Use GetLatestSnapshot in CleanupRecordExists#8594
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8594 +/- ##
==========================================
+ Coverage 87.99% 88.73% +0.74%
==========================================
Files 288 288
Lines 64382 64385 +3
Branches 8108 8109 +1
==========================================
+ Hits 56650 57133 +483
+ Misses 5381 4911 -470
+ Partials 2351 2341 -10 🚀 New features to boost your workflow:
|
| ScanKeyInit(&scanKey[0], Anum_pg_dist_cleanup_record_id, | ||
| BTEqualStrategyNumber, F_INT8EQ, Int64GetDatum(recordId)); | ||
|
|
||
| /* make sure we can always see deletion after acquiring the operation ID lock */ |
There was a problem hiding this comment.
nit: comment format strays from convention. Maybe add this text to the function comment at line 1179 ?
There was a problem hiding this comment.
Thanks for this PR, it looks like it fixes a real race condition by ensuring the catalog is read at a fresh snapshot. Is it possible to add a regress test ? An isolation test where the maintenance daemon contends over pg_dist_cleanup with a shard move/split operation, for example. The main concern is that without regress test(s), a future code change could compromise this fix.
Since cleanup does not block waiting for the lock, the isolation testing framework cannot intercept it. It would require injection points, but Citus does not use those yet. |
How did you encounter this bug ? Curious what prompted the PR. Another test approach is a superuser only GUC that reflects a millisec delay, and enables a test to create a race and assert there's only one drop. But that may be overkill for this PR - possibly requires a TAP test - so for this but can you update the function's comment with fix details before we merge it ? And also a note about a future test using Postgres injection points. It's worth emphasizing since comments are a useful (and in this case, the only) defense again a silent regression. |
Expand the function header to record:
* Why GetLatestSnapshot() is required (post-TryLockOperationId
visibility of a delete committed by the operation owner just
before the advisory lock was released);
* That this helper is not appropriate for callers that want their
own outer-transaction snapshot semantics;
* That no automated regression test exists today because cleanup
uses TryLockOperationId(dontWait=true), which pg_isolationtester
cannot interleave, and that adopting PostgreSQL's INJECTION_POINT()
infrastructure (or an equivalent test-only delay GUC) is required
before such a test can be written;
* That, until that test infrastructure exists, the GetLatestSnapshot()
call is load-bearing and must not be reverted to NULL.
This is a follow-up to the snapshot fix in the previous commit and
does not change behavior.
2d018eb to
3171fce
Compare
|
@microsoft-github-policy-service agree company="Snowflake" |
|
We found it because we use a similar technique in https://github.com/Snowflake-Labs/pg_lake/blob/main/pg_lake_engine/src/cleanup/in_progress_files.c for file cleanup in pg_lake, which happens at a much higher rate than shard moves making it easier to detect the bug. At some point we realized Citus might have a similar bug. |
DESCRIPTION: Prevent a possible race condition in shard cleanup
systable_beginscan without a snapshot calls GetCatalogSnapshot, which might be cached from a previous call. That means that if a shard move succeeded and the deletion of the cleanup record completed just before the cleaner tries to acquire a lock,
CleanupRecordExistsmight still see the record in its snapshot and shard cleanup will incorrectly proceed.