feat: support planning cleanup#7147
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
@claude review |
| async fn plan_with_referenced_branches( | ||
| &self, | ||
| referenced_branches: Vec<(String, u64)>, | ||
| ) -> Result<CleanupPlan> { | ||
| let latest_version = self.dataset.latest_version_id().await?; | ||
| self.plan_with_referenced_branches_at_version(referenced_branches, latest_version) | ||
| .await | ||
| } |
There was a problem hiding this comment.
🔴 plan.read_version is captured before manifest listing, causing spurious cleanup_with_plan rejections on commits-that-land-during-planning. plan_with_referenced_branches (cleanup.rs:289-296) resolves latest_version via latest_version_id() before listing manifests, but the listing-consistency guard (cleanup.rs:377-378) accepts max_listed_version >= latest_version and process_manifest_file treats any manifest with version >= latest_version as in-the-working-set and protects its references. So a commit V+1 landing during planning is absorbed into the plan — yet plan.read_version is still stamped with the older V, and validate_plan_read_version rejects with "plan was created from version V, but latest dataset version is V+1" even though no commit landed between plan completion and execute. Fix: set plan.read_version = inspection.max_listed_version (or re-resolve latest_version_id after listing) so the recorded read-version matches what the planner actually observed.
Extended reasoning...
What the bug is
plan_with_referenced_branches resolves latest_version from storage at the start of planning (cleanup.rs:293), then passes that value into plan_with_referenced_branches_at_version → process_manifests → process_manifest_file. While planning, two things happen with that captured latest_version:
process_manifest_file(cleanup.rs:414) computesis_latest = latest_version <= manifest.version. Any newly-listed manifest with a version>= latest_versionis treated as in the working set, and its referenced data/transaction/index/deletion files are added toinspection.referenced_files(i.e. protected from deletion).process_manifests(cleanup.rs:377-378) accepts the listing as consistent whenevermax_listed_version >= latest_version. The PR description explicitly calls this a guard against "eventual-consistency or racing list output".
Then build_cleanup_plan stamps plan.read_version = latest_version — the pre-listing snapshot, not what the planner actually observed.
The race window
There is real I/O between latest_version_id() (a store request via resolve_latest_location) and the start of list_manifest_locations. Suppose:
- Planner calls
latest_version_id()→ returns V (e.g. 2). - A concurrent writer commits → storage latest is now V+1 (3).
- Planner runs
list_manifest_locations. It returns v1, v2, and v3. - For v3:
is_latest = V (2) <= 3 = true→in_working_set = true→ v3's references are added toverified_files.max_listed_versionbecomes 3. - Listing-consistency check:
3 >= 2→ passes. build_cleanup_planwritesplan.read_version = 2.- Operator immediately calls
cleanup_with_plan(plan). No commit happens between step 6 and step 7. validate_plan_read_version(cleanup.rs:722) re-resolveslatest_version_id()→ 3.2 != 3→Error::Cleanup: "cleanup plan was created from version 2, but latest dataset version is 3".
The plan was internally consistent with V+1 — the planner saw V+1, treated it as live, and kept its files protected. The plan would have safely executed. The rejection is spurious.
Why existing code doesn't prevent it
The PR's rustdoc on cleanup_with_plan documents: "cleanup_with_plan will reject a plan if any commit lands between plan_cleanup and cleanup_with_plan on the same dataset." That contract is what validate_plan_read_version is implementing. But here the commit landed during plan_cleanup, not between plan and execute, and the planner already absorbed it. The strict-equality check immediately undoes the planner's TOCTOU tolerance.
The existing test internal_cleanup_plan_allows_toctou_commit_before_delete (added in this PR, lines around 2210) explicitly demonstrates that the author already considers "plan saw a newer manifest than read_version claims" to be a legal, safe state — that test has to call execute_plan_unchecked to skip the version check. The new public cleanup_with_plan API has no such escape hatch.
Impact
For the PR's stated motivation (operators planning cleanups on tables with 100k+ fragments in busy production environments), every concurrent commit that happens to land during the latest_version_id() → list_manifest_locations window forces a full re-plan, even though the planner already observed and accepted the newer state. The user is shown an error that says "a commit landed between plan and execute" when in fact no such commit happened. This is precisely the operational scenario the listing-consistency guard was designed to tolerate.
No data is at risk — the failure mode is rejection, not deletion of live files.
Fix
Set plan.read_version from what the planner actually observed rather than the pre-listing snapshot. Either:
// In build_cleanup_plan, instead of latest_version:
read_version: inspection.max_listed_version
.expect("process_manifests would have errored if no manifests were listed"),or re-resolve latest_version_id() after process_manifests returns and use that. inspection.max_listed_version is guaranteed Some here because process_manifests (cleanup.rs:387-390) errors out if it is None. With this change, the post-plan equality check in validate_plan_read_version accepts the V+1-absorbed plan, while still rejecting any genuine commit landing between plan completion and cleanup_with_plan.
| pub async fn cleanup_with_plan(dataset: &Dataset, plan: CleanupPlan) -> Result<RemovalStats> { | ||
| let cleanup = CleanupTask::new(dataset, plan.policy.clone()); | ||
| cleanup.execute_plan(plan).await | ||
| } |
There was a problem hiding this comment.
🔴 cleanup_with_plan silently skips the clean_referenced_branches cascade. When a caller builds a plan with CleanupPolicy { clean_referenced_branches: true, .. } and then runs cleanup_with_plan, child branches are never visited and the returned RemovalStats omits their contribution — even though the rustdoc on plan_cleanup_with_policy (dataset.rs:1208-1213) promises parity with cleanup_with_policy for the same policy. Fix by cascading on plan.referenced_branches inside execute_plan when policy.clean_referenced_branches is true (or rejecting/documenting that the plan path does not support that policy).
Extended reasoning...
What is broken
Dataset::cleanup_with_plan (dataset.rs:1230) calls cleanup::cleanup_with_plan → CleanupTask::execute_plan (cleanup.rs:356) → execute_plan_unchecked. execute_plan_unchecked only iterates plan.files and deletes the listed paths. The plan.referenced_branches field — which build_cleanup_plan (cleanup.rs:622-625) faithfully populates from the planning-time branch scan — is read by no code path during execution. By contrast, the equivalent direct entry point cleanup_old_versions → CleanupTask::run → run_at_version (cleanup.rs:255-267) explicitly does:
if self.policy.clean_referenced_branches {
let branch_stats = self.clean_referenced_branches(&referenced_branches).await?;
final_stats.bytes_removed += branch_stats.bytes_removed;
// ... accumulates the rest of branch_stats into final_stats
}clean_referenced_branches cascades into cleanup_cascade_branch for each child branch, which itself runs a full file cleanup against that branch. Skipping it means child-branch garbage is never collected.
Why nothing protects against this
plan_cleanup accepts any CleanupPolicy — including one with clean_referenced_branches=true — and produces a populated referenced_branches Vec. cleanup_with_plan validates the dataset URI, base path, file paths, and read_version, but does not inspect plan.policy.clean_referenced_branches. There is no warning, no error, no log line.
Impact
This is the exact use case the PR motivates: a dry-run + execute pair. An operator who currently relies on cleanup_old_versions(policy{clean_referenced_branches:true}) and wants to migrate to plan/execute will get silently divergent behaviour:
plan_cleanupreturns stats for the current branch only (theCleanupPlanStatsfield counts files inplan.files, which never includes child-branch files).cleanup_with_plandeletes only those files; child-branch storage continues to accumulate.- The returned
RemovalStatsis missing the branch contribution thatcleanup_old_versionswould have reported (and that the PR description explicitly highlights as a new behavior: "RemovalStatsreturned bycleanup_old_versionsnow includes the stats of cascadedclean_referenced_branchescleanups").
This directly contradicts the rustdoc on plan_cleanup_with_policy (dataset.rs:1208-1213): "The returned plan contains the concrete files that would be removed by Self::cleanup_with_policy for the same dataset snapshot and policy." For clean_referenced_branches=true, that statement is false.
Step-by-step proof
Setup: a dataset with one child branch feature/x, both with old versions eligible for cleanup. Policy: CleanupPolicy { before_timestamp: Some(now), clean_referenced_branches: true, .. }.
Path A — dataset.cleanup_with_policy(policy):
cleanup_old_versions→CleanupTask::run→run_at_version.- Line 259:
self.find_referenced_branches()returns[("feature/x", root_version)]. - Line 261: policy flag is true →
self.clean_referenced_branches(&referenced_branches)runs, which callscleanup_cascade_branchagainst branchfeature/x. Suppose that frees 500 MB across 30 data files. - Lines 262-267: those numbers are folded into
final_stats. - Main-branch plan is built and executed; suppose that frees 100 MB across 10 files.
- Returned
RemovalStats:bytes_removed = 600 MB,data_files_removed = 40.
Path B — let p = dataset.plan_cleanup_with_policy(policy).await?; dataset.cleanup_with_plan(p).await?:
plan_cleanup→CleanupTask::plan→plan_with_referenced_branches.find_referenced_branchesruns and the result is only attached toplan.referenced_branches(cleanup.rs:622-625).build_cleanup_plandoes not scan child-branch files.plan.filescontains only main-branch entries;plan.statsreflects only 100 MB / 10 files.cleanup_with_plan→execute_plan→validate_plan+validate_plan_read_version+execute_plan_unchecked.execute_plan_uncheckediteratesplan.files.plan.referenced_branchesis in scope but never read.- Returned
RemovalStats:bytes_removed = 100 MB,data_files_removed = 10. Branchfeature/xis untouched.
500 MB / 30 files silently leaked, no error, no log. Two API paths the rustdoc promises to be equivalent diverge.
Suggested fix
In execute_plan (cleanup.rs:356-360), before calling execute_plan_unchecked, check plan.policy.clean_referenced_branches. If true, build the Vec<(String, u64)> from plan.referenced_branches, call self.clean_referenced_branches(...), and fold those stats into the result returned by execute_plan_unchecked (mirroring run_at_version lines 261-267). Alternatively, plan_cleanup could reject policies with clean_referenced_branches=true and the rustdoc should be updated to call that out. The first option preserves the documented equivalence; the second is a smaller change but is a feature gap.
Background / Motivation
cleanup_old_versionstoday behaves as a black box: callers hand in a policyand either get a
RemovalStatsback or, on failure, a partially mutateddataset with no record of which files were inspected, kept, or deleted.
That opacity becomes painful in three scenarios we hit in production:
upcoming cleanup will remove (and how many bytes that frees) before
actually running it, especially on tables with 100k+ fragments where a
mistaken policy could remove tens of GB.
(commit hooks, schedulers), there is no artifact we can inspect afterwards
to answer "why did this file go away?". The
tracingaudit log helps, butonly if you were already capturing it.
execute on another (or in a maintenance window), which the current API
does not support at all.
This PR splits cleanup into an explicit plan and execute pair, while
keeping the existing
cleanup_old_versionsentry point byte-for-bytecompatible. The plan is a serializable description of every file the cleanup
intends to delete, the reason it qualifies, and the dataset snapshot it was
built from.
What's in this PR
plan_cleanup(&Dataset, CleanupPolicy) -> CleanupPlancleanup_with_plan(&Dataset, CleanupPlan) -> RemovalStatsCleanupPlan/CleanupFile/CleanupFileKind/CleanupFileReason/CleanupPlanStats/CleanupReferencedBranchdata types.CleanupTaskinto three explicit execution paths(
cleanup_old_versions/cleanup_with_plan/ commit hooks), each withits own trust model documented in-source.
cleanup_with_planvalidates: dataset URI, base path, that every plannedpath stays under the dataset base, and that the plan's
read_versionstill matches the storage-resolved latest version. A residual TOCTOU
window between the version check and the deletes is documented in the
rustdoc; callers running concurrent writers must serialize externally.
the in-memory dataset handle, so plans built from a stale handle are
still safe.
list_manifest_locationsdid not return the storage-resolved latest version (defends against
eventual-consistency or racing list output).
Behavior changes worth flagging
RemovalStatsreturned bycleanup_old_versionsnow includes the statsof cascaded
clean_referenced_branchescleanups. Previously those weresilently dropped. Monitoring/dashboards that compared against the old
numbers will see an increase.
cleanup_with_planwill reject a plan if any commit lands betweenplan_cleanupandcleanup_with_planon the same dataset. This is bydesign — see rustdoc. The internal
cleanup_old_versionspath isunaffected.
Tests
plan_cleanup_does_not_delete_filesplan_cleanup_uses_latest_version_with_stale_handlecleanup_with_plan_rejects_stale_versioncleanup_with_plan_rejects_toctou_commit_with_stale_handleinternal_cleanup_plan_allows_toctou_commit_before_deleteprocess_manifests_rejects_listing_missing_latest_versioncleanup_old_versions/cleanup_with_policytests continueto pass unmodified.