Skip to content

delete workflow in batches#256

Open
ianmacartney wants to merge 4 commits into
mainfrom
ian/batch-delete
Open

delete workflow in batches#256
ianmacartney wants to merge 4 commits into
mainfrom
ian/batch-delete

Conversation

@ianmacartney
Copy link
Copy Markdown
Member

@ianmacartney ianmacartney commented May 8, 2026

Make cleanup resilient to transaction limits

Fixes #255

Workflow cleanup now deletes steps in batches rather than collecting all steps at once. After each batch, the transaction budget is checked, and if it is mostly consumed, a scheduled continuation (cleanupContinue) is enqueued to pick up where deletion left off.

Share deleteStepsFrom

restartHandler and cleanup now both route through a single deleteStepsFrom helper, replacing the old deleteSteps function that operated on a pre-fetched slice.

All step deletion is batched

deleteStepBatch accumulates nested workflow IDs encountered during a batch and enqueues their cleanup in a single enqueueMutationBatch call rather than one enqueueMutation per step.

Copy link
Copy Markdown
Member Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented May 8, 2026

Open in StackBlitz

npm i https://pkg.pr.new/@convex-dev/workflow@256

commit: abd7483

@ianmacartney ianmacartney marked this pull request as ready for review May 12, 2026 19:29
@ianmacartney ianmacartney requested a review from reeceyang May 12, 2026 19:30
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR refactors workflow cleanup to be resilient for large workflows by implementing batched, transaction-budget-aware deletion. The cleanup function now calls deleteStepsFrom(ctx, workflowId, 0) which deletes steps in fixed-size batches, deletes associated events, and enqueues nested-workflow cleanup in batch. When transaction metrics indicate budget is mostly consumed, deletions pause and schedule an internal mutation cleanupContinue to resume later. restartHandler is updated to compute the correct starting step and reuse prefetched docs, and a test verifies cleanup of 100 steps with continuation draining.

Sequence Diagram

sequenceDiagram
  participant cleanup as cleanup()
  participant deleteStepsFrom as deleteStepsFrom()
  participant deleteStepBatch as deleteStepBatch()
  participant budget as transactionBudgetMostlyConsumed()
  participant cleanupContinue as cleanupContinue()
  
  cleanup->>deleteStepsFrom: deleteStepsFrom(ctx, workflowId, 0)
  loop Process batches
    deleteStepsFrom->>deleteStepBatch: fetch and delete next batch
    deleteStepBatch->>budget: check transaction budget
    alt Budget OK
      deleteStepBatch->>deleteStepBatch: delete steps, delete events, enqueue nested cleanup batch
      deleteStepBatch-->>deleteStepsFrom: continue to next batch
    else Budget exceeded
      deleteStepBatch->>deleteStepsFrom: return offset for resumption
      deleteStepsFrom->>cleanupContinue: schedule cleanupContinue(ctx, workflowId, offset)
      Note over cleanupContinue: Scheduled for next execution
      cleanupContinue->>deleteStepsFrom: resume deleteStepsFrom(ctx, workflowId, offset)
    end
  end
  deleteStepsFrom-->>cleanup: all steps deleted
Loading

Possibly related PRs

  • get-convex/workflow#205: Modifies src/component/workflow.ts's cleanup and step-deletion flow and introduces shared step-deletion helpers.

Suggested reviewers

  • reeceyang
  • sethconvex
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: implementing batched deletion for workflow cleanup instead of eager deletion.
Description check ✅ Passed The description is clearly related to the changeset, explaining how cleanup now handles transaction limits through batching and continuation.
Linked Issues check ✅ Passed The PR directly addresses issue #255 by implementing batched step deletion with transaction budget awareness to prevent read limit failures.
Out of Scope Changes check ✅ Passed All changes are in scope: batched deletion logic, new cleanupContinue mutation, refactored deleteStepsFrom helper, and test updates validate the fix.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ian/batch-delete

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/component/workflow.test.ts (1)

574-577: ⚡ Quick win

Increase test size to exercise batching behavior.

Using stepCount = 100 won’t cross the 256-delete batch boundary, so this test can pass without covering the new multi-batch cleanup path.

Simple update
-    const stepCount = 100;
+    const stepCount = 300;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/component/workflow.test.ts` around lines 574 - 577, The test uses
stepCount = 100 which doesn't trigger the multi-batch delete path; increase the
stepCount value to a number greater than the delete batch size (e.g., set
stepCount to 300) so the loop that calls ctx.db.insert("steps", ...) within the
t.run(async (ctx) => { ... }) block creates enough records to exercise the
256-delete batching behavior and validate the multi-batch cleanup logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/component/workflow.ts`:
- Line 314: deleteStepsFrom currently schedules cleanup via cleanupContinue and
can return before deletions finish, allowing restartHandler to increment
generation and restart while old steps remain; change deleteStepsFrom (or add a
new variant like deleteStepsFromAndWait) so it returns a Promise that resolves
only after cleanupContinue and all targeted step deletions complete, then update
restartHandler (and the other call sites using deleteStepsFrom) to await that
completed promise before incrementing the workflow generation and triggering the
restart to prevent stale steps from persisting.

---

Nitpick comments:
In `@src/component/workflow.test.ts`:
- Around line 574-577: The test uses stepCount = 100 which doesn't trigger the
multi-batch delete path; increase the stepCount value to a number greater than
the delete batch size (e.g., set stepCount to 300) so the loop that calls
ctx.db.insert("steps", ...) within the t.run(async (ctx) => { ... }) block
creates enough records to exercise the 256-delete batching behavior and validate
the multi-batch cleanup logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: get-convex/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 33725fd3-2e6d-4b66-9cfe-67a6f6b02e2b

📥 Commits

Reviewing files that changed from the base of the PR and between 56f74ee and 4547e51.

📒 Files selected for processing (3)
  • CHANGELOG.md
  • src/component/workflow.test.ts
  • src/component/workflow.ts

Comment thread src/component/workflow.ts
fromStepNumber = found;
prefetched = visited;
}
await deleteStepsFrom(ctx, args.workflowId, fromStepNumber, prefetched);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Prevent deferred step deletion during restart.

deleteStepsFrom can schedule cleanupContinue and return before all target steps are deleted. restartHandler then increments generation and restarts immediately, which can leave stale steps from the old run and corrupt replay behavior.

Suggested direction
-export async function restartHandler(...) {
+export async function restartHandler(...) {
   ...
-  await deleteStepsFrom(ctx, args.workflowId, fromStepNumber, prefetched);
+  await deleteStepsFrom(ctx, args.workflowId, fromStepNumber, prefetched, {
+    allowContinuation: false,
+  });
   ...
 }
 
 async function deleteStepsFrom(
   ctx: MutationCtx,
   workflowId: Id<"workflows">,
   fromStepNumber: number,
   prefetched?: Doc<"steps">[],
+  opts: { allowContinuation?: boolean } = {},
 ) {
+  const allowContinuation = opts.allowContinuation ?? true;
   ...
-      if (await transactionBudgetMostlyConsumed(ctx)) {
+      if (allowContinuation && (await transactionBudgetMostlyConsumed(ctx))) {
         await ctx.scheduler.runAfter(0, internal.workflow.cleanupContinue, {
           workflowId,
           fromStepNumber,
         });
         return;
       }
   ...
-    if (await transactionBudgetMostlyConsumed(ctx)) {
+    if (allowContinuation && (await transactionBudgetMostlyConsumed(ctx))) {
       await ctx.scheduler.runAfter(0, internal.workflow.cleanupContinue, {
         workflowId,
         fromStepNumber: batch[batch.length - 1].stepNumber + 1,
       });
       return;
     }

Also applies to: 546-551, 565-570

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/component/workflow.ts` at line 314, deleteStepsFrom currently schedules
cleanup via cleanupContinue and can return before deletions finish, allowing
restartHandler to increment generation and restart while old steps remain;
change deleteStepsFrom (or add a new variant like deleteStepsFromAndWait) so it
returns a Promise that resolves only after cleanupContinue and all targeted step
deletions complete, then update restartHandler (and the other call sites using
deleteStepsFrom) to await that completed promise before incrementing the
workflow generation and triggering the restart to prevent stale steps from
persisting.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/component/workflow.ts (1)

316-316: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Keep restart on a synchronous deletion path.

deleteStepsFrom can enqueue cleanupContinue and return before the target range is actually deleted. restartHandler then bumps generationNumber and starts the workflow again, so stale steps from the previous run can survive into the restarted execution.

Suggested direction
 export async function restartHandler(
   ctx: MutationCtx,
   args: Infer<typeof restartArgs>,
 ) {
   ...
-  await deleteStepsFrom(ctx, args.workflowId, fromStepNumber, prefetched);
+  await deleteStepsFrom(ctx, args.workflowId, fromStepNumber, prefetched, {
+    allowContinuation: false,
+  });
   ...
 }

 async function deleteStepsFrom(
   ctx: MutationCtx,
   workflowId: Id<"workflows">,
   fromStepNumber: number,
   prefetched?: Doc<"steps">[],
+  opts: { allowContinuation?: boolean } = {},
 ) {
+  const allowContinuation = opts.allowContinuation ?? true;
   ...
-    if (await transactionBudgetMostlyConsumed(ctx)) {
+    if (allowContinuation && (await transactionBudgetMostlyConsumed(ctx))) {
       await ctx.scheduler.runAfter(0, internal.workflow.cleanupContinue, {
         workflowId,
         fromStepNumber,
       });
       return;
     }
   }
 }

Also applies to: 538-570

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/component/workflow.ts` at line 316, The call to deleteStepsFrom can
return before the deletion finishes (it may enqueue cleanupContinue), allowing
restartHandler to increment generationNumber and restart the workflow while
stale steps remain; modify the restart path to ensure deletions complete
synchronously before restarting by either (A) adding/using a synchronous
deletion API or awaiting the completion signal from deleteStepsFrom (e.g., wait
for its cleanupContinue work to finish), or (B) change deleteStepsFrom to return
a Promise that resolves only after the target steps are fully removed and have
been persisted, then have restartHandler (which updates generationNumber and
restarts the workflow) await that Promise so no stale steps survive into the
restarted execution; update calls at the shown locations (the deleteStepsFrom
invocation at this diff and the other occurrences around lines 538-570) to use
the synchronous/awaiting behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@src/component/workflow.ts`:
- Line 316: The call to deleteStepsFrom can return before the deletion finishes
(it may enqueue cleanupContinue), allowing restartHandler to increment
generationNumber and restart the workflow while stale steps remain; modify the
restart path to ensure deletions complete synchronously before restarting by
either (A) adding/using a synchronous deletion API or awaiting the completion
signal from deleteStepsFrom (e.g., wait for its cleanupContinue work to finish),
or (B) change deleteStepsFrom to return a Promise that resolves only after the
target steps are fully removed and have been persisted, then have restartHandler
(which updates generationNumber and restarts the workflow) await that Promise so
no stale steps survive into the restarted execution; update calls at the shown
locations (the deleteStepsFrom invocation at this diff and the other occurrences
around lines 538-570) to use the synchronous/awaiting behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: get-convex/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7a887a95-9e70-494c-a09e-180ebdebc2db

📥 Commits

Reviewing files that changed from the base of the PR and between 4547e51 and abd7483.

📒 Files selected for processing (1)
  • src/component/workflow.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Impossible to cleanup workflow with too many steps(+events)

1 participant