|
| 1 | +# Backlog: Refactor recovery process and parallelize log replay |
| 2 | + |
| 3 | +## Summary |
| 4 | + |
| 5 | +Recovery has accumulated timestamp-boundary logic, checkpoint bootstrap handling, hot MemTree rebuild behavior, and sequential redo replay in one coordinator. Add a follow-up to clean up the recovery code structure and evaluate or implement parallel log replay without regressing catalog or user-table replay correctness. |
| 6 | + |
| 7 | +## Reference |
| 8 | + |
| 9 | +Raised during task 000120 while wiring dual-tree secondary-index recovery. Discussion clarified LogRecovery state such as catalog_replay_start_ts, replay_floor, max_recovered_cts, table_states, and recovered_tables, and exposed that recovery needs clearer structure before adding more concurrency. |
| 10 | + |
| 11 | +## Deferred From (Optional) |
| 12 | + |
| 13 | +docs/tasks/000120-secondary-index-runtime-access-and-recovery.md |
| 14 | + |
| 15 | +## Deferral Context (Optional) |
| 16 | + |
| 17 | +- Defer Reason: Task 000120 is scoped to secondary-index runtime access and recovery cutover. Broad recovery cleanup and replay parallelization would widen the task beyond dual-tree integration and risk delaying the current feature. |
| 18 | +- Findings: Current recovery tracks multiple replay boundaries: catalog checkpoint replay start, per-table heap redo start, deletion cutoff, a global replay floor, and max recovered CTS. Checkpointed DiskTree secondary-index roots should remain cold state while recovery rebuilds only hot MemTree rows. DDL acts as a pipeline breaker today, and DML replay is still sequential with a todo for dispatching work to multiple threads. |
| 19 | +- Direction Hint: Start with documentation and small structural cleanup so timestamp boundary semantics remain explicit. Then plan parallel replay around DDL barriers, catalog-table serialization, user-table independence, and row-page conflict grouping. Preserve max recovered CTS as a global timestamp watermark even for skipped log records. |
| 20 | + |
| 21 | +## Scope Hint |
| 22 | + |
| 23 | +Audit the trx recovery flow, separate checkpoint bootstrap, DDL and catalog replay, user-table DML replay, index rebuild, and page refresh responsibilities where useful, then design parallel DML or log replay boundaries with deterministic ordering around DDL pipeline breakers and per-table or per-page conflicts. |
| 24 | + |
| 25 | +## Acceptance Hint |
| 26 | + |
| 27 | +Recovery code has clearer ownership boundaries and comments, tests cover catalog replay boundaries, per-table heap and delete cutoffs, hot secondary-index rebuilds, and parallel replay ordering, and full doradb-storage nextest passes with no timestamp reuse or index recovery regressions. |
| 28 | + |
| 29 | +## Notes (Optional) |
| 30 | + |
| 31 | +Consider whether a full RFC is needed before implementation because parallel replay touches recovery ordering, error propagation, and replay determinism. |
| 32 | + |
| 33 | +## Close Reason (Added When Closed) |
| 34 | + |
| 35 | +When a backlog item is moved to `docs/backlogs/closed/`, append: |
| 36 | + |
| 37 | +```md |
| 38 | +## Close Reason |
| 39 | + |
| 40 | +- Type: <implemented|stale|replaced|duplicate|wontfix|already-implemented|other> |
| 41 | +- Detail: <reason detail> |
| 42 | +- Closed By: <backlog close> |
| 43 | +- Reference: <task/issue/pr reference> |
| 44 | +- Closed At: <YYYY-MM-DD> |
| 45 | +``` |
0 commit comments