Skip to content

chore: DiskTree Prefix Compression#566

Merged
jiangzhe merged 8 commits intomainfrom
btree-fences
Apr 18, 2026
Merged

chore: DiskTree Prefix Compression#566
jiangzhe merged 8 commits intomainfrom
btree-fences

Conversation

@jiangzhe
Copy link
Copy Markdown
Owner

@jiangzhe jiangzhe commented Apr 18, 2026

Closes #565

Summary by CodeRabbit

  • Refactor
    • Reorganized B‑Tree and storage internals to centralize packing/merge logic and tighten node interfaces.
  • New Features
    • Smarter sibling packing and merge planning with explicit fence handling and range-based node rebuilding for denser, safer blocks.
  • Bug Fixes
    • Fixed edge cases in packing/merge and rewrite flows that could cause incorrect repacks or failed rewrites.
  • Tests
    • Expanded tests for fence selection, space estimation, merge outcomes, rewrite/absorption, and multi-range packing.
  • Chores
    • Removed legacy convenience scan helpers; tests and tooling now use the encoded scan API.

@jiangzhe jiangzhe self-assigned this Apr 18, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 18, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new crate-internal B‑Tree packing/planning module and integrates it across B‑tree and DiskTree: introduces packed-node space estimation, pack/plan APIs, tightens node helper visibility, restructures B‑Tree submodules and re-exports, rewrites DiskTree rewrite/packing to use deferred materialization and the new packing helpers, and updates callsites/tests to the new APIs.

Changes

Cohort / File(s) Summary
B-Tree Packing Algorithm
doradb-storage/src/index/btree/algo.rs
New module implementing packed entry types, fence-aware space estimator, sibling-plan algorithms, packers for ranges/multi-ranges, MemTree sibling-merge planner, and unit tests.
B-Tree Module Reorg & Re-exports
doradb-storage/src/index/btree/mod.rs, doradb-storage/src/index/mod.rs
Added algo and hint modules, reorganized/re-exported btree submodules, removed legacy btree_* re-exports at index root; callsites updated to new exports.
Node API & Packed-Node Estimator
doradb-storage/src/index/btree/node.rs
Narrowed many BTreeNode helper visibilities to pub(in crate::index) and added PackedNodeSpace estimator and entry-space accounting helpers used by planner/packers.
DiskTree Rewrite & Packing Integration
doradb-storage/src/index/disk_tree.rs
Reworked rewrite to deferred RewriteEntry/PendingRewriteEntry, replaced eager packing with plan_sibling_node + pack_* routines, added allocation guard/rollback, sibling absorption logic, and updated tests for fence propagation and rollback.
Packing/Pack Callsites
doradb-storage/src/index/btree/mod.rs, .../merge/compaction (callers)
Replaced manual node init/copy/update_hints with pack_node_range_into/box and pack_node_ranges_box, and use plan_memtree_sibling_merge for compaction/merge decisions; updated separator deletion encoding width usage.
Imports & Callsite Adjustments
doradb-storage/src/index/btree/hint.rs, .../scan.rs, .../non_unique_index.rs, .../secondary_index.rs, .../unique_index.rs, .../table/tests.rs, .../trx/recover.rs
Updated use paths to consume consolidated crate::index::btree exports; tests and callsites migrated to encoded APIs (prefix_scan_entries, encoded batch put/insert) and adjusted helpers.
Key Encoder API
doradb-storage/src/index/btree/key.rs
Removed KeyEncoder::encode_extend method and its implementations; tests updated to use encode_copy and est_encode_len.
Small Import Fix
doradb-storage/src/index/btree/hint.rs
Adjusted KeyHeadInt import path and conditional compilation for scalar fallback.
Tests & Assertions
doradb-storage/src/index/*, trx/* tests
Updated tests to new encoded batch/scan APIs and validated packing/merge/fence behaviors; added tests for partial/full-merge branches and packing equivalence.

Sequence Diagram(s)

sequenceDiagram
  participant DiskTree
  participant BTreeAlgo
  participant BTreeNode
  participant MemTree

  DiskTree->>BTreeAlgo: request plan(entries, lower/upper fences, min_slots)
  BTreeAlgo->>BTreeNode: estimate packed space with fences & prefix logic
  BTreeNode-->>BTreeAlgo: space estimation result
  BTreeAlgo->>DiskTree: return plan (Full / Partial{right_count} / FenceOutOfSpace / NoProgress)

  alt Full or Partial
    DiskTree->>BTreeAlgo: request pack(KnownFenceNodeParams, ranges/entries)
    BTreeAlgo->>BTreeNode: init boxed node and copy slots (pack ranges)
    BTreeNode-->>DiskTree: packed node box
    DiskTree->>MemTree: apply merge/purge or update separator
  else FenceOutOfSpace / NoProgress
    DiskTree->>MemTree: handle skip/abandon or defer
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰 I nibble prefixes, counting bytes at night,
I tuck the siblings snug and tight,
With fences set and pockets neat,
I hop the slots until they meet,
A tiny hop — the tree sleeps light.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'chore: DiskTree Prefix Compression' directly describes the main change—implementing prefix compression in the DiskTree module, which aligns with the core refactoring of prefix fence handling and space estimation.
Linked Issues check ✅ Passed The PR closes issue #565 which is a task for DiskTree prefix compression. The changes implement comprehensive prefix fence compression support through new B+Tree node packing utilities, space estimation, merge planning, and DiskTree rewrites using these mechanisms.
Out of Scope Changes check ✅ Passed All changes are within scope: new algo module for node packing/merging, modified hint/scan imports, updated disk_tree with deferred rewrite blocks and sibling absorption, removal of convenience wrapper APIs (UniqueDiskTreePut, NonUniqueDiskTreeDelete, etc.), module restructuring under btree/, and test updates to use new encoded APIs.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch btree-fences

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 18, 2026

Codecov Report

❌ Patch coverage is 96.95148% with 71 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.18%. Comparing base (779e572) to head (eac6856).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
doradb-storage/src/index/disk_tree.rs 95.70% 44 Missing ⚠️
doradb-storage/src/index/btree/algo.rs 97.79% 13 Missing ⚠️
doradb-storage/src/index/btree/mod.rs 97.98% 8 Missing ⚠️
doradb-storage/src/index/btree/node.rs 94.25% 5 Missing ⚠️
doradb-storage/src/table/tests.rs 98.59% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #566      +/-   ##
==========================================
+ Coverage   91.72%   92.18%   +0.45%     
==========================================
  Files         102      103       +1     
  Lines       56084    57525    +1441     
==========================================
+ Hits        51445    53029    +1584     
+ Misses       4639     4496     -143     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
doradb-storage/src/index/btree/algo.rs (2)

451-475: Test name vs. covered scenario is slightly misleading.

test_pack_sibling_node_accounts_for_upper_fence_capacity actually exercises the path where the oversized lower_fence alone exhausts the page and rightmost_count == 0/min_slots > 0 forces the finite-fence loop to bail out with InvalidArgument. The upper fence never gets considered (loop range 1..1 is empty). Consider renaming to reflect the actual scenario (e.g. test_pack_sibling_node_rejects_when_fences_exceed_page) or extending the test to add a second entry so the upper-fence capacity path is genuinely exercised.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@doradb-storage/src/index/btree/algo.rs` around lines 451 - 475, The test name
misrepresents what it covers: update the test function
test_pack_sibling_node_accounts_for_upper_fence_capacity to either (A) rename it
to reflect the actual behavior (e.g.
test_pack_sibling_node_rejects_when_fences_exceed_page) or (B) change the setup
so the upper-fence path is exercised by adding a second entry (create a second
PackedNodeEntry and adjust entries and assertions) so that pack_sibling_node's
finite-fence loop (the branch that considers the upper fence) is actually
entered; keep references to pack_sibling_node, PackedNodeParams,
lower_fence/rightmost_count/min_slots and assert the expected
Error::InvalidArgument outcome when appropriate.

130-172: Minor: fences_fit is conservative, and one branch appears unreachable.

Two observations on the inner loop:

  1. fences_fit (via SpaceEstimation::new) counts prefix_len bytes unconditionally, whereas PackedNodeSpace::total_space() correctly accounts for the inline threshold via common_prefix_body_space, which returns 0 when prefix_len <= INLINE_PREFIX_LEN (16 bytes). When the prefix is inline, fences_fit can over-estimate by up to 16 bytes and trigger the early break at line 137 sooner than necessary. Since PackedNodeSpace::total_space() already models the correct layout, consider replacing the fences_fit guard at line 135 with a check against space.total_space() instead to eliminate the discrepancy.

  2. The condition included_count > packed at line 142 is unreachable. After each iteration (whether via the recompute path at line 151 or the incremental path at line 160), included_count equals packed. On the next iteration, packed increments strictly while included_count is not modified until one of those paths executes again, so included_count can never exceed packed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@doradb-storage/src/index/btree/algo.rs` around lines 130 - 172, Replace the
conservative fences_fit::<V>(lower_fence, upper_fence) guard with a check based
on the actual PackedNodeSpace for the fence pair: construct space via
PackedNodeSpace::with_fences(lower_fence, upper_fence) (as already done) and
rely on space.total_space() together with included_space to determine fit (i.e.,
compare space.total_space().checked_add(included_space) to
BTREE_NODE_USABLE_SIZE) instead of using fences_fit; also remove the unreachable
branch condition included_count > packed (and its associated recompute path)
since included_count will never be greater than packed given the loop logic
(keep the incremental accumulation path that advances included_count to packed).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@doradb-storage/src/index/btree/algo.rs`:
- Around line 451-475: The test name misrepresents what it covers: update the
test function test_pack_sibling_node_accounts_for_upper_fence_capacity to either
(A) rename it to reflect the actual behavior (e.g.
test_pack_sibling_node_rejects_when_fences_exceed_page) or (B) change the setup
so the upper-fence path is exercised by adding a second entry (create a second
PackedNodeEntry and adjust entries and assertions) so that pack_sibling_node's
finite-fence loop (the branch that considers the upper fence) is actually
entered; keep references to pack_sibling_node, PackedNodeParams,
lower_fence/rightmost_count/min_slots and assert the expected
Error::InvalidArgument outcome when appropriate.
- Around line 130-172: Replace the conservative fences_fit::<V>(lower_fence,
upper_fence) guard with a check based on the actual PackedNodeSpace for the
fence pair: construct space via PackedNodeSpace::with_fences(lower_fence,
upper_fence) (as already done) and rely on space.total_space() together with
included_space to determine fit (i.e., compare
space.total_space().checked_add(included_space) to BTREE_NODE_USABLE_SIZE)
instead of using fences_fit; also remove the unreachable branch condition
included_count > packed (and its associated recompute path) since included_count
will never be greater than packed given the loop logic (keep the incremental
accumulation path that advances included_count to packed).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a6cad0a4-4a63-4c24-91fa-ad5891479b2a

📥 Commits

Reviewing files that changed from the base of the PR and between 1d1e57f and 25c7305.

📒 Files selected for processing (2)
  • doradb-storage/src/index/btree/algo.rs
  • doradb-storage/src/index/btree/node.rs

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@doradb-storage/src/index/btree/algo.rs`:
- Around line 471-477: The fences_fit function currently uses the conservative
SpaceEstimation, which rejects keys that would fit once prefix-compression is
applied; change fences_fit to use the prefix-aware estimator (PackedNodeSpace)
so common-prefix compression is considered — replace the
SpaceEstimation::with_fences(...) call with the prefix-aware estimator (e.g.
PackedNodeSpace::with_fences(lower_fence, upper_fence,
V::ENCODED_LEN).total_space() or the project’s prefix-aware estimator API) and
remove any unnecessary ::<V> generic qualifiers at call sites so the
packed/prefix-aware estimator is used consistently for fence-fit checks.

In `@doradb-storage/src/index/btree/mod.rs`:
- Around line 1026-1041: The branch-node partial-merge currently sets
lower_fence_value to r_node.value::<BTreeU64>(count) while pack_node_range_box
copies the slot at index count, causing the first child to be reachable twice;
to fix, when r_node.height() != 0 set KnownFenceNodeParams.lower_fence_value to
BTreeU64::INVALID_VALUE (matching split_node) instead of
r_node.value::<BTreeU64>(count), so the right suffix doesn’t duplicate the first
branch child (alternatively you may change the copied range from
count..r_node.count() to (count + 1)..r_node.count(), but prefer the
INVALID_VALUE approach to align with split_node).

In `@doradb-storage/src/index/disk_tree.rs`:
- Around line 1368-1379: The branch write path currently allocates rewrite block
ids in write_one_branch_block (and write_node_block) and
materialize_rewrite_entries can create child rewrite blocks, but on any
subsequent async write failure those ids remain allocated; modify
materialize_rewrite_entries / materialize_rewrite_entry and
write_one_branch_block to track allocated rewrite block ids (both the parent id
from write_one_*_block and any child ids returned by
materialize_rewrite_entries), and on any Err return roll them back by calling
the block-deallocation routine you already use elsewhere (same pattern as other
write paths), ensuring that when a parent write fails you also deallocate any
materialized children; apply the same pattern to the other occurrences
referenced in the comment (the branch write sites around the blocks at the other
lines).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0af6f8c3-5839-4c24-a866-6f0673bc35cf

📥 Commits

Reviewing files that changed from the base of the PR and between 25c7305 and 61e3d21.

📒 Files selected for processing (6)
  • doradb-storage/src/index/btree/algo.rs
  • doradb-storage/src/index/btree/mod.rs
  • doradb-storage/src/index/disk_tree.rs
  • doradb-storage/src/index/secondary_index.rs
  • doradb-storage/src/table/tests.rs
  • doradb-storage/src/trx/recover.rs
✅ Files skipped from review due to trivial changes (2)
  • doradb-storage/src/trx/recover.rs
  • doradb-storage/src/index/secondary_index.rs

Comment thread doradb-storage/src/index/btree/algo.rs Outdated
Comment thread doradb-storage/src/index/btree/mod.rs Outdated
Comment thread doradb-storage/src/index/disk_tree.rs
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@doradb-storage/src/index/btree/mod.rs`:
- Around line 973-974: The code wrongly uses V::ENCODED_LEN when deleting the
separator from the parent branch (p_node.delete_at(p_r_idx, value_size)) which
uses child page ids encoded with the parent branch width; replace V::ENCODED_LEN
with the parent branch's encoded value length—obtain the length from the parent
node/type (e.g., use the branch value constant or a method on p_node such as
branch/value encoded length) and pass that to p_node.delete_at instead of
V::ENCODED_LEN to avoid corrupting the parent layout.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 56746692-5aaf-4622-906d-9236e50ac648

📥 Commits

Reviewing files that changed from the base of the PR and between 61e3d21 and 2cbda92.

⛔ Files ignored due to path filters (2)
  • docs/backlogs/000092-refactor-memtree-compaction-policy.md is excluded by !docs/**, !**/*.md
  • docs/backlogs/next-id is excluded by !docs/**
📒 Files selected for processing (3)
  • doradb-storage/src/index/btree/algo.rs
  • doradb-storage/src/index/btree/mod.rs
  • doradb-storage/src/index/disk_tree.rs
✅ Files skipped from review due to trivial changes (1)
  • doradb-storage/src/index/btree/algo.rs

Comment thread doradb-storage/src/index/btree/mod.rs Outdated
@jiangzhe jiangzhe merged commit c2c92af into main Apr 18, 2026
6 checks passed
@jiangzhe jiangzhe deleted the btree-fences branch April 18, 2026 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Task: DiskTree Prefix Compression

1 participant