chore: composite secondary index core by jiangzhe · Pull Request #558 · jiangzhe/doradb

jiangzhe · 2026-04-14T23:43:31Z

Closes #557

Summary by CodeRabbit

New Features
- Composite secondary indexes combining in-memory and on-disk layers with consistent snapshot opens and merged scan/lookup results.
- Snapshot-backed disk-tree runtime for more stable, efficient snapshot reads.
Bug Fixes
- Index scans now abort on callback errors; malformed encoded entries are detected and reported.
- Improved common-prefix aware lower-bound probing for more accurate and efficient prefix scans.
Tests
- Added unit tests for unique/non-unique semantics, merge/overlay behavior, snapshot consistency, scans, and error propagation.

coderabbitai · 2026-04-14T23:43:46Z

Warning

Rate limit exceeded

@jiangzhe has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 7 minutes and 50 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 7 minutes and 50 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e338b977-89d1-4d0c-8788-502443b9aae4

📥 Commits

Reviewing files that changed from the base of the PR and between a47ff01 and 52aee6c.

⛔ Files ignored due to path filters (3)

docs/backlogs/000086-secondary-index-dual-tree-access-path.md is excluded by !docs/**, !**/*.md
docs/backlogs/next-id is excluded by !docs/**
docs/tasks/000119-composite-secondary-index-core.md is excluded by !docs/**, !**/*.md

📒 Files selected for processing (1)

doradb-storage/src/index/composite_secondary_index.rs

📝 Walkthrough

Walkthrough

Adds a composite secondary-index combining MemTree + DiskTree runtimes, refactors DiskTree into runtime/snapshot surfaces, expands MemTree encoded-entry helpers, adjusts B-tree scan callback error semantics, and removes the ReadonlyBackingFile indirection across buffer/file/write paths.

Changes

Cohort / File(s)	Summary
Composite secondary-index `doradb-storage/src/index/composite_secondary_index.rs`, `doradb-storage/src/index/mod.rs`	New module implementing `SecondaryDiskTreeRuntime`, `DualTreeUniqueIndex`, `DualTreeNonUniqueIndex`, and `DualTreeSecondaryIndex` wrapper; runtime-based openers for unique/non-unique disk snapshots; integrated into module exports.
DiskTree runtime & snapshot `doradb-storage/src/index/disk_tree.rs`	Introduced `DiskTreeRuntime<F>` owning IO/context and `DiskTree<'a,F>` as a borrowing snapshot; added Unique/NonUnique runtime factories; added incremental scan API and non-unique prefix-scan entries; increased visibility of several internal types.
MemTree encoded-entry APIs `doradb-storage/src/index/non_unique_index.rs`, `doradb-storage/src/index/unique_index.rs`	Added `NonUniqueMemTreeEntry` / `UniqueMemTreeEntry` and async helpers: insert-overlay/delete-shadow, lookup/scan encoded entries, and scan helpers used to merge MemTree/DiskTree results.
B-tree scan callback & node helpers `doradb-storage/src/index/btree_scan.rs`, `doradb-storage/src/index/btree_node.rs`	`BTreeSlotCallback::apply` now returns `Result<bool>` (scan propagates callback errors); added `lower_bound_*` probe-aware helpers and thin `value_for_slot` wrapper plus unit tests for BTree node probing.
Readonly buffer & file ownership removal `doradb-storage/src/buffer/readonly.rs`, `doradb-storage/src/buffer/mod.rs`, `doradb-storage/src/file/*.rs`	Removed `ReadonlyBackingFile` indirection; `ReadSubmission`/`WriteSubmission` and write/publish helpers now carry `Arc<SparseFile>` directly; removed `*_with_owner` variants and `readonly_backing()` usages; tests updated.
Table / ColumnStorage integration `doradb-storage/src/table/mod.rs`, `doradb-storage/src/table/persistence.rs`, `doradb-storage/src/table/tests.rs`	`ColumnStorage::new` now constructs per-index `SecondaryDiskTreeRuntime` instances and returns `Result`; persistence and tests updated to use `secondary_index_runtime(...).open_*_at(...)` instead of direct DiskTree construction.
Misc tests & visibility tweaks many `src/index` tests and callers	Updated tests and call sites to use new runtime/snapshot APIs, adjusted imports/visibility, and added unit tests for new behaviors (prefix scans, encoded-entry errors, callback error propagation).

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant DualTree as DualTreeUniqueIndex
    participant MemTree as MemTree
    participant DiskRuntime as SecondaryDiskTreeRuntime
    participant DiskTree as DiskTreeSnapshot

    Client->>DualTree: lookup(key)
    DualTree->>MemTree: lookup(key)
    alt mem hit
        MemTree-->>DualTree: Some(row_id)
        DualTree-->>Client: row_id
    else mem miss
        DualTree->>DiskRuntime: open_unique_at(root, guard)
        DiskRuntime-->>DiskTree: DiskTree snapshot
        DualTree->>DiskTree: lookup(key)
        alt disk match
            DiskTree-->>DualTree: row_id
            DualTree->>MemTree: insert overlay for disk match
            MemTree-->>DualTree: inserted
            DualTree-->>Client: row_id
        else no match
            DiskTree-->>DualTree: None
            DualTree-->>Client: None
        end
    end

sequenceDiagram
    participant Client
    participant DualTree as DualTreeNonUniqueIndex
    participant MemTree as MemTree
    participant DiskRuntime as SecondaryDiskTreeRuntime
    participant DiskTree as DiskTreeSnapshot

    Client->>DualTree: lookup(prefix)
    DualTree->>MemTree: lookup_encoded_entries(prefix)
    MemTree-->>DualTree: mem_entries (encoded, row_id, deleted)
    DualTree->>DiskRuntime: open_non_unique_at(root, guard)
    DiskRuntime-->>DiskTree: DiskTree snapshot
    DualTree->>DiskTree: prefix_scan_entries(prefix)
    DiskTree-->>DualTree: disk_entries (encoded_exact_key, row_id)
    DualTree->>DualTree: merge_non_unique_entries(mem_entries, disk_entries)
    DualTree-->>Client: deduplicated merged RowIDs

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

chore: Remove Per-File Readonly Buffer Pool Wrapper #541 — Direct overlap: changes to readonly buffer-pool and removal/rewiring of ReadonlyBackingFile and related call sites.
chore: Unify Persisted File Identity And Readonly Buffer IO #538 — Overlapping low-level refactor of readonly buffer/file I/O and removal of ReadonlyBackingFile; similar file/buffer call-site updates.
chore: DiskTree Checkpoint Sidecar Publication #555 — Related DiskTree/runtime work that modifies disk-tree internals, writer APIs, and snapshot open paths.

Poem

🐰
I nibble suffixes, stitch two roots as one,
MemTree hops quick, DiskTree sleeps in sun,
I merge, I mask, I peek where keys reside,
Two-layered index — a hop and a glide,
Carrot-encoded rows—hoppy work well done! 🥕

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'chore: composite secondary index core' directly describes the main changeset: adding composite secondary index core infrastructure via new module, refactored disk-tree design, and supporting helpers.
Linked Issues check	✅ Passed	The PR closes issue `#557`, which references task 000119 'Composite Secondary Index Core'. The changeset implements the composite secondary index core through new module, refactored disk-tree design, and supporting infrastructure.
Out of Scope Changes check	✅ Passed	All changes are in-scope: BTree node improvements (lower_bound helpers), disk-tree refactoring (runtime split design), composite secondary index module, supporting index helpers, and buffer/file ownership cleanup are all aligned with implementing composite secondary index core.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch dual-tree-core

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-04-14T23:46:41Z

Codecov Report

❌ Patch coverage is 96.11111% with 56 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.67%. Comparing base (34bf56c) to head (52aee6c).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
...adb-storage/src/index/composite_secondary_index.rs	95.02%	42 Missing ⚠️
doradb-storage/src/index/disk_tree.rs	95.92%	9 Missing ⚠️
doradb-storage/src/index/btree_node.rs	98.46%	1 Missing ⚠️
doradb-storage/src/index/btree_scan.rs	96.55%	1 Missing ⚠️
doradb-storage/src/index/non_unique_index.rs	99.16%	1 Missing ⚠️
doradb-storage/src/index/unique_index.rs	98.21%	1 Missing ⚠️
doradb-storage/src/table/mod.rs	97.22%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #558      +/-   ##
==========================================
+ Coverage   91.59%   91.67%   +0.08%     
==========================================
  Files         100      101       +1     
  Lines       53220    54203     +983     
==========================================
+ Hits        48746    49691     +945     
- Misses       4474     4512      +38

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

doradb-storage/src/index/disk_tree.rs (1)

1520-1545: Keep prefix_scan() on the row-id-only path.

Line 1521 now routes the legacy Vec<RowID> API through prefix_scan_entries(), which clones every matching exact key before immediately discarding it. That adds avoidable allocation/copy cost to an existing lookup path; please share the filtering loop without materializing keys unless the caller actually asks for them.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@doradb-storage/src/index/disk_tree.rs` around lines 1520 - 1545, prefix_scan
currently calls prefix_scan_entries and thus decodes and clones full exact keys
only to discard them, causing extra allocations; change prefix_scan(&self, key:
&[Val]) to perform the same loop as prefix_scan_entries but only collect RowID
values: compute the prefix via self.tree.encoder.encode_prefix(...), iterate
over self.tree.collect_entries().await?, check
entry.key.starts_with(prefix.as_bytes()), call
unpack_row_id_from_exact_key(&entry.key)? to get the RowID and push that into
the Vec<RowID>, and return it—leave prefix_scan_entries unchanged for callers
that need (Vec<u8>, RowID).

doradb-storage/src/index/composite_secondary_index.rs (1)

519-577: Reduce allocation churn in non-unique merge dedup tracking.

push_row_once currently allocates (to_vec) on every emitted key. For large scans, this can become avoidable overhead. Consider reusing one buffer for the last emitted key.

Optional perf-oriented refactor

 fn merge_non_unique_entries(
     mem_entries: &[NonUniqueMemTreeEntry],
     disk_entries: &[(Vec<u8>, RowID)],
     values: &mut Vec<RowID>,
 ) {
     let mut mem_idx = 0;
     let mut disk_idx = 0;
-    let mut last_emitted_key: Option<Vec<u8>> = None;
+    let mut last_emitted_key = Vec::new();
+    let mut has_last = false;
     while mem_idx < mem_entries.len() && disk_idx < disk_entries.len() {
         let mem = &mem_entries[mem_idx];
         let (disk_key, disk_row_id) = &disk_entries[disk_idx];
         match mem.encoded_key.as_slice().cmp(disk_key.as_slice()) {
             Ordering::Less => {
-                push_active_mem_entry(mem, values, &mut last_emitted_key);
+                push_active_mem_entry(mem, values, &mut last_emitted_key, &mut has_last);
                 mem_idx += 1;
             }
             Ordering::Equal => {
                 if !mem.deleted {
-                    push_row_once(&mem.encoded_key, mem.row_id, values, &mut last_emitted_key);
+                    push_row_once(
+                        &mem.encoded_key,
+                        mem.row_id,
+                        values,
+                        &mut last_emitted_key,
+                        &mut has_last,
+                    );
                 }
                 mem_idx += 1;
                 disk_idx += 1;
             }
             Ordering::Greater => {
-                push_row_once(disk_key, *disk_row_id, values, &mut last_emitted_key);
+                push_row_once(
+                    disk_key,
+                    *disk_row_id,
+                    values,
+                    &mut last_emitted_key,
+                    &mut has_last,
+                );
                 disk_idx += 1;
             }
         }
     }
     for mem in &mem_entries[mem_idx..] {
-        push_active_mem_entry(mem, values, &mut last_emitted_key);
+        push_active_mem_entry(mem, values, &mut last_emitted_key, &mut has_last);
     }
     for (disk_key, row_id) in &disk_entries[disk_idx..] {
-        push_row_once(disk_key, *row_id, values, &mut last_emitted_key);
+        push_row_once(disk_key, *row_id, values, &mut last_emitted_key, &mut has_last);
     }
 }
 
 fn push_active_mem_entry(
     entry: &NonUniqueMemTreeEntry,
     values: &mut Vec<RowID>,
-    last_emitted_key: &mut Option<Vec<u8>>,
+    last_emitted_key: &mut Vec<u8>,
+    has_last: &mut bool,
 ) {
     if !entry.deleted {
-        push_row_once(&entry.encoded_key, entry.row_id, values, last_emitted_key);
+        push_row_once(&entry.encoded_key, entry.row_id, values, last_emitted_key, has_last);
     }
 }
 
 fn push_row_once(
     encoded_key: &[u8],
     row_id: RowID,
     values: &mut Vec<RowID>,
-    last_emitted_key: &mut Option<Vec<u8>>,
+    last_emitted_key: &mut Vec<u8>,
+    has_last: &mut bool,
 ) {
-    if last_emitted_key.as_deref() != Some(encoded_key) {
+    if !*has_last || last_emitted_key.as_slice() != encoded_key {
         values.push(row_id);
-        *last_emitted_key = Some(encoded_key.to_vec());
+        last_emitted_key.clear();
+        last_emitted_key.extend_from_slice(encoded_key);
+        *has_last = true;
     }
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@doradb-storage/src/index/composite_secondary_index.rs` around lines 519 -
577, The current push_row_once causes a new allocation for every emitted key by
calling encoded_key.to_vec(); change push_row_once (and callers
merge_non_unique_entries and push_active_mem_entry) to reuse the single Optional
buffer in last_emitted_key instead of always allocating: compare encoded_key
with last_emitted_key.as_deref() as now, but when updating last_emitted_key, if
last_emitted_key.as_mut() yields Some(buf) then clear and
extend_from_slice(encoded_key) (reusing capacity), otherwise set
last_emitted_key to Some(encoded_key.to_vec()) only once; keep the same push
semantics (values.push(row_id) only when key differs). This reduces allocation
churn while keeping function signatures push_row_once(encoded_key, row_id,
values, last_emitted_key) and push_active_mem_entry(entry, values,
last_emitted_key) intact.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@doradb-storage/src/index/non_unique_index.rs`:
- Around line 163-176: The scan currently swallows decode failures from
push_encoded_exact_entry (via CollectEncodedExactEntries) causing
lookup_encoded_entries to return partial results instead of failing; change the
callback (CollectEncodedExactEntries / push_encoded_exact_entry) to return a
Result and propagate any decode error (e.g. Error::InvalidState) up instead of
converting it to a stop signal, ensure prefix_scanner/scan_prefix returns that
error, and make lookup_encoded_entries propagate the scan_prefix Result (no
longer treating a stopped scan as success) so concrete decode failures surface
to the caller.

---

Nitpick comments:
In `@doradb-storage/src/index/composite_secondary_index.rs`:
- Around line 519-577: The current push_row_once causes a new allocation for
every emitted key by calling encoded_key.to_vec(); change push_row_once (and
callers merge_non_unique_entries and push_active_mem_entry) to reuse the single
Optional buffer in last_emitted_key instead of always allocating: compare
encoded_key with last_emitted_key.as_deref() as now, but when updating
last_emitted_key, if last_emitted_key.as_mut() yields Some(buf) then clear and
extend_from_slice(encoded_key) (reusing capacity), otherwise set
last_emitted_key to Some(encoded_key.to_vec()) only once; keep the same push
semantics (values.push(row_id) only when key differs). This reduces allocation
churn while keeping function signatures push_row_once(encoded_key, row_id,
values, last_emitted_key) and push_active_mem_entry(entry, values,
last_emitted_key) intact.

In `@doradb-storage/src/index/disk_tree.rs`:
- Around line 1520-1545: prefix_scan currently calls prefix_scan_entries and
thus decodes and clones full exact keys only to discard them, causing extra
allocations; change prefix_scan(&self, key: &[Val]) to perform the same loop as
prefix_scan_entries but only collect RowID values: compute the prefix via
self.tree.encoder.encode_prefix(...), iterate over
self.tree.collect_entries().await?, check
entry.key.starts_with(prefix.as_bytes()), call
unpack_row_id_from_exact_key(&entry.key)? to get the RowID and push that into
the Vec<RowID>, and return it—leave prefix_scan_entries unchanged for callers
that need (Vec<u8>, RowID).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 543e40ad-f329-4eaa-a3cc-10443e230681

📥 Commits

Reviewing files that changed from the base of the PR and between 34bf56c and 25f0d61.

⛔ Files ignored due to path filters (4)

docs/rfcs/0014-dual-tree-secondary-index.md is excluded by !docs/**, !**/*.md
docs/tasks/000119-composite-secondary-index-core.md is excluded by !docs/**, !**/*.md
docs/tasks/next-id is excluded by !docs/**
docs/unsafe-usage-baseline.md is excluded by !docs/**, !**/*.md

📒 Files selected for processing (6)

doradb-storage/src/index/btree_node.rs
doradb-storage/src/index/composite_secondary_index.rs
doradb-storage/src/index/disk_tree.rs
doradb-storage/src/index/mod.rs
doradb-storage/src/index/non_unique_index.rs
doradb-storage/src/index/unique_index.rs

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

doradb-storage/src/index/disk_tree.rs (1)

638-649: Let open() create and own its PoolGuard.

The runtime already owns the readonly pool, but callers still have to thread an external guard with the same lifetime. Making the snapshot self-contained here would remove an easy-to-misuse parameter and simplify downstream APIs.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@doradb-storage/src/index/disk_tree.rs` around lines 638 - 649, open currently
requires callers to pass a PoolGuard; instead have open() acquire and own the
guard itself by calling self.disk_pool.pool_guard() so callers no longer thread
a guard. Change open(&self, root_block_id: BlockID, disk_pool_guard: &PoolGuard)
-> DiskTree to remove the external PoolGuard parameter and inside call
self.disk_pool.pool_guard(), then pass that owned PoolGuard into
DiskTree::from_root_snapshot(root_block_id, self, pool_guard). Update
DiskTree::from_root_snapshot signature if needed to accept an owned PoolGuard
(or adjust lifetimes) and remove usages that expect an external guard; you may
also remove or keep disk_pool_guard() helper (disk_pool_guard) if still useful.

doradb-storage/src/index/composite_secondary_index.rs (1)

601-612: Consider reducing allocations in push_row_once.

The function clones encoded_key on every successful push (line 610). Since this is called in a tight merge loop, consider tracking the last emitted position/slice rather than cloning the full key each time.

♻️ Alternative using last index tracking

One approach is to track which source (mem or disk) and index was last emitted, then compare against the original slice directly instead of cloning:

-fn push_row_once(
-    encoded_key: &[u8],
-    row_id: RowID,
-    values: &mut Vec<RowID>,
-    last_emitted_key: &mut Option<Vec<u8>>,
-) {
-    if last_emitted_key.as_deref() != Some(encoded_key) {
-        values.push(row_id);
-        *last_emitted_key = Some(encoded_key.to_vec());
-    }
-}
+fn push_row_once<'a>(
+    encoded_key: &'a [u8],
+    row_id: RowID,
+    values: &mut Vec<RowID>,
+    last_emitted_key: &mut Option<&'a [u8]>,
+) {
+    if *last_emitted_key != Some(encoded_key) {
+        values.push(row_id);
+        *last_emitted_key = Some(encoded_key);
+    }
+}

This avoids per-push allocations by borrowing the slice directly.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@doradb-storage/src/index/composite_secondary_index.rs` around lines 601 -
612, push_row_once currently clones encoded_key into last_emitted_key on each
push, causing allocations in a hot merge loop; change the tracking to store a
lightweight identifier instead of a Vec<u8> (for example replace
last_emitted_key: &mut Option<Vec<u8>> with a small enum/tuple like
Option<(SourceTag, usize)> or Option<(source_id, index)> that records which
input source and index produced the last emitted key) and then compare that
identifier to the current source+index before pushing into values (keep function
name push_row_once, parameters encoded_key and row_id, and update callers that
maintain last_emitted_key to provide and update the new lightweight identifier).
Ensure no long-lived borrowed slices are stored across iterations.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@doradb-storage/src/index/composite_secondary_index.rs`:
- Around line 452-474: In compare_delete, if mem.lookup_unique(...) returns None
you must still consult the DiskTree like the unique variant does: after the mem
check, perform a disk lookup for the key+row_id (or call the disk-layer
compare_delete equivalent) on self.disk using the same pool_guard, key, row_id
and ts; if the disk-layer lookup indicates the entry exists, invoke the disk
compare_delete behavior and return its Result<bool>, otherwise return Ok(true).
Update the compare_delete function to call self.disk (e.g.,
self.disk.lookup_non_unique or self.disk.compare_delete) when mem lookup is None
so non-unique indexes correctly verify cold entries.

In `@doradb-storage/src/index/disk_tree.rs`:
- Around line 1557-1566: prefix_scan_entries currently calls collect_entries()
which materializes the entire tree and then filters, causing O(N) scans; change
it to a streamed prefix scan: compute the prefix bytes via
self.encoder().encode_prefix(key, Some(ROW_ID_SIZE)), obtain an async
iterator/stream that starts scanning at that prefix (replace collect_entries()
with the crate/engine method that yields entries from a start key), iterate
entries one-by-one, for each entry check
entry.key.starts_with(prefix.as_bytes()) and
unpack_row_id_from_exact_key(&entry.key) and push into results, and break the
loop as soon as an entry no longer matches the prefix to avoid scanning the rest
of the tree. Use prefix_scan_entries, encoder().encode_prefix,
unpack_row_id_from_exact_key and remove the collect_entries() materialization.

---

Nitpick comments:
In `@doradb-storage/src/index/composite_secondary_index.rs`:
- Around line 601-612: push_row_once currently clones encoded_key into
last_emitted_key on each push, causing allocations in a hot merge loop; change
the tracking to store a lightweight identifier instead of a Vec<u8> (for example
replace last_emitted_key: &mut Option<Vec<u8>> with a small enum/tuple like
Option<(SourceTag, usize)> or Option<(source_id, index)> that records which
input source and index produced the last emitted key) and then compare that
identifier to the current source+index before pushing into values (keep function
name push_row_once, parameters encoded_key and row_id, and update callers that
maintain last_emitted_key to provide and update the new lightweight identifier).
Ensure no long-lived borrowed slices are stored across iterations.

In `@doradb-storage/src/index/disk_tree.rs`:
- Around line 638-649: open currently requires callers to pass a PoolGuard;
instead have open() acquire and own the guard itself by calling
self.disk_pool.pool_guard() so callers no longer thread a guard. Change
open(&self, root_block_id: BlockID, disk_pool_guard: &PoolGuard) -> DiskTree to
remove the external PoolGuard parameter and inside call
self.disk_pool.pool_guard(), then pass that owned PoolGuard into
DiskTree::from_root_snapshot(root_block_id, self, pool_guard). Update
DiskTree::from_root_snapshot signature if needed to accept an owned PoolGuard
(or adjust lifetimes) and remove usages that expect an external guard; you may
also remove or keep disk_pool_guard() helper (disk_pool_guard) if still useful.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 26d41c2e-0490-40f0-a8bf-4d27e84cdc6b

📥 Commits

Reviewing files that changed from the base of the PR and between 43b3fa5 and 1e8b199.

⛔ Files ignored due to path filters (2)

docs/rfcs/0014-dual-tree-secondary-index.md is excluded by !docs/**, !**/*.md
docs/tasks/000119-composite-secondary-index-core.md is excluded by !docs/**, !**/*.md

📒 Files selected for processing (11)

doradb-storage/src/buffer/mod.rs
doradb-storage/src/buffer/readonly.rs
doradb-storage/src/file/cow_file.rs
doradb-storage/src/file/mod.rs
doradb-storage/src/file/multi_table_file.rs
doradb-storage/src/file/table_file.rs
doradb-storage/src/index/composite_secondary_index.rs
doradb-storage/src/index/disk_tree.rs
doradb-storage/src/table/mod.rs
doradb-storage/src/table/persistence.rs
doradb-storage/src/table/tests.rs

✅ Files skipped from review due to trivial changes (1)

doradb-storage/src/buffer/readonly.rs

composite secondary index

25f0d61

jiangzhe self-assigned this Apr 14, 2026

coderabbitai bot reviewed Apr 14, 2026

View reviewed changes

Comment thread doradb-storage/src/index/non_unique_index.rs

jiangzhe added 3 commits April 15, 2026 12:14

fix issues

43b3fa5

dual-tree-core

60c2c2d

resolve task

1e8b199

coderabbitai bot reviewed Apr 15, 2026

View reviewed changes

Comment thread doradb-storage/src/index/composite_secondary_index.rs

Comment thread doradb-storage/src/index/disk_tree.rs

jiangzhe added 2 commits April 15, 2026 22:30

fix issue

a47ff01

update

52aee6c

jiangzhe merged commit 1772490 into main Apr 15, 2026
6 checks passed

jiangzhe deleted the dual-tree-core branch April 15, 2026 15:28

This was referenced Apr 16, 2026

feat: Dual-Tree Secondary Index #560

Merged

chore: Secondary Index Cleanup Hardening #563

Merged

chore: DiskTree Prefix Compression #566

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: composite secondary index core#558

chore: composite secondary index core#558
jiangzhe merged 6 commits intomainfrom
dual-tree-core

jiangzhe commented Apr 14, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 14, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

codecov bot commented Apr 14, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jiangzhe commented Apr 14, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

codecov bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jiangzhe commented Apr 14, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 14, 2026 •

edited

Loading

codecov bot commented Apr 14, 2026 •

edited

Loading