Skip to content

chore: composite secondary index core#558

Merged
jiangzhe merged 6 commits intomainfrom
dual-tree-core
Apr 15, 2026
Merged

chore: composite secondary index core#558
jiangzhe merged 6 commits intomainfrom
dual-tree-core

Conversation

@jiangzhe
Copy link
Copy Markdown
Owner

@jiangzhe jiangzhe commented Apr 14, 2026

Closes #557

Summary by CodeRabbit

  • New Features

    • Composite secondary indexes combining in-memory and on-disk layers with consistent snapshot opens and merged scan/lookup results.
    • Snapshot-backed disk-tree runtime for more stable, efficient snapshot reads.
  • Bug Fixes

    • Index scans now abort on callback errors; malformed encoded entries are detected and reported.
    • Improved common-prefix aware lower-bound probing for more accurate and efficient prefix scans.
  • Tests

    • Added unit tests for unique/non-unique semantics, merge/overlay behavior, snapshot consistency, scans, and error propagation.

@jiangzhe jiangzhe self-assigned this Apr 14, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 14, 2026

Warning

Rate limit exceeded

@jiangzhe has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 7 minutes and 50 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 7 minutes and 50 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e338b977-89d1-4d0c-8788-502443b9aae4

📥 Commits

Reviewing files that changed from the base of the PR and between a47ff01 and 52aee6c.

⛔ Files ignored due to path filters (3)
  • docs/backlogs/000086-secondary-index-dual-tree-access-path.md is excluded by !docs/**, !**/*.md
  • docs/backlogs/next-id is excluded by !docs/**
  • docs/tasks/000119-composite-secondary-index-core.md is excluded by !docs/**, !**/*.md
📒 Files selected for processing (1)
  • doradb-storage/src/index/composite_secondary_index.rs
📝 Walkthrough

Walkthrough

Adds a composite secondary-index combining MemTree + DiskTree runtimes, refactors DiskTree into runtime/snapshot surfaces, expands MemTree encoded-entry helpers, adjusts B-tree scan callback error semantics, and removes the ReadonlyBackingFile indirection across buffer/file/write paths.

Changes

Cohort / File(s) Summary
Composite secondary-index
doradb-storage/src/index/composite_secondary_index.rs, doradb-storage/src/index/mod.rs
New module implementing SecondaryDiskTreeRuntime, DualTreeUniqueIndex, DualTreeNonUniqueIndex, and DualTreeSecondaryIndex wrapper; runtime-based openers for unique/non-unique disk snapshots; integrated into module exports.
DiskTree runtime & snapshot
doradb-storage/src/index/disk_tree.rs
Introduced DiskTreeRuntime<F> owning IO/context and DiskTree<'a,F> as a borrowing snapshot; added Unique/NonUnique runtime factories; added incremental scan API and non-unique prefix-scan entries; increased visibility of several internal types.
MemTree encoded-entry APIs
doradb-storage/src/index/non_unique_index.rs, doradb-storage/src/index/unique_index.rs
Added NonUniqueMemTreeEntry / UniqueMemTreeEntry and async helpers: insert-overlay/delete-shadow, lookup/scan encoded entries, and scan helpers used to merge MemTree/DiskTree results.
B-tree scan callback & node helpers
doradb-storage/src/index/btree_scan.rs, doradb-storage/src/index/btree_node.rs
BTreeSlotCallback::apply now returns Result<bool> (scan propagates callback errors); added lower_bound_* probe-aware helpers and thin value_for_slot wrapper plus unit tests for BTree node probing.
Readonly buffer & file ownership removal
doradb-storage/src/buffer/readonly.rs, doradb-storage/src/buffer/mod.rs, doradb-storage/src/file/*.rs
Removed ReadonlyBackingFile indirection; ReadSubmission/WriteSubmission and write/publish helpers now carry Arc<SparseFile> directly; removed *_with_owner variants and readonly_backing() usages; tests updated.
Table / ColumnStorage integration
doradb-storage/src/table/mod.rs, doradb-storage/src/table/persistence.rs, doradb-storage/src/table/tests.rs
ColumnStorage::new now constructs per-index SecondaryDiskTreeRuntime instances and returns Result; persistence and tests updated to use secondary_index_runtime(...).open_*_at(...) instead of direct DiskTree construction.
Misc tests & visibility tweaks
many src/index tests and callers
Updated tests and call sites to use new runtime/snapshot APIs, adjusted imports/visibility, and added unit tests for new behaviors (prefix scans, encoded-entry errors, callback error propagation).

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant DualTree as DualTreeUniqueIndex
    participant MemTree as MemTree
    participant DiskRuntime as SecondaryDiskTreeRuntime
    participant DiskTree as DiskTreeSnapshot

    Client->>DualTree: lookup(key)
    DualTree->>MemTree: lookup(key)
    alt mem hit
        MemTree-->>DualTree: Some(row_id)
        DualTree-->>Client: row_id
    else mem miss
        DualTree->>DiskRuntime: open_unique_at(root, guard)
        DiskRuntime-->>DiskTree: DiskTree snapshot
        DualTree->>DiskTree: lookup(key)
        alt disk match
            DiskTree-->>DualTree: row_id
            DualTree->>MemTree: insert overlay for disk match
            MemTree-->>DualTree: inserted
            DualTree-->>Client: row_id
        else no match
            DiskTree-->>DualTree: None
            DualTree-->>Client: None
        end
    end
Loading
sequenceDiagram
    participant Client
    participant DualTree as DualTreeNonUniqueIndex
    participant MemTree as MemTree
    participant DiskRuntime as SecondaryDiskTreeRuntime
    participant DiskTree as DiskTreeSnapshot

    Client->>DualTree: lookup(prefix)
    DualTree->>MemTree: lookup_encoded_entries(prefix)
    MemTree-->>DualTree: mem_entries (encoded, row_id, deleted)
    DualTree->>DiskRuntime: open_non_unique_at(root, guard)
    DiskRuntime-->>DiskTree: DiskTree snapshot
    DualTree->>DiskTree: prefix_scan_entries(prefix)
    DiskTree-->>DualTree: disk_entries (encoded_exact_key, row_id)
    DualTree->>DualTree: merge_non_unique_entries(mem_entries, disk_entries)
    DualTree-->>Client: deduplicated merged RowIDs
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Poem

🐰
I nibble suffixes, stitch two roots as one,
MemTree hops quick, DiskTree sleeps in sun,
I merge, I mask, I peek where keys reside,
Two-layered index — a hop and a glide,
Carrot-encoded rows—hoppy work well done! 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'chore: composite secondary index core' directly describes the main changeset: adding composite secondary index core infrastructure via new module, refactored disk-tree design, and supporting helpers.
Linked Issues check ✅ Passed The PR closes issue #557, which references task 000119 'Composite Secondary Index Core'. The changeset implements the composite secondary index core through new module, refactored disk-tree design, and supporting infrastructure.
Out of Scope Changes check ✅ Passed All changes are in-scope: BTree node improvements (lower_bound helpers), disk-tree refactoring (runtime split design), composite secondary index module, supporting index helpers, and buffer/file ownership cleanup are all aligned with implementing composite secondary index core.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dual-tree-core

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 14, 2026

Codecov Report

❌ Patch coverage is 96.11111% with 56 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.67%. Comparing base (34bf56c) to head (52aee6c).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...adb-storage/src/index/composite_secondary_index.rs 95.02% 42 Missing ⚠️
doradb-storage/src/index/disk_tree.rs 95.92% 9 Missing ⚠️
doradb-storage/src/index/btree_node.rs 98.46% 1 Missing ⚠️
doradb-storage/src/index/btree_scan.rs 96.55% 1 Missing ⚠️
doradb-storage/src/index/non_unique_index.rs 99.16% 1 Missing ⚠️
doradb-storage/src/index/unique_index.rs 98.21% 1 Missing ⚠️
doradb-storage/src/table/mod.rs 97.22% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #558      +/-   ##
==========================================
+ Coverage   91.59%   91.67%   +0.08%     
==========================================
  Files         100      101       +1     
  Lines       53220    54203     +983     
==========================================
+ Hits        48746    49691     +945     
- Misses       4474     4512      +38     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
doradb-storage/src/index/disk_tree.rs (1)

1520-1545: Keep prefix_scan() on the row-id-only path.

Line 1521 now routes the legacy Vec<RowID> API through prefix_scan_entries(), which clones every matching exact key before immediately discarding it. That adds avoidable allocation/copy cost to an existing lookup path; please share the filtering loop without materializing keys unless the caller actually asks for them.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@doradb-storage/src/index/disk_tree.rs` around lines 1520 - 1545, prefix_scan
currently calls prefix_scan_entries and thus decodes and clones full exact keys
only to discard them, causing extra allocations; change prefix_scan(&self, key:
&[Val]) to perform the same loop as prefix_scan_entries but only collect RowID
values: compute the prefix via self.tree.encoder.encode_prefix(...), iterate
over self.tree.collect_entries().await?, check
entry.key.starts_with(prefix.as_bytes()), call
unpack_row_id_from_exact_key(&entry.key)? to get the RowID and push that into
the Vec<RowID>, and return it—leave prefix_scan_entries unchanged for callers
that need (Vec<u8>, RowID).
doradb-storage/src/index/composite_secondary_index.rs (1)

519-577: Reduce allocation churn in non-unique merge dedup tracking.

push_row_once currently allocates (to_vec) on every emitted key. For large scans, this can become avoidable overhead. Consider reusing one buffer for the last emitted key.

Optional perf-oriented refactor
 fn merge_non_unique_entries(
     mem_entries: &[NonUniqueMemTreeEntry],
     disk_entries: &[(Vec<u8>, RowID)],
     values: &mut Vec<RowID>,
 ) {
     let mut mem_idx = 0;
     let mut disk_idx = 0;
-    let mut last_emitted_key: Option<Vec<u8>> = None;
+    let mut last_emitted_key = Vec::new();
+    let mut has_last = false;
     while mem_idx < mem_entries.len() && disk_idx < disk_entries.len() {
         let mem = &mem_entries[mem_idx];
         let (disk_key, disk_row_id) = &disk_entries[disk_idx];
         match mem.encoded_key.as_slice().cmp(disk_key.as_slice()) {
             Ordering::Less => {
-                push_active_mem_entry(mem, values, &mut last_emitted_key);
+                push_active_mem_entry(mem, values, &mut last_emitted_key, &mut has_last);
                 mem_idx += 1;
             }
             Ordering::Equal => {
                 if !mem.deleted {
-                    push_row_once(&mem.encoded_key, mem.row_id, values, &mut last_emitted_key);
+                    push_row_once(
+                        &mem.encoded_key,
+                        mem.row_id,
+                        values,
+                        &mut last_emitted_key,
+                        &mut has_last,
+                    );
                 }
                 mem_idx += 1;
                 disk_idx += 1;
             }
             Ordering::Greater => {
-                push_row_once(disk_key, *disk_row_id, values, &mut last_emitted_key);
+                push_row_once(
+                    disk_key,
+                    *disk_row_id,
+                    values,
+                    &mut last_emitted_key,
+                    &mut has_last,
+                );
                 disk_idx += 1;
             }
         }
     }
     for mem in &mem_entries[mem_idx..] {
-        push_active_mem_entry(mem, values, &mut last_emitted_key);
+        push_active_mem_entry(mem, values, &mut last_emitted_key, &mut has_last);
     }
     for (disk_key, row_id) in &disk_entries[disk_idx..] {
-        push_row_once(disk_key, *row_id, values, &mut last_emitted_key);
+        push_row_once(disk_key, *row_id, values, &mut last_emitted_key, &mut has_last);
     }
 }
 
 fn push_active_mem_entry(
     entry: &NonUniqueMemTreeEntry,
     values: &mut Vec<RowID>,
-    last_emitted_key: &mut Option<Vec<u8>>,
+    last_emitted_key: &mut Vec<u8>,
+    has_last: &mut bool,
 ) {
     if !entry.deleted {
-        push_row_once(&entry.encoded_key, entry.row_id, values, last_emitted_key);
+        push_row_once(&entry.encoded_key, entry.row_id, values, last_emitted_key, has_last);
     }
 }
 
 fn push_row_once(
     encoded_key: &[u8],
     row_id: RowID,
     values: &mut Vec<RowID>,
-    last_emitted_key: &mut Option<Vec<u8>>,
+    last_emitted_key: &mut Vec<u8>,
+    has_last: &mut bool,
 ) {
-    if last_emitted_key.as_deref() != Some(encoded_key) {
+    if !*has_last || last_emitted_key.as_slice() != encoded_key {
         values.push(row_id);
-        *last_emitted_key = Some(encoded_key.to_vec());
+        last_emitted_key.clear();
+        last_emitted_key.extend_from_slice(encoded_key);
+        *has_last = true;
     }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@doradb-storage/src/index/composite_secondary_index.rs` around lines 519 -
577, The current push_row_once causes a new allocation for every emitted key by
calling encoded_key.to_vec(); change push_row_once (and callers
merge_non_unique_entries and push_active_mem_entry) to reuse the single Optional
buffer in last_emitted_key instead of always allocating: compare encoded_key
with last_emitted_key.as_deref() as now, but when updating last_emitted_key, if
last_emitted_key.as_mut() yields Some(buf) then clear and
extend_from_slice(encoded_key) (reusing capacity), otherwise set
last_emitted_key to Some(encoded_key.to_vec()) only once; keep the same push
semantics (values.push(row_id) only when key differs). This reduces allocation
churn while keeping function signatures push_row_once(encoded_key, row_id,
values, last_emitted_key) and push_active_mem_entry(entry, values,
last_emitted_key) intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@doradb-storage/src/index/non_unique_index.rs`:
- Around line 163-176: The scan currently swallows decode failures from
push_encoded_exact_entry (via CollectEncodedExactEntries) causing
lookup_encoded_entries to return partial results instead of failing; change the
callback (CollectEncodedExactEntries / push_encoded_exact_entry) to return a
Result and propagate any decode error (e.g. Error::InvalidState) up instead of
converting it to a stop signal, ensure prefix_scanner/scan_prefix returns that
error, and make lookup_encoded_entries propagate the scan_prefix Result (no
longer treating a stopped scan as success) so concrete decode failures surface
to the caller.

---

Nitpick comments:
In `@doradb-storage/src/index/composite_secondary_index.rs`:
- Around line 519-577: The current push_row_once causes a new allocation for
every emitted key by calling encoded_key.to_vec(); change push_row_once (and
callers merge_non_unique_entries and push_active_mem_entry) to reuse the single
Optional buffer in last_emitted_key instead of always allocating: compare
encoded_key with last_emitted_key.as_deref() as now, but when updating
last_emitted_key, if last_emitted_key.as_mut() yields Some(buf) then clear and
extend_from_slice(encoded_key) (reusing capacity), otherwise set
last_emitted_key to Some(encoded_key.to_vec()) only once; keep the same push
semantics (values.push(row_id) only when key differs). This reduces allocation
churn while keeping function signatures push_row_once(encoded_key, row_id,
values, last_emitted_key) and push_active_mem_entry(entry, values,
last_emitted_key) intact.

In `@doradb-storage/src/index/disk_tree.rs`:
- Around line 1520-1545: prefix_scan currently calls prefix_scan_entries and
thus decodes and clones full exact keys only to discard them, causing extra
allocations; change prefix_scan(&self, key: &[Val]) to perform the same loop as
prefix_scan_entries but only collect RowID values: compute the prefix via
self.tree.encoder.encode_prefix(...), iterate over
self.tree.collect_entries().await?, check
entry.key.starts_with(prefix.as_bytes()), call
unpack_row_id_from_exact_key(&entry.key)? to get the RowID and push that into
the Vec<RowID>, and return it—leave prefix_scan_entries unchanged for callers
that need (Vec<u8>, RowID).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 543e40ad-f329-4eaa-a3cc-10443e230681

📥 Commits

Reviewing files that changed from the base of the PR and between 34bf56c and 25f0d61.

⛔ Files ignored due to path filters (4)
  • docs/rfcs/0014-dual-tree-secondary-index.md is excluded by !docs/**, !**/*.md
  • docs/tasks/000119-composite-secondary-index-core.md is excluded by !docs/**, !**/*.md
  • docs/tasks/next-id is excluded by !docs/**
  • docs/unsafe-usage-baseline.md is excluded by !docs/**, !**/*.md
📒 Files selected for processing (6)
  • doradb-storage/src/index/btree_node.rs
  • doradb-storage/src/index/composite_secondary_index.rs
  • doradb-storage/src/index/disk_tree.rs
  • doradb-storage/src/index/mod.rs
  • doradb-storage/src/index/non_unique_index.rs
  • doradb-storage/src/index/unique_index.rs

Comment thread doradb-storage/src/index/non_unique_index.rs
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
doradb-storage/src/index/disk_tree.rs (1)

638-649: Let open() create and own its PoolGuard.

The runtime already owns the readonly pool, but callers still have to thread an external guard with the same lifetime. Making the snapshot self-contained here would remove an easy-to-misuse parameter and simplify downstream APIs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@doradb-storage/src/index/disk_tree.rs` around lines 638 - 649, open currently
requires callers to pass a PoolGuard; instead have open() acquire and own the
guard itself by calling self.disk_pool.pool_guard() so callers no longer thread
a guard. Change open(&self, root_block_id: BlockID, disk_pool_guard: &PoolGuard)
-> DiskTree to remove the external PoolGuard parameter and inside call
self.disk_pool.pool_guard(), then pass that owned PoolGuard into
DiskTree::from_root_snapshot(root_block_id, self, pool_guard). Update
DiskTree::from_root_snapshot signature if needed to accept an owned PoolGuard
(or adjust lifetimes) and remove usages that expect an external guard; you may
also remove or keep disk_pool_guard() helper (disk_pool_guard) if still useful.
doradb-storage/src/index/composite_secondary_index.rs (1)

601-612: Consider reducing allocations in push_row_once.

The function clones encoded_key on every successful push (line 610). Since this is called in a tight merge loop, consider tracking the last emitted position/slice rather than cloning the full key each time.

♻️ Alternative using last index tracking

One approach is to track which source (mem or disk) and index was last emitted, then compare against the original slice directly instead of cloning:

-fn push_row_once(
-    encoded_key: &[u8],
-    row_id: RowID,
-    values: &mut Vec<RowID>,
-    last_emitted_key: &mut Option<Vec<u8>>,
-) {
-    if last_emitted_key.as_deref() != Some(encoded_key) {
-        values.push(row_id);
-        *last_emitted_key = Some(encoded_key.to_vec());
-    }
-}
+fn push_row_once<'a>(
+    encoded_key: &'a [u8],
+    row_id: RowID,
+    values: &mut Vec<RowID>,
+    last_emitted_key: &mut Option<&'a [u8]>,
+) {
+    if *last_emitted_key != Some(encoded_key) {
+        values.push(row_id);
+        *last_emitted_key = Some(encoded_key);
+    }
+}

This avoids per-push allocations by borrowing the slice directly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@doradb-storage/src/index/composite_secondary_index.rs` around lines 601 -
612, push_row_once currently clones encoded_key into last_emitted_key on each
push, causing allocations in a hot merge loop; change the tracking to store a
lightweight identifier instead of a Vec<u8> (for example replace
last_emitted_key: &mut Option<Vec<u8>> with a small enum/tuple like
Option<(SourceTag, usize)> or Option<(source_id, index)> that records which
input source and index produced the last emitted key) and then compare that
identifier to the current source+index before pushing into values (keep function
name push_row_once, parameters encoded_key and row_id, and update callers that
maintain last_emitted_key to provide and update the new lightweight identifier).
Ensure no long-lived borrowed slices are stored across iterations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@doradb-storage/src/index/composite_secondary_index.rs`:
- Around line 452-474: In compare_delete, if mem.lookup_unique(...) returns None
you must still consult the DiskTree like the unique variant does: after the mem
check, perform a disk lookup for the key+row_id (or call the disk-layer
compare_delete equivalent) on self.disk using the same pool_guard, key, row_id
and ts; if the disk-layer lookup indicates the entry exists, invoke the disk
compare_delete behavior and return its Result<bool>, otherwise return Ok(true).
Update the compare_delete function to call self.disk (e.g.,
self.disk.lookup_non_unique or self.disk.compare_delete) when mem lookup is None
so non-unique indexes correctly verify cold entries.

In `@doradb-storage/src/index/disk_tree.rs`:
- Around line 1557-1566: prefix_scan_entries currently calls collect_entries()
which materializes the entire tree and then filters, causing O(N) scans; change
it to a streamed prefix scan: compute the prefix bytes via
self.encoder().encode_prefix(key, Some(ROW_ID_SIZE)), obtain an async
iterator/stream that starts scanning at that prefix (replace collect_entries()
with the crate/engine method that yields entries from a start key), iterate
entries one-by-one, for each entry check
entry.key.starts_with(prefix.as_bytes()) and
unpack_row_id_from_exact_key(&entry.key) and push into results, and break the
loop as soon as an entry no longer matches the prefix to avoid scanning the rest
of the tree. Use prefix_scan_entries, encoder().encode_prefix,
unpack_row_id_from_exact_key and remove the collect_entries() materialization.

---

Nitpick comments:
In `@doradb-storage/src/index/composite_secondary_index.rs`:
- Around line 601-612: push_row_once currently clones encoded_key into
last_emitted_key on each push, causing allocations in a hot merge loop; change
the tracking to store a lightweight identifier instead of a Vec<u8> (for example
replace last_emitted_key: &mut Option<Vec<u8>> with a small enum/tuple like
Option<(SourceTag, usize)> or Option<(source_id, index)> that records which
input source and index produced the last emitted key) and then compare that
identifier to the current source+index before pushing into values (keep function
name push_row_once, parameters encoded_key and row_id, and update callers that
maintain last_emitted_key to provide and update the new lightweight identifier).
Ensure no long-lived borrowed slices are stored across iterations.

In `@doradb-storage/src/index/disk_tree.rs`:
- Around line 638-649: open currently requires callers to pass a PoolGuard;
instead have open() acquire and own the guard itself by calling
self.disk_pool.pool_guard() so callers no longer thread a guard. Change
open(&self, root_block_id: BlockID, disk_pool_guard: &PoolGuard) -> DiskTree to
remove the external PoolGuard parameter and inside call
self.disk_pool.pool_guard(), then pass that owned PoolGuard into
DiskTree::from_root_snapshot(root_block_id, self, pool_guard). Update
DiskTree::from_root_snapshot signature if needed to accept an owned PoolGuard
(or adjust lifetimes) and remove usages that expect an external guard; you may
also remove or keep disk_pool_guard() helper (disk_pool_guard) if still useful.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 26d41c2e-0490-40f0-a8bf-4d27e84cdc6b

📥 Commits

Reviewing files that changed from the base of the PR and between 43b3fa5 and 1e8b199.

⛔ Files ignored due to path filters (2)
  • docs/rfcs/0014-dual-tree-secondary-index.md is excluded by !docs/**, !**/*.md
  • docs/tasks/000119-composite-secondary-index-core.md is excluded by !docs/**, !**/*.md
📒 Files selected for processing (11)
  • doradb-storage/src/buffer/mod.rs
  • doradb-storage/src/buffer/readonly.rs
  • doradb-storage/src/file/cow_file.rs
  • doradb-storage/src/file/mod.rs
  • doradb-storage/src/file/multi_table_file.rs
  • doradb-storage/src/file/table_file.rs
  • doradb-storage/src/index/composite_secondary_index.rs
  • doradb-storage/src/index/disk_tree.rs
  • doradb-storage/src/table/mod.rs
  • doradb-storage/src/table/persistence.rs
  • doradb-storage/src/table/tests.rs
✅ Files skipped from review due to trivial changes (1)
  • doradb-storage/src/buffer/readonly.rs

Comment thread doradb-storage/src/index/composite_secondary_index.rs
Comment thread doradb-storage/src/index/disk_tree.rs
@jiangzhe jiangzhe merged commit 1772490 into main Apr 15, 2026
6 checks passed
@jiangzhe jiangzhe deleted the dual-tree-core branch April 15, 2026 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Task: Composite Secondary Index Core

1 participant