feat(memwal): implement read-path shard pruning for MemWAL queries#7122
Open
beinan wants to merge 5 commits into
Open
feat(memwal): implement read-path shard pruning for MemWAL queries#7122beinan wants to merge 5 commits into
beinan wants to merge 5 commits into
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
added 3 commits
June 5, 2026 12:45
When a query filter contains an equality or IN predicate on the sharding column, the scan planner now evaluates the filter values through the sharding transform (bucket/identity) and skips shards whose field values do not match. This avoids scanning data in non-matching shards entirely. Key changes: - Add shard_field_values to ShardSnapshot for read-path access - Add hash_scalar_to_bucket and source_column_for_field helpers to sharding.rs - New shard_pruning module with filter-to-shard-id evaluation - Wire collect_pruned into LsmScanPlanner with type coercion for SQL literals - Add with_sharding_spec to LsmScanner and LsmDataSourceCollector Co-Authored-By: Beinan Wang <beinanwang@microsoft.com>
Run cargo fmt and address clippy::collapsible_if warning in collector.rs by merging nested if-let chains. Co-Authored-By: Beinan Wang <beinanwang@microsoft.com>
The ShardSnapshot struct gained a new shard_field_values field for read-path shard pruning, but the constructors in python/src/mem_wal.rs and java/lance-jni/src/mem_wal.rs were not updated. Co-Authored-By: Beinan Wang <beinanwang@microsoft.com>
70ada29 to
908a635
Compare
Co-Authored-By: Beinan Wang <beinanwang@microsoft.com>
908a635 to
784b9e1
Compare
… empty snapshots Co-Authored-By: Beinan Wang <beinanwang@microsoft.com>
f4b943e to
a2961e6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR implements read-path shard pruning for MemWAL datasets.
When a query's filter contains an equality (
=) orINpredicate on the designated sharding column (derived from the dataset'sShardingSpec), the query planner evaluates the target shard IDs using the sharding transform (e.g., identity value matches or bucket Murmur3 hashes). It then callscollect_for_shards()to open and scan only the matching shard subdirectories under_mem_wal/, skipping irrelevant directories entirely.Key Changes
data_source.rs): EnrichShardSnapshotwith each shard's derived partition column values (shard_field_values: HashMap<String, Vec<u8>>) derived from the transaction manifest.shard_pruning.rs): Implement literal value extraction from DataFusion expressions (&Expr), perform type coercion to match column schema, and evaluate targets against both Identity and Bucket sharding specifications.collector.rs,planner.rs): ChangeLsmScanPlannerto callcollect_pruned(filter)instead of unconditionally loading all shards viacollect().planner.rs,shard_pruning.rs): Add unit tests for pruning evaluation and an integration test verifying only the single correct shard subdirectory is read during active queries.Test plan
shard_pruning.rsverifying bucket/identity/unsharded logic and coercion logic.planner.rsverifying correct shard selection and checking thatcollect_prunedactively skips unrelated directories under_mem_wal/on active queries.🤖 Generated with Claude Code