feat(python): expose zonemap segment builds#7177
Conversation
9b63bd7 to
71532a2
Compare
d7af83b to
bc4063e
Compare
| ) | ||
|
|
||
|
|
||
| def test_zonemap_uncommitted_segments_can_be_merged_and_committed_from_python(tmp_path): |
There was a problem hiding this comment.
Minor: Simplify the names of the test cases.
There was a problem hiding this comment.
simplified to test_zonemap_segment_merge_and_commit_from_python
| ) | ||
|
|
||
| if fragment_ids is not None and logical_index_type in {"BTREE", "BITMAP"}: | ||
| if fragment_ids is not None and logical_index_type in { |
There was a problem hiding this comment.
Minor: {"BTREE", "BITMAP", "ZONEMAP"} appears multiple times. Can we abstract it into a method?
There was a problem hiding this comment.
extract these code into a new function: _is_segment_native_scalar_index_type
e7933e9 to
3ef12ba
Compare
3ef12ba to
1066074
Compare
|
@claude review |
|
|
||
| def test_zonemap_segment_merge_and_commit_from_python(tmp_path): | ||
| ds = generate_multi_fragment_dataset( | ||
| tmp_path, num_fragments=4, rows_per_fragment=40 |
There was a problem hiding this comment.
Can we add the number about rows_per_fragment to be larger than 8192(e.g. > 2 * 8192), so that we can have two zones in one fragment? Because the default value of rows_per_zone is 8192.
| with pytest.raises(ValueError, match="create_index_uncommitted"): | ||
| ds.create_scalar_index( | ||
| column="id", | ||
| index_type="ZONEMAP", | ||
| fragment_ids=[fragment_ids[0]], | ||
| ) |
There was a problem hiding this comment.
IMO, it would be better to add a test case named .e.g test_zonemap_fragment_ids_parameter_validation?
Summary
create_index_uncommitted(..., index_type="ZONEMAP", fragment_ids=...)to use the scalar segment build pathcreate_scalar_index(..., index_type="ZONEMAP", fragment_ids=...)with the same migration guidance used for other segment-native scalar buildsContext
ZoneMap segment merge support landed in #7128, but the Python public
create_index_uncommittedhelper still only routed BTREE/BITMAP scalar requests through the uncommitted scalar segment path. As a result, ZONEMAP fell through to vector validation and failed on scalar columns.I searched for open duplicate PRs with
ZONEMAP create_index_uncommitted Python distributedandzonemap uncommitted python; no open matches were found.Validation
UV_PYTHON=/usr/bin/python3.11 uv run pytest python/tests/test_scalar_index.py::test_bitmap_uncommitted_segments_can_be_committed_from_python python/tests/test_scalar_index.py::test_btree_fragment_ids_parameter_validation python/tests/test_scalar_index.py::test_zonemap_index python/tests/test_scalar_index.py::test_zonemap_segment_merge_and_commit_from_pythonpassedUV_PYTHON=/usr/bin/python3.11 uv run ruff format --check --diff python/lance/dataset.py python/tests/test_scalar_index.pypassedUV_PYTHON=/usr/bin/python3.11 uv run ruff check python/lance/dataset.py python/tests/test_scalar_index.pypassedUV_PYTHON=/usr/bin/python3.11 uv run make lintpassed after installing the optionaltorchextra in the local uv environment.