Skip to content

feat(python): expose zonemap segment builds#7177

Open
everySympathy wants to merge 1 commit into
lance-format:mainfrom
everySympathy:codex/zonemap-uncommitted-python
Open

feat(python): expose zonemap segment builds#7177
everySympathy wants to merge 1 commit into
lance-format:mainfrom
everySympathy:codex/zonemap-uncommitted-python

Conversation

@everySympathy

@everySympathy everySympathy commented Jun 9, 2026

Copy link
Copy Markdown

Summary

  • allow Python create_index_uncommitted(..., index_type="ZONEMAP", fragment_ids=...) to use the scalar segment build path
  • reject create_scalar_index(..., index_type="ZONEMAP", fragment_ids=...) with the same migration guidance used for other segment-native scalar builds
  • add a Python regression test that stages ZoneMap segments, merges them, commits the merged segment, and verifies indexed query results

Context

ZoneMap segment merge support landed in #7128, but the Python public create_index_uncommitted helper still only routed BTREE/BITMAP scalar requests through the uncommitted scalar segment path. As a result, ZONEMAP fell through to vector validation and failed on scalar columns.

I searched for open duplicate PRs with ZONEMAP create_index_uncommitted Python distributed and zonemap uncommitted python; no open matches were found.

Validation

  • UV_PYTHON=/usr/bin/python3.11 uv run pytest python/tests/test_scalar_index.py::test_bitmap_uncommitted_segments_can_be_committed_from_python python/tests/test_scalar_index.py::test_btree_fragment_ids_parameter_validation python/tests/test_scalar_index.py::test_zonemap_index python/tests/test_scalar_index.py::test_zonemap_segment_merge_and_commit_from_python passed
  • UV_PYTHON=/usr/bin/python3.11 uv run ruff format --check --diff python/lance/dataset.py python/tests/test_scalar_index.py passed
  • UV_PYTHON=/usr/bin/python3.11 uv run ruff check python/lance/dataset.py python/tests/test_scalar_index.py passed
  • UV_PYTHON=/usr/bin/python3.11 uv run make lint passed after installing the optional torch extra in the local uv environment.

@github-actions github-actions Bot added A-python Python bindings enhancement New feature or request labels Jun 9, 2026
@everySympathy everySympathy marked this pull request as ready for review June 9, 2026 11:08
@everySympathy everySympathy force-pushed the codex/zonemap-uncommitted-python branch from 9b63bd7 to 71532a2 Compare June 9, 2026 12:38
@everySympathy everySympathy marked this pull request as draft June 9, 2026 12:38
@everySympathy everySympathy force-pushed the codex/zonemap-uncommitted-python branch 3 times, most recently from d7af83b to bc4063e Compare June 9, 2026 12:49
@everySympathy everySympathy marked this pull request as ready for review June 9, 2026 13:15
@everySympathy everySympathy marked this pull request as draft June 9, 2026 13:20
@everySympathy everySympathy marked this pull request as ready for review June 9, 2026 13:20
)


def test_zonemap_uncommitted_segments_can_be_merged_and_committed_from_python(tmp_path):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Simplify the names of the test cases.

@everySympathy everySympathy Jun 9, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplified to test_zonemap_segment_merge_and_commit_from_python

Comment thread python/python/lance/dataset.py Outdated
)

if fragment_ids is not None and logical_index_type in {"BTREE", "BITMAP"}:
if fragment_ids is not None and logical_index_type in {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: {"BTREE", "BITMAP", "ZONEMAP"} appears multiple times. Can we abstract it into a method?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract these code into a new function: _is_segment_native_scalar_index_type

@everySympathy everySympathy force-pushed the codex/zonemap-uncommitted-python branch 2 times, most recently from e7933e9 to 3ef12ba Compare June 9, 2026 14:05
@everySympathy everySympathy force-pushed the codex/zonemap-uncommitted-python branch from 3ef12ba to 1066074 Compare June 9, 2026 14:16
@yanghua

yanghua commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

@claude review

@yanghua yanghua left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left two comments.


def test_zonemap_segment_merge_and_commit_from_python(tmp_path):
ds = generate_multi_fragment_dataset(
tmp_path, num_fragments=4, rows_per_fragment=40

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add the number about rows_per_fragment to be larger than 8192(e.g. > 2 * 8192), so that we can have two zones in one fragment? Because the default value of rows_per_zone is 8192.

Comment on lines +4018 to +4023
with pytest.raises(ValueError, match="create_index_uncommitted"):
ds.create_scalar_index(
column="id",
index_type="ZONEMAP",
fragment_ids=[fragment_ids[0]],
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it would be better to add a test case named .e.g test_zonemap_fragment_ids_parameter_validation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-python Python bindings enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants