Skip to content

feat(python): expose zonemap segment builds#7177

Merged
yanghua merged 1 commit into
lance-format:mainfrom
everySympathy:codex/zonemap-uncommitted-python
Jun 11, 2026
Merged

feat(python): expose zonemap segment builds#7177
yanghua merged 1 commit into
lance-format:mainfrom
everySympathy:codex/zonemap-uncommitted-python

Conversation

@everySympathy

@everySympathy everySympathy commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

  • allow Python create_index_uncommitted(..., index_type="ZONEMAP", fragment_ids=...) to use the scalar segment build path
  • reject create_scalar_index(..., index_type="ZONEMAP", fragment_ids=...) with the same migration guidance used for BTREE/BITMAP segment-native scalar builds
  • add Python regression coverage for ZoneMap fragment-id validation and for staging, merging, committing, and querying ZoneMap segments

Context

ZoneMap segment merge support landed in #7128, but the Python public create_index_uncommitted helper still only routed BTREE/BITMAP/INVERTED scalar requests through the uncommitted scalar segment path. As a result, ZONEMAP fell through to vector validation and failed on scalar columns.

I searched for open duplicate PRs with ZONEMAP create_index_uncommitted Python distributed and zonemap uncommitted python; no open matches were found.

Validation

  • UV_PYTHON=/usr/bin/python3.11 uv run pytest python/tests/test_scalar_index.py::test_fragment_ids_parameter_validation python/tests/test_scalar_index.py::test_segment_fts python/tests/test_scalar_index.py::test_zonemap_fragment_ids_parameter_validation python/tests/test_scalar_index.py::test_zonemap_segment_merge_and_commit_from_python python/tests/test_scalar_index.py::test_bitmap_uncommitted_segments_can_be_committed_from_python python/tests/test_scalar_index.py::test_btree_fragment_ids_parameter_validation passed
  • UV_PYTHON=/usr/bin/python3.11 uv run ruff format --check --diff python/lance/dataset.py python/tests/test_scalar_index.py passed
  • UV_PYTHON=/usr/bin/python3.11 uv run ruff check python/lance/dataset.py python/tests/test_scalar_index.py passed
  • UV_PYTHON=/usr/bin/python3.11 uv run make lint passed

@github-actions github-actions Bot added A-python Python bindings enhancement New feature or request labels Jun 9, 2026
@everySympathy everySympathy marked this pull request as ready for review June 9, 2026 11:08
@everySympathy everySympathy force-pushed the codex/zonemap-uncommitted-python branch from 9b63bd7 to 71532a2 Compare June 9, 2026 12:38
@everySympathy everySympathy marked this pull request as draft June 9, 2026 12:38
@everySympathy everySympathy force-pushed the codex/zonemap-uncommitted-python branch 3 times, most recently from d7af83b to bc4063e Compare June 9, 2026 12:49
@everySympathy everySympathy marked this pull request as ready for review June 9, 2026 13:15
@everySympathy everySympathy marked this pull request as draft June 9, 2026 13:20
@everySympathy everySympathy marked this pull request as ready for review June 9, 2026 13:20
)


def test_zonemap_uncommitted_segments_can_be_merged_and_committed_from_python(tmp_path):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Simplify the names of the test cases.

@everySympathy everySympathy Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplified to test_zonemap_segment_merge_and_commit_from_python

Comment thread python/python/lance/dataset.py Outdated
)

if fragment_ids is not None and logical_index_type in {"BTREE", "BITMAP"}:
if fragment_ids is not None and logical_index_type in {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: {"BTREE", "BITMAP", "ZONEMAP"} appears multiple times. Can we abstract it into a method?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract these code into a new function: _is_segment_native_scalar_index_type

@everySympathy everySympathy force-pushed the codex/zonemap-uncommitted-python branch 3 times, most recently from 3ef12ba to 1066074 Compare June 9, 2026 14:16
@yanghua

yanghua commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

@claude review

@yanghua yanghua left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left two comments.


def test_zonemap_segment_merge_and_commit_from_python(tmp_path):
ds = generate_multi_fragment_dataset(
tmp_path, num_fragments=4, rows_per_fragment=40

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add the number about rows_per_fragment to be larger than 8192(e.g. > 2 * 8192), so that we can have two zones in one fragment? Because the default value of rows_per_zone is 8192.

@everySympathy everySympathy Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted the merge/commit test to use rows_per_fragment = 20_000, so each fragment contains multiple ZoneMap zones. The query now targets rows in the second zone.

Comment on lines +4018 to +4023
with pytest.raises(ValueError, match="create_index_uncommitted"):
ds.create_scalar_index(
column="id",
index_type="ZONEMAP",
fragment_ids=[fragment_ids[0]],
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it would be better to add a test case named .e.g test_zonemap_fragment_ids_parameter_validation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split the create_scalar_index(..., fragment_ids=...) validation into test_zonemap_fragment_ids_parameter_validation, leaving the merge/commit test focused on the segment lifecycle.

@everySympathy everySympathy force-pushed the codex/zonemap-uncommitted-python branch 2 times, most recently from e2255e9 to d928901 Compare June 10, 2026 12:04
@everySympathy everySympathy force-pushed the codex/zonemap-uncommitted-python branch from d928901 to 01d7152 Compare June 10, 2026 12:13
@everySympathy everySympathy requested a review from yanghua June 10, 2026 12:15

@yanghua yanghua left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for your contribution!

@yanghua yanghua merged commit e8748a4 into lance-format:main Jun 11, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-python Python bindings enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants