Skip to content

fix: preserve overwrite schema changes#5161

Open
LuciferYang wants to merge 3 commits into
lance-format:mainfrom
LuciferYang:fix/write-lance-schema-evolution
Open

fix: preserve overwrite schema changes#5161
LuciferYang wants to merge 3 commits into
lance-format:mainfrom
LuciferYang:fix/write-lance-schema-evolution

Conversation

@LuciferYang
Copy link
Copy Markdown

Summary

  • pass the write mode through fragment writes so overwrite/create fragments use the new schema instead of append validation
  • propagate mode through LanceFragmentWriter and streaming writes, with later streaming batches switching back to append
  • add regressions for URI, streaming, and namespace overwrite schema changes while preserving append behavior

Testing

  • python -m ruff check .
  • python -m ruff format --check lance_ray/fragment.py lance_ray/datasink.py lance_ray/io.py tests/test_basic_read_write.py
  • python -m pytest tests/test_basic_read_write.py tests/test_fragment.py tests/test_blob.py::test_stream_copy_basic_local tests/test_blob.py::test_stream_copy_resume_local tests/test_blob_v2.py::test_blob_v2_append_with_target_bases_stream -q
  • local write_lance failed after add a new column to existed dataset. #95 namespace overwrite smoke

Closes #95

@github-actions github-actions Bot added the bug Something isn't working label Jun 4, 2026
Distributed create/overwrite writes assign Lance field ids positionally, so
blocks whose columns arrive in a different order (e.g. a union of differently
ordered sources) were committed under a single schema and read back transposed.
Add a guard in LanceDatasink.on_write_complete that raises on inconsistent
column order, and align Arrow blocks by name in pd_to_arrow so an explicit
schema writes correctly.

Also harden the schema-evolution mode plumbing: document per-mode schema
behaviour (append validates/drops; create and overwrite evolve the schema),
fix the LanceDatasink default-mode docstring, document the write_fragment mode
parameter and the writer/committer mode-pairing requirement, and add regression
tests for multi-fragment overwrite, schema drop/type-change, streaming overwrite
resume, streaming-append rejection, and the column-order guard.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

write_lance failed after add a new column to existed dataset.

1 participant