Skip to content

fix(#48): version-aware binary tuple decoding for ALTER COLUMN#64

Merged
farhan-syah merged 3 commits intomainfrom
fix/issue-48-alter-column-schema-evolution
Apr 16, 2026
Merged

fix(#48): version-aware binary tuple decoding for ALTER COLUMN#64
farhan-syah merged 3 commits intomainfrom
fix/issue-48-alter-column-schema-evolution

Conversation

@farhan-syah
Copy link
Copy Markdown
Contributor

Fixes #48ALTER COLLECTION ADD/DROP COLUMN on strict document collections silently corrupted pre-ALTER rows (null-everywhere reads, decode failures on UPDATE) because the catalog schema was mutated without migrating existing rows or providing a read-time compatibility shim.

Summary

  • Track when each column was added (added_at_version) and keep tombstones for dropped columns (DroppedColumn { def, position, dropped_at_version }) on StrictSchema.
  • StrictSchema::schema_for_version(v) reconstructs the physical tuple layout at any historical version — excludes columns added after v, re-inserts columns dropped after v at their original positions.
  • Reader (binary_tuple_to_value / binary_tuple_to_json) detects tuple-version < schema-version and decodes with a sub-schema matching the tuple's physical layout, then virtually fills new columns with their DEFAULT.
  • parse_default_literal() resolves common SQL defaults ('n/a', 0, true, false, null) at read time.

Coverage

Added 7 regression tests to nodedb/tests/sql_transactions.rs covering the full class — every test asserts that pre-ALTER rows remain readable with correct values after the DDL:

  • add_column_preserves_pre_alter_row_existing_columns — original columns must not null-everywhere corrupt
  • add_column_returns_default_for_pre_alter_row — new column virtual-fills with DEFAULT
  • add_column_then_update_pre_alter_row — UPDATE on schema-mismatched row must not fail "failed to decode Binary Tuple"
  • multiple_add_columns_preserves_pre_alter_row — compound schema drift
  • drop_column_preserves_pre_alter_row_remaining_columns — remaining columns keep values
  • rename_column_preserves_pre_alter_row_value
  • alter_column_type_preserves_pre_alter_row_value

Test plan

  • cargo nextest run -p nodedb — 2907/2907 passed
  • cargo nextest run -p nodedb-strict -p nodedb-types -p nodedb-columnar — 369/369 passed
  • cargo fmt --all — clean
  • cargo clippy --all-targets -- -D warnings — no issues

Introduce `added_at_version: u16` on `ColumnDef` to record the schema
version at which a column was added. Columns present at collection
creation default to version 1.

Add a `DroppedColumn` tombstone struct that captures the full column
definition, its ordinal position, and the schema version at which it
was removed. `StrictSchema` now carries a `dropped_columns` list so
the physical layout of any historical tuple version can be
reconstructed without row migration.

New helpers on `StrictSchema`:
- `schema_for_version(v)` — builds a sub-schema matching the physical
  layout of tuples written at version `v` by excluding later-added
  columns and re-inserting dropped columns at their original positions.
- `parse_default_literal(expr)` — evaluates a SQL DEFAULT expression
  (string, boolean, integer, float, NULL) to a `Value` at read time.

`DroppedColumn` is re-exported from `nodedb-types::columnar`.
ALTER TABLE ... ADD COLUMN now stamps the new column's `added_at_version`
with the bumped schema version before appending it to the live column
list, so the read path can distinguish columns that did not exist when
older tuples were written.

ALTER TABLE ... DROP COLUMN now records a `DroppedColumn` tombstone
(definition, original position, version at drop) instead of silently
discarding the column definition. This allows the reader to reconstruct
the physical layout of any tuple written before the drop without
requiring row migration.

The CONVERT path initialises `dropped_columns` to an empty vec to keep
all `StrictSchema` construction sites consistent.
…tions

`binary_tuple_to_value` now detects when a stored tuple's schema version
is behind the current catalog version and decodes using
`schema_for_version` to match the physical column layout at write time.
Columns added after the tuple's version are filled with their DEFAULT
value (or NULL) rather than causing an index-out-of-bounds or
returning corrupt data.

`binary_tuple_to_json` is refactored to delegate to
`binary_tuple_to_value` so the version-aware path is shared across both
read modes without duplication.

Remaining `StrictSchema` construction sites in the executor initialise
`dropped_columns` to keep all call sites consistent.

Add integration tests covering the full lifecycle of schema-altering DDL
on a strict collection:
- Pre-ALTER rows return correct values for existing columns after ADD COLUMN
- Pre-ALTER rows return the column DEFAULT for newly added columns
- Updating a pre-ALTER row migrates it to the current schema
- DROP COLUMN leaves pre-drop rows readable for surviving columns
- Multiple ADD COLUMN operations in sequence remain readable
- RENAME COLUMN and ALTER COLUMN TYPE on pre-existing rows
@farhan-syah farhan-syah merged commit aa5f1ee into main Apr 16, 2026
2 checks passed
@farhan-syah farhan-syah deleted the fix/issue-48-alter-column-schema-evolution branch April 16, 2026 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ALTER COLLECTION ADD COLUMN zombifies existing rows — schema bumped without data migration

1 participant