Metadata-only CREATE TABLE for DSv2 + Kernel + CCv2 path by TimothyW553 · Pull Request #7 · TimothyW553/delta

TimothyW553 · 2026-03-24T18:57:43Z

Summary

Implements a clean sequential CREATE TABLE architecture for the DSv2 + Kernel + CCv2 path: build → commit → publish → load
New ddl package in spark/v2 with: CreateTableTxnBuilder, TableCommitter, CreateTablePublisher, CreateTableContext (plain POJO), and supporting DTOs/interfaces
DeltaCatalog orchestrator is 4 lines, zero branching — all UC/path-based logic pushed into builder and publisher
Unity Catalog dependency pinned to 0.5.0-SNAPSHOT

Test plan

5 unit tests for property filtering and DataLayoutSpec conversion
5 integration tests for full build→commit→publish flow (path-based tables)
UC end-to-end test (requires live UC server with 0.5.0-SNAPSHOT)
Spark SQL-level CREATE TABLE with STRICT mode enabled

This pull request was AI-assisted by Isaac.

#### Which Delta project/connector is this regarding?  - [ ] Spark - [ ] Standalone - [ ] Flink - [x] Kernel - [ ] Other (fill in here) ## Description Add collations table features, `collations-preview` and `collations`, as writer features. ## How was this patch tested? New test using golden table. ## Does this PR introduce _any_ user-facing changes? No.

delta-io#6337) Clarify in two places that checkpoints and reconciled snapshots should not contain `domainMetadata` actions with `removed=true`: - Action reconciliation: the `domainMetadata` collection excludes tombstones. - Checkpoint contents: the Domain Metadata bullet excludes removed entries.

…6350)  #### Which Delta project/connector is this regarding?  - [ ] Spark - [ ] Standalone - [ ] Flink - [x] Kernel - [ ] Other (fill in here) ## Description test only follow up to delta-io#6322. revises a test to not be reliant on delta-spark versions. ## How was this patch tested? test only. test still passes ## Does this PR introduce _any_ user-facing changes?  no

…ale (delta-io#6354) ## Summary - Fix `ExpressionUtils.convertValueToKernelLiteral` crash when a `BigDecimal` has precision less than scale - Normalize precision to `Math.max(bd.precision(), bd.scale() + 1)` before calling `Literal.ofDecimal` - Add unit test for the edge case ## Problem A query with a decimal `IN` predicate containing `0.00` crashes when using the V2 connector: ```sql SELECT * FROM delta_table WHERE dec10_2 IN (0.00, 100.00) ``` This throws an `IllegalArgumentException: Invalid precision and scale combo` from Kernel's `Literal.ofDecimal`. **Root cause:** Java's `BigDecimal("0.00")` reports `precision()=1` and `scale()=2`. This violates the invariant that `precision >= scale` required by Kernel's `Literal.ofDecimal`. The V2 code in `ExpressionUtils.convertValueToKernelLiteral` was passing these values through without normalization. ## Fix Before constructing the Kernel literal, normalize precision: ```java int precision = Math.max(bd.precision(), bd.scale() + 1); ``` For `BigDecimal("0.00")`, this yields `precision=3, scale=2` (i.e., `DECIMAL(3,2)`), which correctly represents the value. ## Test plan - [x] New unit test `testConvertValueToKernelLiteral_DecimalWithScaleExceedingPrecision` passes - [x] All 73 `ExpressionUtilsTest` tests pass

…without a supported IcebergCompat version (delta-io#6352)  #### Which Delta project/connector is this regarding?  - [x] Spark - [ ] Standalone - [ ] Flink - [ ] Kernel - [ ] Other (fill in here) ## Description When a user enables `delta.universalFormat.enabledFormats = 'iceberg'` alongside an unrecognized IcebergCompat property (e.g. `delta.enableIcebergCompatV3`), the error message currently says: > To enable IcebergCompatV2, set the table property 'delta.enableIcebergCompatV2' = 'true'. This is misleading because: 1. Spark Delta silently ignores the unrecognized V3 property (only V1 and V2 are in `IcebergCompat.knownVersions`) 2. The error then directs the user to V2 with no indication that V3 is not supported The fix updates the message to explicitly state that the supported versions are IcebergCompatV1 and IcebergCompatV2, making clear to users that higher versions are not yet supported in this Spark Delta release. Resolves delta-io#6351 ## How was this patch tested? The change is limited to an error message string. Manual reproduction: ```sql SET spark.databricks.delta.allowArbitraryProperties.enabled=true; CREATE TABLE demo.icebergCompatV3 (i INT, s STRING) USING DELTA TBLPROPERTIES ( 'delta.columnMapping.mode' = 'name', 'delta.enableIcebergCompatV3' = 'true', 'delta.enableDeletionVectors' = 'false', 'delta.universalFormat.enabledFormats' = 'iceberg' ); ``` **Before:** `"To enable IcebergCompatV2, set the table property 'delta.enableIcebergCompatV2' = 'true'."` **After:** `"Supported versions are IcebergCompatV1 and IcebergCompatV2."` ## Does this PR introduce _any_ user-facing changes? Yes. The error message for `DELTA_UNIVERSAL_FORMAT_VIOLATION` (when UniForm Iceberg is enabled without a recognized IcebergCompat version) now explicitly lists the supported IcebergCompat versions (V1 and V2), instead of directing the user to enable V2. This helps users who set an unsupported version (e.g. V3) understand why the error occurred. --------- Signed-off-by: openinx <openinx@gmail.com>

…commits (delta-io#6338) ## Description This PR adds a regression test that exposes a **data loss bug** in Delta streaming when used with Coordinated Commits and non-trivial backfill batch sizes. Related issue: delta-io#6339 ### Observed Behavior The test writes 100 sequential single-row commits to a Delta table while a streaming query is running, then verifies all 100 values appear in the sink. - **batchSize = 1**: Test passes consistently. All 100 values present. - **batchSize = 2**: Test fails. ~4 out of 100 commits are lost on average. **Only odd-versioned commits are lost** (these are the versions that are not immediately backfilled). - **batchSize = 3**: Test fails. The loss pattern follows the backfill cycle: - `v % 3 == 0`: no loss (these versions trigger backfill) - `v % 3 == 1`: v may be lost, and if v is lost, v+1 is always lost together (they get backfilled as a pair) - `v % 3 == 2`: v may be lost independently The pattern strongly correlates with which commits are sitting unbackfilled in the coordinator at any given time, suggesting the bug is related to the interaction between backfill and commit listing. ### Hypothesized Root Cause We suspect the issue is a race condition in `CoordinatedCommitsUtils.commitFilesIterator`, which is used by `DeltaSource.getFileChanges` during both `latestOffset` and `getBatch`. This method lists commits in two lazy, sequential steps: 1. List backfilled commits from the filesystem (`listedDeltas`) 2. Query the coordinator for unbackfilled commits (`tailFromSnapshot`) These two steps are **not atomic**. A possible scenario (example with batchSize = 3): 1. **During `latestOffset`**: filesystem has `[0.json]`, coordinator has `[1, 2]`. `latestOffset` correctly computes endOffset covering through version 2. 2. **During `getBatch`**: the filesystem listing iterator runs and sees `[0.json]`. 3. **Between the two iterators**: a concurrent write creates version 3, triggering `backfillToVersion(3)`. This writes versions 1, 2, 3 to the filesystem and **removes them from the coordinator** via `registerBackfill`. 4. **The coordinator query runs**: returns empty — versions 1 and 2 have been removed. 5. **Result**: `getBatch` misses versions 1 and 2. The next batch starts from version 3, so they are never re-read. ## How was this patch tested? Added a new test `"streaming processes 100 sequential single-value commits and contains all values 0 to 99"` that: - Creates a Delta table and starts a streaming query - Appends 100 single-row commits while the stream is running - Verifies all 100 values appear in the sink The test passes for batchSize = 1 but fails for batchSize = 2 and 3 due to the data loss. ## Does this PR introduce _any_ user-facing changes? No. This PR only adds a test to demonstrate the existing bug. A fix will follow in a subsequent PR.

## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta/pull/6249/files/0d54da51e7eade47b8115d92aaf7be1e8e4c011f..7e400f87189bef892d3b0022c0aede238a0c84de) to review incremental changes. - [stack/ignoreDeletesV2](delta-io#6245) [[Files changed](https://github.com/delta-io/delta/pull/6245/files)] - [stack/skipChangeCommitsV2](delta-io#6246) [[Files changed](https://github.com/delta-io/delta/pull/6246/files/6e9962d4a30a63ed14786830bae2b668004a76c9..2e111cf6ac9d1e5f84d83d94412b05486f543613)] - [**stack/ignoreChangesV2**](delta-io#6249) [[Files changed](https://github.com/delta-io/delta/pull/6249/files/0d54da51e7eade47b8115d92aaf7be1e8e4c011f..7e400f87189bef892d3b0022c0aede238a0c84de)] - [stack/ignoreFileDeletionV2](delta-io#6250) [[Files changed](https://github.com/delta-io/delta/pull/6250/files/7e400f87189bef892d3b0022c0aede238a0c84de..af230590d0d9c1a5c37cc6a7b6af404025a6d415)] ---------  #### Which Delta project/connector is this regarding?  - [x] Spark - [ ] Standalone - [ ] Flink - [ ] Kernel - [ ] Other (fill in here) ## Description Support ignoreChanges read option in DSv2, which skip all remove file actions but keep the add file actions in the same commit.  ## How was this patch tested? Unit tests that test parity between v1 and v2 connector on both pure deletes commit (only remove) and change commit (add + remove). Integration tests: - streaming with ignoreChanges = true allows both delete and change commits  ## Does this PR introduce _any_ user-facing changes? No

…elta-io#6335)  ## Description Write atomic-supported property to Iceberg compact tables. This is part of project to support write to UC managed UniForm tables ## How was this patch tested? Add UTs ``` build/sbt -DsparkVersion=4.0 "iceberg/testOnly org.apache.spark.sql.delta.uniform.UniversalFormatSuite" ```

… path Add a clean, sequential CREATE TABLE architecture for the Delta DSv2 catalog path using Kernel as the transaction engine: build → commit → publish → load New components in spark/v2 ddl package: - CreateTableContext: plain data POJO for operation inputs - CreateTableTxnBuilder: encapsulates all prep logic (UC pre-registration, path resolution, property filtering, schema conversion, Kernel txn building) - TableCommitter: generic Kernel commit boundary - CreateTablePublisher: derives catalog publication from committed snapshot - DTOs: PreparedTableTxn, PreparedCreateTableTxn, CommittedTableTxn, CreateTableCatalogPublication - Generic interfaces: TableTxnBuilder, TablePublisher DeltaCatalog.createTable routes to the new path when DeltaV2Mode is STRICT or AUTO (for UC-managed tables). The orchestrator is 4 lines with zero branching — all table-type logic is pushed into the builder/publisher. Also: - build.sbt: Unity Catalog version → 0.5.0-SNAPSHOT - DeltaV2Mode: added shouldUseKernelForCreateTable() - CatalogTableUtils: added isCatalogManagedFromProperties() - AbstractDeltaCatalog: isUnityCatalog visibility → protected Tests: 5 unit tests (property filtering, DataLayoutSpec) + 5 integration tests (full build→commit→publish for path-based tables).

Replace 10-file over-engineered design with minimal plumbing: - DDLRequest: generic POJO for all DDL ops (CREATE, CTAS, RTAS) - CreateTableBuilder: prepare() + buildTransaction() - CreateTablePublisher: publish abstraction point DeltaCatalog.createTable() is now 5 lines: prepare → buildTransaction → commit → publish → loadTable

yyanyy and others added 10 commits March 20, 2026 20:56

[Spark] Fix flaky testPlanInputPartitionsGroupsFilesByPartition (delt…

cd215cb

…a-io#6344)

TimothyW553 force-pushed the DDL-new-create-table branch 3 times, most recently from bbb0e88 to 2e86dfc Compare March 24, 2026 20:20

TimothyW553 force-pushed the DDL-new-create-table branch from 2e86dfc to 5dcc3c8 Compare March 24, 2026 20:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metadata-only CREATE TABLE for DSv2 + Kernel + CCv2 path#7

Metadata-only CREATE TABLE for DSv2 + Kernel + CCv2 path#7
TimothyW553 wants to merge 11 commits into
masterfrom
DDL-new-create-table

TimothyW553 commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

TimothyW553 commented Mar 24, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants