Skip to content

Metadata-only CREATE TABLE for DSv2 + Kernel + CCv2 path#7

Draft
TimothyW553 wants to merge 11 commits into
masterfrom
DDL-new-create-table
Draft

Metadata-only CREATE TABLE for DSv2 + Kernel + CCv2 path#7
TimothyW553 wants to merge 11 commits into
masterfrom
DDL-new-create-table

Conversation

@TimothyW553
Copy link
Copy Markdown
Owner

Summary

  • Implements a clean sequential CREATE TABLE architecture for the DSv2 + Kernel + CCv2 path: build → commit → publish → load
  • New ddl package in spark/v2 with: CreateTableTxnBuilder, TableCommitter, CreateTablePublisher, CreateTableContext (plain POJO), and supporting DTOs/interfaces
  • DeltaCatalog orchestrator is 4 lines, zero branching — all UC/path-based logic pushed into builder and publisher
  • Unity Catalog dependency pinned to 0.5.0-SNAPSHOT

Test plan

  • 5 unit tests for property filtering and DataLayoutSpec conversion
  • 5 integration tests for full build→commit→publish flow (path-based tables)
  • UC end-to-end test (requires live UC server with 0.5.0-SNAPSHOT)
  • Spark SQL-level CREATE TABLE with STRICT mode enabled

This pull request was AI-assisted by Isaac.

yyanyy and others added 10 commits March 20, 2026 20:56
<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [ ] Spark
- [ ] Standalone
- [ ] Flink
- [x] Kernel
- [ ] Other (fill in here)

## Description
Add collations table features, `collations-preview` and `collations`, as
writer features.

## How was this patch tested?
New test using golden table.

## Does this PR introduce _any_ user-facing changes?
No.
delta-io#6337)

Clarify in two places that checkpoints and reconciled snapshots should
not contain `domainMetadata` actions with `removed=true`:

- Action reconciliation: the `domainMetadata` collection excludes
tombstones.
- Checkpoint contents: the Domain Metadata bullet excludes removed
entries.
…6350)

<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [ ] Spark
- [ ] Standalone
- [ ] Flink
- [x] Kernel
- [ ] Other (fill in here)
## Description
test only follow up to delta-io#6322. revises a test to not be reliant on
delta-spark versions.

## How was this patch tested?
test only. test still passes
## Does this PR introduce _any_ user-facing changes?

<!--
If yes, please clarify the previous behavior and the change this PR
proposes - provide the console output, description and/or an example to
show the behavior difference if possible.
If possible, please also clarify if this is a user-facing change
compared to the released Delta Lake versions or within the unreleased
branches such as master.
If no, write 'No'.
-->
 no
…ale (delta-io#6354)

## Summary
- Fix `ExpressionUtils.convertValueToKernelLiteral` crash when a
`BigDecimal` has precision less than scale
- Normalize precision to `Math.max(bd.precision(), bd.scale() + 1)`
before calling `Literal.ofDecimal`
- Add unit test for the edge case

## Problem

A query with a decimal `IN` predicate containing `0.00` crashes when
using the V2 connector:

```sql
SELECT * FROM delta_table WHERE dec10_2 IN (0.00, 100.00)
```

This throws an `IllegalArgumentException: Invalid precision and scale
combo` from Kernel's `Literal.ofDecimal`.

**Root cause:** Java's `BigDecimal("0.00")` reports `precision()=1` and
`scale()=2`. This violates the invariant that `precision >= scale`
required by Kernel's `Literal.ofDecimal`. The V2 code in
`ExpressionUtils.convertValueToKernelLiteral` was passing these values
through without normalization.

## Fix

Before constructing the Kernel literal, normalize precision:

```java
int precision = Math.max(bd.precision(), bd.scale() + 1);
```

For `BigDecimal("0.00")`, this yields `precision=3, scale=2` (i.e.,
`DECIMAL(3,2)`), which correctly represents the value.

## Test plan
- [x] New unit test
`testConvertValueToKernelLiteral_DecimalWithScaleExceedingPrecision`
passes
- [x] All 73 `ExpressionUtilsTest` tests pass
…without a supported IcebergCompat version (delta-io#6352)

<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description

When a user enables `delta.universalFormat.enabledFormats = 'iceberg'`
alongside an unrecognized IcebergCompat property (e.g.
`delta.enableIcebergCompatV3`), the error message currently says:

> To enable IcebergCompatV2, set the table property
'delta.enableIcebergCompatV2' = 'true'.

This is misleading because:
1. Spark Delta silently ignores the unrecognized V3 property (only V1
and V2 are in `IcebergCompat.knownVersions`)
2. The error then directs the user to V2 with no indication that V3 is
not supported

The fix updates the message to explicitly state that the supported
versions are IcebergCompatV1 and IcebergCompatV2, making clear to users
that higher versions are not yet supported in this Spark Delta release.

Resolves delta-io#6351

## How was this patch tested?

The change is limited to an error message string. Manual reproduction:

```sql
SET spark.databricks.delta.allowArbitraryProperties.enabled=true;

CREATE TABLE demo.icebergCompatV3 (i INT, s STRING) USING DELTA
TBLPROPERTIES (
  'delta.columnMapping.mode'             = 'name',
  'delta.enableIcebergCompatV3'          = 'true',
  'delta.enableDeletionVectors'          = 'false',
  'delta.universalFormat.enabledFormats' = 'iceberg'
);
```

**Before:** `"To enable IcebergCompatV2, set the table property
'delta.enableIcebergCompatV2' = 'true'."`

**After:** `"Supported versions are IcebergCompatV1 and
IcebergCompatV2."`

## Does this PR introduce _any_ user-facing changes?

Yes. The error message for `DELTA_UNIVERSAL_FORMAT_VIOLATION` (when
UniForm Iceberg is enabled without a recognized IcebergCompat version)
now explicitly lists the supported IcebergCompat versions (V1 and V2),
instead of directing the user to enable V2. This helps users who set an
unsupported version (e.g. V3) understand why the error occurred.

---------

Signed-off-by: openinx <openinx@gmail.com>
…commits (delta-io#6338)

## Description

This PR adds a regression test that exposes a **data loss bug** in Delta
streaming when used with Coordinated Commits and non-trivial backfill
batch sizes.

Related issue: delta-io#6339

### Observed Behavior

The test writes 100 sequential single-row commits to a Delta table while
a streaming query is running, then verifies all 100 values appear in the
sink.

- **batchSize = 1**: Test passes consistently. All 100 values present.
- **batchSize = 2**: Test fails. ~4 out of 100 commits are lost on
average. **Only odd-versioned commits are lost** (these are the versions
that are not immediately backfilled).
- **batchSize = 3**: Test fails. The loss pattern follows the backfill
cycle:
  - `v % 3 == 0`: no loss (these versions trigger backfill)
- `v % 3 == 1`: v may be lost, and if v is lost, v+1 is always lost
together (they get backfilled as a pair)
  - `v % 3 == 2`: v may be lost independently

The pattern strongly correlates with which commits are sitting
unbackfilled in the coordinator at any given time, suggesting the bug is
related to the interaction between backfill and commit listing.

### Hypothesized Root Cause

We suspect the issue is a race condition in
`CoordinatedCommitsUtils.commitFilesIterator`, which is used by
`DeltaSource.getFileChanges` during both `latestOffset` and `getBatch`.
This method lists commits in two lazy, sequential steps:

1. List backfilled commits from the filesystem (`listedDeltas`)
2. Query the coordinator for unbackfilled commits (`tailFromSnapshot`)

These two steps are **not atomic**. A possible scenario (example with
batchSize = 3):

1. **During `latestOffset`**: filesystem has `[0.json]`, coordinator has
`[1, 2]`. `latestOffset` correctly computes endOffset covering through
version 2.
2. **During `getBatch`**: the filesystem listing iterator runs and sees
`[0.json]`.
3. **Between the two iterators**: a concurrent write creates version 3,
triggering `backfillToVersion(3)`. This writes versions 1, 2, 3 to the
filesystem and **removes them from the coordinator** via
`registerBackfill`.
4. **The coordinator query runs**: returns empty — versions 1 and 2 have
been removed.
5. **Result**: `getBatch` misses versions 1 and 2. The next batch starts
from version 3, so they are never re-read.

## How was this patch tested?

Added a new test `"streaming processes 100 sequential single-value
commits and contains all values 0 to 99"` that:
- Creates a Delta table and starts a streaming query
- Appends 100 single-row commits while the stream is running
- Verifies all 100 values appear in the sink

The test passes for batchSize = 1 but fails for batchSize = 2 and 3 due
to the data loss.

## Does this PR introduce _any_ user-facing changes?

No. This PR only adds a test to demonstrate the existing bug. A fix will
follow in a subsequent PR.
## 🥞 Stacked PR
Use this
[link](https://github.com/delta-io/delta/pull/6249/files/0d54da51e7eade47b8115d92aaf7be1e8e4c011f..7e400f87189bef892d3b0022c0aede238a0c84de)
to review incremental changes.
- [stack/ignoreDeletesV2](delta-io#6245)
[[Files changed](https://github.com/delta-io/delta/pull/6245/files)]
-
[stack/skipChangeCommitsV2](delta-io#6246)
[[Files
changed](https://github.com/delta-io/delta/pull/6246/files/6e9962d4a30a63ed14786830bae2b668004a76c9..2e111cf6ac9d1e5f84d83d94412b05486f543613)]
-
[**stack/ignoreChangesV2**](delta-io#6249)
[[Files
changed](https://github.com/delta-io/delta/pull/6249/files/0d54da51e7eade47b8115d92aaf7be1e8e4c011f..7e400f87189bef892d3b0022c0aede238a0c84de)]
-
[stack/ignoreFileDeletionV2](delta-io#6250)
[[Files
changed](https://github.com/delta-io/delta/pull/6250/files/7e400f87189bef892d3b0022c0aede238a0c84de..af230590d0d9c1a5c37cc6a7b6af404025a6d415)]

---------
<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description
Support ignoreChanges read option in DSv2, which skip all remove file
actions but keep the add file actions in the same commit.
<!--
- Describe what this PR changes.
- Describe why we need the change.
 
If this PR resolves an issue be sure to include "Resolves #XXX" to
correctly link and close the issue upon merge.
-->

## How was this patch tested?
Unit tests that test parity between v1 and v2 connector on both pure
deletes commit (only remove) and change commit (add + remove).

Integration tests:
- streaming with ignoreChanges = true allows both delete and change
commits
<!--
If tests were added, say they were added here. Please make sure to test
the changes thoroughly including negative and positive cases if
possible.
If the changes were tested in any way other than unit tests, please
clarify how you tested step by step (ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future).
If the changes were not tested, please explain why.
-->

## Does this PR introduce _any_ user-facing changes?
No
<!--
If yes, please clarify the previous behavior and the change this PR
proposes - provide the console output, description and/or an example to
show the behavior difference if possible.
If possible, please also clarify if this is a user-facing change
compared to the released Delta Lake versions or within the unreleased
branches such as master.
If no, write 'No'.
-->
…elta-io#6335)

<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

## Description
Write atomic-supported property to Iceberg compact tables. This is part
of project to support write to UC managed UniForm tables

## How was this patch tested?
Add UTs
```
build/sbt -DsparkVersion=4.0 "iceberg/testOnly org.apache.spark.sql.delta.uniform.UniversalFormatSuite"
```
… path

Add a clean, sequential CREATE TABLE architecture for the Delta DSv2 catalog
path using Kernel as the transaction engine:

  build → commit → publish → load

New components in spark/v2 ddl package:
- CreateTableContext: plain data POJO for operation inputs
- CreateTableTxnBuilder: encapsulates all prep logic (UC pre-registration,
  path resolution, property filtering, schema conversion, Kernel txn building)
- TableCommitter: generic Kernel commit boundary
- CreateTablePublisher: derives catalog publication from committed snapshot
- DTOs: PreparedTableTxn, PreparedCreateTableTxn, CommittedTableTxn,
  CreateTableCatalogPublication
- Generic interfaces: TableTxnBuilder, TablePublisher

DeltaCatalog.createTable routes to the new path when DeltaV2Mode is
STRICT or AUTO (for UC-managed tables). The orchestrator is 4 lines with
zero branching — all table-type logic is pushed into the builder/publisher.

Also:
- build.sbt: Unity Catalog version → 0.5.0-SNAPSHOT
- DeltaV2Mode: added shouldUseKernelForCreateTable()
- CatalogTableUtils: added isCatalogManagedFromProperties()
- AbstractDeltaCatalog: isUnityCatalog visibility → protected

Tests: 5 unit tests (property filtering, DataLayoutSpec) + 5 integration
tests (full build→commit→publish for path-based tables).
@TimothyW553 TimothyW553 force-pushed the DDL-new-create-table branch 3 times, most recently from bbb0e88 to 2e86dfc Compare March 24, 2026 20:20
Replace 10-file over-engineered design with minimal plumbing:
- DDLRequest: generic POJO for all DDL ops (CREATE, CTAS, RTAS)
- CreateTableBuilder: prepare() + buildTransaction()
- CreateTablePublisher: publish abstraction point

DeltaCatalog.createTable() is now 5 lines:
  prepare → buildTransaction → commit → publish → loadTable
@TimothyW553 TimothyW553 force-pushed the DDL-new-create-table branch from 2e86dfc to 5dcc3c8 Compare March 24, 2026 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants