Skip to content

[Storage] Add tableIdentifier to UCClient getCommits#6788

Merged
openinx merged 1 commit into
delta-io:masterfrom
TimothyW553:getcommits-table-identifier
May 19, 2026
Merged

[Storage] Add tableIdentifier to UCClient getCommits#6788
openinx merged 1 commit into
delta-io:masterfrom
TimothyW553:getcommits-table-identifier

Conversation

@TimothyW553
Copy link
Copy Markdown
Collaborator

@TimothyW553 TimothyW553 commented May 14, 2026

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (Storage)

Description

Adds TableIdentifier to UCClient#getCommits so UC clients can receive the catalog/schema/table name when fetching commits.

The identifier is forwarded from UCCommitCoordinatorClient when available in the TableDescriptor. Kernel catalog-managed snapshot loading also has an overload that forwards the identifier, and Spark v2 UC snapshot metadata now carries the identifier from CatalogTable into that Kernel path.

The legacy UCTokenBasedRestClient accepts the new argument but keeps sending the existing legacy request fields.

Resolves #6784.
Addresses the getCommits API follow-up from #6780.

tableIdentifier contract

tableIdentifier is the three-part catalog.schema.table name (not the UC UUID tableId). Callers pass it when they have catalog context, null otherwise; receivers either require it (rejecting null) or ignore it.

Caller Passes
Kernel UCCatalogManagedClient Non-null (API requires)
Spark V2 UCManagedTableSnapshotManager Non-null (from CatalogTable)
Spark V1 catalog access (via DeltaTableV2) Non-null (threaded from DeltaTableV2.catalogTable)
Spark V1 path-based (DeltaTable.forPath, delta.`s3://...`) Null (no name exists)
Flink CatalogManagedTable Non-null (from qualified name)

what does each receiver do when it gets this new tableIdentifier?

Receiver Behavior
UCTokenBasedRestClient (legacy) Ignores
InMemoryUCClient (test) Ignores
Future UC Delta REST client like UCDeltaTokenBasedRestClient Will require non-null

How was this patch tested?

build/sbt javafmtAll scalafmtAll
build/sbt "storage/testOnly io.delta.storage.commit.uccommitcoordinator.UCTokenBasedRestClientSuite"
build/sbt "spark/testOnly org.apache.spark.sql.delta.coordinatedcommits.UCCommitCoordinatorClientSuite"
build/sbt "kernelUnityCatalog/testOnly io.delta.kernel.unitycatalog.InMemoryUCClientSuite" "kernelUnityCatalog/testOnly io.delta.kernel.unitycatalog.UCCatalogManagedClientSuite"
build/sbt "sparkV2/testOnly io.delta.spark.internal.v2.snapshot.unitycatalog.UCUtilsSuite" "sparkV2/testOnly io.delta.spark.internal.v2.snapshot.unitycatalog.UCManagedTableSnapshotManagerSuite" "sparkV2/testOnly io.delta.spark.internal.v2.snapshot.unitycatalog.UCTableInfoTest"

Does this PR introduce any user-facing changes?

No.

@TimothyW553 TimothyW553 requested a review from openinx May 14, 2026 20:00
@TimothyW553 TimothyW553 marked this pull request as ready for review May 14, 2026 20:00
try {
return ucClient.getCommits(
ucTableId,
null /* tableIdentifier */,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @TimothyW553 , I think we still need to pass the tableIdentifier for the getCommits , in this UCCatalogManagedClient. since we will use the same method to delegate to either UCClient or UCDeltaClient.

And regardless of the client, both impls should work for the UCCatalogManagedClient. so we will have to pass the correct tableIdentifier here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, Kernel now passes the real TableIdentifier to getCommits when it has the UC table identifier.

val response = client.getCommits("tableId", fakeURI, startVersionOpt, endVersionOpt)
val response = client.getCommits(
"tableId",
null,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is fine to be null, since we are only use it for testing.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, I kept null only in tests where tableIdentifier is not used.

Comment on lines +209 to +210
// Build the DeltaGetCommits request using SDK models. The legacy API does not accept
// tableIdentifier, but UC Delta Rest Catalog clients need it in this shared interface.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change can be removed, since the UCTokenBasedRestClient shouldn't know any internal impls of UCDeltaTokenBasedRestClient.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, removed the comment about UCDeltaTokenBasedRestClient.

}
}

test("getCommits accepts table identifier without changing legacy request") {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test seems like useless, I think we can just remove, since the tableIdentifier that passed for getCommits in the `UCTokenBasedRestClient, actually is never used. I think it's not worth for us to add a separate unit test for it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, removed this test because UCTokenBasedRestClient does not use tableIdentifier.

Copy link
Copy Markdown
Collaborator

@openinx openinx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left few comments, others looks good to me. thanks @TimothyW553

@TimothyW553 TimothyW553 force-pushed the getcommits-table-identifier branch from 2d3b137 to 0c814bf Compare May 14, 2026 20:13
*/
GetCommitsResponse getCommits(
String tableId,
Optional<TableIdentifier> tableIdentifier,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow commit. Do not make tableIdentifier optional. In tests where it doesn't matter, it can be set to null.

 @Override
  public void commit(
      String tableId,
      URI tableUri,
      TableIdentifier tableIdentifier,

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, getCommits now uses TableIdentifier directly, same as commit.

@TimothyW553 TimothyW553 force-pushed the getcommits-table-identifier branch from 0c814bf to 5aaf7d1 Compare May 14, 2026 20:37
@TimothyW553 TimothyW553 force-pushed the getcommits-table-identifier branch 4 times, most recently from bc923e5 to f6dba9c Compare May 14, 2026 23:20
String tablePath,
Optional<Long> versionOpt,
Optional<Long> timestampOpt,
UCTableIdentifier ucTableIdentifier) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move ucTableIdentifier to right after tablePath

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@TimothyW553 TimothyW553 force-pushed the getcommits-table-identifier branch from f6dba9c to 6bdba53 Compare May 15, 2026 00:39
String tablePath,
Optional<Long> versionOpt,
Optional<Long> timestampOpt) {
return loadSnapshotImpl(engine, ucTableId, tablePath, null, versionOpt, timestampOpt);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @TimothyW553 , we cannot set the null for the UCTableIdentifier here. because if we use the deltaRestApi.enabled for kernel, then we won't be able to use kernel API access the deltaRestCatalog.

So we have to pass through the UCTableIdentifier to all the code path.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only way for us to support it is: adding the ucTableIdentifier in this loadSnapshot API, and pls just keep one loadSnapshot API . we cannot provide a loadSnapshot API without the ucTableIdentifier.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loadSnapshot now has one API that requires UCTableIdentifier, and all callers pass it through.

ucTableId,
tablePath,
versionOpt,
toStorageTableIdentifierOrNull(ucTableIdentifier)));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ucTableIdentifier is always required, cannot be null.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, this path now requires a non-null UCTableIdentifier

startTimestampOpt,
endVersionOpt,
endTimestampOpt,
null);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cannot be null, otherwise we won't be able to support the DeltaRestCatalog in kernel.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, loadCommitRange now requires UCTableIdentifier, and all callers pass it through.

Comment on lines +123 to +129
scala.Option<String> schemaOption = catalogTable.identifier().database();
if (schemaOption.isEmpty()) {
throw new IllegalArgumentException(
"Unable to determine Unity Catalog schema for table "
+ catalogTable.identifier()
+ ": schema name is missing.");
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use convert the catalogTable.identifier to UCTableIdentifier directly ? actually, we don't need the catalogName parameter in this method.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, UCUtils now converts catalogTable.identifier() directly and no longer takes catalogName.

@openinx openinx requested review from weiluo-db and zhenlineo May 15, 2026 05:09
@TimothyW553 TimothyW553 force-pushed the getcommits-table-identifier branch from 1e73c1e to 58f4727 Compare May 15, 2026 05:11
@TimothyW553 TimothyW553 requested review from openinx and yili-db May 15, 2026 21:27
@TimothyW553 TimothyW553 force-pushed the getcommits-table-identifier branch from e06ed8f to a1edad4 Compare May 17, 2026 08:35
schemaName = namespaces[0];
tableName = namespaces[1];
}
UCTableIdentifier tableIdentifier = toUcTableIdentifier(tableId);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: okay, so this tableId is table identifier, while in this PR https://github.com/delta-io/delta/pull/6796/changes#diff-3faa0eba650a9174ae36b64cd5c3ba38387988ccf586b119bb1386592442dc1aR108 , @yili-db and I discussed to use the tableId to represent table uuid. I think we may need to unify the naming for all the code.

not a blocker for the PR, just a comment here, for knowing the context.

String ucTableId,
String tablePath,
Optional<Long> versionOpt,
TableIdentifier tableIdentifier) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if all the caller need to conver the UCTableIdentifier into the TableIdentifier, then a good approach for us may be, define the UCTableIdentifier directly in the getRatifiedCommitsFromUC, and push conversion inside the getRatifiedCommitsFromUC impl . but not a blocker for this PR.

Copy link
Copy Markdown
Collaborator

@openinx openinx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, looks good to me, I left several minor comments, which are not blockers for us to go. thanks @TimothyW553 for the work.

Copy link
Copy Markdown
Collaborator

@harperjiang harperjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level questions:

It seems TableIdentifier include the qualified "name" of UC Table, while we still need UC's table UUID.

Is "identifier" a good name for this variable?
Having two parameters "tableIdentifier" and "ucTableId" in the same method call do not sound natural. Is this only a temporary solution?

@TimothyW553
Copy link
Copy Markdown
Collaborator Author

TimothyW553 commented May 18, 2026

@harperjiang

High level questions:

It seems TableIdentifier include the qualified "name" of UC Table, while we still need UC's table UUID.

Is "identifier" a good name for this variable? Having two parameters "tableIdentifier" and "ucTableId" in the same method call do not sound natural. Is this only a temporary solution?

"identifier" is in fact a terrible name - especially I have been confused multiple times between "id" <> "identifier"

ultimately, this is a short term solution and calls for a large renaming. I have opened an issue which we should address once the bigger changes are done: #6810

@TimothyW553 TimothyW553 force-pushed the getcommits-table-identifier branch 2 times, most recently from 016025d to 4a73aa9 Compare May 18, 2026 23:10
Adds `TableIdentifier` to `UCClient#getCommits` so UC clients can receive the
catalog/schema/table name when fetching commits.

The identifier is forwarded from `UCCommitCoordinatorClient` when available in
the `TableDescriptor`. Kernel catalog-managed snapshot loading also has an
overload that forwards the identifier, and Spark v2 UC snapshot metadata now
carries the identifier from `CatalogTable` into that Kernel path.

The legacy `UCTokenBasedRestClient` accepts the new argument but keeps sending
the existing legacy request fields.

`tableIdentifier` is the three-part `catalog.schema.table` name (not the UC
UUID `tableId`); callers pass it when they have catalog context and null
otherwise, and receivers either require it (rejecting null) or ignore it.

Resolves delta-io#6784.
@TimothyW553 TimothyW553 force-pushed the getcommits-table-identifier branch from 4a73aa9 to 728930d Compare May 19, 2026 00:14
Comment thread build.sbt
(kernelApi / Test / packageBin).value,
(kernelDefaults / Test / packageBin).value,
(kernelUnityCatalog / Test / packageBin).value
"io.delta" % "delta-kernel-unitycatalog" % v,
Copy link
Copy Markdown
Collaborator Author

@TimothyW553 TimothyW553 May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is needed because in Maven mode (-DkernelVersion=...), sparkV2 can't reach kernel-unitycatalog's test helpers (InMemoryUCClient, UCCatalogManagedTestUtils) via the source-mode test->test dep, so we publish and consume them as a -tests classifier jar -- same pattern kernelApi already uses.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as the change below.

Copy link
Copy Markdown
Contributor

@tdas tdas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build.sbt changes approved. @openinx please review and approve the changes

@openinx
Copy link
Copy Markdown
Collaborator

openinx commented May 19, 2026

Thanks @tdas and @harperjiang for the reviewing, and thanks @TimothyW553 for the contribution, I'm approved for this PR. I will plan to merge this PR now.

@openinx openinx merged commit 0906bc3 into delta-io:master May 19, 2026
33 checks passed
openinx pushed a commit that referenced this pull request May 20, 2026
#### Which Delta project/connector is this regarding?

- [ ] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [x] Other (Storage)

## Description

This PR is stacked on #6788.

#6788 passes the Unity Catalog table identifier (`catalog.schema.table`)
through the `UCClient#getCommits` path. This PR builds on top of that
and implements `getCommits` in `UCDeltaTokenBasedRestClient`.

The implementation:

- loads the UC Delta table by `catalog.schema.table`
- validates that the returned UC table UUID matches the requested
`tableId`
- converts returned UC `DeltaCommit` entries into storage `Commit`
entries
- applies the optional start/end version filters locally

Review focus for this stacked PR:

Stack diff:
TimothyW553/delta@728930d...29c699c

## How was this patch tested?

```bash
build/sbt "storage/testOnly io.delta.storage.commit.uccommitcoordinator.UCDeltaTokenBasedRestClientSuite" "storage/testOnly io.delta.storage.commit.uccommitcoordinator.UCTokenBasedRestClientSuite"
```

## Does this PR introduce _any_ user-facing changes?

No.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add the tableIdentifier in the UCClient#getCommits API to support both uc client and uc delta client.

5 participants