[Storage] Add tableIdentifier to UCClient getCommits#6788
Conversation
| try { | ||
| return ucClient.getCommits( | ||
| ucTableId, | ||
| null /* tableIdentifier */, |
There was a problem hiding this comment.
Hi @TimothyW553 , I think we still need to pass the tableIdentifier for the getCommits , in this UCCatalogManagedClient. since we will use the same method to delegate to either UCClient or UCDeltaClient.
And regardless of the client, both impls should work for the UCCatalogManagedClient. so we will have to pass the correct tableIdentifier here.
There was a problem hiding this comment.
Done, Kernel now passes the real TableIdentifier to getCommits when it has the UC table identifier.
| val response = client.getCommits("tableId", fakeURI, startVersionOpt, endVersionOpt) | ||
| val response = client.getCommits( | ||
| "tableId", | ||
| null, |
There was a problem hiding this comment.
This one is fine to be null, since we are only use it for testing.
There was a problem hiding this comment.
Ack, I kept null only in tests where tableIdentifier is not used.
| // Build the DeltaGetCommits request using SDK models. The legacy API does not accept | ||
| // tableIdentifier, but UC Delta Rest Catalog clients need it in this shared interface. |
There was a problem hiding this comment.
This change can be removed, since the UCTokenBasedRestClient shouldn't know any internal impls of UCDeltaTokenBasedRestClient.
There was a problem hiding this comment.
Done, removed the comment about UCDeltaTokenBasedRestClient.
| } | ||
| } | ||
|
|
||
| test("getCommits accepts table identifier without changing legacy request") { |
There was a problem hiding this comment.
This test seems like useless, I think we can just remove, since the tableIdentifier that passed for getCommits in the `UCTokenBasedRestClient, actually is never used. I think it's not worth for us to add a separate unit test for it.
There was a problem hiding this comment.
Done, removed this test because UCTokenBasedRestClient does not use tableIdentifier.
openinx
left a comment
There was a problem hiding this comment.
Left few comments, others looks good to me. thanks @TimothyW553
2d3b137 to
0c814bf
Compare
| */ | ||
| GetCommitsResponse getCommits( | ||
| String tableId, | ||
| Optional<TableIdentifier> tableIdentifier, |
There was a problem hiding this comment.
Follow commit. Do not make tableIdentifier optional. In tests where it doesn't matter, it can be set to null.
@Override
public void commit(
String tableId,
URI tableUri,
TableIdentifier tableIdentifier,
There was a problem hiding this comment.
Done, getCommits now uses TableIdentifier directly, same as commit.
0c814bf to
5aaf7d1
Compare
bc923e5 to
f6dba9c
Compare
| String tablePath, | ||
| Optional<Long> versionOpt, | ||
| Optional<Long> timestampOpt, | ||
| UCTableIdentifier ucTableIdentifier) { |
There was a problem hiding this comment.
Move ucTableIdentifier to right after tablePath
f6dba9c to
6bdba53
Compare
| String tablePath, | ||
| Optional<Long> versionOpt, | ||
| Optional<Long> timestampOpt) { | ||
| return loadSnapshotImpl(engine, ucTableId, tablePath, null, versionOpt, timestampOpt); |
There was a problem hiding this comment.
Hi @TimothyW553 , we cannot set the null for the UCTableIdentifier here. because if we use the deltaRestApi.enabled for kernel, then we won't be able to use kernel API access the deltaRestCatalog.
So we have to pass through the UCTableIdentifier to all the code path.
There was a problem hiding this comment.
I think the only way for us to support it is: adding the ucTableIdentifier in this loadSnapshot API, and pls just keep one loadSnapshot API . we cannot provide a loadSnapshot API without the ucTableIdentifier.
There was a problem hiding this comment.
loadSnapshot now has one API that requires UCTableIdentifier, and all callers pass it through.
| ucTableId, | ||
| tablePath, | ||
| versionOpt, | ||
| toStorageTableIdentifierOrNull(ucTableIdentifier))); |
There was a problem hiding this comment.
The ucTableIdentifier is always required, cannot be null.
There was a problem hiding this comment.
Done, this path now requires a non-null UCTableIdentifier
| startTimestampOpt, | ||
| endVersionOpt, | ||
| endTimestampOpt, | ||
| null); |
There was a problem hiding this comment.
This cannot be null, otherwise we won't be able to support the DeltaRestCatalog in kernel.
There was a problem hiding this comment.
Done, loadCommitRange now requires UCTableIdentifier, and all callers pass it through.
| scala.Option<String> schemaOption = catalogTable.identifier().database(); | ||
| if (schemaOption.isEmpty()) { | ||
| throw new IllegalArgumentException( | ||
| "Unable to determine Unity Catalog schema for table " | ||
| + catalogTable.identifier() | ||
| + ": schema name is missing."); | ||
| } |
There was a problem hiding this comment.
Why not just use convert the catalogTable.identifier to UCTableIdentifier directly ? actually, we don't need the catalogName parameter in this method.
There was a problem hiding this comment.
Done, UCUtils now converts catalogTable.identifier() directly and no longer takes catalogName.
1e73c1e to
58f4727
Compare
e06ed8f to
a1edad4
Compare
| schemaName = namespaces[0]; | ||
| tableName = namespaces[1]; | ||
| } | ||
| UCTableIdentifier tableIdentifier = toUcTableIdentifier(tableId); |
There was a problem hiding this comment.
nit: okay, so this tableId is table identifier, while in this PR https://github.com/delta-io/delta/pull/6796/changes#diff-3faa0eba650a9174ae36b64cd5c3ba38387988ccf586b119bb1386592442dc1aR108 , @yili-db and I discussed to use the tableId to represent table uuid. I think we may need to unify the naming for all the code.
not a blocker for the PR, just a comment here, for knowing the context.
| String ucTableId, | ||
| String tablePath, | ||
| Optional<Long> versionOpt, | ||
| TableIdentifier tableIdentifier) { |
There was a problem hiding this comment.
nit: if all the caller need to conver the UCTableIdentifier into the TableIdentifier, then a good approach for us may be, define the UCTableIdentifier directly in the getRatifiedCommitsFromUC, and push conversion inside the getRatifiedCommitsFromUC impl . but not a blocker for this PR.
openinx
left a comment
There was a problem hiding this comment.
Generally, looks good to me, I left several minor comments, which are not blockers for us to go. thanks @TimothyW553 for the work.
harperjiang
left a comment
There was a problem hiding this comment.
High level questions:
It seems TableIdentifier include the qualified "name" of UC Table, while we still need UC's table UUID.
Is "identifier" a good name for this variable?
Having two parameters "tableIdentifier" and "ucTableId" in the same method call do not sound natural. Is this only a temporary solution?
"identifier" is in fact a terrible name - especially I have been confused multiple times between "id" <> "identifier" ultimately, this is a short term solution and calls for a large renaming. I have opened an issue which we should address once the bigger changes are done: #6810 |
016025d to
4a73aa9
Compare
Adds `TableIdentifier` to `UCClient#getCommits` so UC clients can receive the catalog/schema/table name when fetching commits. The identifier is forwarded from `UCCommitCoordinatorClient` when available in the `TableDescriptor`. Kernel catalog-managed snapshot loading also has an overload that forwards the identifier, and Spark v2 UC snapshot metadata now carries the identifier from `CatalogTable` into that Kernel path. The legacy `UCTokenBasedRestClient` accepts the new argument but keeps sending the existing legacy request fields. `tableIdentifier` is the three-part `catalog.schema.table` name (not the UC UUID `tableId`); callers pass it when they have catalog context and null otherwise, and receivers either require it (rejecting null) or ignore it. Resolves delta-io#6784.
4a73aa9 to
728930d
Compare
| (kernelApi / Test / packageBin).value, | ||
| (kernelDefaults / Test / packageBin).value, | ||
| (kernelUnityCatalog / Test / packageBin).value | ||
| "io.delta" % "delta-kernel-unitycatalog" % v, |
There was a problem hiding this comment.
This change is needed because in Maven mode (-DkernelVersion=...), sparkV2 can't reach kernel-unitycatalog's test helpers (InMemoryUCClient, UCCatalogManagedTestUtils) via the source-mode test->test dep, so we publish and consume them as a -tests classifier jar -- same pattern kernelApi already uses.
There was a problem hiding this comment.
same as the change below.
|
Thanks @tdas and @harperjiang for the reviewing, and thanks @TimothyW553 for the contribution, I'm approved for this PR. I will plan to merge this PR now. |
#### Which Delta project/connector is this regarding? - [ ] Spark - [ ] Standalone - [ ] Flink - [ ] Kernel - [x] Other (Storage) ## Description This PR is stacked on #6788. #6788 passes the Unity Catalog table identifier (`catalog.schema.table`) through the `UCClient#getCommits` path. This PR builds on top of that and implements `getCommits` in `UCDeltaTokenBasedRestClient`. The implementation: - loads the UC Delta table by `catalog.schema.table` - validates that the returned UC table UUID matches the requested `tableId` - converts returned UC `DeltaCommit` entries into storage `Commit` entries - applies the optional start/end version filters locally Review focus for this stacked PR: Stack diff: TimothyW553/delta@728930d...29c699c ## How was this patch tested? ```bash build/sbt "storage/testOnly io.delta.storage.commit.uccommitcoordinator.UCDeltaTokenBasedRestClientSuite" "storage/testOnly io.delta.storage.commit.uccommitcoordinator.UCTokenBasedRestClientSuite" ``` ## Does this PR introduce _any_ user-facing changes? No.
Which Delta project/connector is this regarding?
Description
Adds
TableIdentifiertoUCClient#getCommitsso UC clients can receive the catalog/schema/table name when fetching commits.The identifier is forwarded from
UCCommitCoordinatorClientwhen available in theTableDescriptor. Kernel catalog-managed snapshot loading also has an overload that forwards the identifier, and Spark v2 UC snapshot metadata now carries the identifier fromCatalogTableinto that Kernel path.The legacy
UCTokenBasedRestClientaccepts the new argument but keeps sending the existing legacy request fields.Resolves #6784.
Addresses the
getCommitsAPI follow-up from #6780.tableIdentifier contract
tableIdentifieris the three-partcatalog.schema.tablename (not the UC UUIDtableId). Callers pass it when they have catalog context, null otherwise; receivers either require it (rejecting null) or ignore it.UCCatalogManagedClientUCManagedTableSnapshotManagerCatalogTable)DeltaTableV2)DeltaTableV2.catalogTable)DeltaTable.forPath,delta.`s3://...`)CatalogManagedTablewhat does each receiver do when it gets this new tableIdentifier?
UCTokenBasedRestClient(legacy)InMemoryUCClient(test)UCDeltaTokenBasedRestClientHow was this patch tested?
Does this PR introduce any user-facing changes?
No.