[Storage] Implement getCommits in UC Delta token-based client#6814
Conversation
| String[] namespace = Objects.requireNonNull( | ||
| tableIdentifier.getNamespace(), "tableIdentifier namespace must not be null"); | ||
| if (namespace.length != 2) { | ||
| throw new IllegalArgumentException( | ||
| "tableIdentifier must be a three-part Unity Catalog table name"); | ||
| } |
There was a problem hiding this comment.
I think this is common for all the tableIdentifier verification. Could you pls help us to move it a to a static method, so that both getCommits and commit API can share the same static method ?
There was a problem hiding this comment.
good idea, pulled it out as requireThreePartName and used it in commit, getCommits, and loadTable.
29c699c to
36aee29
Compare
| String catalog = Objects.requireNonNull(namespace[0], "catalog name must not be null"); | ||
| String schema = Objects.requireNonNull(namespace[1], "schema name must not be null"); | ||
| String table = Objects.requireNonNull(tableIdentifier.getName(), "table name must not be null"); | ||
| String fullName = catalog + "." + schema + "." + table; |
There was a problem hiding this comment.
Even for those, the getCommits and commit should also share the same method.
| response = deltaTablesApi.loadTable(catalog, schema, table); | ||
| } catch (ApiException e) { | ||
| if (e.getCode() == HTTP_NOT_FOUND) { | ||
| throw new InvalidTargetTableException( |
There was a problem hiding this comment.
yes this was temporary until his changes were in
There was a problem hiding this comment.
yes, swapped to NoSuchTableException once #6811 landed.
- Compare table UUIDs via UUID equality instead of case-sensitive string compare, matching the existing UUID.fromString pattern in commit(). - Extract fromDeltaCommit helper to mirror UCTokenBasedRestClient's fromDeltaCommitInfo and remove the inline null-check storm inside the Commit/FileStatus constructor. - Document why version filtering is client-side (loadTable does not expose server-side filters). - Drop the dead requireNonNull(response, ...) after a successful loadTable; the SDK never returns null on 2xx.
36aee29 to
473ef06
Compare
| response = deltaTablesApi.loadTable(catalog, schema, table); | ||
| } catch (ApiException e) { | ||
| if (e.getCode() == HTTP_NOT_FOUND) { | ||
| throw new InvalidTargetTableException( |
There was a problem hiding this comment.
Oh, I think one legacy issue is: the UCTokenBasedRestClient already use this InvalidTargetTableException exception now.
@yili-db , could you pls help us differentiate the difference between InvalidTargetTableException and your newly introduced NoSuchTableException ?
There was a problem hiding this comment.
I'll go with 404 → NoSuchTableException and wrong-target → InvalidTargetTableException. let me know if you want it different.
- Extract resolveThreePartName helper used by loadTable, commit, and getCommits, replacing three near-identical inline parses of TableIdentifier with one source of truth (per openinx review). - Change getCommits 404 from InvalidTargetTableException to NoSuchTableException, matching loadTable and the typed exception introduced in delta-io#6811. - Update the 404 test to mirror loadTable's NoSuchTableException test (asserts the qualified table name and the response body are in the error message).
|
|
||
| TableMetadata metadata = response.getMetadata(); | ||
| UUID actualTableUuid = metadata != null ? metadata.getTableUuid() : null; | ||
| if (!expectedTableUuid.equals(actualTableUuid)) { |
There was a problem hiding this comment.
So this will happen in the following case:
- t1: create a table
cat.db.table, with tableId = 111. - t2: drop the table .
- t3: construct the
getCommits (cat.db.table, tableId=111). - t4: re-create the table, tableId=222.
- t5: send the
getCommits(cat.db.table, tableId=111).
There was a problem hiding this comment.
yes — this is exactly the case the UUID check protects against.
| Objects.requireNonNull(startVersion, "startVersion must not be null"); | ||
| Objects.requireNonNull(endVersion, "endVersion must not be null"); | ||
|
|
||
| UUID expectedTableUuid = UUID.fromString(tableId); |
There was a problem hiding this comment.
wait, does all the tableId is a unique string that can be serialized to be strickly UUID ? I think OSS UC server should have the implication, but pls take a look for the delta uc spec, and see if it clearly says it's a string that can be deserialized to a UUID ?
Otherwise, if the table id is a unique one, but it will not be able to deserialize to a UUID, then that's a serious bug.
There was a problem hiding this comment.
I will more suggest to use a String, rather than explicitly deserializing to a UUID instance.
There was a problem hiding this comment.
okay, the spec clearly says it's UUID. https://github.com/unitycatalog/unitycatalog/blob/main/api/delta-docs/Models/TableMetadata.md
But I will still suggest to use a String.
There was a problem hiding this comment.
good catch, switched to String.
There was a problem hiding this comment.
It is fine to be a String here, only because the function signature says "String".
In this case, aString.equals(anotherUUID.toString()) is better than UUID.fromString(...).equals(anotherUUID)
57ee576 to
9b9b931
Compare
| Objects.requireNonNull( | ||
| deltaCommit.getFileModificationTimestamp(), | ||
| "commit fileModificationTimestamp must not be null"), |
There was a problem hiding this comment.
will it possible for this getFileModificationTimestamp to be nullable ? the annotation already says is not nullable.
@jakarta.annotation.Nonnull
@JsonProperty(JSON_PROPERTY_FILE_MODIFICATION_TIMESTAMP)
@JsonInclude(value = JsonInclude.Include.ALWAYS)
public Long getFileModificationTimestamp() {
return fileModificationTimestamp;
}
There was a problem hiding this comment.
yes, it's @Nonnull in the SDK — removed the requireNonNull.
| deltaCommit.getFileModificationTimestamp(), | ||
| "commit fileModificationTimestamp must not be null"), | ||
| new Path(basePath, Objects.requireNonNull( | ||
| deltaCommit.getFileName(), "commit fileName must not be null"))); |
There was a problem hiding this comment.
Same comment for getFileName.
There was a problem hiding this comment.
same here — removed the requireNonNull on getFileName too.
| Objects.requireNonNull(deltaCommit.getVersion(), "commit version must not be null"), | ||
| fileStatus, | ||
| Objects.requireNonNull(deltaCommit.getTimestamp(), "commit timestamp must not be null")); |
There was a problem hiding this comment.
those nullable checks may not necessary.
There was a problem hiding this comment.
you're right, removed all the requireNonNull from fromDeltaCommit.
- Rename resolveThreePartName to requireThreePartName: the method validates and unpacks, it does not resolve. requireThreePartName matches the requireNonNull idiom. - Drop the per-field Objects.requireNonNull storm inside fromDeltaCommit and from the outer loop's version unbox. The SDK marks every DeltaCommit getter @nonnull; matching the sibling fromDeltaCommitInfo (which trusts the SDK) keeps the two helpers symmetric. - Import java.util.Arrays instead of fully-qualifying inside requireThreePartName. - Move scala.jdk.CollectionConverters._ into its own scala.* import group between java.* and the third-party block (scalafmt order). - Extend the getCommits null-parameter test to cover startVersion and endVersion as well, matching the five requireNonNull calls in the method body. - Assert message contents on the getCommits UUID-mismatch test (qualified table name plus both UUIDs), matching the assertion shape used by the loadTable 404 test.
Per openinx r3263890890: prefer plain String comparison over round-tripping tableId through UUID.fromString. The UC delta spec canonicalizes the UUID form so both sides produce the same string. Drops the upfront UUID.fromString validation step and keeps the mismatch error message in terms of the strings the caller passed in.
| // Inner Classes | ||
| // =========================== | ||
|
|
||
| /** A Unity Catalog three-part table name resolved from a {@link TableIdentifier}. */ |
There was a problem hiding this comment.
this is intentional kept inner - its a private impl detail
openinx
left a comment
There was a problem hiding this comment.
Generally looks good to me, no blocker comments.
| ensureOpen(); | ||
| Objects.requireNonNull(tableId, "tableId must not be null"); | ||
| Objects.requireNonNull(tableIdentifier, "tableIdentifier must not be null"); | ||
| ResolvedTableName name = requireThreePartName(tableIdentifier); |
There was a problem hiding this comment.
nit: name -> resolvedTable.
| Objects.requireNonNull(startVersion, "startVersion must not be null"); | ||
| Objects.requireNonNull(endVersion, "endVersion must not be null"); | ||
|
|
||
| ResolvedTableName name = requireThreePartName(tableIdentifier); |
| } | ||
|
|
||
| long latestTableVersion = response.getLatestTableVersion() != null | ||
| ? response.getLatestTableVersion() : -1L; |
There was a problem hiding this comment.
will it be the case if the table does not even have a single table version ?
Which Delta project/connector is this regarding?
Description
This PR is stacked on #6788.
#6788 passes the Unity Catalog table identifier (
catalog.schema.table) through theUCClient#getCommitspath. This PR builds on top of that and implementsgetCommitsinUCDeltaTokenBasedRestClient.The implementation:
catalog.schema.tabletableIdDeltaCommitentries into storageCommitentriesReview focus for this stacked PR:
Stack diff: TimothyW553/delta@728930d...29c699c
How was this patch tested?
Does this PR introduce any user-facing changes?
No.