Skip to content

[Storage] Implement getCommits in UC Delta token-based client#6814

Merged
openinx merged 6 commits into
delta-io:masterfrom
TimothyW553:ucdelta-getcommits-pr6788
May 20, 2026
Merged

[Storage] Implement getCommits in UC Delta token-based client#6814
openinx merged 6 commits into
delta-io:masterfrom
TimothyW553:ucdelta-getcommits-pr6788

Conversation

@TimothyW553
Copy link
Copy Markdown
Collaborator

@TimothyW553 TimothyW553 commented May 19, 2026

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (Storage)

Description

This PR is stacked on #6788.

#6788 passes the Unity Catalog table identifier (catalog.schema.table) through the UCClient#getCommits path. This PR builds on top of that and implements getCommits in UCDeltaTokenBasedRestClient.

The implementation:

  • loads the UC Delta table by catalog.schema.table
  • validates that the returned UC table UUID matches the requested tableId
  • converts returned UC DeltaCommit entries into storage Commit entries
  • applies the optional start/end version filters locally

Review focus for this stacked PR:

Stack diff: TimothyW553/delta@728930d...29c699c

How was this patch tested?

build/sbt "storage/testOnly io.delta.storage.commit.uccommitcoordinator.UCDeltaTokenBasedRestClientSuite" "storage/testOnly io.delta.storage.commit.uccommitcoordinator.UCTokenBasedRestClientSuite"

Does this PR introduce any user-facing changes?

No.

Comment on lines +219 to +224
String[] namespace = Objects.requireNonNull(
tableIdentifier.getNamespace(), "tableIdentifier namespace must not be null");
if (namespace.length != 2) {
throw new IllegalArgumentException(
"tableIdentifier must be a three-part Unity Catalog table name");
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is common for all the tableIdentifier verification. Could you pls help us to move it a to a static method, so that both getCommits and commit API can share the same static method ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, pulled it out as requireThreePartName and used it in commit, getCommits, and loadTable.

@TimothyW553 TimothyW553 force-pushed the ucdelta-getcommits-pr6788 branch from 29c699c to 36aee29 Compare May 19, 2026 05:05
Comment on lines +225 to +228
String catalog = Objects.requireNonNull(namespace[0], "catalog name must not be null");
String schema = Objects.requireNonNull(namespace[1], "schema name must not be null");
String table = Objects.requireNonNull(tableIdentifier.getName(), "table name must not be null");
String fullName = catalog + "." + schema + "." + table;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even for those, the getCommits and commit should also share the same method.

response = deltaTablesApi.loadTable(catalog, schema, table);
} catch (ApiException e) {
if (e.getCode() == HTTP_NOT_FOUND) {
throw new InvalidTargetTableException(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is incorrect. for this 404 error code, we should throw the NoSuchTableException, the exception will be from @yili-db's latest PR: #6811

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this was temporary until his changes were in

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, swapped to NoSuchTableException once #6811 landed.

- Compare table UUIDs via UUID equality instead of case-sensitive string
  compare, matching the existing UUID.fromString pattern in commit().
- Extract fromDeltaCommit helper to mirror UCTokenBasedRestClient's
  fromDeltaCommitInfo and remove the inline null-check storm inside the
  Commit/FileStatus constructor.
- Document why version filtering is client-side (loadTable does not
  expose server-side filters).
- Drop the dead requireNonNull(response, ...) after a successful
  loadTable; the SDK never returns null on 2xx.
@TimothyW553 TimothyW553 force-pushed the ucdelta-getcommits-pr6788 branch from 36aee29 to 473ef06 Compare May 19, 2026 05:13
response = deltaTablesApi.loadTable(catalog, schema, table);
} catch (ApiException e) {
if (e.getCode() == HTTP_NOT_FOUND) {
throw new InvalidTargetTableException(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think one legacy issue is: the UCTokenBasedRestClient already use this InvalidTargetTableException exception now.

@yili-db , could you pls help us differentiate the difference between InvalidTargetTableException and your newly introduced NoSuchTableException ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll go with 404 → NoSuchTableException and wrong-target → InvalidTargetTableException. let me know if you want it different.

- Extract resolveThreePartName helper used by loadTable, commit, and
  getCommits, replacing three near-identical inline parses of
  TableIdentifier with one source of truth (per openinx review).
- Change getCommits 404 from InvalidTargetTableException to
  NoSuchTableException, matching loadTable and the typed exception
  introduced in delta-io#6811.
- Update the 404 test to mirror loadTable's NoSuchTableException test
  (asserts the qualified table name and the response body are in the
  error message).

TableMetadata metadata = response.getMetadata();
UUID actualTableUuid = metadata != null ? metadata.getTableUuid() : null;
if (!expectedTableUuid.equals(actualTableUuid)) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this will happen in the following case:

  • t1: create a table cat.db.table, with tableId = 111.
  • t2: drop the table .
  • t3: construct the getCommits (cat.db.table, tableId=111).
  • t4: re-create the table, tableId=222.
  • t5: send the getCommits(cat.db.table, tableId=111).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes — this is exactly the case the UUID check protects against.

Objects.requireNonNull(startVersion, "startVersion must not be null");
Objects.requireNonNull(endVersion, "endVersion must not be null");

UUID expectedTableUuid = UUID.fromString(tableId);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, does all the tableId is a unique string that can be serialized to be strickly UUID ? I think OSS UC server should have the implication, but pls take a look for the delta uc spec, and see if it clearly says it's a string that can be deserialized to a UUID ?

Otherwise, if the table id is a unique one, but it will not be able to deserialize to a UUID, then that's a serious bug.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will more suggest to use a String, rather than explicitly deserializing to a UUID instance.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, the spec clearly says it's UUID. https://github.com/unitycatalog/unitycatalog/blob/main/api/delta-docs/Models/TableMetadata.md

But I will still suggest to use a String.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, switched to String.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fine to be a String here, only because the function signature says "String".
In this case, aString.equals(anotherUUID.toString()) is better than UUID.fromString(...).equals(anotherUUID)

@TimothyW553 TimothyW553 force-pushed the ucdelta-getcommits-pr6788 branch from 57ee576 to 9b9b931 Compare May 19, 2026 05:50
Comment on lines +353 to +355
Objects.requireNonNull(
deltaCommit.getFileModificationTimestamp(),
"commit fileModificationTimestamp must not be null"),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will it possible for this getFileModificationTimestamp to be nullable ? the annotation already says is not nullable.

@jakarta.annotation.Nonnull
  @JsonProperty(JSON_PROPERTY_FILE_MODIFICATION_TIMESTAMP)
  @JsonInclude(value = JsonInclude.Include.ALWAYS)
  public Long getFileModificationTimestamp() {
    return fileModificationTimestamp;
  }

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it's @Nonnull in the SDK — removed the requireNonNull.

deltaCommit.getFileModificationTimestamp(),
"commit fileModificationTimestamp must not be null"),
new Path(basePath, Objects.requireNonNull(
deltaCommit.getFileName(), "commit fileName must not be null")));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment for getFileName.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here — removed the requireNonNull on getFileName too.

Comment on lines +359 to +361
Objects.requireNonNull(deltaCommit.getVersion(), "commit version must not be null"),
fileStatus,
Objects.requireNonNull(deltaCommit.getTimestamp(), "commit timestamp must not be null"));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those nullable checks may not necessary.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right, removed all the requireNonNull from fromDeltaCommit.

- Rename resolveThreePartName to requireThreePartName: the method
  validates and unpacks, it does not resolve. requireThreePartName
  matches the requireNonNull idiom.
- Drop the per-field Objects.requireNonNull storm inside fromDeltaCommit
  and from the outer loop's version unbox. The SDK marks every
  DeltaCommit getter @nonnull; matching the sibling fromDeltaCommitInfo
  (which trusts the SDK) keeps the two helpers symmetric.
- Import java.util.Arrays instead of fully-qualifying inside
  requireThreePartName.
- Move scala.jdk.CollectionConverters._ into its own scala.* import
  group between java.* and the third-party block (scalafmt order).
- Extend the getCommits null-parameter test to cover startVersion and
  endVersion as well, matching the five requireNonNull calls in the
  method body.
- Assert message contents on the getCommits UUID-mismatch test
  (qualified table name plus both UUIDs), matching the assertion shape
  used by the loadTable 404 test.
Per openinx r3263890890: prefer plain String comparison over
round-tripping tableId through UUID.fromString. The UC delta spec
canonicalizes the UUID form so both sides produce the same string.
Drops the upfront UUID.fromString validation step and keeps the
mismatch error message in terms of the strings the caller passed in.
@TimothyW553 TimothyW553 marked this pull request as ready for review May 19, 2026 21:10
// Inner Classes
// ===========================

/** A Unity Catalog three-part table name resolved from a {@link TableIdentifier}. */
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is intentional kept inner - its a private impl detail

Copy link
Copy Markdown
Collaborator

@openinx openinx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good to me, no blocker comments.

ensureOpen();
Objects.requireNonNull(tableId, "tableId must not be null");
Objects.requireNonNull(tableIdentifier, "tableIdentifier must not be null");
ResolvedTableName name = requireThreePartName(tableIdentifier);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: name -> resolvedTable.

Objects.requireNonNull(startVersion, "startVersion must not be null");
Objects.requireNonNull(endVersion, "endVersion must not be null");

ResolvedTableName name = requireThreePartName(tableIdentifier);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

}

long latestTableVersion = response.getLatestTableVersion() != null
? response.getLatestTableVersion() : -1L;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will it be the case if the table does not even have a single table version ?

@openinx openinx merged commit d6ce0e6 into delta-io:master May 20, 2026
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants