Skip to content

[Storage] Extend UCDeltaClient with table-loading ops + exceptions#6811

Merged
openinx merged 3 commits into
delta-io:masterfrom
yili-db:stack/UCDeltaTokenBasedRestClient_load
May 19, 2026
Merged

[Storage] Extend UCDeltaClient with table-loading ops + exceptions#6811
openinx merged 3 commits into
delta-io:masterfrom
yili-db:stack/UCDeltaTokenBasedRestClient_load

Conversation

@yili-db
Copy link
Copy Markdown
Collaborator

@yili-db yili-db commented May 18, 2026

🥞 Stacked PR

Use this link to review incremental changes.


Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (Storage)

Description

[Storage] Extend UCDeltaClient with table-loading ops + exceptions

  • loadTable / createStagingTable / createTable now take TableIdentifier; createTable takes AbstractMetadata + AbstractProtocol (mirrors commit()).
  • TableInfo realigned with StagingTableInfo (tableId field + ordering).
  • New typed exceptions: CredentialFetchFailedException, NoSuchTableException, UnsupportedTableFormatException.
  • build.sbt: conditional unitycatalog-hadoop dep gated on UC version >= 0.5.0 via a small isAtLeastVersion helper.

How was this patch tested?

Added tests

Does this PR introduce any user-facing changes?

No

Comment on lines +37 to +39
public TableInfo getTableInfoWithoutCredentials() {
return tableInfoWithoutCredentials;
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are this getTableInfoWithoutCredentials still needed ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +27 to +29
public UnsupportedTableFormatException(String message) {
super(message);
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this still using ?

Copy link
Copy Markdown
Collaborator Author

@yili-db yili-db May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much everything in this PR is unused until #6796

* @throws IOException on network or API errors
*/
AbstractMetadata loadTable(String catalog, String schema, String table) throws IOException;
TableInfo loadTable(TableIdentifier tableIdentifier) throws IOException;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the parameters, should we use the catalog, schema, table directly, rather than the TableIdentifier, so that we can keep align with all the existing public methods definition in this interface ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to align with:

  @Override
  public void commit(
      String tableId,
      URI tableUri,
      TableIdentifier tableIdentifier,

/** Result of {@link UCDeltaClient#loadTable}. */
public static final class TableInfo {

private final String tableId;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the tableUuid, then I suggest to rename it as tableUuid, since I just reviewed @TimothyW553 's PR, and found that the kernal actually use the tableId to represent the table identifier. see here: #6788 (comment)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean Kernel is writing TableIdentifier tableId? Kernel shouldn't do that then. tableId is referring to a UUID universally everywhere else. The co-incidence of TableIdentifier being similar to tableId is making it sometimes confusing. But tableId must be that UUID.
To make it easier to understand and harder to confuse, I am changing it to a UUID object instead of String.

this.credentialRenewalEnabled = credentialRenewalEnabled;
this.credentialScopedFsEnabled = credentialScopedFsEnabled;
this.hadoopConfSupplier =
hadoopConfSupplier != null ? hadoopConfSupplier : () -> new Configuration();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can use the Configuration::new direclty here.

// UC's loadTable response carries the UC table_uuid (exposed via TableInfo.getTableId),
// not the Delta Metadata.id. The Delta id only lives in the Delta log Metadata action and
// is never sent to UC; callers that need it must read the log.
return null;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will it have problem if the metadata.id is null ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still better than returning the UC table id as metadata.id. Here it really doesn't know the metadata.id.

// come out as bare strings (Delta's wire format), e.g. "integer" rather than
// {"type":"integer"}. The result is parseable by Delta's schema readers
// (e.g. DataType.fromJson on the Spark side).
return DELTA_SCHEMA_MAPPER.writeValueAsString(m.getColumns());
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have any test coverage for those changes ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Just look for getSchemaString in this PR.

Comment thread build.sbt
Copy link
Copy Markdown
Collaborator

@openinx openinx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, looks good to me, thanks @yili-db for the work. Just left few minor changes, I'm fine to address it in the following PR.

@yili-db yili-db force-pushed the stack/UCDeltaTokenBasedRestClient_load branch 2 times, most recently from 8d98f06 to f4f7d88 Compare May 18, 2026 23:15
Comment thread build.sbt
if (useDefaultUnityCatalogReleaseVersion) defaultUnityCatalogReleaseVersion
else unityCatalogReleaseVersion.getOrElse(pinnedUnityCatalogVersion))

/**
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move this into the file that defines the unityCatalogReleaseVersion??

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unityCatalogReleaseVersion is defined in this file.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aah .. i thought we have made a separate utility file. i think we should do that and remove all this crud from this master file... but that can happen in a later pr.

"BOOLEAN", "BYTE", "SHORT", "INT", "LONG", "FLOAT", "DOUBLE",
"DATE", "TIMESTAMP", "TIMESTAMP_NTZ", "STRING", "BINARY", "DECIMAL");

/** Emits Delta's schema JSON wire format: bare-string primitives + camelCase field names. */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add docs to explain how this works?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

ObjectMapper m = JSON.getDefault().getMapper().copy();
m.registerModule(new DeltaTypeModule());
m.addMixIn(ArrayType.class, CamelCaseArrayMixin.class);
m.addMixIn(MapType.class, CamelCaseMapMixin.class);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please comment to explain what this is doing.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment added

return m;
}

abstract static class CamelCaseArrayMixin {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class docs explaining what this is used for

and why is this abstract?

shouldnt this be private? does not any one outside this class need this?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is made private and with doc added.

    * <p>The class is {@code abstract} and the methods abstract because Jackson never
   * instantiates the mixin: it only inspects annotated method signatures and projects the
   * annotations onto the target class. Making the class abstract makes that contract
   * explicit and avoids a no-op constructor.

abstract void setContainsNull(Boolean v);
}

abstract static class CamelCaseMapMixin {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if all the schema conversion logic should be a utility class of its own and thoroughly tested. otherwise these tests will be shared across multiple endpoints, etc. with gaps and all.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a blocker for this Pr.. but please follow up PR to make this better.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to storage/src/main/java/io/delta/storage/commit/uccommitcoordinator/DeltaSchemaConverter.java

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no testsuite for this?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added testsuite.

val m = c.loadTable(testCatalog, testSchema, testTable)
val info = c.loadTable(testIdentifier)
assert(info.getLocation === "s3://bucket/table")
assert(info.getTableId === testTableId)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[AI Assisted] This assertion (and the matching one at line 556 for createStagingTable) will fail at runtime now that TableInfo.getTableId / StagingTableInfo.getTableId return java.util.UUID instead of String.

testTableId is still declared as a String ("550e8400-e29b-41d4-a716-446655440000"). ScalaTest's === uses Equality[A].areEqual, which falls back to a.equals(b) — and UUID.equals(String) == false. The expression typechecks (because === is Any-shaped) but the assertion fails on a real run.

Suggested fix:

private val testTableIdStr = "550e8400-e29b-41d4-a716-446655440000"
private val testTableId = java.util.UUID.fromString(testTableIdStr)

and use testTableIdStr at JSON sites ("table-uuid":"$testTableIdStr", "table-id":"$testTableIdStr", uuid assertions on the captured commit body) where a string is genuinely needed; keep testTableId (now a UUID) for the info.getTableId === ... comparisons.

Worth double-checking by running this suite locally — if the assertions are actually green for you, I'd like to understand why, because by my reading they shouldn't be.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed testTableId to UUID.

@yili-db yili-db requested review from openinx and tdas May 19, 2026 00:45
* {@link StructType}. Only primitive types are supported today; complex types throw
* {@link UnsupportedOperationException}.
*/
static StructType toSDKStructType(List<UCClient.ColumnDef> columns) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you replace SDK*** to UC***. SDKStructType to UCStructType.

* </ul>
* The resulting JSON is parseable by Delta's schema readers (e.g. {@code DataType.fromJson}).
*/
final class DeltaSchemaConverter {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we name this UCDeltaSchemaConverter because this is specific to UC types.

can we move all the UC-specific classes to a package of its own?

Copy link
Copy Markdown
Collaborator Author

@yili-db yili-db May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed. This package name is uccommitcoordinator so it is UC-specific.
It's a little odd that this entire UCDeltaClient is no longer limited to commitcoordinator. But that's a bigger refactor for later.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good.

* Singleton mapper preconfigured to emit Delta's wire format
* (bare-string primitives, camelCase keys for nested types). See class-level docs.
*/
static final ObjectMapper DELTA_SCHEMA_MAPPER = createDeltaSchemaMapper();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be public?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made private

* Primitive type names that the legacy create-table path ({@link #toSDKStructType}) accepts.
* Visible to the package for tests; not part of the converter's public surface.
*/
static final Set<String> PRIMITIVE_TYPE_NAMES = Set.of(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do all of these methods and fields need to public.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made private

abstract Boolean getValueContainsNull();
@JsonSetter("valueContainsNull")
abstract void setValueContainsNull(Boolean v);
}
Copy link
Copy Markdown
Contributor

@tdas tdas May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no bidirectional test suite for this?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added testsuite UCDeltaSchemaConverterSuite.scala

@yili-db yili-db requested a review from tdas May 19, 2026 02:35
yili-db added 2 commits May 18, 2026 20:20
- loadTable / createStagingTable / createTable now take TableIdentifier; createTable takes AbstractMetadata + AbstractProtocol (mirrors commit()).
- TableInfo realigned with StagingTableInfo (tableId field + ordering).
- New typed exceptions: CredentialFetchFailedException, NoSuchTableException, UnsupportedTableFormatException.
- build.sbt: conditional unitycatalog-hadoop dep gated on UC version >= 0.5.0 via a small isAtLeastVersion helper.

Signed-off-by: Yi Li <yi.li@databricks.com>
Signed-off-by: Yi Li <yi.li@databricks.com>
Signed-off-by: Yi Li <yi.li@databricks.com>
@yili-db yili-db force-pushed the stack/UCDeltaTokenBasedRestClient_load branch from 5787f44 to 3e2f64f Compare May 19, 2026 03:24
kebabKeys.foreach { k =>
assert(!json.contains("\"" + k + "\""), s"unexpected kebab-case key '$k'")
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[AI Assisted] Nice converter suite — it nails the wire-format edges (bare-string primitives, camelCase keys, no kebab leakage) and the toUCStructType happy path. A few gaps worth filing as follow-ups (none blocking this PR):

Higher-value follow-ups

  1. Nested struct. Struct<...> as a field's type is the most common Delta shape after primitives, but there is zero coverage. Add at least:

    • Struct<...> directly as a field type (field("nested", innerStruct))
    • Array<Struct<...>>
    • Map<string, Struct<...>> and Map<string, Array<...>> (complex value type, not just complex outer)
    • >2-level nesting (e.g. Array<Array<Map<...>>>)
  2. StructField.metadata round-trip. Delta wire format carries "metadata":{...} per field (e.g. comment, generated). The UC SDK's StructField has a metadata slot. Untested. One test that sets non-empty metadata on a field and confirms the JSON output contains it verbatim. If it silently drops, that's a real bug for any non-trivial Delta table loaded through this client.

  3. DataType.fromJson round-trip. The class doc explicitly claims "parseable by Delta's schema readers (e.g. DataType.fromJson)". Current tests parse with a generic ObjectMapper and check key shapes — that proves the JSON is well-formed, not that Delta's reader accepts it. If delta-storage can't depend on delta-spark directly, add the round-trip test in a delta-spark test suite where org.apache.spark.sql.types.DataType.fromJson is already on the classpath.

Lower-value follow-ups

  1. Strengthen the "every primitive is accepted" smoke test (line 184-193). It only asserts getFields.size() === 1 — if all 13 primitives collapsed to the same wire string the test would still pass. Also assert s.getFields.get(0).getType.asInstanceOf[PrimitiveType].getType matches the expected per-type wire form.

  2. toUCStructType unsupported-name coverage. Only ARRAY is tested for rejection. Parameterize over Seq("MAP", "STRUCT", "VARIANT", "int", "Integer", "") to pin case sensitivity and the rejection contract.

  3. containsNull == null / valueContainsNull == null. The Jackson setters allow null. One test that pins whether the output omits the key or emits "containsNull":null — Delta's reader rejects the latter, so this matters.

  4. All 13 primitives in nested contexts. Today only ~5 (string, integer, double, long, date) appear inside arrays/maps. Easy to widen the existing nested tests to cover the rest.

Happy to file these as a tracking issue if useful.

Copy link
Copy Markdown
Contributor

@tdas tdas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[AI Assisted] LGTM. Blocking issue from the earlier round (UUID vs String test assertions) is fixed, and the schema-conversion code has been promoted into its own UCDeltaSchemaConverter with a dedicated UCDeltaSchemaConverterSuite that solidly covers primitives (bare-string + decimal), arrays, maps, 2-level nesting, and the no-kebab-leakage invariant.

Left a single inline comment with follow-up test gaps (nested struct, StructField.metadata, DataType.fromJson round-trip, plus a few smaller items) — none blocking this PR.

Other open items from earlier rounds that are also fine as follow-ups:

  • PR description still says createTable takes AbstractMetadata + AbstractProtocol and createStagingTable takes a TableIdentifier, but the interfaces still have the older signatures. Either update the description or the interfaces in the follow-up.
  • UnsupportedTableFormatException detection via substring-match on the response body is brittle; ideally parse the structured error envelope.
  • fetchTableCredentials catches IllegalArgumentException to detect "no creds for this scheme" — prefer a positive scheme check.
  • Untested production paths: CredentialFetchFailedException (cred-fetch retries exhausted), loadTable namespace validation (non-2-component TableIdentifier), and the 9-param UCDeltaClient.createTable.

Thanks for the responsiveness on the addressed comments.

@openinx openinx merged commit 972e615 into delta-io:master May 19, 2026
32 checks passed
TimothyW553 added a commit to TimothyW553/delta that referenced this pull request May 19, 2026
- Extract resolveThreePartName helper used by loadTable, commit, and
  getCommits, replacing three near-identical inline parses of
  TableIdentifier with one source of truth (per openinx review).
- Change getCommits 404 from InvalidTargetTableException to
  NoSuchTableException, matching loadTable and the typed exception
  introduced in delta-io#6811.
- Update the 404 test to mirror loadTable's NoSuchTableException test
  (asserts the qualified table name and the response body are in the
  error message).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants