[Storage] Extend UCDeltaClient with table-loading ops + exceptions#6811
Conversation
| public TableInfo getTableInfoWithoutCredentials() { | ||
| return tableInfoWithoutCredentials; | ||
| } |
There was a problem hiding this comment.
Are this getTableInfoWithoutCredentials still needed ?
There was a problem hiding this comment.
| public UnsupportedTableFormatException(String message) { | ||
| super(message); | ||
| } |
There was a problem hiding this comment.
Pretty much everything in this PR is unused until #6796
| * @throws IOException on network or API errors | ||
| */ | ||
| AbstractMetadata loadTable(String catalog, String schema, String table) throws IOException; | ||
| TableInfo loadTable(TableIdentifier tableIdentifier) throws IOException; |
There was a problem hiding this comment.
for the parameters, should we use the catalog, schema, table directly, rather than the TableIdentifier, so that we can keep align with all the existing public methods definition in this interface ?
There was a problem hiding this comment.
This is to align with:
@Override
public void commit(
String tableId,
URI tableUri,
TableIdentifier tableIdentifier,
| /** Result of {@link UCDeltaClient#loadTable}. */ | ||
| public static final class TableInfo { | ||
|
|
||
| private final String tableId; |
There was a problem hiding this comment.
If this is the tableUuid, then I suggest to rename it as tableUuid, since I just reviewed @TimothyW553 's PR, and found that the kernal actually use the tableId to represent the table identifier. see here: #6788 (comment)
There was a problem hiding this comment.
You mean Kernel is writing TableIdentifier tableId? Kernel shouldn't do that then. tableId is referring to a UUID universally everywhere else. The co-incidence of TableIdentifier being similar to tableId is making it sometimes confusing. But tableId must be that UUID.
To make it easier to understand and harder to confuse, I am changing it to a UUID object instead of String.
| this.credentialRenewalEnabled = credentialRenewalEnabled; | ||
| this.credentialScopedFsEnabled = credentialScopedFsEnabled; | ||
| this.hadoopConfSupplier = | ||
| hadoopConfSupplier != null ? hadoopConfSupplier : () -> new Configuration(); |
There was a problem hiding this comment.
nit: we can use the Configuration::new direclty here.
| // UC's loadTable response carries the UC table_uuid (exposed via TableInfo.getTableId), | ||
| // not the Delta Metadata.id. The Delta id only lives in the Delta log Metadata action and | ||
| // is never sent to UC; callers that need it must read the log. | ||
| return null; |
There was a problem hiding this comment.
will it have problem if the metadata.id is null ?
There was a problem hiding this comment.
It's still better than returning the UC table id as metadata.id. Here it really doesn't know the metadata.id.
| // come out as bare strings (Delta's wire format), e.g. "integer" rather than | ||
| // {"type":"integer"}. The result is parseable by Delta's schema readers | ||
| // (e.g. DataType.fromJson on the Spark side). | ||
| return DELTA_SCHEMA_MAPPER.writeValueAsString(m.getColumns()); |
There was a problem hiding this comment.
do we have any test coverage for those changes ?
There was a problem hiding this comment.
Yes. Just look for getSchemaString in this PR.
8d98f06 to
f4f7d88
Compare
| if (useDefaultUnityCatalogReleaseVersion) defaultUnityCatalogReleaseVersion | ||
| else unityCatalogReleaseVersion.getOrElse(pinnedUnityCatalogVersion)) | ||
|
|
||
| /** |
There was a problem hiding this comment.
can we move this into the file that defines the unityCatalogReleaseVersion??
There was a problem hiding this comment.
unityCatalogReleaseVersion is defined in this file.
There was a problem hiding this comment.
aah .. i thought we have made a separate utility file. i think we should do that and remove all this crud from this master file... but that can happen in a later pr.
| "BOOLEAN", "BYTE", "SHORT", "INT", "LONG", "FLOAT", "DOUBLE", | ||
| "DATE", "TIMESTAMP", "TIMESTAMP_NTZ", "STRING", "BINARY", "DECIMAL"); | ||
|
|
||
| /** Emits Delta's schema JSON wire format: bare-string primitives + camelCase field names. */ |
There was a problem hiding this comment.
can you add docs to explain how this works?
| ObjectMapper m = JSON.getDefault().getMapper().copy(); | ||
| m.registerModule(new DeltaTypeModule()); | ||
| m.addMixIn(ArrayType.class, CamelCaseArrayMixin.class); | ||
| m.addMixIn(MapType.class, CamelCaseMapMixin.class); |
There was a problem hiding this comment.
please comment to explain what this is doing.
| return m; | ||
| } | ||
|
|
||
| abstract static class CamelCaseArrayMixin { |
There was a problem hiding this comment.
class docs explaining what this is used for
and why is this abstract?
shouldnt this be private? does not any one outside this class need this?
There was a problem hiding this comment.
It is made private and with doc added.
* <p>The class is {@code abstract} and the methods abstract because Jackson never
* instantiates the mixin: it only inspects annotated method signatures and projects the
* annotations onto the target class. Making the class abstract makes that contract
* explicit and avoids a no-op constructor.
| abstract void setContainsNull(Boolean v); | ||
| } | ||
|
|
||
| abstract static class CamelCaseMapMixin { |
There was a problem hiding this comment.
i wonder if all the schema conversion logic should be a utility class of its own and thoroughly tested. otherwise these tests will be shared across multiple endpoints, etc. with gaps and all.
There was a problem hiding this comment.
not a blocker for this Pr.. but please follow up PR to make this better.
There was a problem hiding this comment.
Moved to storage/src/main/java/io/delta/storage/commit/uccommitcoordinator/DeltaSchemaConverter.java
| val m = c.loadTable(testCatalog, testSchema, testTable) | ||
| val info = c.loadTable(testIdentifier) | ||
| assert(info.getLocation === "s3://bucket/table") | ||
| assert(info.getTableId === testTableId) |
There was a problem hiding this comment.
[AI Assisted] This assertion (and the matching one at line 556 for createStagingTable) will fail at runtime now that TableInfo.getTableId / StagingTableInfo.getTableId return java.util.UUID instead of String.
testTableId is still declared as a String ("550e8400-e29b-41d4-a716-446655440000"). ScalaTest's === uses Equality[A].areEqual, which falls back to a.equals(b) — and UUID.equals(String) == false. The expression typechecks (because === is Any-shaped) but the assertion fails on a real run.
Suggested fix:
private val testTableIdStr = "550e8400-e29b-41d4-a716-446655440000"
private val testTableId = java.util.UUID.fromString(testTableIdStr)and use testTableIdStr at JSON sites ("table-uuid":"$testTableIdStr", "table-id":"$testTableIdStr", uuid assertions on the captured commit body) where a string is genuinely needed; keep testTableId (now a UUID) for the info.getTableId === ... comparisons.
Worth double-checking by running this suite locally — if the assertions are actually green for you, I'd like to understand why, because by my reading they shouldn't be.
There was a problem hiding this comment.
Changed testTableId to UUID.
| * {@link StructType}. Only primitive types are supported today; complex types throw | ||
| * {@link UnsupportedOperationException}. | ||
| */ | ||
| static StructType toSDKStructType(List<UCClient.ColumnDef> columns) { |
There was a problem hiding this comment.
Can you replace SDK*** to UC***. SDKStructType to UCStructType.
| * </ul> | ||
| * The resulting JSON is parseable by Delta's schema readers (e.g. {@code DataType.fromJson}). | ||
| */ | ||
| final class DeltaSchemaConverter { |
There was a problem hiding this comment.
can we name this UCDeltaSchemaConverter because this is specific to UC types.
can we move all the UC-specific classes to a package of its own?
There was a problem hiding this comment.
Renamed. This package name is uccommitcoordinator so it is UC-specific.
It's a little odd that this entire UCDeltaClient is no longer limited to commitcoordinator. But that's a bigger refactor for later.
| * Singleton mapper preconfigured to emit Delta's wire format | ||
| * (bare-string primitives, camelCase keys for nested types). See class-level docs. | ||
| */ | ||
| static final ObjectMapper DELTA_SCHEMA_MAPPER = createDeltaSchemaMapper(); |
| * Primitive type names that the legacy create-table path ({@link #toSDKStructType}) accepts. | ||
| * Visible to the package for tests; not part of the converter's public surface. | ||
| */ | ||
| static final Set<String> PRIMITIVE_TYPE_NAMES = Set.of( |
There was a problem hiding this comment.
do all of these methods and fields need to public.
| abstract Boolean getValueContainsNull(); | ||
| @JsonSetter("valueContainsNull") | ||
| abstract void setValueContainsNull(Boolean v); | ||
| } |
There was a problem hiding this comment.
no bidirectional test suite for this?
There was a problem hiding this comment.
Added testsuite UCDeltaSchemaConverterSuite.scala
- loadTable / createStagingTable / createTable now take TableIdentifier; createTable takes AbstractMetadata + AbstractProtocol (mirrors commit()). - TableInfo realigned with StagingTableInfo (tableId field + ordering). - New typed exceptions: CredentialFetchFailedException, NoSuchTableException, UnsupportedTableFormatException. - build.sbt: conditional unitycatalog-hadoop dep gated on UC version >= 0.5.0 via a small isAtLeastVersion helper. Signed-off-by: Yi Li <yi.li@databricks.com>
Signed-off-by: Yi Li <yi.li@databricks.com>
Signed-off-by: Yi Li <yi.li@databricks.com>
5787f44 to
3e2f64f
Compare
| kebabKeys.foreach { k => | ||
| assert(!json.contains("\"" + k + "\""), s"unexpected kebab-case key '$k'") | ||
| } | ||
| } |
There was a problem hiding this comment.
[AI Assisted] Nice converter suite — it nails the wire-format edges (bare-string primitives, camelCase keys, no kebab leakage) and the toUCStructType happy path. A few gaps worth filing as follow-ups (none blocking this PR):
Higher-value follow-ups
-
Nested struct.
Struct<...>as a field's type is the most common Delta shape after primitives, but there is zero coverage. Add at least:Struct<...>directly as a field type (field("nested", innerStruct))Array<Struct<...>>Map<string, Struct<...>>andMap<string, Array<...>>(complex value type, not just complex outer)>2-level nesting (e.g.Array<Array<Map<...>>>)
-
StructField.metadataround-trip. Delta wire format carries"metadata":{...}per field (e.g.comment,generated). The UC SDK'sStructFieldhas a metadata slot. Untested. One test that sets non-empty metadata on a field and confirms the JSON output contains it verbatim. If it silently drops, that's a real bug for any non-trivial Delta table loaded through this client. -
DataType.fromJsonround-trip. The class doc explicitly claims "parseable by Delta's schema readers (e.g.DataType.fromJson)". Current tests parse with a genericObjectMapperand check key shapes — that proves the JSON is well-formed, not that Delta's reader accepts it. Ifdelta-storagecan't depend ondelta-sparkdirectly, add the round-trip test in adelta-sparktest suite whereorg.apache.spark.sql.types.DataType.fromJsonis already on the classpath.
Lower-value follow-ups
-
Strengthen the "every primitive is accepted" smoke test (line 184-193). It only asserts
getFields.size() === 1— if all 13 primitives collapsed to the same wire string the test would still pass. Also asserts.getFields.get(0).getType.asInstanceOf[PrimitiveType].getTypematches the expected per-type wire form. -
toUCStructTypeunsupported-name coverage. OnlyARRAYis tested for rejection. Parameterize overSeq("MAP", "STRUCT", "VARIANT", "int", "Integer", "")to pin case sensitivity and the rejection contract. -
containsNull == null/valueContainsNull == null. The Jackson setters allownull. One test that pins whether the output omits the key or emits"containsNull":null— Delta's reader rejects the latter, so this matters. -
All 13 primitives in nested contexts. Today only ~5 (
string,integer,double,long,date) appear inside arrays/maps. Easy to widen the existing nested tests to cover the rest.
Happy to file these as a tracking issue if useful.
There was a problem hiding this comment.
[AI Assisted] LGTM. Blocking issue from the earlier round (UUID vs String test assertions) is fixed, and the schema-conversion code has been promoted into its own UCDeltaSchemaConverter with a dedicated UCDeltaSchemaConverterSuite that solidly covers primitives (bare-string + decimal), arrays, maps, 2-level nesting, and the no-kebab-leakage invariant.
Left a single inline comment with follow-up test gaps (nested struct, StructField.metadata, DataType.fromJson round-trip, plus a few smaller items) — none blocking this PR.
Other open items from earlier rounds that are also fine as follow-ups:
- PR description still says
createTabletakesAbstractMetadata + AbstractProtocolandcreateStagingTabletakes aTableIdentifier, but the interfaces still have the older signatures. Either update the description or the interfaces in the follow-up. UnsupportedTableFormatExceptiondetection via substring-match on the response body is brittle; ideally parse the structured error envelope.fetchTableCredentialscatchesIllegalArgumentExceptionto detect "no creds for this scheme" — prefer a positive scheme check.- Untested production paths:
CredentialFetchFailedException(cred-fetch retries exhausted),loadTablenamespace validation (non-2-componentTableIdentifier), and the 9-paramUCDeltaClient.createTable.
Thanks for the responsiveness on the addressed comments.
- Extract resolveThreePartName helper used by loadTable, commit, and getCommits, replacing three near-identical inline parses of TableIdentifier with one source of truth (per openinx review). - Change getCommits 404 from InvalidTargetTableException to NoSuchTableException, matching loadTable and the typed exception introduced in delta-io#6811. - Update the 404 test to mirror loadTable's NoSuchTableException test (asserts the qualified table name and the response body are in the error message).
🥞 Stacked PR
Use this link to review incremental changes.
Which Delta project/connector is this regarding?
Description
[Storage] Extend UCDeltaClient with table-loading ops + exceptions
How was this patch tested?
Added tests
Does this PR introduce any user-facing changes?
No