[Spark] Wire UC Delta Rest Catalog API loadTable into Delta catalog#6660
[Spark] Wire UC Delta Rest Catalog API loadTable into Delta catalog#6660TimothyW553 wants to merge 2 commits into
Conversation
3ff4533 to
a215f41
Compare
Range-diff: stack/drc-loadtable-storage (a215f41 -> 018519b)
Reproduce locally: |
018519b to
900cb31
Compare
Range-diff: stack/drc-loadtable-storage (018519b -> 900cb31)
Reproduce locally: |
900cb31 to
b2a7aff
Compare
Range-diff: stack/drc-loadtable-storage (900cb31 -> b2a7aff)
Reproduce locally: |
Range-diff: stack/drc-loadtable-storage (b2a7aff -> 62b4288)
Reproduce locally: |
a2b0fa0 to
b91d3d0
Compare
Range-diff: stack/drc-loadtable-storage (a2b0fa0 -> b91d3d0)
Reproduce locally: |
e43d057 to
2dda66c
Compare
4f79913 to
2c0be33
Compare
| } | ||
| Some(V1Table(toCatalogTable(ident, metadata, locationUri, credentials))) | ||
| case _ => | ||
| None |
There was a problem hiding this comment.
We need to gracefully handle non-Delta passthrough even in Delta code I think. See AbstractDeltaCatalog.loadTable:
val table = super.loadTable(ident)
table match {
case v1: V1Table if DeltaTableUtils.isDeltaTable(v1.catalogTable) =>
loadCatalogTable(ident, v1.catalogTable)
case o => o
}
@tdas please correct me if I am wrong. This also means UC should provide limited support in loadTable to allow the non-Delta tables to be loaded via DRC.
| s"Unsupported Delta REST table type for " + | ||
| s"$catalogName.${ident.namespace().mkString(".")}.${ident.name()}: $other") | ||
| }, | ||
| storage = CatalogStorageFormat( |
There was a problem hiding this comment.
in UCSingleCatalog this is: CatalogStorageFormat.empty.copy(
| compressed = false, | ||
| properties = toStorageProperties(metadata, credentials, locationUri.getScheme)), | ||
| schema = DeltaRestSchemaConverter.toSparkType(metadata.getColumns), | ||
| provider = Some(DeltaSourceUtils.ALT_NAME), |
There was a problem hiding this comment.
should pick up from metadata
There was a problem hiding this comment.
done, picked up from metadata and matched UCSingleCatalog:
provider = Some(metadata.getDataSourceFormat.getValue.toLowerCase(Locale.ROOT))
| partitionColumnNames = Option(metadata.getPartitionColumns) | ||
| .map(_.asScala.toSeq) | ||
| .getOrElse(Nil), | ||
| properties = Map.empty) |
There was a problem hiding this comment.
properties = Map.empty is unnecessary.
| // features here so the Delta load path receives them with the same option.* shape as other | ||
| // storage-level UC properties, while CatalogTable.properties stays reserved for Spark metadata. | ||
| Option(metadata.getProperties).map(_.asScala.toMap).getOrElse(Map.empty) ++ | ||
| Map(UC_TABLE_ID_KEY -> metadata.getTableUuid.toString) ++ |
There was a problem hiding this comment.
For managed delta tables, UC_TABLE_ID_KEY should be in metadata.getProperties already. If not, it means it's an external table and doesn't need it anyway.
There was a problem hiding this comment.
got it. removed this to stop synthesizing UC_TABLE_ID_KEY
| normalizedLocation.charAt(normalizedPrefix.length) == '/')) | ||
| } | ||
|
|
||
| private def storageCredentialToProperties( |
There was a problem hiding this comment.
This function mirrors CredPropsUtil.createTableCredProps from UC-Spark which happens to be public. Should it just call CredPropsUtil.createTableCredProps instead?
Also @tdas please take a look.
There was a problem hiding this comment.
should we depend on UC-Spark? I was thinking of refactoring CredPropsUtil.createTableCredProps and other cloud-credential utils to uc-client
2c0be33 to
041d54a
Compare
| case _ => false | ||
| } | ||
|
|
||
| private def isNumeric(value: Any): Boolean = isIntegral(value) || (value match { |
There was a problem hiding this comment.
Just make it isFloatOrDouble and remove the call to isIntegral
|
|
||
| val spark = SparkSession.active | ||
|
|
||
| private lazy val deltaCatalogClient: DeltaCatalogClient = DeltaCatalogClient(delegate, spark) |
There was a problem hiding this comment.
Can you enable this in UC integration test in UnityCatalogSupport already?
There was a problem hiding this comment.
sure, yes. i've just been enabling it via compiler flag. updated.
c08f5a4 to
cabd7ac
Compare
| "For managed tables, path-based access should fail"); | ||
| } else { | ||
| // For EXTERNAL tables, path-based access should work | ||
| // TODO: Enable remote external path reads after Delta wires DRC |
There was a problem hiding this comment.
the next PR removes this as this PR doesn't handle path-based tables.
| UCTokenBasedRestClientFactory | ||
| } | ||
|
|
||
| private class DeltaCatalogClient private ( |
There was a problem hiding this comment.
class docs based on offline discussion. we need to have clear idea of the difference between this interface and UCClient and UCDeltaClient
There was a problem hiding this comment.
added class docs clarifying DeltaCatalogClient
🥞 Stacked PR
Use this link to review incremental changes.
Which Delta project/connector is this regarding?
Spark / Unity Catalog
Description
This PR wires named UC Delta table
loadTablein the Spark Delta catalog through the storage client added in the previous PR.Flow:
This PR adds:
DeltaCatalogClientfor Spark-side UC Delta Rest Catalog API configuration and named-table loading.StructTypeto SparkStructTypeconversion at the Spark boundary.READ_WRITEfirst andREADfallback for read-only principals.This is named-table
loadTableonly. Raw path-based Delta access does not have a table identifier and is handled by the next stacked PR using temporary path credentials.How was this patch tested?
Covered by
DeltaCatalogClientSuitefor named-table loading, credential handling, local path behavior, server error propagation, disabled config behavior, and unsupported API failures.Covered by
UCDeltaRestCatalogApiSchemaConverterSuitefor Kernel schema to Spark schema conversion.Covered by
UCDeltaTableReadTestin the Spark UC integration tests.Local verification:
./build/sbt "spark/testOnly org.apache.spark.sql.delta.catalog.UCDeltaRestCatalogApiSchemaConverterSuite org.apache.spark.sql.delta.catalog.DeltaCatalogClientSuite"Does this PR introduce any user-facing changes?
No released user-facing change. The new path is controlled by
spark.sql.catalog.<catalog>.deltaRestApi.enabled; when that config is disabled, Delta stays on the existing legacy path.