[Spark] Add UC Delta Rest Catalog API path credentials for raw path reads#6682
[Spark] Add UC Delta Rest Catalog API path credentials for raw path reads#6682TimothyW553 wants to merge 3 commits into
Conversation
9f0faa3 to
41924e3
Compare
4df9d78 to
cf85522
Compare
162948f to
c6f036a
Compare
c6f036a to
f1d45b2
Compare
49ed145 to
dbccc2b
Compare
faf76b2 to
4318b06
Compare
e858b6f to
e6355cf
Compare
| val metadata = try { | ||
| client.loadTable(catalogName, schemaName, tableName) | ||
| .asInstanceOf[TableMetadataAdapter] | ||
| } catch { | ||
| case e: IOException if isUnsupportedTableFormat(e) => | ||
| return None | ||
| case e: IOException => | ||
| throw translateLoadTableException(ident, e) | ||
| } | ||
| val location = metadata.getLocation | ||
| val locationUri = CatalogUtils.stringToURI(location) | ||
| val credentials = Option.when(isCloudScheme(locationUri.getScheme)) { | ||
| try { | ||
| // Prefer READ_WRITE so a loaded table can be used for writes without reloading | ||
| // credentials; read-only principals fall back to READ below. | ||
| client.getTableCredentials( | ||
| CredentialOperation.READ_WRITE, | ||
| catalogName, | ||
| schemaName, | ||
| tableName) | ||
| } catch { | ||
| case e: IOException if isAuthError(e) => | ||
| client.getTableCredentials( | ||
| CredentialOperation.READ, | ||
| catalogName, | ||
| schemaName, | ||
| tableName) | ||
| } | ||
| } |
There was a problem hiding this comment.
why don't we loadTable from the UCSingleCatlaog ? in this current code, all the credential renewal won't work, that's a huge risk.
| val credentials = client.getTemporaryPathCredentials( | ||
| location, | ||
| CredentialOperation.READ) | ||
| buildHadoopCredentialPropertiesForPath( | ||
| location, | ||
| getStorageCredentials(credentials), | ||
| locationScheme, | ||
| config.credentialContext) | ||
| } catch { | ||
| case _: UnsupportedOperationException => | ||
| Map.empty[String, String] | ||
| case e: IOException if isNotFound(e) => | ||
| Map.empty[String, String] | ||
| } |
There was a problem hiding this comment.
I see the reason why you add the getTemporaryPathCredentials API in the client interface now, since you need it to generate the initial temp credentials.
But after this unitycatalog/unitycatalog#1549, actually we don't need to explicitly set the initial credential any more.
So we can entirely remove the getTemporaryPathCredentials code in oss delta now.
|
|
||
| val spark = SparkSession.active | ||
|
|
||
| private lazy val deltaCatalogClient: DeltaCatalogClient = DeltaCatalogClient(delegate, spark) |
There was a problem hiding this comment.
Rather than defining a separate DeltaCatalogClient, why not just introduce a separate StagingTableCatalog, just like the UCSingleCatalog in oss-unitycatalog.
Essentially, the UCSingleCatalog is using the old and legacy client api to talk to unitycatalog. and the UCDeltaCatalog will use the new client api to talk to new catalog endpoint.
Does this make sense ?
🥞 Stacked PR
Use this link to review incremental changes.
Which Delta project/connector is this regarding?
Spark / Unity Catalog
Description
This PR adds UC Delta Rest Catalog API temporary path credentials for raw Delta path reads.
Named UC table reads already use table credentials from the previous PR. Raw path reads use a different Spark/Delta entry point, so they need
GET /delta/v1/temporary-path-credentialsinstead of named-tableloadTablecredentials.This PR adds:
UCClient.getTemporaryPathCredentials(...)andUCTokenBasedRestClientsupport for the temporary path credentials endpoint.DeltaCatalogClient.pathCredentialOptions(...)to fetch path-scoped credentials for cloud paths using the UC Delta Rest Catalog API-enabled default catalog.DeltaTableV2wiring so raw path reads receive those credential options before touching storage.loadTable.DeltaTable.forPathwith UC Delta Rest Catalog API path credentials.If no UC Delta Rest Catalog API-enabled default catalog is configured, or the path is not a supported cloud path, this PR returns no extra credential options and keeps the existing behavior.
How was this patch tested?
Covered by
UCTokenBasedRestClientSuitefor temporary path credentials and unsupported endpoint behavior.Covered by
DeltaCatalogClientSuitefor path credential options and path identifier skip behavior.Covered by
ServerSidePlannedTableSuitefor the disabled server-side planning guard.Covered by the Spark UC integration tests.
Local verification:
Does this PR introduce any user-facing changes?
No released user-facing change. This only applies when UC Delta Rest Catalog API is enabled for the default catalog used to authorize raw path access.