[Storage] Expose Kernel schema in metadata API#6716
Conversation
b860716 to
0bd6839
Compare
| } | ||
|
|
||
| @Override | ||
| public StructType getSchema() { |
There was a problem hiding this comment.
Do you plan to use this io.delta.kernel.unitycatalog.adapters.MetadataAdapter in Delta Storage and Spark?
Can we do that?
There was a problem hiding this comment.
no. and im pretty sure we cant do that either.
- kernel-unitycatalog depends on storage. so if we let storage use this adapter thats a circular dependency
- spark holds a regular Metadata object - but the constructor for MetadataAdapter takes in kernel metadata -
public MetadataAdapter(Metadata kernelMetadata). of course we can convert Spark Metadata to Kernel Metadata but that seems unnecessary
| /** | ||
| * The table schema as a Delta Kernel type. | ||
| */ | ||
| StructType getSchema(); |
There was a problem hiding this comment.
This would force org.apache.spark.sql.delta.actions.Metadata to import Kernel StructType too. But can it (a non-v2 class) do that?
There was a problem hiding this comment.
I changed getSchema() to a default method, so normal AbstractMetadata implementers are not forced to import Kernel StructType. Spark Metadata still overrides it because this PR intentionally exposes Kernel schema at the storage boundary, and Delta-Spark already depends on storage
| exportJars := true, | ||
| javaOnlyReleaseSettings, | ||
|
|
||
| // Use the shaded kernel-api jar. A direct project dependency puts kernel-api's unshaded |
There was a problem hiding this comment.
I don't know why Kernel API can't be dependency without shading.
There was a problem hiding this comment.
yeah i tried just doing storage.dependsOn(kernelApi) but it ran into problems. here's the story:
- Kernel source code uses normal Jackson names:
com.fasterxml.jackson.databind.ObjectNode- When Kernel builds its published jar, Delta renames Jackson packages inside that jar:
com.fasterxml.jackson...becomes:
io.delta.kernel.shaded.com.fasterxml.jackson...this way Kernel does not fight with Spark/Hadoop/other libraries that may use different Jackson versions.
- I first wanted the simple build change:
storage.dependsOn(kernelApi)- But in sbt,
.dependsOn(kernelApi)does not use Kernel’s final shaded jar. It uses Kernel’s raw compiled class directory.
That raw class directory still references:
com.fasterxml.jackson...- Other modules/tests were using the final shaded Kernel jar, which references:
io.delta.kernel.shaded.com.fasterxml.jackson...- Then runtime had two different ObjectNode classes:
com.fasterxml.jackson.databind.node.ObjectNode
io.delta.kernel.shaded.com.fasterxml.jackson.databind.node.ObjectNodeJava sees those as totally different classes. So this failed:
com.fasterxml.jackson.databind.node.ObjectNode cannot be cast to
io.delta.kernel.shaded.com.fasterxml.jackson.databind.node.ObjectNodeFix:
Instead of:
storage.dependsOn(kernelApi)we make storage compile against Kernel’s final shaded jar:
Compile / unmanagedJars += (kernelApi / Compile / packageBin).valueAnd we still add delta-kernel-api to the published POM, so external users get the normal dependency.
0bd6839 to
9684101
Compare
🥞 Stacked PR
Use this link to review incremental changes.
Which Delta project/connector is this regarding?
Storage / Kernel
Description
This PR extends the storage metadata abstraction so Delta code can access table schema as a Kernel
StructType, not only as schema JSON.Without this, each caller that needs typed schema has to parse
getSchemaString()itself. That makes UC Delta Rest Catalog API integration harder because the typed schema would be converted in multiple places.This PR adds:
storageaccess to Kernel API types.AbstractMetadata.getSchema()method returning KernelStructType.Metadata.getSchemaparses schema JSON through Kernel serde.AbstractMetadata.getSchemaString()remains for existing callers.How was this patch tested?
Local verification:
Downstream stack verification:
Does this PR introduce any user-facing changes?
No.