Skip to content

[Storage] Expose Kernel schema in metadata API#6716

Closed
TimothyW553 wants to merge 1 commit into
delta-io:masterfrom
TimothyW553:stack/drc-storage-kernel-schema
Closed

[Storage] Expose Kernel schema in metadata API#6716
TimothyW553 wants to merge 1 commit into
delta-io:masterfrom
TimothyW553:stack/drc-storage-kernel-schema

Conversation

@TimothyW553
Copy link
Copy Markdown
Collaborator

@TimothyW553 TimothyW553 commented May 4, 2026

🥞 Stacked PR

Use this link to review incremental changes.


Which Delta project/connector is this regarding?

Storage / Kernel

Description

This PR extends the storage metadata abstraction so Delta code can access table schema as a Kernel StructType, not only as schema JSON.

Without this, each caller that needs typed schema has to parse getSchemaString() itself. That makes UC Delta Rest Catalog API integration harder because the typed schema would be converted in multiple places.

This PR adds:

  • storage access to Kernel API types.
  • A default AbstractMetadata.getSchema() method returning Kernel StructType.
  • Implementations for Spark metadata and Kernel metadata adapters.
  • A Spark-side test that verifies Metadata.getSchema parses schema JSON through Kernel serde.

AbstractMetadata.getSchemaString() remains for existing callers.

How was this patch tested?

Local verification:

./build/sbt "storage/compile" "storage/testOnly io.delta.storage.commit.uccommitcoordinator.UCTokenBasedRestClientSuite" "spark/testOnly org.apache.spark.sql.delta.ActionSerializerSuite"
./build/sbt scalafmtAll javafmtAll
git diff --check

Downstream stack verification:

./build/sbt "storage/compile" "storage/testOnly io.delta.storage.commit.uccommitcoordinator.UCTokenBasedRestClientSuite"
./build/sbt "spark/testOnly org.apache.spark.sql.delta.catalog.UCDeltaRestCatalogApiSchemaConverterSuite org.apache.spark.sql.delta.catalog.DeltaCatalogClientSuite"
./build/sbt "sparkUnityCatalog/testOnly io.sparkuctest.UCDeltaTableReadTest"

Does this PR introduce any user-facing changes?

No.

}

@Override
public StructType getSchema() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to use this io.delta.kernel.unitycatalog.adapters.MetadataAdapter in Delta Storage and Spark?
Can we do that?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no. and im pretty sure we cant do that either.

  1. kernel-unitycatalog depends on storage. so if we let storage use this adapter thats a circular dependency
  2. spark holds a regular Metadata object - but the constructor for MetadataAdapter takes in kernel metadata - public MetadataAdapter(Metadata kernelMetadata). of course we can convert Spark Metadata to Kernel Metadata but that seems unnecessary

/**
* The table schema as a Delta Kernel type.
*/
StructType getSchema();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would force org.apache.spark.sql.delta.actions.Metadata to import Kernel StructType too. But can it (a non-v2 class) do that?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed getSchema() to a default method, so normal AbstractMetadata implementers are not forced to import Kernel StructType. Spark Metadata still overrides it because this PR intentionally exposes Kernel schema at the storage boundary, and Delta-Spark already depends on storage

Comment thread build.sbt
exportJars := true,
javaOnlyReleaseSettings,

// Use the shaded kernel-api jar. A direct project dependency puts kernel-api's unshaded
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why Kernel API can't be dependency without shading.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i tried just doing storage.dependsOn(kernelApi) but it ran into problems. here's the story:

  1. Kernel source code uses normal Jackson names:
com.fasterxml.jackson.databind.ObjectNode
  1. When Kernel builds its published jar, Delta renames Jackson packages inside that jar:
com.fasterxml.jackson...

becomes:

io.delta.kernel.shaded.com.fasterxml.jackson...

this way Kernel does not fight with Spark/Hadoop/other libraries that may use different Jackson versions.

  1. I first wanted the simple build change:
storage.dependsOn(kernelApi)
  1. But in sbt, .dependsOn(kernelApi) does not use Kernel’s final shaded jar. It uses Kernel’s raw compiled class directory.

That raw class directory still references:

com.fasterxml.jackson...
  1. Other modules/tests were using the final shaded Kernel jar, which references:
io.delta.kernel.shaded.com.fasterxml.jackson...
  1. Then runtime had two different ObjectNode classes:
com.fasterxml.jackson.databind.node.ObjectNode
io.delta.kernel.shaded.com.fasterxml.jackson.databind.node.ObjectNode

Java sees those as totally different classes. So this failed:

  com.fasterxml.jackson.databind.node.ObjectNode cannot be cast to
  io.delta.kernel.shaded.com.fasterxml.jackson.databind.node.ObjectNode

Fix:

Instead of:

storage.dependsOn(kernelApi)

we make storage compile against Kernel’s final shaded jar:

Compile / unmanagedJars += (kernelApi / Compile / packageBin).value

And we still add delta-kernel-api to the published POM, so external users get the normal dependency.

@TimothyW553 TimothyW553 force-pushed the stack/drc-storage-kernel-schema branch from 0bd6839 to 9684101 Compare May 4, 2026 19:02
@TimothyW553 TimothyW553 requested a review from yili-db May 4, 2026 19:51
@TimothyW553 TimothyW553 closed this May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants