Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -1163,6 +1163,37 @@ lazy val storage = (project in file("storage"))
commonSettings,
exportJars := true,
javaOnlyReleaseSettings,

// Use the shaded kernel-api jar. A direct project dependency puts kernel-api's unshaded
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why Kernel API can't be dependency without shading.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i tried just doing storage.dependsOn(kernelApi) but it ran into problems. here's the story:

  1. Kernel source code uses normal Jackson names:
com.fasterxml.jackson.databind.ObjectNode
  1. When Kernel builds its published jar, Delta renames Jackson packages inside that jar:
com.fasterxml.jackson...

becomes:

io.delta.kernel.shaded.com.fasterxml.jackson...

this way Kernel does not fight with Spark/Hadoop/other libraries that may use different Jackson versions.

  1. I first wanted the simple build change:
storage.dependsOn(kernelApi)
  1. But in sbt, .dependsOn(kernelApi) does not use Kernel’s final shaded jar. It uses Kernel’s raw compiled class directory.

That raw class directory still references:

com.fasterxml.jackson...
  1. Other modules/tests were using the final shaded Kernel jar, which references:
io.delta.kernel.shaded.com.fasterxml.jackson...
  1. Then runtime had two different ObjectNode classes:
com.fasterxml.jackson.databind.node.ObjectNode
io.delta.kernel.shaded.com.fasterxml.jackson.databind.node.ObjectNode

Java sees those as totally different classes. So this failed:

  com.fasterxml.jackson.databind.node.ObjectNode cannot be cast to
  io.delta.kernel.shaded.com.fasterxml.jackson.databind.node.ObjectNode

Fix:

Instead of:

storage.dependsOn(kernelApi)

we make storage compile against Kernel’s final shaded jar:

Compile / unmanagedJars += (kernelApi / Compile / packageBin).value

And we still add delta-kernel-api to the published POM, so external users get the normal dependency.

// class directory on downstream classpaths, which conflicts with Kernel's shaded Jackson API.
Compile / unmanagedJars += (kernelApi / Compile / packageBin).value,
Test / unmanagedJars += (kernelApi / Compile / packageBin).value,
Compile / compile := (Compile / compile).dependsOn(kernelApi / Compile / packageBin).value,
Test / test := (Test / test).dependsOn(kernelApi / Compile / packageBin).value,

// delta-storage exposes Kernel types in its public API, so the published POM must still
// declare delta-kernel-api even though local compilation uses the shaded jar above.
pomPostProcess := { node =>
val ver = version.value
import scala.xml._
import scala.xml.transform._

val kernelApiDependency =
<dependency>
<groupId>io.delta</groupId>
<artifactId>delta-kernel-api</artifactId>
<version>{ver}</version>
</dependency>

new RuleTransformer(new RewriteRule {
override def transform(n: Node): Seq[Node] = n match {
case e: Elem if e.label == "dependencies" =>
Seq(e.copy(child = e.child ++ kernelApiDependency))
case _ => Seq(n)
}
}).transform(node).head
},

libraryDependencies ++= Seq(
// User can provide any 2.x or 3.x version. We don't use any new fancy APIs. Watch out for
// versions with known vulnerabilities.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@

import io.delta.kernel.internal.actions.Metadata;
import io.delta.kernel.internal.util.VectorUtils;
import io.delta.kernel.types.StructType;
import io.delta.storage.commit.actions.AbstractMetadata;
import java.util.*;

Expand Down Expand Up @@ -58,6 +59,11 @@ public Map<String, String> getFormatOptions() {
return Collections.unmodifiableMap(kernelMetadata.getFormat().getOptions());
}

@Override
public StructType getSchema() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to use this io.delta.kernel.unitycatalog.adapters.MetadataAdapter in Delta Storage and Spark?
Can we do that?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no. and im pretty sure we cant do that either.

  1. kernel-unitycatalog depends on storage. so if we let storage use this adapter thats a circular dependency
  2. spark holds a regular Metadata object - but the constructor for MetadataAdapter takes in kernel metadata - public MetadataAdapter(Metadata kernelMetadata). of course we can convert Spark Metadata to Kernel Metadata but that seems unnecessary

return kernelMetadata.getSchema();
}

@Override
public String getSchemaString() {
return kernelMetadata.getSchemaString();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ class ActionAdaptersSuite extends AnyFunSuite {
assert(adapter.getDescription === "description")
assert(adapter.getProvider === "parquet")
assert(adapter.getFormatOptions.asScala == Map("foo" -> "bar"))
assert(adapter.getSchema === kernelMetadata.getSchema)
assert(adapter.getSchemaString === "schemaStringJson")
assert(adapter.getPartitionColumns.asScala == Seq("part1"))
assert(adapter.getConfiguration.asScala == Map("zip" -> "zap"))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1251,6 +1251,12 @@ case class Metadata(
.map(DataType.fromJson(_).asInstanceOf[StructType])
.getOrElse(StructType.apply(Nil))

/** Returns the schema as a Kernel [[io.delta.kernel.types.StructType]] */
@JsonIgnore
private lazy val kernelSchema: io.delta.kernel.types.StructType = Option(schemaString)
.map(io.delta.kernel.internal.types.DataTypeJsonSerDe.deserializeStructType)
.orNull

/** Returns the partitionSchema as a [[StructType]] */
@JsonIgnore
lazy val partitionSchema: StructType =
Expand Down Expand Up @@ -1312,6 +1318,9 @@ case class Metadata(
@JsonIgnore
override def getFormatOptions: java.util.Map[String, String] = format.options.asJava

@JsonIgnore
override def getSchema: io.delta.kernel.types.StructType = kernelSchema

override def getSchemaString: String = schemaString

override def getPartitionColumns: java.util.List[String] = partitionColumns.asJava
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,19 @@ class ActionSerializerSuite extends QueryTest with SharedSparkSession with Delta
new StructType().json,
Seq("a")))

test("Metadata getSchema parses schemaString as Kernel schema") {
val schemaString = new StructType()
.add("id", "long")
.add("nested", new StructType().add("name", "string"))
.json
val metadata = Metadata(schemaString = schemaString)
val expected =
io.delta.kernel.internal.types.DataTypeJsonSerDe.deserializeStructType(schemaString)

assert(metadata.getSchema === expected)
assert(Metadata().getSchema === null)
}

test("extra fields") {
// TODO reading from checkpoint
Action.fromJson("""{"txn": {"test": 1}}""")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,11 @@

package io.delta.storage.commit.actions;

import java.util.Map;
import java.util.List;
import java.util.Map;

import io.delta.kernel.internal.types.DataTypeJsonSerDe;
import io.delta.kernel.types.StructType;

/**
* Interface for metadata actions in Delta. The metadata defines the metadata
Expand Down Expand Up @@ -46,6 +49,17 @@ public interface AbstractMetadata {
/** The format options */
Map<String, String> getFormatOptions();

/**
* The table schema as a Delta Kernel type.
*
* <p>The default implementation parses {@link #getSchemaString()}; if {@code getSchemaString()}
* returns {@code null}, this method returns {@code null}.
*/
default StructType getSchema() {
String schemaString = getSchemaString();
return schemaString == null ? null : DataTypeJsonSerDe.deserializeStructType(schemaString);
}

/**
* The table schema in string representation.
*/
Expand Down
Loading