Skip to content

[KERNEL] Kernel fix clustering and add tests#6749

Open
milastdbx wants to merge 5 commits into
delta-io:masterfrom
milastdbx:kerneltests
Open

[KERNEL] Kernel fix clustering and add tests#6749
milastdbx wants to merge 5 commits into
delta-io:masterfrom
milastdbx:kerneltests

Conversation

@milastdbx
Copy link
Copy Markdown
Collaborator

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

In this PR we are adding additional testing for geospatial types for kernel:

  • Checkpointing
  • Clustering
  • Type widening / schema evolution
  • Partitoning

We've also fixed issue where clustering was accidentaly passing.

How was this patch tested?

New tests

Does this PR introduce any user-facing changes?

Forbids clustering on geo columns

val before = table.getLatestSnapshot(engine).asInstanceOf[SnapshotImpl]
val featuresBefore = before.getProtocol.getImplicitlyAndExplicitlySupportedFeatures
assert(
!featuresBefore.contains(io.delta.kernel.internal.tablefeatures.TableFeatures
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several fully-qualified references in tests (e.g. io.delta.kernel.internal.tablefeatures.TableFeatures.GEOSPATIAL_RW_FEATURE and io.delta.kernel.internal.TableConfig.CHECKPOINT_POLICY). To avoid repeating inline several times, could we add the corresponding imports and tighten readability a bit?

Copy link
Copy Markdown
Collaborator

@uros-db uros-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new clustering check inspects top-level columns, but how about clustering on a nested geo field? Let's confirm that and perhaps consider adding appropriate tests

Copy link
Copy Markdown
Collaborator

@uros-db uros-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments! @scottsand-db please review

Copy link
Copy Markdown

@thinh2 thinh2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with some comments. Thank you!

Comment on lines +2008 to +2009
assert(after.getProtocol.getMinReaderVersion >= 3)
assert(after.getProtocol.getMinWriterVersion >= 7)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(after.getProtocol.getMinReaderVersion >= 3)
assert(after.getProtocol.getMinWriterVersion >= 7)
assert(after.getProtocol.getMinReaderVersion == 3)
assert(after.getProtocol.getMinWriterVersion == 7)

.fromMetadata(getMetadata(engine, tablePath))
}

// Adding a geo column auto-enables GEOSPATIAL_RW_FEATURE and bumps protocol to (3, 7).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add test cover the case where schema evolution reject the GeometryType due to malformed GeographyType, e.g GeographyType("XyZ", "vincenty") ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants