Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions PROTOCOL.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@ This example represents a table after [metadata cleanup](#metadata-cleanup) has
Data files can be stored in the root directory of the table or in any non-hidden subdirectory (i.e., one whose name does not start with an `_`).
By default, the reference implementation stores data files in directories that are named based on the partition values for data in that file (i.e. `part1=value1/part2=value2/...`).
This directory format is only used to follow existing conventions and is not required by the protocol.
Actual partition values for a file must be read from the transaction log.
Actual partition values for a file must be read from the transaction log, even if they are present in the data file itself.

### Deletion Vector Files
Deletion Vector (DV) files are stored in the root directory of the table alongside the data files. A DV file contains one or more serialised DV, each describing the set of *invalidated* (or "soft deleted") rows for a particular data file it is associated with.
Expand Down Expand Up @@ -2443,6 +2443,7 @@ Property | Description | Details
-|-|-
`delta.parquet.compression.codec` | Compression codec writers SHOULD use for new Parquet data and checkpoint files. Changing this property does not affect existing files; a table may contain files written with different codecs, which is a normal and expected state. | Widely supported values (matched case-insensitively): `uncompressed`/`none` (no compression), `snappy`, `gzip`, `lz4` (deprecated, Hadoop framing), `lz4_raw` ([LZ4 block format](https://parquet.apache.org/docs/file-format/data-pages/compression/#lz4_raw)), `zstd`.<br><br>When absent, writers SHOULD default to `zstd`. If a writer does not support or recognize the specified codec, it SHOULD abort with an appropriate error or fall back to a default codec.<br><br>Readers SHOULD support all codecs listed above regardless of the current property value. Parquet files written with other [parquet-supported codecs](https://parquet.apache.org/docs/file-format/data-pages/compression/) may also exist; readers MAY support reading these files.
`delta.parquet.format.version` | Parquet data page format writers SHOULD use for new data files. This property is a directive to writers only; readers do not need to consult it, as Parquet pages are self-describing via the `PageType` field in each page header. Changing this property does not affect existing files; a table MAY contain files written with different data page versions, which is a normal and expected state. | Valid values: `1.0.0` (DataPageV1) and `2.x.x` (DataPageV2, where `x.x` is any minor.patch version). Recommended values are `1.0.0` and `2.12.0`.<br><br>When absent, writers SHOULD default to `1.0.0`. Writers SHOULD validate this property and abort if the value does not match `1.0.0` or `2.MINOR.PATCH`.<br><br>Readers SHOULD support both DataPageV1 and DataPageV2 pages regardless of this property's value. Tables intended for access by engines beyond the Delta Lake connectors SHOULD use `1.0.0`, as DataPageV2 support varies across the broader Parquet ecosystem.
`delta.writePartitionColumnsToParquet` | Controls whether writers SHOULD write partition columns in newly written data parquet files, in the absence of any writer features that necessitate writing of partition columns (eg. `IcebergCompatV1`). In other words, if no writer feature is enabled that requires materialization of partition columns, writers should read this property to decide whether to materialize partition columns in data parquet files or not. Writer features requirements take precedence over this property's value. Readers should continue to read partition values off of AddFile actions, regardless of the presence of partition values in data files. File-level statistics should not be present for partition columns in partitioned tables in any case. This setting does not apply to writers of files of any other file format. | Boolean field, with valid values `false` and `true`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1/ I don't see a section which explains how this relates to https://github.com/delta-io/delta/blob/master/protocol_rfcs/materialize-partition-columns.md

2/ I think this needs to be structured closer to the section which talks about serializing partition values or in a separate section with the serialization section linking to the new section.

Its hard to read this and understand what the algorithm/ behavior would be on the Kernel/ connection side? Maybe have a truth table?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does reference writer features that govern materialization of partition values, which are multiple (IcebergCompatV1, IcebergCompatV2, IcebergCompatV3, IcebergWriterCompatV1, IcebergWriterCompatV3, materializePartitionColumns), and the writer features are treated as one set that are prioritized over this property. I don't see why we should talk about one writer feature in isolation.

2/ I think this needs to be structured closer to the section which talks about serializing partition values or in a separate section with the serialization section linking to the new section.

The protocol doesn't spell out materialization of partition values at all anywhere else, with the exception of the above-listed writer features. We could do it if we want but we would then need to decide if there's a default stance the writers should take.

Its hard to read this and understand what the algorithm/ behavior would be on the Kernel/ connection side? Maybe have a truth table?

For this property, should I just reword it to "this property only governs materialization of partition columns if no writer feature requiring materialization of partition columns is enabled on the table" - that's the same point as the existing text with fewer words, and still leaves enough ambiguity for either default true / default false.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1/ We might not need to talk about one feature in isolation. But materialization of partition values is an entire functional area. I think we need a top level section covering all features for which we materialize partition values in addition to https://github.com/sanujbasu/delta/blob/e4c1fa465d51a02355b78366151818ffaee342a7/PROTOCOL.md#partition-value-serialization.

2/ "this property only governs materialization of partition columns if no writer feature requiring materialization of partition columns is enabled on the table" is good phrasing but it hard to understand why "materializePartitionColumns" alone doesnt address that use case since its enabled by default?
Also there is no mention of materializePartitionColumns in the protocol.md? Everyone ive discussed this with is confused about why we need the toggle if we already have a feature? If the toggle is to add the materializePartitionColumns writer feature to the list, why not just add the feature to the vector?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1/ We might not need to talk about one feature in isolation. But materialization of partition values is an entire functional area. I think we need a top level section covering all features for which we materialize partition values in addition to https://github.com/sanujbasu/delta/blob/e4c1fa465d51a02355b78366151818ffaee342a7/PROTOCOL.md#partition-value-serialization.

That one is about how they're serialized in AddFile though. In parquet we don't always use the string type, it's just the same data type that's used if it were any other column.

I've made some references to the optionality of the presence of partition values in data files in this PR now, let me know how that sounds. I'm wary of writing up the whole decision tree in protocol because not all writers will be expected to support iceberg compat, etc., and the table property spec makes it pretty clear that the toggle is best-effort. Really the only thing the protocol should enforce is "readers should understand both states, writers are sometimes required to write partition values".

2/ "this property only governs materialization of partition columns if no writer feature requiring materialization of partition columns is enabled on the table" is good phrasing but it hard to understand why "materializePartitionColumns" alone doesnt address that use case since its enabled by default?
Also there is no mention of materializePartitionColumns in the protocol.md? Everyone ive discussed this with is confused about why we need the toggle if we already have a feature? If the toggle is to add the materializePartitionColumns writer feature to the list, why not just add the feature to the vector?

materializePartitionColumns is not in the protocol because it's just an RFC that hasn't been merged yet. (funny enough, we did merge it into the protocol... and then reverted it because it hadn't baked as an RFC for long enough yet)

Everyone ive discussed this with is confused about why we need the toggle if we already have a feature?

Features block writes if they're present in a protocol and a writer doesn't understand them. We want the ability to materialize partition columns without blocking writers, especially since this is a new writer feature and we don't want to block writers all the time but rather we just want a best-effort way to require writers to materialize partition values.

If you're suggesting that we go ahead and default to writing partition values all the time, I'm supportive of a change like that. I just think that in that reality we would still need the ability to configure that behaviour, and for that we'd need a table property.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on an offline conversation, we decided to move this change to the materialize partition columns RFC, along with a detailed description of when and when not to materialize partition columns. Opened a separate PR for that: #6738


# Appendix

Expand Down Expand Up @@ -2573,7 +2574,7 @@ maxValues | A value that is equal to the largest valid value[^1] present in the

## Partition Value Serialization

Partition values are stored as strings, using the following formats. An empty string for any type translates to a `null` partition value.
Partition values are stored as strings in actions, using the following formats. An empty string for any type translates to a `null` partition value.

Type | Serialization Format
-|-
Expand All @@ -2591,6 +2592,8 @@ Note: A timestamp value in a partition value may be stored in one of the followi

It is highly recommended that modern writers adjust the timestamp to UTC and store the timestamp in ISO8601 format as outlined in 2.

In addition to being stored in actions, partition values could also be present in data files. Some writer features (eg. [Iceberg Compatibility V1](#iceberg-compatibility-v1)) require the presence of partition values in data files.

## Schema Serialization Format

Delta uses a subset of Spark SQL's JSON Schema representation to record the schema of a table in the transaction log.
Expand Down