Skip to content

[PROTOCOL] Add delta.parquet.compression.codec property to protocol#6324

Merged
tdas merged 18 commits into
delta-io:masterfrom
emkornfield:rfc_for_compression_setting
Apr 8, 2026
Merged

[PROTOCOL] Add delta.parquet.compression.codec property to protocol#6324
tdas merged 18 commits into
delta-io:masterfrom
emkornfield:rfc_for_compression_setting

Conversation

@emkornfield
Copy link
Copy Markdown
Collaborator

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (protocol)

Description

Add a proposal RFC to document parquet compression.

How was this patch tested?

N/A

Does this PR introduce any user-facing changes?

No

Comment thread protocol_rfcs/parquet-compression-codec.md Outdated
@emkornfield emkornfield requested a review from scovich March 19, 2026 22:07
@emkornfield emkornfield marked this pull request as draft March 25, 2026 22:43
@emkornfield
Copy link
Copy Markdown
Collaborator Author

Offline feedback around proposal:

  • Create a new table properties section
  • Try to make the RFC more concise under a single header.

@emkornfield emkornfield marked this pull request as ready for review March 26, 2026 22:09
Comment thread protocol_rfcs/parquet-compression-codec.md Outdated
@@ -0,0 +1,38 @@
# Parquet Compression Codec
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @emkornfield ! I think for something like this you can just make a PR directly against PROTOCOL.md.

WDYT @tdas ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think so too. if its not a breaking change, just fully backward compatible improvements, then we can just add it to the protocol directly.


Specifies the compression codec writers SHOULD use when writing new Parquet data and checkpoint files. Changing this property does not affect existing files; a table may contain files written with different codecs, which is a normal and expected state.

Supported values (matched case-insensitively):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we clarify that this is a best-effort list? We don't definitively state the exhaustive list of supported values?

In other words: is it VALID for a Delta table to have a DIFFERENT value?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is covered below on writer requirements?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat?

  • here we state a list of "supported values"
  • below it states "If a writer does not support the specified codec, it SHOULD abort with an appropriate error or fall back to a default codec."
  • that doesn't answer the question of: is it perfictly fine to use "foofoobarbar" as a codec value? there is "what a writer supports" and there is "is there the concept of an unsupported value"

Is there a simple sentence or clause we can add that clears this up? removes the ambiguity?

Comment thread protocol_rfcs/parquet-compression-codec.md Outdated
Comment thread protocol_rfcs/parquet-compression-codec.md Outdated
Comment thread protocol_rfcs/parquet-compression-codec.md Outdated
Comment thread protocol_rfcs/parquet-compression-codec.md Outdated
Comment thread protocol_rfcs/parquet-compression-codec.md Outdated
Comment thread protocol_rfcs/parquet-compression-codec.md Outdated
Co-authored-by: emkornfield <emkornfield@gmail.com>
Comment thread protocol_rfcs/parquet-compression-codec.md Outdated

When the property is absent, writers SHOULD default to `zstd`. If a writer does not support or recognize the specified codec, it SHOULD abort with an appropriate error or fall back to a default codec.

Readers SHOULD be able to read parquet files compressed with any of the supported codecs, regardless of the current table property value. In some cases parquet files might have been written codecs that [parquet supports](https://parquet.apache.org/docs/file-format/data-pages/compression/) that are not in the list above, readers MAY support reading these files.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

been written *with codecs that ...

Comment thread protocol_rfcs/parquet-compression-codec.md Outdated
Comment thread protocol_rfcs/parquet-compression-codec.md Outdated
Comment thread protocol_rfcs/parquet-compression-codec.md Outdated
Comment thread protocol_rfcs/parquet-compression-codec.md Outdated
Comment thread protocol_rfcs/parquet-compression-codec.md Outdated
Co-authored-by: emkornfield <emkornfield@gmail.com>

## Property Details

### delta.parquet.compression.codec
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is a title. each property will be a title?
shouldnt we be make this into a table. most projects i know defines properties as a table
https://spark.apache.org/docs/latest/configuration.html
https://iceberg.apache.org/docs/latest/configuration/

without a table .. i am not sure how the list of properties will look like

not a blocker for merging the RFC. we can refactor it when merging into the protocol as well. but i suggest following standards eventually.

Copy link
Copy Markdown
Contributor

@tdas tdas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but see my comment.

@scottsand-db scottsand-db changed the title [RFC] for compression setting [PROTOCOL] Add delta.parquet.compression.codec property to protocol Apr 8, 2026
@tdas tdas merged commit 6921ec7 into delta-io:master Apr 8, 2026
2 checks passed
huashi-st pushed a commit to huashi-st/delta that referenced this pull request Apr 24, 2026
…elta-io#6324)

#### Which Delta project/connector is this regarding?

- [ ] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [x] Other (protocol)

## Description

Add a proposal RFC to document parquet compression.

## How was this patch tested?

N/A

## Does this PR introduce _any_ user-facing changes?

No

---------

Co-authored-by: Scott Sandre <scott.sandre@databricks.com>
@felipepessoto
Copy link
Copy Markdown
Contributor

fix #6323?

@felipepessoto
Copy link
Copy Markdown
Contributor

@emkornfield, @scovich, @scottsand-db, @tdas, Spark uses Snappy by default. Should we make changes in Spark-Delta to use ZSTD by default and align with new spec?

@scottsand-db
Copy link
Copy Markdown
Collaborator

@felipepessoto yes that seems reasonable to me -- wdyt @emkornfield ?

@emkornfield
Copy link
Copy Markdown
Collaborator Author

@felipepessoto yes that seems reasonable to me -- wdyt @emkornfield ?

Yes, this seems reasonable to me.

@felipepessoto
Copy link
Copy Markdown
Contributor

@emkornfield, @scottsand-db are you doing it or should I send a PR?

@emkornfield
Copy link
Copy Markdown
Collaborator Author

@emkornfield, @scottsand-db are you doing it or should I send a PR?

If you have bandwidth a PR would be appreciated.

@felipepessoto
Copy link
Copy Markdown
Contributor

felipepessoto commented May 15, 2026

Created a new issue: #6803 and PR: #6802

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants