Skip to content

[Spark] Allow file filtering with partial stats over AND predicates#6756

Open
BenMagyar wants to merge 1 commit into
delta-io:masterfrom
BenMagyar:bmagyar/predicate-stats
Open

[Spark] Allow file filtering with partial stats over AND predicates#6756
BenMagyar wants to merge 1 commit into
delta-io:masterfrom
BenMagyar:bmagyar/predicate-stats

Conversation

@BenMagyar
Copy link
Copy Markdown

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

Fixes #6755. For data skipping on AND predicates a file can be pruned if either side of the predicate shows the file cannot match and that side alone has the required stats. Currently, both sides of the predicate must have stats regardless of which is actively forcing the filter.

Here we validate the stats against each matching predicate piece instead, allowing for partial stats in the non-matching cases.

How was this patch tested?

Added a test to cover the partial stats case where we write one file with stats across columns (a, b) and then another file with stats only on column (a) and then test to see that we still are able to prune over the predicate a < 0 AND b < 100

Does this PR introduce any user-facing changes?

No

@BenMagyar BenMagyar force-pushed the bmagyar/predicate-stats branch 2 times, most recently from 590aaab to 55b6941 Compare May 9, 2026 15:39
@BenMagyar BenMagyar force-pushed the bmagyar/predicate-stats branch from 55b6941 to f5f60d4 Compare May 9, 2026 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG][Spark] Data skipping is too conservative for AND predicates

1 participant