[Spark] Allow file filtering with partial stats over AND predicates#6756
Open
BenMagyar wants to merge 1 commit into
Open
[Spark] Allow file filtering with partial stats over AND predicates#6756BenMagyar wants to merge 1 commit into
BenMagyar wants to merge 1 commit into
Conversation
590aaab to
55b6941
Compare
55b6941 to
f5f60d4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which Delta project/connector is this regarding?
Description
Fixes #6755. For data skipping on
ANDpredicates a file can be pruned if either side of the predicate shows the file cannot match and that side alone has the required stats. Currently, both sides of the predicate must have stats regardless of which is actively forcing the filter.Here we validate the stats against each matching predicate piece instead, allowing for partial stats in the non-matching cases.
How was this patch tested?
Added a test to cover the partial stats case where we write one file with stats across columns
(a, b)and then another file with stats only on column(a)and then test to see that we still are able to prune over the predicatea < 0 AND b < 100Does this PR introduce any user-facing changes?
No