feat: Support IEEE 754 negative zero semantics#22835
Conversation
|
run benchmark tpch tpcds |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing array_nan (1a564ec) to 883c38e (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing array_nan (1a564ec) to 883c38e (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch — base (merge-base)
tpch — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
run benchmark tpch tpcds |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing array_nan (beb8750) to 883c38e (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing array_nan (beb8750) to 883c38e (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch — base (merge-base)
tpch — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
mbutrovich
left a comment
There was a problem hiding this comment.
Thanks for tackling this, @comphead!
One thing I wanted to ask about: optimizing for the common case where an array has no -0.0.
normalize_float_zero allocates unconditionally
PrimitiveArray::unary always allocates a fresh values buffer of length n, even when there's no -0.0 to fix — which is the overwhelmingly common case. This now sits on some hot paths:
apply_cmp— every float=/</>/<=/>=/IS [NOT] DISTINCTcomparison, on both operands.eq_dyn_null— float hash-join key verification.GroupValuesRows::intern— a deep copy of each float column on every batch.
Would it be worth gating the copy behind a SIMD-reducible pre-scan, so we only allocate when a -0.0 is actually present?
// f64 case; sign-bit-only pattern == -0.0
let needs = arr.values().iter().any(|v| v.to_bits() == 0x8000_0000_0000_0000);
if !needs { return Arc::clone(array); }.any() over a contiguous slice should vectorize to a read-only OR-reduction, so the common case drops from "allocate + write n" to just a scan, and we pay the unary cost only when there's something to normalize. The per-native-value canonicalize path in the primitive group-by builders already avoids allocation entirely — this would bring the array path closer to that.
Small question on coverage
The fix covers hash joins via eq_dyn_null + create_hashes, but JoinKeyComparator builds via make_comparator (arrow totalOrder), which isn't normalized. Is sort-merge join reachable for float equi-keys, and if so is it covered? The current tests look like they exercise the hash-join plan only.
Either way, this is a solid fix — mostly just wondering whether we can keep the no--0.0 path allocation-free.
| /// (PostgreSQL / IEEE 754 equality) require them to compare equal, so | ||
| /// callers normalize before invoking those kernels. | ||
| /// | ||
| /// The common case — no `-0.0` present — is allocation-free: a single |
There was a problem hiding this comment.
Parenthetical in em dashes is awkward.
mbutrovich
left a comment
There was a problem hiding this comment.
Minor nit, thanks @comphead.
Which issue does this PR close?
Rationale for this change
SQL (per PG and IEEE 754) treats
+0.0and-0.0as equal in=,IS DISTINCT FROM, DISTINCT, GROUP BY, UNION/INTERSECT/EXCEPT, equi-joins, andarray_*set ops. DataFusion treated them asdistinct because:
cmp::eq/gt/ltuse IEEE 754 totalOrder for floats —arrow-ord-58.3.0/src/cmp.rs:71-75explicitly says "please normalize zeros before calling this kernel".RowConverterrow-encodes floats with totalOrder; ±0 produce different bytes.to_bits()/to_ne_bytes(), so ±0 hashed to different buckets.What changes are included in this PR?
Helper in
datafusion/common/src/utils/mod.rs:normalize_float_zero(&ArrayRef) -> ArrayRef— rewrites-0.0 → +0.0for Float16/32/64 viaPrimitiveArray::unary;Arc::clonefor non-float. NaN payloads preserved (bits << 1 == 0matches only±0).
normalize_float_zero_scalar(ScalarValue) -> ScalarValue— symmetric for scalars.Applied at six boundary sites where DataFusion hands float data to Arrow:
physical-expr-common/src/datum.rs::apply_cmp=,<,>,IS DISTINCT FROMphysical-plan/src/joins/utils.rs::eq_dyn_nullphysical-plan/src/aggregates/group_values/row.rs::internfunctions-nested/src/set_ops.rs::general_array_distinctarray_distinctfunctions-nested/src/set_ops.rs::generic_set_listsarray_union,array_intersectfunctions-nested/src/except.rs::general_exceptarray_exceptFloat hash macros normalize for consistency:
datafusion/common/src/hash_utils.rs::hash_float_value!—create_hashes, hash joins, shuffle.datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive.rs::hash_float!— single-column primitive GROUP BY fast path.Single-column primitive GROUP BY / DISTINCT (
GroupValuesPrimitive::intern,PrimitiveGroupValueBuilder::{append_val,vectorized_append,equal_to,vectorized_equal_to_*}) canonicalize the input viaa new default-identity
canonicalizemethod on the localHashValuetrait (float override only). Trait visibility lifted topubso the multi-column file can use it.