[GH-3027] Consolidate Box predicates into ST_Intersects / ST_Contains#3028
[GH-3027] Consolidate Box predicates into ST_Intersects / ST_Contains#3028jiayuasu wants to merge 1 commit into
Conversation
85ea310 to
0a183ad
Compare
There was a problem hiding this comment.
Pull request overview
This PR consolidates the Box2D/Box3D SQL predicate surface by folding the standalone box predicates into type-dispatched overloads of the existing ST_Intersects and ST_Contains across Sedona’s Spark SQL integration (plus corresponding updates in tests and pushdown/join planning), and updates the Python/Flink surfaces accordingly.
Changes:
- Add Box2D/Box3D branches to Spark
ST_Intersects/ST_Contains(via inferred dispatch) and remove the standaloneST_Box*/ST_3DBox*expressions and catalog registrations. - Update Spark planner + GeoParquet pushdown logic to detect Box2D variants by operand
dataTyperather than predicate class name. - Update Scala/Python/Flink tests and DataFrame helpers to use the unified
ST_Intersects/ST_Containsnames.
Reviewed changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| spark/common/src/test/scala/org/apache/sedona/sql/predicateTestScala.scala | Updates predicate SQL tests to use ST_Intersects/ST_Contains for Box2D. |
| spark/common/src/test/scala/org/apache/sedona/sql/GeoParquetSpatialFilterPushDownSuite.scala | Updates GeoParquet pushdown tests to match unified predicate names. |
| spark/common/src/test/scala/org/apache/sedona/sql/dataFrameAPITestScala.scala | Updates DataFrame API tests to use unified predicates for Box3D. |
| spark/common/src/test/scala/org/apache/sedona/sql/Box3DIntersectsContainsSuite.scala | Renames suite and updates SQL to use unified predicates for Box3D. |
| spark/common/src/test/scala/org/apache/sedona/sql/Box2DJoinSuite.scala | Updates join planning tests to use ST_Intersects/ST_Contains on Box2D. |
| spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/TraitJoinQueryBase.scala | Adjusts join strategy commentary to reflect unified predicate names. |
| spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/OptimizableJoinCondition.scala | Removes standalone box predicate classes from join-condition whitelist. |
| spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/JoinQueryDetector.scala | Adds Box2D-pair detection for unified predicates and removes old box-specific matches. |
| spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/optimization/SpatialFilterPushDownForGeoParquet.scala | Rewrites pushdown matching to detect Box2D literals under unified predicates. |
| spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/st_predicates.scala | Removes Scala DataFrame API helpers for deleted standalone box predicates. |
| spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/Predicates.scala | Extends ST_Intersects/ST_Contains inferred dispatch to Box2D/Box3D and removes old expressions. |
| spark/common/src/main/scala/org/apache/spark/sql/execution/datasources/geoparquet/GeoParquetSpatialFilter.scala | Updates pushdown filter documentation strings to unified predicate names. |
| spark/common/src/main/scala/org/apache/sedona/sql/UDF/Catalog.scala | Drops registrations for removed standalone box predicate expressions. |
| python/tests/sql/test_predicate.py | Updates SQL-level Python tests to use unified predicate names for Box2D. |
| python/tests/sql/test_dataframe_api.py | Updates Python DataFrame API tests to use unified predicates for Box3D. |
| python/sedona/spark/sql/st_predicates.py | Removes Python helpers for deleted standalone box predicates; expands docs for unified ones. |
| flink/src/test/java/org/apache/sedona/flink/PredicateTest.java | Updates Flink SQL tests to use unified predicate names for Box2D. |
| flink/src/main/java/org/apache/sedona/flink/expressions/Predicates.java | Renames/remaps Flink predicate UDFs to unified names and adds Box2D overloads. |
| flink/src/main/java/org/apache/sedona/flink/Catalog.java | Removes registrations for deleted Flink box predicate UDFs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Geometry geom1 = (Geometry) o1; | ||
| Geometry geom2 = (Geometry) o2; | ||
| return org.apache.sedona.common.Predicates.intersects(geom1, geom2); |
There was a problem hiding this comment.
Fixed in latest push: added if (o1 == null || o2 == null) return null; to the Geometry-typed eval so it matches the Box2D overload below.
| Geometry geom1 = (Geometry) o1; | ||
| Geometry geom2 = (Geometry) o2; | ||
| return org.apache.sedona.common.Predicates.intersects(geom1, geom2); | ||
| return org.apache.sedona.common.Predicates.contains(geom1, geom2); |
There was a problem hiding this comment.
Fixed in latest push: same null guard added to the Geometry-typed ST_Contains eval.
| @@ -59,13 +62,14 @@ public Boolean eval( | |||
| bridgedTo = Box2D.class) | |||
| Box2D b) { | |||
| if (a == null || b == null) return null; | |||
| return org.apache.sedona.common.Predicates.boxContains(a, b); | |||
| return org.apache.sedona.common.Predicates.boxIntersects(a, b); | |||
| } | |||
There was a problem hiding this comment.
Correct. The Flink module has no Box3DTypeSerializer and no Box3D scalar surface yet; only the Box2D overload is added here. PR description corrected to scope this PR as Box2D-on-Flink only. Box3D in Flink is tracked under the Box3D EPIC (#2973) as a separate follow-up.
| - Geometry / Geometry — topological containment via JTS. | ||
| - Box2D / Box2D — closed-interval bbox containment on both axes (PostGIS ``~`` on box2d). | ||
| - Box3D / Box3D — closed-interval bbox containment on all three axes. |
There was a problem hiding this comment.
Fixed in latest push: added the Geography / Geography branch to the ST_Contains docstring.
| - Geometry / Geometry — topological intersection via JTS. | ||
| - Box2D / Box2D — closed-interval bbox intersection on both axes (PostGIS ``&&`` on box2d). | ||
| - Box3D / Box3D — closed-interval bbox intersection on all three axes (PostGIS ``&&&``). |
There was a problem hiding this comment.
Fixed in latest push: added the Geography / Geography branch to the ST_Intersects docstring.
…ntains Folds the four standalone box predicates into type-dispatched overloads on the existing geometry predicates, matching the precedent set by ST_DWithin(Box2D, Box2D, double). - ST_Intersects gains (Box2D, Box2D) and (Box3D, Box3D) inferrable branches in addition to the existing Geometry and Geography ones. - ST_Contains gains the same two branches. - Removed ST_BoxIntersects, ST_BoxContains, ST_3DBoxIntersects, and ST_3DBoxContains entirely from the Spark expression catalog, Scala DataFrame helpers, Python DataFrame helpers, and the Flink catalog + ScalarFunction module. Planner / pushdown sites that previously pattern-matched on the standalone class names now key on child dataType instead: - JoinQueryDetector matches Box2D-typed `ST_Intersects` / `ST_Contains` first via a new `isBox2DPair` helper and routes them through the bbox R-tree path (Box3D inputs fall through to row-by-row, matching the previous coverage). - OptimizableJoinCondition drops the explicit Box class names — the unified `ST_Intersects` / `ST_Contains` are already on the whitelist and the dataType check is delegated to JoinQueryDetector. - SpatialFilterPushDownForGeoParquet matches `Box2DUDT`-typed literals on `ST_Intersects` / `ST_Contains` ahead of the Geometry-typed cases so the Box2D row-group filter still fires. Tests rewritten to call the unified names (Spark, Flink, Python). The Java `Predicates.boxIntersects` / `boxContains` / `box3dIntersects` / `box3dContains` implementation methods are unchanged; only the SQL / Catalyst layer is consolidated.
0a183ad to
0162f32
Compare
Did you read the Contributor Guide?
Is this PR related to a ticket?
What changes were proposed in this PR?
Folds the four standalone box predicates into type-dispatched overloads on the existing geometry predicates, matching the precedent set by
ST_DWithin(Box2D, Box2D, double).Spark SQL surface:
ST_Intersectsgains(Box2D, Box2D)and(Box3D, Box3D)inferrable branches in addition to the existing Geometry and Geography ones.ST_Containsgains the same two branches.ST_BoxIntersects,ST_BoxContains,ST_3DBoxIntersects,ST_3DBoxContainsentirely. Clean cut — the surface is young (Box3D versions landed in [GH-3012] Box3D predicates: ST_3DBoxIntersects and ST_3DBoxContains #3014) and carrying deprecated aliases doubles the surface area for no real callers.Python: same overloads exposed via the existing
ST_Intersects/ST_Containshelpers (Python dispatch is JVM-side, so no code change beyond docstrings and the removal of the four standalone helpers).Flink: the existing
ST_Intersects/ST_ContainsScalarFunctions gain a(Box2D, Box2D)evaloverload. Box3D is not wired into Flink — there's noBox3DTypeSerializeryet and no Box3D scalar surface in the Flink module. Adding Box3D to Flink is tracked as a separate follow-up under the Box3D EPIC (#2973). Also added the null guards Copilot flagged on the geometry-typedevalpaths so the Box2D and Geometry overloads behave consistently onNULL.Planner / pushdown sites that previously pattern-matched on the standalone class names now key on child dataType:
JoinQueryDetectormatches Box2D-typedST_Intersects/ST_Containsfirst via a newisBox2DPairhelper and routes them through the bbox R-tree path. A second guard,hasBox3DInput, short-circuits Box3D-typed inputs back to row-by-row evaluation (the join executors only know Geometry / Geography / Box2D — treating a Box3D shape as a Geometry binary failed at runtime before this guard).OptimizableJoinConditiondrops the explicit Box class names — the unifiedST_Intersects/ST_Containsare already on the whitelist and the dataType discrimination is delegated toJoinQueryDetector.SpatialFilterPushDownForGeoParquetmatchesBox2DUDT-typed literals onST_Intersects/ST_Containsahead of the Geometry-typed cases so the Box2D row-group filter still fires. Box3DUDT-typed literals are explicitly skipped (returnNone) so they don't fall through intoGeometryUDT.deserializeand fail planning — Box3D filter pushdown is a separate follow-up.Underlying implementation methods (
Predicates.boxIntersects/boxContains/box3dIntersects/box3dContains) are unchanged. Only the SQL / Catalyst layer is consolidated.Not folded (intentional):
ST_3DDWithinstays separate fromST_DWithin— the 3D distance metric is a real semantic difference, not just a different input type. PostGIS keeps them separate.ST_Extent/ST_3DExtent, constructors (ST_Box2D,ST_Box3D,ST_MakeBox2D,ST_3DMakeBox), andST_GeomFromBox2Dstay — different return types, PostGIS canon.Docs:
Predicates/ST_Intersects.mdandPredicates/ST_Contains.mddescribe the new Box2D / Box3D / Geography polymorphism; the standalonebox2d/Box2D-Predicates/ST_BoxIntersects.md/ST_BoxContains.mdpages are deleted;box2d/Box2D-Functions.mdandOptimizer.mdrewritten to use the unified names.How was this patch tested?
predicateTestScala,Box3DIntersectsContainsSuite,Box2DJoinSuite,GeoParquetSpatialFilterPushDownSuite,dataFrameAPITestScala) all pass locally — 69+ tests, 0 failed. Includes a new regression test verifying that a Box3D-on-Box3DST_Intersectsjoin falls back to row-by-row evaluation without the prior cast error.PredicateTest— 18/18 pass.test_predicate.py+test_dataframe_api.py— the intersect/contains/box-related cases pass; the 5 pre-existing flakes (Buffer / SubDivide / dbscan / lof) are unchanged.Did this PR include necessary documentation updates?