Skip to content

[GH-3027] Consolidate Box predicates into ST_Intersects / ST_Contains#3028

Draft
jiayuasu wants to merge 1 commit into
apache:masterfrom
jiayuasu:feature/consolidate-box-predicates
Draft

[GH-3027] Consolidate Box predicates into ST_Intersects / ST_Contains#3028
jiayuasu wants to merge 1 commit into
apache:masterfrom
jiayuasu:feature/consolidate-box-predicates

Conversation

@jiayuasu
Copy link
Copy Markdown
Member

@jiayuasu jiayuasu commented Jun 3, 2026

Did you read the Contributor Guide?

  • Yes

Is this PR related to a ticket?

What changes were proposed in this PR?

Folds the four standalone box predicates into type-dispatched overloads on the existing geometry predicates, matching the precedent set by ST_DWithin(Box2D, Box2D, double).

Spark SQL surface:

  • ST_Intersects gains (Box2D, Box2D) and (Box3D, Box3D) inferrable branches in addition to the existing Geometry and Geography ones.
  • ST_Contains gains the same two branches.
  • Removed ST_BoxIntersects, ST_BoxContains, ST_3DBoxIntersects, ST_3DBoxContains entirely. Clean cut — the surface is young (Box3D versions landed in [GH-3012] Box3D predicates: ST_3DBoxIntersects and ST_3DBoxContains #3014) and carrying deprecated aliases doubles the surface area for no real callers.

Python: same overloads exposed via the existing ST_Intersects / ST_Contains helpers (Python dispatch is JVM-side, so no code change beyond docstrings and the removal of the four standalone helpers).

Flink: the existing ST_Intersects / ST_Contains ScalarFunctions gain a (Box2D, Box2D) eval overload. Box3D is not wired into Flink — there's no Box3DTypeSerializer yet and no Box3D scalar surface in the Flink module. Adding Box3D to Flink is tracked as a separate follow-up under the Box3D EPIC (#2973). Also added the null guards Copilot flagged on the geometry-typed eval paths so the Box2D and Geometry overloads behave consistently on NULL.

Planner / pushdown sites that previously pattern-matched on the standalone class names now key on child dataType:

  • JoinQueryDetector matches Box2D-typed ST_Intersects / ST_Contains first via a new isBox2DPair helper and routes them through the bbox R-tree path. A second guard, hasBox3DInput, short-circuits Box3D-typed inputs back to row-by-row evaluation (the join executors only know Geometry / Geography / Box2D — treating a Box3D shape as a Geometry binary failed at runtime before this guard).
  • OptimizableJoinCondition drops the explicit Box class names — the unified ST_Intersects / ST_Contains are already on the whitelist and the dataType discrimination is delegated to JoinQueryDetector.
  • SpatialFilterPushDownForGeoParquet matches Box2DUDT-typed literals on ST_Intersects / ST_Contains ahead of the Geometry-typed cases so the Box2D row-group filter still fires. Box3DUDT-typed literals are explicitly skipped (return None) so they don't fall through into GeometryUDT.deserialize and fail planning — Box3D filter pushdown is a separate follow-up.

Underlying implementation methods (Predicates.boxIntersects / boxContains / box3dIntersects / box3dContains) are unchanged. Only the SQL / Catalyst layer is consolidated.

Not folded (intentional):

  • ST_3DDWithin stays separate from ST_DWithin — the 3D distance metric is a real semantic difference, not just a different input type. PostGIS keeps them separate.
  • ST_Extent / ST_3DExtent, constructors (ST_Box2D, ST_Box3D, ST_MakeBox2D, ST_3DMakeBox), and ST_GeomFromBox2D stay — different return types, PostGIS canon.

Docs: Predicates/ST_Intersects.md and Predicates/ST_Contains.md describe the new Box2D / Box3D / Geography polymorphism; the standalone box2d/Box2D-Predicates/ST_BoxIntersects.md / ST_BoxContains.md pages are deleted; box2d/Box2D-Functions.md and Optimizer.md rewritten to use the unified names.

How was this patch tested?

  • Touched Scala suites (predicateTestScala, Box3DIntersectsContainsSuite, Box2DJoinSuite, GeoParquetSpatialFilterPushDownSuite, dataFrameAPITestScala) all pass locally — 69+ tests, 0 failed. Includes a new regression test verifying that a Box3D-on-Box3D ST_Intersects join falls back to row-by-row evaluation without the prior cast error.
  • Flink PredicateTest — 18/18 pass.
  • Python test_predicate.py + test_dataframe_api.py — the intersect/contains/box-related cases pass; the 5 pre-existing flakes (Buffer / SubDivide / dbscan / lof) are unchanged.

Did this PR include necessary documentation updates?

  • Yes — see the Docs bullet above.

@jiayuasu jiayuasu requested a review from Copilot June 3, 2026 06:27
@jiayuasu jiayuasu force-pushed the feature/consolidate-box-predicates branch from 85ea310 to 0a183ad Compare June 3, 2026 06:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR consolidates the Box2D/Box3D SQL predicate surface by folding the standalone box predicates into type-dispatched overloads of the existing ST_Intersects and ST_Contains across Sedona’s Spark SQL integration (plus corresponding updates in tests and pushdown/join planning), and updates the Python/Flink surfaces accordingly.

Changes:

  • Add Box2D/Box3D branches to Spark ST_Intersects / ST_Contains (via inferred dispatch) and remove the standalone ST_Box* / ST_3DBox* expressions and catalog registrations.
  • Update Spark planner + GeoParquet pushdown logic to detect Box2D variants by operand dataType rather than predicate class name.
  • Update Scala/Python/Flink tests and DataFrame helpers to use the unified ST_Intersects / ST_Contains names.

Reviewed changes

Copilot reviewed 25 out of 25 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
spark/common/src/test/scala/org/apache/sedona/sql/predicateTestScala.scala Updates predicate SQL tests to use ST_Intersects/ST_Contains for Box2D.
spark/common/src/test/scala/org/apache/sedona/sql/GeoParquetSpatialFilterPushDownSuite.scala Updates GeoParquet pushdown tests to match unified predicate names.
spark/common/src/test/scala/org/apache/sedona/sql/dataFrameAPITestScala.scala Updates DataFrame API tests to use unified predicates for Box3D.
spark/common/src/test/scala/org/apache/sedona/sql/Box3DIntersectsContainsSuite.scala Renames suite and updates SQL to use unified predicates for Box3D.
spark/common/src/test/scala/org/apache/sedona/sql/Box2DJoinSuite.scala Updates join planning tests to use ST_Intersects/ST_Contains on Box2D.
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/TraitJoinQueryBase.scala Adjusts join strategy commentary to reflect unified predicate names.
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/OptimizableJoinCondition.scala Removes standalone box predicate classes from join-condition whitelist.
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/JoinQueryDetector.scala Adds Box2D-pair detection for unified predicates and removes old box-specific matches.
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/optimization/SpatialFilterPushDownForGeoParquet.scala Rewrites pushdown matching to detect Box2D literals under unified predicates.
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/st_predicates.scala Removes Scala DataFrame API helpers for deleted standalone box predicates.
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/Predicates.scala Extends ST_Intersects/ST_Contains inferred dispatch to Box2D/Box3D and removes old expressions.
spark/common/src/main/scala/org/apache/spark/sql/execution/datasources/geoparquet/GeoParquetSpatialFilter.scala Updates pushdown filter documentation strings to unified predicate names.
spark/common/src/main/scala/org/apache/sedona/sql/UDF/Catalog.scala Drops registrations for removed standalone box predicate expressions.
python/tests/sql/test_predicate.py Updates SQL-level Python tests to use unified predicate names for Box2D.
python/tests/sql/test_dataframe_api.py Updates Python DataFrame API tests to use unified predicates for Box3D.
python/sedona/spark/sql/st_predicates.py Removes Python helpers for deleted standalone box predicates; expands docs for unified ones.
flink/src/test/java/org/apache/sedona/flink/PredicateTest.java Updates Flink SQL tests to use unified predicate names for Box2D.
flink/src/main/java/org/apache/sedona/flink/expressions/Predicates.java Renames/remaps Flink predicate UDFs to unified names and adds Box2D overloads.
flink/src/main/java/org/apache/sedona/flink/Catalog.java Removes registrations for deleted Flink box predicate UDFs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +46 to +48
Geometry geom1 = (Geometry) o1;
Geometry geom2 = (Geometry) o2;
return org.apache.sedona.common.Predicates.intersects(geom1, geom2);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest push: added if (o1 == null || o2 == null) return null; to the Geometry-typed eval so it matches the Box2D overload below.

Comment on lines 86 to +88
Geometry geom1 = (Geometry) o1;
Geometry geom2 = (Geometry) o2;
return org.apache.sedona.common.Predicates.intersects(geom1, geom2);
return org.apache.sedona.common.Predicates.contains(geom1, geom2);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest push: same null guard added to the Geometry-typed ST_Contains eval.

Comment on lines 51 to 66
@@ -59,13 +62,14 @@ public Boolean eval(
bridgedTo = Box2D.class)
Box2D b) {
if (a == null || b == null) return null;
return org.apache.sedona.common.Predicates.boxContains(a, b);
return org.apache.sedona.common.Predicates.boxIntersects(a, b);
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. The Flink module has no Box3DTypeSerializer and no Box3D scalar surface yet; only the Box2D overload is added here. PR description corrected to scope this PR as Box2D-on-Flink only. Box3D in Flink is tracked under the Box3D EPIC (#2973) as a separate follow-up.

Comment on lines +37 to +39
- Geometry / Geometry — topological containment via JTS.
- Box2D / Box2D — closed-interval bbox containment on both axes (PostGIS ``~`` on box2d).
- Box3D / Box3D — closed-interval bbox containment on all three axes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest push: added the Geography / Geography branch to the ST_Contains docstring.

Comment on lines +97 to +99
- Geometry / Geometry — topological intersection via JTS.
- Box2D / Box2D — closed-interval bbox intersection on both axes (PostGIS ``&&`` on box2d).
- Box3D / Box3D — closed-interval bbox intersection on all three axes (PostGIS ``&&&``).
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest push: added the Geography / Geography branch to the ST_Intersects docstring.

…ntains

Folds the four standalone box predicates into type-dispatched overloads
on the existing geometry predicates, matching the precedent set by
ST_DWithin(Box2D, Box2D, double).

- ST_Intersects gains (Box2D, Box2D) and (Box3D, Box3D) inferrable
  branches in addition to the existing Geometry and Geography ones.
- ST_Contains gains the same two branches.
- Removed ST_BoxIntersects, ST_BoxContains, ST_3DBoxIntersects, and
  ST_3DBoxContains entirely from the Spark expression catalog, Scala
  DataFrame helpers, Python DataFrame helpers, and the Flink catalog +
  ScalarFunction module.

Planner / pushdown sites that previously pattern-matched on the
standalone class names now key on child dataType instead:

- JoinQueryDetector matches Box2D-typed `ST_Intersects` / `ST_Contains`
  first via a new `isBox2DPair` helper and routes them through the
  bbox R-tree path (Box3D inputs fall through to row-by-row, matching
  the previous coverage).
- OptimizableJoinCondition drops the explicit Box class names — the
  unified `ST_Intersects` / `ST_Contains` are already on the whitelist
  and the dataType check is delegated to JoinQueryDetector.
- SpatialFilterPushDownForGeoParquet matches `Box2DUDT`-typed literals
  on `ST_Intersects` / `ST_Contains` ahead of the Geometry-typed cases
  so the Box2D row-group filter still fires.

Tests rewritten to call the unified names (Spark, Flink, Python). The
Java `Predicates.boxIntersects` / `boxContains` / `box3dIntersects` /
`box3dContains` implementation methods are unchanged; only the SQL /
Catalyst layer is consolidated.
@jiayuasu jiayuasu force-pushed the feature/consolidate-box-predicates branch from 0a183ad to 0162f32 Compare June 3, 2026 06:43
@jiayuasu jiayuasu marked this pull request as draft June 3, 2026 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants