Skip to content

feat(rust/sedona-raster-gdal): add RS_FromPath#831

Merged
Kontinuation merged 8 commits into
apache:mainfrom
Kontinuation:pr-c-rs-frompath
Jun 2, 2026
Merged

feat(rust/sedona-raster-gdal): add RS_FromPath#831
Kontinuation merged 8 commits into
apache:mainfrom
Kontinuation:pr-c-rs-frompath

Conversation

@Kontinuation
Copy link
Copy Markdown
Member

@Kontinuation Kontinuation commented May 11, 2026

Summary

  • add RS_FromPath for constructing out-db rasters from GDAL-openable paths
  • wire sedona-raster-gdal and SedonaContext minimally for this standalone child PR
  • keep other GDAL-backed raster functions out of scope for follow-up PRs

Testing

  • cargo test -p sedona-raster-gdal rs_from_path --no-default-features
  • cargo clippy -p sedona-raster-gdal --no-default-features -- -D warnings
  • cargo test -p sedona-raster-gdal -p sedona --lib
  • cargo clippy -p sedona-raster-gdal -p sedona --lib -- -D warnings

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a new GDAL-backed raster UDF that constructs out-of-db rasters from GDAL-openable paths, and wires it into SedonaContext for availability.

Changes:

  • Introduced RS_FromPath-style UDF (rs_frompath) backed by GDAL, plus a Criterion benchmark.
  • Added out-db raster construction helper (append_as_outdb_raster) and centralized nodata (de)serialization helpers.
  • Registered GDAL UDFs during SedonaContext initialization and added SQL reference docs.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
rust/sedona/src/context.rs Registers all GDAL UDFs into SedonaContext during initialization.
rust/sedona-raster-gdal/src/utils.rs Adds out-db raster builder path, refactors nodata conversion usage, expands tests.
rust/sedona-raster-gdal/src/rs_frompath.rs Implements the path-based raster UDF and its unit tests.
rust/sedona-raster-gdal/src/lib.rs Exposes the new UDF and provides all_gdal_udfs().
rust/sedona-raster-gdal/src/gdal_dataset_provider.rs Refactors nodata setting to shared helper.
rust/sedona-raster-gdal/src/gdal_common.rs Adds shared helpers for reading/setting band nodata bytes.
rust/sedona-raster-gdal/benches/rs_frompath.rs Adds benchmarks for the new UDF using test fixtures.
rust/sedona-raster-gdal/Cargo.toml Adds deps needed for the UDF + registers the benchmark target.
docs/reference/sql/rs_frompath.qmd Adds SQL reference documentation for RS_FromPath.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread rust/sedona-raster-gdal/src/rs_frompath.rs Outdated
Comment thread rust/sedona-raster-gdal/src/utils.rs
Comment thread rust/sedona/src/context.rs Outdated
Comment thread rust/sedona-raster-gdal/src/rs_frompath.rs Outdated
Comment thread rust/sedona-raster-gdal/src/gdal_common.rs
@Kontinuation Kontinuation marked this pull request as ready for review May 13, 2026 02:53
@jiayuasu
Copy link
Copy Markdown
Member

@james-willis please review too

Copy link
Copy Markdown
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this!

Comment on lines +68 to +77
let len = path_array.len();
let mut builder = RasterBuilder::new(len);
for i in 0..len {
if path_array.is_null(i) {
builder.append_null()?;
} else {
let path = path_array.value(i);
append_as_outdb_raster(gdal, path, &mut builder)?;
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works, although our iteration elsewhere is generally for item_opt in path_array { ... }

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to for path_opt in path_array

Comment on lines +80 to +87

match &args[0] {
ColumnarValue::Scalar(_) => {
let scalar = datafusion_common::ScalarValue::try_from_array(&result, 0)?;
Ok(ColumnarValue::Scalar(scalar))
}
ColumnarValue::Array(_) => Ok(ColumnarValue::Array(result)),
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
match &args[0] {
ColumnarValue::Scalar(_) => {
let scalar = datafusion_common::ScalarValue::try_from_array(&result, 0)?;
Ok(ColumnarValue::Scalar(scalar))
}
ColumnarValue::Array(_) => Ok(ColumnarValue::Array(result)),
}
executor.finish(result)

Comment on lines +64 to +65

let paths = args[0].cast_to(&DataType::Utf8, None)?.into_array(1)?;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let paths = args[0].cast_to(&DataType::Utf8, None)?.into_array(1)?;
let executor = WkbBytesExecutor::new(arg_types, args);
let paths = args[0].cast_to(&DataType::Utf8, None)?.into_array(executor.num_iterations())?;

The rest of this loop suggests this can handle both scalars and columns but I think as written this will fail for anything except a scalar or one row table

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to use WkbBytesExecutor. It looks a bit strange: we use the executor for purposes other than iterating over the input data, but it still make the code shorter.

Comment on lines +116 to +122
fn assert_raster_dimensions(
result: &ColumnarValue,
expected_len: usize,
width: u64,
height: u64,
) {
match result {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not your problem for this PR, but we really need support for rasters in the ScalarUdfTester so that these tests can be more compact.

In the meantime this could be moved to sedona_testing::rasters which has a few asserters along these lines.

Comment on lines +107 to +117
/// Append a raster source path as a single out-db raster to the provided [`RasterBuilder`].
pub fn append_as_outdb_raster(gdal: &Gdal, path: &str, builder: &mut RasterBuilder) -> Result<()> {
let gdal_path = normalize_outdb_source_path(path);
let dataset = gdal
.open_ex_with_options(
&gdal_path,
DatasetOptions {
open_flags: GDAL_OF_RASTER | GDAL_OF_READONLY,
..Default::default()
},
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am guessing this does blocking IO. Should we make any attempt to mitigate this or implement RS_FromPath as an async UDF to get the auto batch size adjustment from DataFusion for these types of functions?

Comment thread rust/sedona-raster-gdal/Cargo.toml Outdated
result_large_err = "allow"

[dependencies]
arrow = { workspace = true }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a targeted subcrate for whatever uses this import?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depend on arrow-array, arrow-buffer and arrow-schema.

Comment thread rust/sedona/src/context.rs Outdated
);

for udf in sedona_raster_gdal::all_gdal_udfs() {
out.ctx.register_udf(udf.into());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this use the same pattern as the other crates that provide scalar functions? (e.g., out.register_scalar_kernels(sedona_raster_gdal::register::scalar_kernels().into_iter())?;). I suppose if we make it an async UDF we'll need the pattern you have here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we have sedona_raster_gdal::register::default_function_set, so this loop was replaced by out.register_function_set(sedona_raster_gdal::register::default_function_set());.

Copy link
Copy Markdown
Member

@zhangfengcdt zhangfengcdt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to align the registration the same way as other UDFs via scalar_kernels() from a register module inside sedona-raster-gdal as @paleolimbot already points out, otherwise, LTGM!

@Kontinuation
Copy link
Copy Markdown
Member Author

I tried to implement RS_FromPath as a DataFusion async UDF but was blocked by a DataFusion bug: apache/datafusion#22662. I have submitted a patch for fixing this: apache/datafusion#22663.

I'd like to leave RS_FromPath as synchronous UDF for now until the fix got accepted and released by the upstream. The async version of RS_FromPath can be found in https://github.com/Kontinuation/sedona-db/tree/pr-c-rs-frompath-async. We extended SedonaScalarUDF to support async function and made RS_FromPath async by dispatching GDAL calls to tokio worker threads.

Copy link
Copy Markdown
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to implement RS_FromPath as a DataFusion async UDF but was blocked by a DataFusion bug:

Thank you for trying and for fixing that!

I'd like to leave RS_FromPath as synchronous UDF for now until the fix got accepted and released by the upstream.

Sounds good to me. If an async context is truly helpful we can store an Arc<Runtime> in the UDF and instantiate/register from the Cli/Python where we have access to it, but no need to do that here.

@paleolimbot
Copy link
Copy Markdown
Member

A note that this has some overlap with #886 but I also think it's good to have this merge and we can draw the best from both approaches as we go.

@Kontinuation Kontinuation merged commit badd49f into apache:main Jun 2, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants