Lazy Raster Loading Support for Zarr and GDAL#886
Conversation
Foundational module for async OutDb byte loading. Defines: - `AsyncByteLoader` trait — async fn that returns `arrow_buffer::Buffer`, enabling zero-copy hand-off to the downstream BinaryViewArray construction in RS_EnsureLoaded (one storage→column copy instead of three: storage→loader-internal→Vec<u8>→BinaryViewBuilder-block→column). - `OutDbLoadRequest` — borrowed request struct carrying the URI, dim_names, source_shape, and pixel type for one band's worth of bytes. - `OutDbLoaderRegistry` — format-keyed map from `outdb_format` strings to `Arc<dyn AsyncByteLoader>`. One instance per SedonaContext; the context wraps it in `Arc<RwLock<…>>` so plugin crates can register post-construction via a public SedonaContext API. No callers yet — the registry/trait stand alone. Subsequent commits add the SedonaScalarUDF needs_bytes annotation, the RS_EnsureLoaded async UDF that consumes the registry, the analyzer rule that injects the wrap, and the GDAL + Zarr backend impls. 7 unit tests cover registration, lookup, dispatch, overwrite semantics, and the format-iteration helper used for diagnostic messages. See docs/research/async-outdb-design.md for the full design rationale.
Class-level annotation declaring whether a UDF's kernels read raster pixel bytes. The analyzer rule (later commit) reads this to decide whether to wrap raster arguments with RS_EnsureLoaded. Shape: - New `needs_bytes: bool` field on SedonaScalarUDF, default false. - `with_needs_bytes()` builder method flips it to true at construction. - `needs_bytes()` accessor for the analyzer to query post-downcast. Class-level by convention: each UDF is constructed once per session, so the instance field reflects a fixed property of the underlying kernels. The field doesn't live on SedonaScalarKernel because kernels within a single UDF don't realistically disagree on byte-needing-ness — they differ by type signature, not by data-access pattern. If a future use case demands per-kernel granularity, that's the trigger to push the flag down to the kernel trait; not before. No current RS_* UDFs read pixel bytes (audit returns empty across sedona-raster-functions, sedona-raster-gdal, sedona). The annotation mechanism is staged for future byte-reading kernels (e.g., RS_Value, RS_Sum) which will call `.with_needs_bytes()` at construction. See docs/research/async-outdb-design.md §2 for the design rationale.
Materialises OutDb raster bands at query time via the format-keyed AsyncByteLoader registry from sedona-raster. The UDF instance closes over an Arc<RwLock<OutDbLoaderRegistry>> owned by SedonaContext, so loader registrations from session bootstrap (compiled-in defaults like GDAL under a Cargo feature, later commit) and from plugin entry points (out-of-tree crates calling SedonaContext::register_outdb_loader, later commits) are immediately visible at dispatch time without session-extension lookups — AsyncScalarUDFImpl::invoke_async_with_args in DataFusion 52.5 only receives Arc<ConfigOptions>, so the closure-bound registry is the right runtime path. Implementation: - `RsEnsureLoaded` implements `AsyncScalarUDFImpl`. For each input row, walks bands, passes through InDb bytes verbatim, dispatches OutDb bytes via the registered loader for the band's `outdb_format`, validates the loader-returned `Buffer` length against `Π source_shape × data_type.byte_size()`, and assembles the output raster StructArray via `RasterBuilder`. Sequential await for the first cut; parallel `buffer_unordered` fan-out is a follow-up that doesn't change the trait surface. - `SedonaContext` gains a private `outdb_registry: Arc<RwLock<...>>` field initialised empty at `new_from_context`. The `RsEnsureLoaded` UDF is registered there alongside other session UDFs, with a clone of the same Arc. - Public `SedonaContext::register_outdb_loader(format, loader)` is the single insertion hook used by both compiled-in defaults and plugins. `registered_outdb_formats()` exposes the keys for diagnostics. - `OutDbLoaderRegistry` gains a manual `Debug` impl listing format keys (the loader values themselves are opaque trait objects). - `SedonaScalarUDF` Eq/Hash/Debug derive lessons applied: manual impls by identity-by-name, since the registry field doesn't participate in UDF identity and isn't itself Eq/Hash. 5 unit tests cover the resolver end-to-end with a mock loader: InDb passthrough, OutDb byte materialisation, missing-format diagnostic with registered-formats listing, undersized-loader-output detection, null raster row preservation. One context-level test asserts the registry plumbing (empty initial state, post-register listing, UDF visible in the scalar function registry by name).
Pre-optimizer rewrite that makes OutDb byte materialisation explicit in the logical plan. For every `Expr::ScalarFunction` whose UDF downcasts to a `SedonaScalarUDF` annotated `with_needs_bytes()`, each raster-typed argument is wrapped in `RS_EnsureLoaded(arg)`. The wrap references the same `Arc<ScalarUDF>` as the one `SedonaContext::new_from_context` registers under the UDF registry, so both injected and explicitly-written calls dispatch through the same async resolver. Recursion guard: `RS_EnsureLoaded` itself is left alone, so `SELECT ... FROM RS_EnsureLoaded(rast)` doesn't recurse into `RS_EnsureLoaded(RS_EnsureLoaded(rast))`. Detection of "raster-typed" arguments uses `Expr::to_field(schema)` — not `get_type()` — so the Field's extension-type metadata is available; `SedonaType::Raster` is identified by the `"sedona.raster"` extension type, not by raw `DataType::Struct`. Two adjustments to Commit 3's RS_EnsureLoaded UDF that pair with the analyzer rule: - Signature volatility flipped from `Volatile` to `Stable`. DataFusion's `CommonSubexprEliminate` pass skips volatile expressions (`is_volatile_node()` returns true iff signature is exactly `Volatility::Volatile`), so the previous Volatile classification would have prevented dedup of injected wraps. Semantically Stable is the honest call: within a query, byte materialisation is deterministic for fixed inputs; across queries the underlying storage may change. - Signature shape changed from `user_defined` to `any(1, ...)`. The `AsyncScalarUDF` adapter doesn't delegate `coerce_types` to the inner impl, so `Signature::user_defined` (which requires `coerce_types`) produced a planning-time "function does not implement coerce_types" error. `any(1, ...)` lets DataFusion accept whatever single-arg type the caller passes; we validate "argument is Struct" in `return_type`. The CSE-dedup test (load-bearing) constructs the post-analyzer plan shape directly (two `needs_bytes` UDFs calling the same raster column, both with their args wrapped) and applies DataFusion's `CommonSubexprEliminate::rewrite` to it. Before CSE: two RS_EnsureLoaded(rast) calls in the plan tree. After CSE: exactly one. This validates the volatility-Stable choice and proves we lean on DataFusion's existing optimiser for dedup rather than re-implementing it in our rule. 8 tests cover: - `expr_is_raster` detection on raster + non-raster columns - analyzer rule wraps raster args of needs_bytes UDFs - leaves non-raster args alone - leaves metadata-only (un-annotated) UDFs alone - recursion guard against rewrapping RS_EnsureLoaded - nested calls are rewritten inside-out via transform_up - CSE dedup of injected wraps (2 → 1) Registered from `SedonaContext::new_from_context` via `ctx.state_ref().write().add_analyzer_rule(...)`. The rule runs in DataFusion's analyzer phase, before any optimizer rules including PushDownFilter, PushDownLimit, and CommonSubexprEliminate. See docs/research/async-outdb-design.md for the broader design.
Materialises OutDb GeoTIFF (and other GDAL-supported) bands when the RS_EnsureLoaded UDF dispatches against the `"gdal"` registry key. Shape: - `GdalLoader` is a stateless `AsyncByteLoader` impl. The thread-local `GDALDatasetCache` from `gdal_dataset_provider` does the per-thread dataset memoisation; the loader itself owns no state, so instances are interchangeable and cheap to clone. - Inside `load()`, GDAL's blocking API is wrapped in `tokio::task::spawn_blocking` so the calling tokio worker stays responsive. The dataset cache uses `Rc` (not Send/Sync) and is thread-local, so each blocking-pool worker maintains its own cache — concurrent loads via `buffer_unordered` build per-worker caches without contention. - `read_as_bytes` returns a `Vec<u8>`; `Buffer::from_vec` wraps it zero-copy for the return path, matching the design's one-storage-to-column-copy goal end-to-end on the GDAL backend. Request validation (synchronous, before spawning): - Source shape must be 2-D and `dim_names == ["y", "x"]`. Higher-rank or transposed OutDb reads need MDArray support and explicit axis mapping, tracked separately. - File's GDAL pixel type must match the band metadata's `data_type` claim. Caught here with a clear error before reading, otherwise a size mismatch would mis-blame the loader rather than naming the dtype problem. Bootstrap registration: `SedonaContext::new_from_context` calls `out.register_outdb_loader(GDAL_FORMAT, Arc::new(GdalLoader::new()))`. GDAL itself is dlopen'd lazily by `sedona-gdal` (workspace-default `gdal-sys = false`), so registration is safe on systems without libgdal — the first OutDb GeoTIFF query surfaces a clean "libgdal not found" error from `sedona-gdal::global` instead of a panic, and non-OutDb queries are unaffected. Module bookkeeping: the existing `source_uri` module (`parse_outdb_source`) was previously `#[cfg(test)]`-only; it's needed in production now, so the cfg gate is removed. `get_or_create_outdb_source` on `GDALDatasetCache` is promoted from private to `pub(crate)` so the loader can reach it. 7 unit tests cover: 2x3 UInt8 round-trip, default-band-1 when fragment absent, request-validation rejections (non-2D, non-yx dim_names, dtype mismatch), and propagation of GDAL-layer errors (missing file, out-of-range band index). Tests use real GDAL via the existing `with_gdal` helper and the in-crate driver pattern from `gdal_dataset_provider.rs`. Updated context-level test: bootstrap registry contains `"gdal"` instead of being empty; plugin registrations are stacked on top.
Materialises OutDb Zarr-backed bands when the RS_EnsureLoaded UDF
dispatches against the `"zarr"` registry key. Pairs with the Zarr
reader's existing chunk-anchor URI scheme
(`<store_uri>#array=<array_path>&chunk=<i0>,<i1>,...`): the loader
parses the anchor, opens the Zarr store + array, retrieves the named
chunk via `zarrs`, and returns the bytes as an `arrow_buffer::Buffer`.
Shape:
- `ZarrLoader` is a stateless `AsyncByteLoader` impl. Per-call store
opens; per-(store_uri, array_path) caching is a follow-up
optimisation that doesn't change the trait surface.
- Inside `load()`, the sync `zarrs` API runs inside
`tokio::task::spawn_blocking` so the calling tokio worker stays
responsive. Resulting `Buffer` is Send, crosses the await cleanly.
- Validates the Zarr array's dtype against the request's
`data_type` claim before reading. Mismatches surface here with a
clear diagnostic rather than as a downstream byte-count surprise.
Delivery model: as an out-of-tree plugin, `sedona-raster-zarr` does
NOT depend on `sedona` — callers register the loader from their own
context-setup code:
ctx.register_outdb_loader(
sedona_raster_zarr::ZARR_FORMAT,
Arc::new(sedona_raster_zarr::ZarrLoader::new()),
);
This is the asymmetry from Commit 5 (GDAL): GDAL is upstream of
sedona and registered automatically; Zarr is a plugin and explicit.
The Python wheel `sedonadb-zarr` will surface a Python-level
`register(con)` shim that wraps this call in a follow-up.
Module bookkeeping: `ChunkAnchor`, `parse_chunk_anchor`, and the
`retrieve_chunk_bytes` chunk-read primitive in `loader.rs` were
previously `#[cfg(test)]`-only — they're production-needed now, so
the cfg gates are removed. `retrieve_chunk_bytes` is now `pub(crate)`
since `outdb_loader` is its only consumer outside tests.
5 unit tests cover: UInt8 chunk round-trip (write via `ArrayBuilder`,
read via the loader), dtype mismatch with clear diagnostic, malformed
chunk-anchor URI rejection (missing fragment), missing-array-path
propagation from `zarrs`, and cloud-scheme rejection (until cloud
stores are supported). Tests build a real Zarr v3 group via
`GroupBuilder` + `ArrayBuilder` against a temp filesystem.
4ab8a0d to
d7fce30
Compare
Three small adjustments from review:
- Drop the "Class-level by convention…" paragraph from
`SedonaScalarUDF::with_needs_bytes()`'s doc comment. The remaining
doc already names the purpose; the extra prose was prescriptive
without adding information.
- Replace `unreachable!("matched above")` in
`ensure_loaded_analyzer::rewrite_expr_node` with a
`sedona_internal_datafusion_err!`-bearing let-else. Same
intent-preserved-as-leading-comment, but the failure mode on a
future refactor that breaks the invariant is a clean query-level
internal error instead of a process-level panic — matches the
defensive style used everywhere else in this PR.
- Fix codespell: "Implementors" → "Implementers" in the
`AsyncByteLoader` trait doc.
paleolimbot
left a comment
There was a problem hiding this comment.
I didn't get through the analyzer bits today but thought I'd leave the loader part and I'll take a look at the analyzer tomorrow. In general this looks great!
| /// Raw source shape in `dim_names` order. The loader returns a | ||
| /// `Buffer` whose length equals `Π source_shape × data_type.byte_size()` | ||
| /// bytes, encoding pixels in C-order over `dim_names`. | ||
| pub source_shape: &'a [u64], |
There was a problem hiding this comment.
Shouldn't this see the view as well? In some cases the IO will be the same (but we can discard a lot of pixels right away); in other cases we can avoid IO because the retile will map to a small subset of a contiguous range. (Apologies if I misunderstood what source_shape is here)
There was a problem hiding this comment.
I'm not sure. I havent thought about designing the system to allow "pushing" views into loaders. This PR does not support that and I'm not sure when we would be able to take advantage of such a thing.
Generally a cog tile or zarr chunk its retrieved atomically so we can't benefit from "pushing" the view in these cases. so I dont think theres a way to reduce io with such support.
Im not sure when you want to eagerly slice/crop vs when you want to perform that lazily.
Do you think crops on indb rasters should be eager in general? if so we could probably solve this with operator fusion + a rule in the crop operators impl that always eager crops the indb raster
There was a problem hiding this comment.
Yes, I think that if a loader is asked to load a 3x3 chunk of a 1e6 by 1e6 raster, the loader should return 9 pixels. It would be up to the caller to choose whether retiling or cropping is appropriate (and up to the loader to decide whether or not to cache 1e6 by 1e6 pixels in anticipation of reading another slice of the same chunk).
It's OK not to implement crop pushdown but if the information isn't there it can never happen.
There was a problem hiding this comment.
I created #897 to track this.
I'm still not really sure this kind of feature belongs in the loader and I feel it adds a large amount of complexity to each loader implementation, even if it doesnt support view-pruning.
I think we can do this later if we decide to do it; adding fields and tweaking/shimming existing identity based implementations is a 2 way door.
Addresses two of Dewey's review comments on the GDAL loader: the non-cancellable spawn_blocking concern and the absence of a size sanity check. - New `MAX_OUTDB_LOAD_BYTES` (4 GiB) constant. Requests whose `Π source_shape × byte_size` exceeds this are rejected before the blocking task is spawned — runaway / corrupt-metadata loads can't tie up a blocking-pool thread. - `CancelOnDrop` guard flips an `Arc<AtomicBool>` when the outer async future is dropped (e.g. query-cancel). The blocking task observes the flag between block-height-aligned strip reads and returns a cancellation error rather than running to completion. - Read loop replaced: single `band.read_as_bytes` → pre-allocated output `Vec<u8>` + `band.read_into_bytes` per strip of `band.block_size().1` rows. Each strip writes into a contiguous output slice (no per-block row-splice math). Cancellation granule = natural block height: 1–64 rows for strip GeoTIFFs, 256 rows for tile GeoTIFFs (COG-style). For whole-image-block formats (PNG / JPEG) the byte cap is the primary safety net. - Helper `read_band_blockwise(band, output, w, h, byte_size, cancel)` extracted as the testable unit. Direct tests for pre-armed cancel and multi-strip byte-correctness; full-loader integration tests for byte-cap rejection. 10 unit tests, up from 7. All existing tests pass unchanged.
Addresses Dewey's review feedback that the registry belongs in the session's `ConfigOptions` extensions, mirroring how `CrsProvider` is hosted inside `SedonaOptions`. The motivation: any UDF can reach `ConfigOptions` at dispatch time via `AsyncScalarUDFImpl::invoke_async_with_args`, whereas the previous closure-bound `Arc` was reachable only by the one UDF that closed over it. ConfigOptions is the DataFusion-idiomatic placement. Shape: - New `OutDbLoaderConfig` `ConfigExtension` in `sedona-raster` that holds a `OutDbLoaderRegistryOption(Arc<RwLock<OutDbLoaderRegistry>>)` field. Lives in `sedona-raster` (not `sedona-common`) because the registry type is here and a cross-crate move would create a circular dep. Has its own `sedona.outdb_loader` PREFIX namespace alongside the existing `sedona` `SedonaOptions`. - `SedonaContext::new_from_context` builds the shared `Arc<RwLock<OutDbLoaderRegistry>>` once, stashes a clone inside ConfigOptions via `extensions.insert(OutDbLoaderConfig::from_handle(...))`, and keeps another clone in the existing `SedonaContext.outdb_registry` field. Both handles share the same `RwLock`, so writes through `register_outdb_loader(&self, ...)` are immediately visible to UDF reads. - `RsEnsureLoaded` no longer holds a registry field. It pulls the handle out of `args.config_options.extensions.get::<OutDbLoaderConfig>()` at the top of every `invoke_async_with_args` call. The struct is now session-agnostic — one instance is correct for any session that has the extension registered. Tests updated to call `RsEnsureLoaded::new()` with no args. - `ensure_loaded(...)` test helper signature simplified to `(input, lookup_closure)`; the redundant registry arg is gone. Mirrors the precedent established by `CrsProviderOption`/`SedonaOptions` in `sedona-common`. If a future caller bypasses `SedonaContext::new()` and builds their own session without the extension, RS_EnsureLoaded returns a clear internal error naming `OutDbLoaderConfig` so the fix is obvious. All 109 sedona-crate tests + 72 sedona-raster tests pass unchanged.
paleolimbot
left a comment
There was a problem hiding this comment.
A few notes, but the details here are all great!
| /// Raw source shape in `dim_names` order. The loader returns a | ||
| /// `Buffer` whose length equals `Π source_shape × data_type.byte_size()` | ||
| /// bytes, encoding pixels in C-order over `dim_names`. | ||
| pub source_shape: &'a [u64], |
There was a problem hiding this comment.
Yes, I think that if a loader is asked to load a 3x3 chunk of a 1e6 by 1e6 raster, the loader should return 9 pixels. It would be up to the caller to choose whether retiling or cropping is appropriate (and up to the loader to decide whether or not to cache 1e6 by 1e6 pixels in anticipation of reading another slice of the same chunk).
It's OK not to implement crop pushdown but if the information isn't there it can never happen.
…bug on AsyncByteLoader Addresses review comments on the OutDb lazy-loading PR: * RS_EnsureLoaded now implements `return_field_from_args` returning the input field verbatim, so the `"sedona.raster"` extension metadata survives. The previous `return_type` returned a bare `DataType::Struct`, which dropped the extension — downstream code (including the analyzer's own raster detection) would stop recognising the wrapped column as a Raster. `return_type` now hard-errors since it is no longer the authoritative output-type path. Covered by two regression tests. * `AsyncByteLoader` gains a `Debug` supertrait so loader-holding structs can derive `Debug`; test mocks updated. * Mark the RS_EnsureLoaded InDb passthrough copy with a pointer to the tracking issue for the RasterBuilder zero-copy work (GH apache#894).
…r rule Addresses review comments on the OutDb lazy-loading PR: * Convert the `EnsureLoadedAnalyzerRule` (analyzer rule in `sedona`) to `EnsureLoadedOptimizerRule` (logical optimizer rule in `sedona-query-planner`, alongside the spatial-join rules). Registered immediately before DataFusion's `common_sub_expression_eliminate` so CSE dedupes the injected `RS_EnsureLoaded(col)` wraps in the same pass. * Resolve `RS_EnsureLoaded` from `OptimizerConfig::function_registry()` at rewrite time instead of capturing an `Arc<ScalarUDF>` at construction. `SedonaContext` no longer threads the UDF into a rule — it registers the UDF (before any query plans) and pushes the rule. * Move `RS_ENSURE_LOADED_NAME` to `sedona-expr` so both the UDF impl (`sedona`) and the rule (`sedona-query-planner`) can reference it without a dependency cycle. * Idempotency: optimizer rules run to a fixpoint, and since the output field now preserves the raster extension type, a naive re-pass would double-wrap. Guard against re-wrapping an arg already wrapped in `RS_EnsureLoaded`. * Handle binary/multi-input nodes: resolve expression types against the merged schema of all inputs (via `DFSchema::join`) so a join filter like `RS_Intersects(a.rast, b.rast)` gets its raster args wrapped. Unresolvable/ambiguous merges skip the node rather than failing. Tests cover wrap/no-wrap/metadata-only, recursion + idempotency guards, join-merged-schema resolution, and the before-CSE registration order.
…eralize UDF metadata Addresses review comments on the OutDb lazy-loading PR: * Move the `RsEnsureLoaded` async UDF from `sedona` to `sedona-raster-functions`, alongside the other `RS_*` functions. It only reaches down into `sedona-raster`/`sedona-expr`/`sedona-schema`, so no dependency cycle; `sedona` already depends on `sedona-raster-functions` and imports it from there. * Replace `SedonaScalarUDF`'s `needs_bytes: bool` with a generic `metadata: HashMap<String, String>` (`with_metadata`/`metadata()`), with `with_needs_bytes`/`needs_bytes` kept as convenience over a well-known `NEEDS_PIXELS_METADATA_KEY`. A string map future-proofs the planner-visible UDF metadata surface without a new field per flag. * Carry that metadata across the sedona-extension FFI: a new `metadata` callback on `SedonaCScalarKernel` (mirroring `function_name`) returns the map as a JSON object string; `ExportedScalarKernel::with_metadata` serialises it and `ImportedScalarKernel::metadata()` parses it back, so a plugin-defined function can declare planner flags across the cdylib boundary. Header updated to match. Round-trip test added.
Breadcrumbs at the two places the non-identity-view gap lives: - `OutDbLoadRequest` doc — the request carries no view and `load` returns a bare `Buffer`; the view-in/realized-out contract is apache#897. - `RasterRefImpl::band()`'s non-identity-view rejection — note that it is the guardrail keeping `RS_EnsureLoaded` (which drops `view()` on rebuild) correct, and that lifting it requires the apache#897 round-trip.
Now that AsyncByteLoader carries a Debug supertrait, Arc<dyn AsyncByteLoader> is Debug and the registry can derive it — replacing the hand-written impl that existed only to work around the loaders not being Debug (and whose "loaders don't carry a meaningful Debug surface" comment is now stale). Completes Dewey's review comment: add the supertrait AND derive here.
…e declaration references The logical optimizer rule's source file was left untracked when its `pub mod ensure_loaded;` declaration and call sites landed, so the pushed branch had a dangling module — local builds passed (file present but untracked) while CI rustfmt failed to resolve the module.
… field Adding a `metadata` callback to `SedonaCScalarKernel` broke s2geography init: that struct's definition is duplicated in a vendored copy inside the `sedona-s2geography` submodule (s2geography/src/s2geography/sedona_udf/sedona_extension.h), and `S2GeogInitKernels` lays out the struct from that copy. Inserting the field changed the struct's size and field offsets on the Rust side only, so the C++ and Rust disagreed and init failed with code 22 (every test that builds a SedonaContext panicked). Revert just the FFI plumbing — the `metadata` struct field + header callback, `ExportedScalarKernel::with_metadata`, `ImportedScalarKernel` metadata parsing, the round-trip test, and the serde_json dep. The in-process `SedonaScalarUDF` metadata map (the substantive bool→map generalization) is unaffected and stays. Carrying UDF metadata across the kernel FFI requires updating the submodule's vendored header and bumping the submodule in lockstep; deferred to a follow-up that does it properly (and appends, never inserts mid-struct).
paleolimbot
left a comment
There was a problem hiding this comment.
All just details from me!
The one hiccup is that the AsyncUDFs might be dropping metadata (see comment), in which case you'll have to make a dummy scalar function struct InsertMetadataUdf { field: FieldRef, inner: ScalarUDF } that wraps RS_EnsureLoaded with an overridden return_field. 😮💨
| /// Mark this UDF as one whose kernels read raster pixel bytes from | ||
| /// their inputs — convenience for setting [`NEEDS_PIXELS_METADATA_KEY`] | ||
| /// to `"true"`. The `RS_EnsureLoaded` optimizer rule reads this to | ||
| /// decide whether to wrap raster arguments with the async | ||
| /// byte-materialisation UDF. | ||
| pub fn with_needs_bytes(self) -> SedonaScalarUDF { | ||
| self.with_metadata(NEEDS_PIXELS_METADATA_KEY, "true") | ||
| } | ||
|
|
||
| /// Returns whether this UDF reads raster pixel bytes from its inputs | ||
| /// (i.e. carries [`NEEDS_PIXELS_METADATA_KEY`] = `"true"`). | ||
| pub fn needs_bytes(&self) -> bool { | ||
| self.metadata | ||
| .get(NEEDS_PIXELS_METADATA_KEY) | ||
| .map(String::as_str) | ||
| == Some("true") |
There was a problem hiding this comment.
I don't love having raster function (or any geo-specific) stuff live here. with_needs_bytes is sufficiently compact that you can probably just inline it to the functions that need it. The named constant can probably live in sedona-raster-functions and you can duplicate it if that creates weird cross-crate interactions?
| /// Register an async OutDb byte loader under a `format` key. | ||
| /// | ||
| /// `format` matches the band-level `outdb_format` column value | ||
| /// (`"gdal"`, `"zarr"`, …). At query time the `RS_EnsureLoaded` UDF |
There was a problem hiding this comment.
Should this format be more like "tif", "zarr", etc.? As in, should the byte loader get registered by file format or backend? (I had assumed we'd register by file format). You could also have the trait have a pub fn supported_formats() -> Vec<&str> and just register_outdb_loader(loader).
There was a problem hiding this comment.
I don't have a strong opinion here. part of the reason I used "gdal" is because I don't really know what formats:
- What formats GDAL supports
- What formats GDAL will support
- what formats we [will] support with GDAL
calling it GDAL leaves it such that users can just try to load random things with GDAL and it might even work.
There was a problem hiding this comment.
There's a list of drivers, but I do think you want to separate the production of raster metadata from the resolving of metadata bytes (stac is the best reason for this but there could be other metadata catalogs that make sense). You could use a Vec<AsyncByteLoader> instead of hash map and call byte_loader.supports(format), where the gdal loader could return true based on some runtime call.
| /// and out-of-tree plugins (`sedona-raster-zarr::register(&ctx)` | ||
| /// from user code after construction). Later registrations under the | ||
| /// same `format` key overwrite earlier ones. | ||
| pub fn register_outdb_loader( |
There was a problem hiding this comment.
nit: Should this be an array_loader or raster_loader? OutDb is a bit of a historical name.
| // GDAL is always registered at bootstrap (compiled-in backend). | ||
| // Plugin backends like Zarr add themselves later via | ||
| // `register_outdb_loader`. |
There was a problem hiding this comment.
nit: I know you've been using the word "plugin" but "extension" or "registered at runtime" is closer to existing language (e.g., sedona-extension).
| fn return_field_from_args(&self, args: ReturnFieldArgs) -> Result<FieldRef> { | ||
| // Identity on schema: the output raster has the same fields as the | ||
| // input — only the `data` column's bytes change. Return the input | ||
| // field verbatim so its `"sedona.raster"` extension metadata |
There was a problem hiding this comment.
In #831 Kristin found that Async UDFs drop metadata (at the physical expression). The optimizer rule that emplaces this may have to add it back via a dummy udf (or, simpler, we don't strictly need SedonaType::Raster to emit an extension name...a normal struct is fine).
| #[async_trait] | ||
| impl AsyncScalarUDFImpl for RsEnsureLoaded { |
There was a problem hiding this comment.
Should we also suggest a batch size here? (I think that's something you can implement on the async udf)
| /// Re-exported from `sedona-expr` (its canonical home — see | ||
| /// [`sedona_expr::scalar_udf::RS_ENSURE_LOADED_NAME`]) so existing | ||
| /// `crate::rs_ensure_loaded::RS_ENSURE_LOADED_NAME` references keep | ||
| /// resolving. | ||
| pub use sedona_expr::scalar_udf::RS_ENSURE_LOADED_NAME; |
There was a problem hiding this comment.
I think you can just use "rs_ensureloaded" instead of a named constant
| // Expressions on a node are evaluated against the combined schema | ||
| // of its inputs: single-input nodes (Projection, Filter, …) use | ||
| // their one input; a Join's `filter` references left ⋈ right, so | ||
| // a predicate like `RS_Intersects(a.rast, b.rast)` resolves only | ||
| // against the merged schema. Leaf nodes carry no wrappable | ||
| // expressions. | ||
| let inputs = plan.inputs(); | ||
| if inputs.is_empty() { | ||
| return Ok(Transformed::no(plan)); | ||
| } | ||
| let Some(schema) = merged_input_schema(&inputs) else { | ||
| // Schemas couldn't be merged (e.g. ambiguous duplicate | ||
| // qualifiers in a self-join). Skip this node rather than | ||
| // failing the query — a missed wrap surfaces later as a clear | ||
| // "raster bytes not loaded" error, not a wrong result. | ||
| return Ok(Transformed::no(plan)); | ||
| }; | ||
| drop(inputs); | ||
|
|
||
| plan.map_expressions(|e| { | ||
| e.transform_up(|expr| rewrite_expr_node(expr, &schema, &ensure_loaded_udf)) | ||
| }) |
There was a problem hiding this comment.
Does this work?
| // Expressions on a node are evaluated against the combined schema | |
| // of its inputs: single-input nodes (Projection, Filter, …) use | |
| // their one input; a Join's `filter` references left ⋈ right, so | |
| // a predicate like `RS_Intersects(a.rast, b.rast)` resolves only | |
| // against the merged schema. Leaf nodes carry no wrappable | |
| // expressions. | |
| let inputs = plan.inputs(); | |
| if inputs.is_empty() { | |
| return Ok(Transformed::no(plan)); | |
| } | |
| let Some(schema) = merged_input_schema(&inputs) else { | |
| // Schemas couldn't be merged (e.g. ambiguous duplicate | |
| // qualifiers in a self-join). Skip this node rather than | |
| // failing the query — a missed wrap surfaces later as a clear | |
| // "raster bytes not loaded" error, not a wrong result. | |
| return Ok(Transformed::no(plan)); | |
| }; | |
| drop(inputs); | |
| plan.map_expressions(|e| { | |
| e.transform_up(|expr| rewrite_expr_node(expr, &schema, &ensure_loaded_udf)) | |
| }) | |
| plan.map_expressions(|e| { | |
| e.transform_up(|expr| rewrite_expr_node(expr, plan.schema(), &ensure_loaded_udf)) | |
| }) |
| // so a future refactor that breaks the invariant fails the query | ||
| // cleanly instead of crashing a worker. | ||
| let Expr::ScalarFunction(ScalarFunction { func, args }) = expr else { | ||
| return Err(sedona_internal_datafusion_err!( |
There was a problem hiding this comment.
| return Err(sedona_internal_datafusion_err!( | |
| return sedona_internal_err!( |
# Conflicts: # Cargo.lock # rust/sedona-raster-gdal/Cargo.toml # rust/sedona-raster-gdal/src/lib.rs
Summary
This PR adds support for OutDB raster loading via the
RS_EnsureLoadedUDF. When a function that needs the bytes of a raster available is called, an optimizer rule injects RS_EnsureLoaded into the expression tree. This _should also leverage Common Subexpression Elimination from Datafusion to avoid repeated loads.This PR supercedes #849. This is the new approach based on the review comments I received from Dewey on that PR.
Copy reduction
Returning
arrow_buffer::Buffer(notVec<u8>orBytes) and constructing the outputBinaryViewArraydirectly from collectedBuffers — bypassingBinaryViewBuilder's internal block copy — gives one copy total (storage → loader-internalBuffer) on the byte path, down from three with a naiveVec<u8>interface.Zarr Registration
I punted on this but we will need it in the near future. The standard datafusion FFI doesnt help us here because these OutDBLoaders are not part of the datafusion interface. the Agent presented two solutions:
A. Build a C-ABI wrapper for AsyncByteLoader following the datafusion-ffi pattern (#[repr(C)] struct, extern "C" function pointers, abi_stable types). ~200-400 LoC of FFI scaffolding + safety review. Future-proof but heavyweight, and we'd own the ABI compatibility story going forward.
B. PyCapsule with raw Arc transit. Non-standard but works when both wheels link the same sedona-raster version. PyArrow has analogous patterns (PyCapsule carrying ArrowSchema* etc., where the C struct happens to be standardized — ours wouldn't be). Document the version-coupling constraint.
Merge coordination (with #888)
#888 migrates
sedona-raster-zarrto all-async and establishes theopen_storage_from_uri(uri, store)store-resolution contract. Whichever of #886/#888 lands second must reconcile:Arc<dyn ObjectStore>(from the sessionObjectStoreRegistry) intoOutDbLoadRequestso the Zarr loader reads through the host's credentialed store instead of re-deriving creds from env (Dewey review comment onOutDbLoadRequest).AsyncByteLoaderonto the all-async storage path and drop the syncFilesystemStore/spawn_blockingshim in the zarr loader.