Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
bfd435d
feat: use fast division on Nvidia GPUs in newton_div context
Mikolaj-A-Kowalski Jan 9, 2026
a2ee0a5
fix: do not specify the type of numerator
Mikolaj-A-Kowalski Jan 15, 2026
0930500
refactor: select type of newton_div in a WENO scheme by type parameter
Mikolaj-A-Kowalski Jan 30, 2026
29bfd5d
fix: newton_div type propagation into buffer schemes
Mikolaj-A-Kowalski Feb 2, 2026
d5fe7b8
test: update doctests
Mikolaj-A-Kowalski Jan 30, 2026
2ab9cf5
feat: add CUDA fast division for f32
Mikolaj-A-Kowalski Jan 30, 2026
bbc4642
refactor: remove lower-precision WENO type parameter
Mikolaj-A-Kowalski Feb 5, 2026
ecf4e8e
refactor: use `weight_computation` to refer to division type in WENO
Mikolaj-A-Kowalski Feb 27, 2026
3cd0d41
refactor: make `newton_div` type names less verbose
Mikolaj-A-Kowalski Feb 27, 2026
e351c16
test: add unit tests for newton_div
Mikolaj-A-Kowalski Jan 15, 2026
6a65274
refactor: BackendOptimizedDivision in ConvertingDivision on CPU
Mikolaj-A-Kowalski Mar 4, 2026
c0c0ed2
refactor: use BackendOptimizedDivision by default
Mikolaj-A-Kowalski Mar 4, 2026
506fadf
refactor: use normal division on the CPU
Mikolaj-A-Kowalski Mar 4, 2026
e025c51
refactor: do not use CUDA intrinsics under Reactant
Mikolaj-A-Kowalski Mar 4, 2026
d6588f8
feat: add materialize_advection to defer configuration options
Mikolaj-A-Kowalski Mar 19, 2026
c4adca2
feat: use advection materialisation in models
Mikolaj-A-Kowalski Mar 19, 2026
3c8d143
refactor: get rid of the global weight_computation setting
Mikolaj-A-Kowalski Mar 20, 2026
f3a7456
fix: failing reactant tests
Mikolaj-A-Kowalski Mar 20, 2026
9fe1731
fix: add missing materialize_advection overloads
Mikolaj-A-Kowalski Mar 26, 2026
3ab7ec9
test: fix tests broken by changes to the API
Mikolaj-A-Kowalski Mar 26, 2026
4e76efa
fix: add missing overload for Distributed grid
Mikolaj-A-Kowalski Mar 26, 2026
db30a4c
test: add missing materialization step to test_forcing
Mikolaj-A-Kowalski Mar 27, 2026
fed538a
fix: extra end-of-file newlines
Mikolaj-A-Kowalski Mar 27, 2026
329632d
Merge branch 'main' into mak/fast-cuda-div
simone-silvestri Mar 27, 2026
d535820
Merge branch 'main' into mak/fast-cuda-div
simone-silvestri Mar 28, 2026
5368248
Merge branch 'main' into mak/fast-cuda-div
glwagner Mar 31, 2026
4f295ea
test: fix newton_div test
Mikolaj-A-Kowalski Apr 1, 2026
3c90552
Merge branch 'main' into mak/fast-cuda-div
simone-silvestri Apr 13, 2026
c656070
update minor version (numerical difference)
simone-silvestri Apr 15, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name = "Oceananigans"
uuid = "9e8cae18-63c1-5223-a75c-80ca9d6e9a09"
version = "0.106.5"
version = "0.106.6"
authors = ["Climate Modeling Alliance and contributors"]

[deps]
Expand Down
27 changes: 27 additions & 0 deletions ext/OceananigansCUDAExt.jl
Original file line number Diff line number Diff line change
Expand Up @@ -132,4 +132,31 @@ end
@inline UT.sync_device!(::CUDAGPU) = CUDA.synchronize()
@inline UT.sync_device!(::CUDABackend) = CUDA.synchronize()

# Use faster versions of `newton_div` on Nvidia GPUs
CUDA.@device_override UT.newton_div(::Type{UT.BackendOptimizedDivision}, a, b) = a * fast_inv_cuda(b)

function fast_inv_cuda(a::Float64)
# Get the approximate reciprocal
# https://docs.nvidia.com/cuda/parallel-thread-execution/#floating-point-instructions-rcp-approx-ftz-f64
# This instruction chops off last 32bits of mantissa and computes inverse
# while treating all subnormal numbers as 0.0
# If reciprocal would be subnormal, underflows to 0.0
# 32 least significant bits of the result are filled with 0s
inv_a = ccall("llvm.nvvm.rcp.approx.ftz.d", llvmcall, Float64, (Float64,), a)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wsmoses @avik-pal Reactant tests on CPU are understandably failing with https://buildkite.com/clima/oceananigans/builds/28492#019bc195-ae24-48aa-82c6-1e534be018b3/L709

LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.rcp.approx.ftz.d

Is there a way to tell Reactant to ignore this?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so reactant currently faithfully fully represents the cuda kernel. There is no mechanism for determining if you're in reactant or not within a kernel -- we get the code out of cuda.jl. We could add a mechanism for doing so here: https://github.com/EnzymeAD/Reactant.jl/blob/730804beb548017e819b3589b9f6f793a4f19792/ext/ReactantCUDAExt.jl#L1332. We'd need to make a new gpucompiler job/config/state that looks identical to the cuda one, except with a second method overlay table that itself overlays the cuda overlay table

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, alternatively, in the Reactant extension can we use Reactant.@reactant_overlay for these methods?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but you have to overlay the whole kernel. reactant_overlay does not apply within a cuda kernel [my comment above was describing what would be required for something within kerenl to be overlayed]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be sufficient to define overlaid methods for UT.newton_div (what's done here for CUDA) for Reactant? Wouldn't that override the overlay here?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cuda kernels are still compiled with normal cuda.jl. the reactant overlay method table only applies for generic code outside of kernels.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok ok, got it.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't Reactant/polygeist/whatever raises this to MLIR just handle that intrinsic as fast 1/x and then backends can emit fast reciprocals if they want to. (I blame LLVM for having 50 rcp intrinsics that are all different)


# Approximate the missing 32bits of mantissa with a single cubic iteration
e = fma(inv_a, -a, 1.0)
e = fma(e, e, e)
inv_a = fma(e, inv_a, inv_a)
return inv_a
end

function fast_inv_cuda(a::Float32)
# This instruction just computes reciprocal flushing subnormals to 0.0
# Hence for subnormal inputs it returns Inf
# For large number whose reciprocal is subnormal it underflows to 0.0
inv_a = ccall("llvm.nvvm.rcp.approx.ftz.f", llvmcall, Float32, (Float32,), a)
return inv_a
end

end # module OceananigansCUDAExt
8 changes: 8 additions & 0 deletions ext/OceananigansReactantExt/Models.jl
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ using ..Grids: ShardedGrid, ShardedDistributed
import Oceananigans.Models:
complete_communication_and_compute_buffer!,
interior_tendency_kernel_parameters
import Oceananigans.Advection: default_weno_weight_computation


const ReactantHFSM{TS, E} = Union{
HydrostaticFreeSurfaceModel{TS, E, <:ReactantState},
Expand Down Expand Up @@ -60,4 +62,10 @@ maybe_prepare_first_time_step!(model::ReactantHFSM, callbacks) = nothing
complete_communication_and_compute_buffer!(model, ::ShardedGrid, ::ShardedDistributed) = nothing
interior_tendency_kernel_parameters(::ShardedDistributed, grid) = :xyz

# Reactant uses CUDA version of the code to uplift program description to MLIR.
# Since default `weno_weight_computation` on CUDA uses LLVM's NVPTX intrinsics
# it causes Reactant to crash.
# We need to fall back to different optimization when running with Reactant
default_weno_weight_computation(::ReactantState) = Oceananigans.Utils.ConvertingDivision{Float32}

end # module
1 change: 1 addition & 0 deletions ext/OceananigansReactantExt/OceananigansReactantExt.jl
Original file line number Diff line number Diff line change
Expand Up @@ -335,6 +335,7 @@ end

Base.getindex(array::OffsetVector{T, <:Reactant.AbstractConcreteArray{T, 1}}, ::Colon) where T = array


# These are additional modules that may need to be Reactantified in the future:
#
# include("Utils.jl")
Expand Down
1 change: 1 addition & 0 deletions src/Advection/Advection.jl
Original file line number Diff line number Diff line change
Expand Up @@ -84,5 +84,6 @@ include("tracer_advection_operators.jl")
include("bounds_preserving_tracer_advection_operators.jl")
include("cell_advection_timescale.jl")
include("adapt_advection_order.jl")
include("materialize_advection.jl")

end # module
64 changes: 64 additions & 0 deletions src/Advection/materialize_advection.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
import Oceananigans: architecture

"""
materialize_advection(advection, grid)

Return a fully materialized advection scheme appropriate for `grid`.
It exists to allow advection schemes to defer specialising settings until
additional information about the backend from grid is available.

For example it allows to set per-backend defaults for WENO weight computation
setting.
"""
materialize_advection(advection, grid) = advection
materialize_advection(::Nothing, grid) = nothing
materialize_advection(advection::FluxFormAdvection, grid) = FluxFormAdvection(
materialize_advection(advection.x, grid),
materialize_advection(advection.y, grid),
materialize_advection(advection.z, grid),
)

# Upwinding treatments hold a cross_scheme that may contain deferred WENO weight computation
materialize_advection(u::OnlySelfUpwinding, grid) =
OnlySelfUpwinding(materialize_advection(u.cross_scheme, grid),
u.δU_stencil, u.δV_stencil, u.δu²_stencil, u.δv²_stencil)

materialize_advection(u::CrossAndSelfUpwinding, grid) =
CrossAndSelfUpwinding(materialize_advection(u.cross_scheme, grid),
u.divergence_stencil, u.δu²_stencil, u.δv²_stencil)

materialize_advection(u::VelocityUpwinding, grid) =
VelocityUpwinding(materialize_advection(u.cross_scheme, grid))

# VectorInvariant wraps multiple sub-schemes; recurse into each
materialize_advection(vi::VectorInvariant{N,FT,M}, grid) where {N,FT,M} =
VectorInvariant{N,FT,M}(
materialize_advection(vi.vorticity_scheme, grid),
vi.vorticity_stencil,
materialize_advection(vi.vertical_advection_scheme, grid),
materialize_advection(vi.kinetic_energy_gradient_scheme, grid),
materialize_advection(vi.divergence_scheme, grid),
materialize_advection(vi.upwinding, grid),
)


materialize_advection(weno::WENO{N,FT,WCT}, grid) where {N,FT,WCT} = WENO{N,FT,WCT}(
weno.bounds,
materialize_advection(weno.buffer_scheme, grid),
materialize_advection(weno.advecting_velocity_scheme, grid),
)

materialize_advection(weno::WENO{N,FT,Nothing}, grid) where {N,FT} =
WENO{N,FT,default_weno_weight_computation(architecture(grid))}(
weno.bounds,
materialize_advection(weno.buffer_scheme, grid),
materialize_advection(weno.advecting_velocity_scheme, grid),
)

materialize_advection(scheme::UpwindBiased{N,FT}, grid) where {N,FT} = UpwindBiased{N,FT}(
materialize_advection(scheme.buffer_scheme, grid),
materialize_advection(scheme.advecting_velocity_scheme, grid),
)

materialize_advection(scheme::Centered{N,FT}, grid) where {N,FT} =
Centered{N,FT}(materialize_advection(scheme.buffer_scheme, grid))
13 changes: 6 additions & 7 deletions src/Advection/vector_invariant_advection.jl
Original file line number Diff line number Diff line change
Expand Up @@ -137,8 +137,7 @@ function Base.summary(a::WENOVectorInvariant{N}) where N
vertical_order = weno_order(a.vertical_advection_scheme)
order = weno_order(a.vorticity_scheme)
FT = eltype(a.vorticity_scheme)
FT2 = eltype2(a.vorticity_scheme)
return string("WENOVectorInvariant{$N, $FT, $FT2}(vorticity_order=$vorticity_order, vertical_order=$vertical_order)")
return string("WENOVectorInvariant{$N, $FT}(vorticity_order=$vorticity_order, vertical_order=$vertical_order)")
end

function Base.show(io::IO, a::VectorInvariant{N, FT}) where {N, FT}
Expand Down Expand Up @@ -193,12 +192,12 @@ Example
julia> using Oceananigans

julia> WENOVectorInvariant()
WENOVectorInvariant{5, Float64, Float32}(vorticity_order=9, vertical_order=5)
├── vorticity_scheme: WENO{5, Float64, Float32}(order=9)
WENOVectorInvariant{5, Float64}(vorticity_order=9, vertical_order=5)
├── vorticity_scheme: WENO{5, Float64, Nothing}(order=9)
├── vorticity_stencil: Oceananigans.Advection.VelocityStencil
├── vertical_advection_scheme: WENO{3, Float64, Float32}(order=5)
├── kinetic_energy_gradient_scheme: WENO{3, Float64, Float32}(order=5)
├── divergence_scheme: WENO{3, Float64, Float32}(order=5)
├── vertical_advection_scheme: WENO{3, Float64, Nothing}(order=5)
├── kinetic_energy_gradient_scheme: WENO{3, Float64, Nothing}(order=5)
├── divergence_scheme: WENO{3, Float64, Nothing}(order=5)
└── upwinding: OnlySelfUpwinding
```
"""
Expand Down
4 changes: 2 additions & 2 deletions src/Advection/weno_interpolants.jl
Original file line number Diff line number Diff line change
Expand Up @@ -291,7 +291,7 @@ end
@inline function metaprogrammed_zweno_alpha_loop(buffer)
elem = Vector(undef, buffer)
for stencil = 1:buffer
elem[stencil] = :(C★(scheme, Val($(stencil-1))) * (1 + (newton_div(FT2, τ, β[$stencil] + ϵ))^2))
elem[stencil] = :(C★(scheme, Val($(stencil-1))) * (1 + (newton_div(WCT, τ, β[$stencil] + ϵ))^2))
end

return :($(elem...),)
Expand All @@ -301,7 +301,7 @@ for buffer in advection_buffers[2:end]
@eval begin
@inline beta_sum(scheme::WENO{$buffer, FT}, β₁, β₂) where FT = @inbounds $(metaprogrammed_beta_sum(buffer))
@inline beta_loop(scheme::WENO{$buffer, FT}, ψ) where FT = @inbounds $(metaprogrammed_beta_loop(buffer))
@inline zweno_alpha_loop(scheme::WENO{$buffer, FT, FT2}, β, τ) where {FT, FT2} = @inbounds $(metaprogrammed_zweno_alpha_loop(buffer))
@inline zweno_alpha_loop(scheme::WENO{$buffer, FT, WCT}, β, τ) where {FT, WCT} = @inbounds $(metaprogrammed_zweno_alpha_loop(buffer))
end
end

Expand Down
74 changes: 44 additions & 30 deletions src/Advection/weno_reconstruction.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,20 @@ import Oceananigans
##### Weighted Essentially Non-Oscillatory (WENO) advection scheme
#####

struct WENO{N, FT, FT2, PP, CA, SI} <: AbstractUpwindBiasedAdvectionScheme{N, FT}
struct WENO{N, FT, WCT, PP, CA, SI} <: AbstractUpwindBiasedAdvectionScheme{N, FT}
bounds :: PP
buffer_scheme :: CA
advecting_velocity_scheme :: SI

function WENO{N, FT, FT2}(bounds::PP, buffer_scheme::CA,
advecting_velocity_scheme :: SI) where {N, FT, FT2, PP, CA, SI}
function WENO{N, FT, WCT}(bounds::PP, buffer_scheme::CA,
advecting_velocity_scheme :: SI) where {N, FT, WCT, PP, CA, SI}

return new{N, FT, FT2, PP, CA, SI}(bounds, buffer_scheme, advecting_velocity_scheme)
return new{N, FT, WCT, PP, CA, SI}(bounds, buffer_scheme, advecting_velocity_scheme)
end
end

"""
WENO([FT=Float64, FT2=Float32;]
WENO([FT=Float64;]
order = 5,
bounds = nothing,
minimum_buffer_upwind_order = 3)
Expand All @@ -28,11 +28,12 @@ Arguments
=========

- `FT`: The floating point type used in the scheme. Default: `Oceananigans.defaults.FloatType`
- `FT2`: The floating point type used in some performance-critical parts of the scheme. Default: `Float32`

Keyword arguments
=================

- `weight_computation`: The type of approximate division to used when computing WENO weights.
Default: `Nothing` (deferred; a architecture-dependent default is assigned in
`materialize_advection`)
- `order`: The order of the WENO advection scheme. Default: 5
- `bounds` (experimental): Whether to use bounds-preserving WENO, which produces a reconstruction
that attempts to restrict a quantity to lie between a `bounds` tuple.
Expand All @@ -51,8 +52,8 @@ To build the default 5th-order scheme:
julia> using Oceananigans

julia> WENO()
WENO{3, Float64, Float32}(order=5)
├── buffer_scheme: WENO{2, Float64, Float32}(order=3)
WENO{3, Float64, Nothing}(order=5)
├── buffer_scheme: WENO{2, Float64, Nothing}(order=3)
│ └── buffer_scheme: Centered(order=2)
└── advecting_velocity_scheme: Centered(order=4)
```
Expand All @@ -62,10 +63,10 @@ yet minimally-dissipative advection scheme):

```jldoctest weno
julia> WENO(order=9)
WENO{5, Float64, Float32}(order=9)
├── buffer_scheme: WENO{4, Float64, Float32}(order=7)
│ └── buffer_scheme: WENO{3, Float64, Float32}(order=5)
│ └── buffer_scheme: WENO{2, Float64, Float32}(order=3)
WENO{5, Float64, Nothing}(order=9)
├── buffer_scheme: WENO{4, Float64, Nothing}(order=7)
│ └── buffer_scheme: WENO{3, Float64, Nothing}(order=5)
│ └── buffer_scheme: WENO{2, Float64, Nothing}(order=3)
│ └── buffer_scheme: Centered(order=2)
└── advecting_velocity_scheme: Centered(order=8)
```
Expand All @@ -75,24 +76,34 @@ which uses `Centered(order=2)` as the innermost buffer scheme:

```jldoctest weno
julia> WENO(order=9, minimum_buffer_upwind_order=5)
WENO{5, Float64, Float32}(order=9)
├── buffer_scheme: WENO{4, Float64, Float32}(order=7)
│ └── buffer_scheme: WENO{3, Float64, Float32}(order=5)
WENO{5, Float64, Nothing}(order=9)
├── buffer_scheme: WENO{4, Float64, Nothing}(order=7)
│ └── buffer_scheme: WENO{3, Float64, Nothing}(order=5)
│ └── buffer_scheme: Centered(order=2)
└── advecting_velocity_scheme: Centered(order=8)
```

```jldoctest weno
julia> WENO(order=9, bounds=(0, 1))
WENO{5, Float64, Float32}(order=9, bounds=(0.0, 1.0))
├── buffer_scheme: WENO{4, Float64, Float32}(order=7, bounds=(0.0, 1.0))
│ └── buffer_scheme: WENO{3, Float64, Float32}(order=5, bounds=(0.0, 1.0))
│ └── buffer_scheme: WENO{2, Float64, Float32}(order=3, bounds=(0.0, 1.0))
WENO{5, Float64, Nothing}(order=9, bounds=(0.0, 1.0))
├── buffer_scheme: WENO{4, Float64, Nothing}(order=7, bounds=(0.0, 1.0))
│ └── buffer_scheme: WENO{3, Float64, Nothing}(order=5, bounds=(0.0, 1.0))
│ └── buffer_scheme: WENO{2, Float64, Nothing}(order=3, bounds=(0.0, 1.0))
│ └── buffer_scheme: Centered(order=2)
└── advecting_velocity_scheme: Centered(order=8)
```

To build a WENO scheme that uses approximate division on a GPU to execute faster:
```jldoctest weno
julia> WENO(;weight_computation=Oceananigans.Utils.BackendOptimizedDivision)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
julia> WENO(;weight_computation=Oceananigans.Utils.BackendOptimizedDivision)
julia> WENO(; weight_computation=Oceananigans.Utils.BackendOptimizedDivision)

WENO{3, Float64, Oceananigans.Utils.BackendOptimizedDivision}(order=5)
├── buffer_scheme: WENO{2, Float64, Oceananigans.Utils.BackendOptimizedDivision}(order=3)
│ └── buffer_scheme: Centered(order=2)
└── advecting_velocity_scheme: Centered(order=4)
```
"""
function WENO(FT::DataType=Oceananigans.defaults.FloatType, FT2::DataType=Float32;
function WENO(FT::DataType=Oceananigans.defaults.FloatType;
weight_computation::DataType=Nothing,
order = 5,
buffer_scheme = DecreasingOrderAdvectionScheme(),
bounds = nothing,
Expand All @@ -116,20 +127,19 @@ function WENO(FT::DataType=Oceananigans.defaults.FloatType, FT2::DataType=Float3
# At minimum order, switch to Centered scheme
buffer_scheme = Centered(FT; order=2)
else
buffer_scheme = WENO(FT, FT2; order=order-2, bounds, minimum_buffer_upwind_order)
buffer_scheme = WENO(FT; order=order-2, bounds, minimum_buffer_upwind_order, weight_computation)
end
end

N = Int((order + 1) ÷ 2)
return WENO{N, FT, FT2}(bounds, buffer_scheme, advecting_velocity_scheme)
return WENO{N, FT, weight_computation}(bounds, buffer_scheme, advecting_velocity_scheme)
end
end

weno_order(::WENO{N}) where N = 2N-1
Base.eltype(::WENO{N, FT}) where {N, FT} = FT
eltype2(::WENO{N, FT, FT2}) where {N, FT, FT2} = FT2
Base.summary(a::WENO{N, FT, FT2, Nothing}) where {N, FT, FT2} = string("WENO{$N, $FT, $FT2}(order=", 2N-1, ")")
Base.summary(a::WENO{N, FT, FT2, PP}) where {N, FT, FT2, PP} = string("WENO{$N, $FT, $FT2}(order=", 2N-1, ", bounds=", string(a.bounds), ")")
Base.summary(a::WENO{N, FT, WCT, Nothing}) where {N, FT, WCT} = string("WENO{$N, $FT, $WCT}(order=", 2N-1, ")")
Base.summary(a::WENO{N, FT, WCT, PP}) where {N, FT, WCT, PP} = string("WENO{$N, $FT, $WCT}(order=", 2N-1, ", bounds=", string(a.bounds), ")")

function Base.show(io::IO, a::WENO)
print(io, summary(a), '\n')
Expand All @@ -145,12 +155,16 @@ function Base.show(io::IO, a::WENO)
print(io, "└── advecting_velocity_scheme: ", summary(a.advecting_velocity_scheme))
end

Adapt.adapt_structure(to, scheme::WENO{N, FT, FT2}) where {N, FT, FT2} =
WENO{N, FT, FT2}(Adapt.adapt(to, scheme.bounds),
Adapt.adapt_structure(to, scheme::WENO{N, FT, WCT}) where {N, FT, WCT} =
WENO{N, FT, WCT}(Adapt.adapt(to, scheme.bounds),
Adapt.adapt(to, scheme.buffer_scheme),
Adapt.adapt(to, scheme.advecting_velocity_scheme))

on_architecture(to, scheme::WENO{N, FT, FT2}) where {N, FT, FT2} =
WENO{N, FT, FT2}(on_architecture(to, scheme.bounds),
on_architecture(to, scheme::WENO{N, FT, WCT}) where {N, FT, WCT} =
WENO{N, FT, WCT}(on_architecture(to, scheme.bounds),
on_architecture(to, scheme.buffer_scheme),
on_architecture(to, scheme.advecting_velocity_scheme))

# Select the default WENO weight computation
# Specific backends may override
default_weno_weight_computation(arch) = Oceananigans.Utils.BackendOptimizedDivision
5 changes: 5 additions & 0 deletions src/DistributedComputations/DistributedComputations.jl
Original file line number Diff line number Diff line change
Expand Up @@ -59,4 +59,9 @@ function precondition!(p, preconditioner::DistributedFourierTridiagonalPoissonSo
return p
end

# Correctly pass architecture to determine the default weno_weight_computation
Oceananigans.Advection.default_weno_weight_computation(arch::Distributed) =
Oceananigans.Advection.default_weno_weight_computation(child_architecture(arch))


end # module
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
using Oceananigans.Advection: AbstractAdvectionScheme, Centered, VectorInvariant, adapt_advection_order
using Oceananigans.Advection: AbstractAdvectionScheme, Centered, VectorInvariant, adapt_advection_order, materialize_advection
using Oceananigans.Architectures: AbstractArchitecture
using Oceananigans.Biogeochemistry: validate_biogeochemistry, AbstractBiogeochemistry, biogeochemical_auxiliary_fields
using Oceananigans.BoundaryConditions: FieldBoundaryConditions, regularize_field_boundary_conditions
Expand Down Expand Up @@ -200,6 +200,10 @@ function HydrostaticFreeSurfaceModel(grid;
advection = merge(momentum_advection_tuple, tracer_advection_tuple)
advection = NamedTuple(name => adapt_advection_order(scheme, grid) for (name, scheme) in pairs(advection))

# Fill any settings in advection scheme that might have been deferred until
# the grid and backend is known
advection = NamedTuple(name => materialize_advection(scheme, grid) for (name, scheme) in pairs(advection))

validate_buoyancy(buoyancy, tracernames(tracers))
buoyancy = materialize_buoyancy(buoyancy, grid)

Expand Down
6 changes: 5 additions & 1 deletion src/Models/NonhydrostaticModels/nonhydrostatic_model.jl
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
using Oceananigans.Advection: Centered, adapt_advection_order
using Oceananigans.Advection: Centered, adapt_advection_order, materialize_advection
using Oceananigans.Architectures: AbstractArchitecture
using Oceananigans.Biogeochemistry: validate_biogeochemistry, AbstractBiogeochemistry, biogeochemical_auxiliary_fields
using Oceananigans.BoundaryConditions: MixedBoundaryCondition
Expand Down Expand Up @@ -217,6 +217,10 @@ function NonhydrostaticModel(grid;
# direction
advection = adapt_advection_order(advection, grid)

# Fill any settings in advection scheme that might have been deferred until
# the grid and backend is known
advection = materialize_advection(advection, grid)

# Adjust halos when the advection scheme or turbulence closure requires it.
# Note that halos are isotropic by default; however we respect user-input here
# by adjusting each (x, y, z) halo individually.
Expand Down
Loading
Loading