In Oceananigans we need to define a specialised method for the CUDA target (it uses a PTX intrinsic) only and in CliMA/Oceananigans.jl#5140 we are going to use CUDA.@device_override, but the problem is that Reactant uses CUDA.jl-generated code for all targets, including CPU one, so it'd always try to call that intrinsic with all backends, which is of course not possible
so reactant currently faithfully fully represents the cuda kernel. There is no mechanism for determining if you're in reactant or not within a kernel -- we get the code out of cuda.jl. We could add a mechanism for doing so here:
|
config = GPUCompiler.CompilerConfig( |
. We'd need to make a new gpucompiler job/config/state that looks identical to the cuda one, except with a second method overlay table that itself overlays the cuda overlay table
Originally posted by @wsmoses in CliMA/Oceananigans.jl#5140 (comment)
I don't really know how to accomplish this though. The method table for a compiler job is specified via dispatch, e.g. https://github.com/JuliaGPU/CUDA.jl/blob/9528a336f527bf49cc2255b51bbb59f87a533e43/src/compiler/compilation.jl#L52, so I don't immediately see how we can have same compilerjob/compilerconfiguration as CUDA.jl, but different method table, unless you want to wrap CUDACompilerJob, but that's a endless rabbit hole. Can we overlay GPUCompiler.method_table? That sounds potentially "dangerous" though.
In Oceananigans we need to define a specialised method for the CUDA target (it uses a PTX intrinsic) only and in CliMA/Oceananigans.jl#5140 we are going to use
CUDA.@device_override, but the problem is that Reactant uses CUDA.jl-generated code for all targets, including CPU one, so it'd always try to call that intrinsic with all backends, which is of course not possibleOriginally posted by @wsmoses in CliMA/Oceananigans.jl#5140 (comment)
I don't really know how to accomplish this though. The method table for a compiler job is specified via dispatch, e.g. https://github.com/JuliaGPU/CUDA.jl/blob/9528a336f527bf49cc2255b51bbb59f87a533e43/src/compiler/compilation.jl#L52, so I don't immediately see how we can have same compilerjob/compilerconfiguration as CUDA.jl, but different method table, unless you want to wrap
CUDACompilerJob, but that's a endless rabbit hole. Can we overlayGPUCompiler.method_table? That sounds potentially "dangerous" though.