Conversation
|
is it possible to wait on this for 2 weeks? |
| CUDA.PTXCompilerTarget(; cap=llvm_cap, ptx=llvm_ptx, debuginfo), | ||
| CUDA.CUDACompilerParams(; cap=cuda_cap, ptx=cuda_ptx); | ||
| GPUCompiler.PTXCompilerTarget(; cap=llvm_cap, ptx=llvm_ptx, debuginfo), | ||
| CUDACore.CUDACompilerParams(; cap=cuda_cap, ptx=cuda_ptx); |
There was a problem hiding this comment.
this presumably breaks on cuda 5 right?
There was a problem hiding this comment.
As is right now yes, but I'm pretty sure we can simply define const CUDACore = CUDA when CUDACore isn't defined, thus making all changes compatible with v5 as well. And I pushed some changes in CUDA.jl itself to break fewer things (like making all the @device_* macros available in the CUDA scope)
There was a problem hiding this comment.
With cff4b0a the extension should be fully compatible with both CUDA v5 and v6.
As I mentioned above and elsewhere, this requires a lot of other packages to update to CUDA.jl v6 (which isn't even released) |
|
Side note, half of changes to the extension are actually bug fixes independent of the upgrade to v6 (that only exposed the bugs), like trying to symbols from wrong modules. |
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
|
This is ready for review. I'd like to tag the new version after this is merged, some users already reported issues with Reactant being downgraded to very old versions when installing CUDA in the same environment. |
|
KA tests are broken |
|
it'd make my life easier to get a direct link, instead of having to scavenge the tests. |
Looks to me a bug in LuxLib |
|
I'm going to assume the "broken KA tests" are https://buildkite.com/julialang/reactant-dot-jl/builds/17726#019d9c7c-885e-47ad-94da-9f8c2b1de61b/L2389. A standalone reproducer is (requires an Nvidia GPU) julia> using CUDA, KernelAbstractions, Reactant
julia> @kernel function square_kernel!(y, @Const(x))
i = @index(Global)
@inbounds y[i] = x[i] * x[i]
end
square_kernel! (generic function with 4 methods)
julia> function square(x)
y = similar(x)
backend = KernelAbstractions.get_backend(x)
kernel! = square_kernel!(backend)
kernel!(y, x; ndrange=length(x))
return y
end
square (generic function with 1 method)
julia> x = Reactant.to_rarray(collect(1:1:64) ./ 64);
julia> @jit(raise = false, square(x));
Warning: detected a stack overflow; program state may be corrupted, so further execution might be unreliable.
Warning: detected a stack overflow; program state may be corrupted, so further execution might be unreliable.
ERROR: StackOverflowError:
Stacktrace:
[1] rethrow()
@ Base ./error.jl:71
[2] macro expansion
@ ./lock.jl:378 [inlined]
[3] cufunction(f::typeof(gpu_square_kernel!), tt::Type{Tuple{…}}; kwargs::@Kwargs{})
@ ReactantCUDAExt /mnt/giordano/.julia/dev/Reactant/ext/ReactantCUDAExt.jl:1599
[4] call_with_reactant
@ ./none:-1 [inlined]
[5] call_with_reactant(::Reactant.EnsureReturnType{Any}, ::typeof(cufunction), ::typeof(gpu_square_kernel!), ::Type{Tuple{…}})
@ Reactant /mnt/giordano/.julia/dev/Reactant/src/utils.jl:0
[6] #launch_configuration#9
@ /mnt/giordano/.julia/dev/Reactant/ext/ReactantCUDAExt.jl:614
[7] call_with_reactant
@ ./none:-1 [inlined]
[8] call_with_reactant(::Reactant.EnsureReturnType{…}, ::ReactantCUDAExt.var"##launch_configuration#9", ::Int64, ::Int64, ::typeof(launch_configuration), ::Reactant.Compiler.LLVMFunc{…})
@ Reactant /mnt/giordano/.julia/dev/Reactant/src/utils.jl:0
--- the above 2 lines are repeated 1 more time ---
--- the above 5 lines are repeated 6868 more times ---
[34351] ka_with_reactant
@ /mnt/giordano/.julia/dev/Reactant/ext/ReactantCUDAExt.jl:535
[34352] call_with_reactant
@ ./none:-1 [inlined]
[34353] call_with_reactant(::typeof(Reactant.ka_with_reactant), ::Int64, ::Nothing, ::KernelAbstractions.Kernel{…}, ::Reactant.TracedRArray{…}, ::Reactant.TracedRArray{…})
@ Reactant /mnt/giordano/.julia/dev/Reactant/src/utils.jl:0
[34354] (::KernelAbstractions.Kernel{…})(::Reactant.TracedRArray{…}, ::Vararg{…}; ndrange::Int64, workgroupsize::Nothing)
@ ReactantKernelAbstractionsExt /mnt/giordano/.julia/dev/Reactant/ext/ReactantKernelAbstractionsExt.jl:128
[34355] square
@ ./REPL[5]:5
[34356] call_with_reactant
@ ./none:-1 [inlined]
[34357] call_with_reactant(::typeof(square), ::Reactant.TracedRArray{Float64, 1})
@ Reactant /mnt/giordano/.julia/dev/Reactant/src/utils.jl:0
[34358] make_mlir_fn(f::typeof(square), args::Tuple{…}, kwargs::@NamedTuple{}, name::String, concretein::Bool; toscalar::Bool, return_dialect::Symbol, args_in_result::Symbol, construct_function_without_args::Bool, do_transpose::Bool, within_autodiff::Bool, input_shardings::Nothing, output_shardings::Nothing, runtime::Val{…}, verify_arg_names::Nothing, argprefix::Symbol, resprefix::Symbol, resargprefix::Symbol, num_replicas::Int64, optimize_then_pad::Bool)
@ Reactant.TracedUtils /mnt/giordano/.julia/dev/Reactant/src/TracedUtils.jl:370 |
|
I'll need some help for digging this down. I followed Reactant.jl/ext/ReactantCUDAExt.jl Lines 506 to 523 in 2ee58db using CUDA, KernelAbstractions, Reactant
const KA = KernelAbstractions
@kernel function square_kernel!(y, @Const(x))
i = @index(Global)
@inbounds y[i] = x[i] * x[i]
end
x = Reactant.to_rarray(collect(1:1:64) ./ 64);
y = similar(x);
backend = KernelAbstractions.get_backend(x)
kernel! = square_kernel!(backend)
ndrange, workgroupsize = length(x), nothing
obj = kernel!
args = (y, x);
ndrange, workgroupsize, iterspace, dynamic = KA.launch_config(
obj, ndrange, workgroupsize
)
ctx = KA.mkcontext(obj, ndrange, iterspace)
maxthreads = nothing
kernel = CUDA.@cuda launch = false always_inline = backend.always_inline maxthreads =
maxthreads obj.f(ctx, args...)but I get various errors both on |
|
@maleadt was there any change to cufunction/friends? |
|
The stacktrace suggests the stackoverflow happens in Reactant.jl/ext/ReactantCUDAExt.jl Line 535 in 3a2f3d4 |
|
How does Reactant.jl/ext/ReactantCUDAExt.jl Lines 606 to 614 in 2ee58db cufunction is overlayed at Reactant.jl/ext/ReactantCUDAExt.jl Lines 1591 to 1626 in 2ee58db Reactant.Compiler.LLVMFunc also on main, so that the CUDA.launch_configuration is effectively an infinitely recursive function? Did this work by chance so far? Or am I missing something?
|
|
we should really do: Base.inferencebarrier(CUDA.cufunction)(f.f, Tuple{tt.parameters[2:end]...}).fun;-> call_with_native(CUDA.cufunction, f.f, Tuple{tt.parameters[2:end]...}).fun;since essentially the gist there is that within the reactant interp we can call into the native interp result for |
diff --git a/ext/ReactantCUDAExt.jl b/ext/ReactantCUDAExt.jl
index f1ce6e1dd..6f3c419ef 100644
--- a/ext/ReactantCUDAExt.jl
+++ b/ext/ReactantCUDAExt.jl
@@ -7,7 +7,8 @@ using Reactant:
AnyConcretePJRTArray,
MLIR,
TracedRNumber,
- ReactantPrecompilationException
+ ReactantPrecompilationException,
+ call_with_native
using Reactant.Compiler: raising, LLVMFunc, llvm_compiler_cache
using Reactant.Ops: @opcall
@@ -612,7 +613,7 @@ end
f::LLVMFunc{F,tt}; shmem::Union{Integer,Base.Callable}=0, max_threads::Integer=0
) where {F,tt}
return CUDA.launch_configuration(
- Base.inferencebarrier(CUDA.cufunction)(f.f, Tuple{tt.parameters[2:end]...}).fun;
+ call_with_native(CUDA.cufunction, f.f, Tuple{tt.parameters[2:end]...}).fun;
shmem,
max_threads,
)does fix the issue! |
|
I believe we need to wait LuxDL/Lux.jl#1696 for a clearer run, Lux integration tests are going to fail without that |
Not quite ready, especially because it depends on upstream packages (
ArrayInterface,Flux,Lux,LuxLib,NNlib,NonuniformFFTs,OneHotArrays) to adapt to the upcoming CUDA v6 first, but I'm saving my progress so far, at least with these changes I can barely precompile the CUDA extension.