Skip to content

Fix GC race in CuRef getindex causing intermittent CUDA errors#3087

Merged
maleadt merged 1 commit intomasterfrom
tb/refpointer_preserve
Apr 9, 2026
Merged

Fix GC race in CuRef getindex causing intermittent CUDA errors#3087
maleadt merged 1 commit intomasterfrom
tb/refpointer_preserve

Conversation

@maleadt
Copy link
Copy Markdown
Member

@maleadt maleadt commented Apr 9, 2026

The getindex methods for CuRefValue and CuRefArray only preserved the CPU Ref with GC.@preserve, but not the GPU reference. After extracting the raw device pointer via unsafe_convert, the GC could collect the CuRefValue (running its pool_free finalizer) before the unsafe_copyto! memcpy completed, resulting in use-after-free.

Fixes #3012

The `getindex` methods for `CuRefValue` and `CuRefArray` only preserved
the CPU `Ref` with `GC.@preserve`, but not the GPU reference. After
extracting the raw device pointer via `unsafe_convert`, the GC could
collect the `CuRefValue` (running its `pool_free` finalizer) before the
`unsafe_copyto!` memcpy completed, resulting in use-after-free.

This manifested as intermittent `CUDA error: invalid argument` or
segfaults under GC pressure, particularly in multi-threaded workloads
performing many CUBLAS operations (e.g., dot products in Arnoldi iteration).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@maleadt maleadt merged commit 9bcae25 into master Apr 9, 2026
1 of 2 checks passed
@maleadt maleadt deleted the tb/refpointer_preserve branch April 9, 2026 16:10
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.42%. Comparing base (a9a687c) to head (e49bd1b).
⚠️ Report is 5 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3087   +/-   ##
=======================================
  Coverage   90.41%   90.42%           
=======================================
  Files         141      141           
  Lines       11993    11993           
=======================================
+ Hits        10844    10845    +1     
+ Misses       1149     1148    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Details
Benchmark suite Current: e49bd1b Previous: 5f45772 Ratio
array/accumulate/Float32/1d 101132 ns 101495 ns 1.00
array/accumulate/Float32/dims=1 76479 ns 76898 ns 0.99
array/accumulate/Float32/dims=1L 1583539 ns 1585143.5 ns 1.00
array/accumulate/Float32/dims=2 143853 ns 143801 ns 1.00
array/accumulate/Float32/dims=2L 657727 ns 657240 ns 1.00
array/accumulate/Int64/1d 118411 ns 118623 ns 1.00
array/accumulate/Int64/dims=1 80131 ns 80572.5 ns 0.99
array/accumulate/Int64/dims=1L 1695177 ns 1693852 ns 1.00
array/accumulate/Int64/dims=2 155949 ns 156484 ns 1.00
array/accumulate/Int64/dims=2L 961597.5 ns 961603 ns 1.00
array/broadcast 20514 ns 20294 ns 1.01
array/construct 1331.2 ns 1320.4 ns 1.01
array/copy 18720 ns 18780 ns 1.00
array/copyto!/cpu_to_gpu 215223.5 ns 214684 ns 1.00
array/copyto!/gpu_to_cpu 283574 ns 282072 ns 1.01
array/copyto!/gpu_to_gpu 11408 ns 11361 ns 1.00
array/iteration/findall/bool 131383 ns 131719.5 ns 1.00
array/iteration/findall/int 148780 ns 148883 ns 1.00
array/iteration/findfirst/bool 80906 ns 81470.5 ns 0.99
array/iteration/findfirst/int 83533.5 ns 83414 ns 1.00
array/iteration/findmin/1d 88432.5 ns 89419 ns 0.99
array/iteration/findmin/2d 117090.5 ns 117365 ns 1.00
array/iteration/logical 199596 ns 207612 ns 0.96
array/iteration/scalar 67301 ns 66780 ns 1.01
array/permutedims/2d 52399 ns 52471.5 ns 1.00
array/permutedims/3d 52919 ns 53137 ns 1.00
array/permutedims/4d 52303 ns 52429 ns 1.00
array/random/rand/Float32 13180 ns 13089 ns 1.01
array/random/rand/Int64 37361 ns 37236 ns 1.00
array/random/rand!/Float32 8520.333333333334 ns 8527.666666666666 ns 1.00
array/random/rand!/Int64 34437.5 ns 34109.5 ns 1.01
array/random/randn/Float32 44084 ns 38147 ns 1.16
array/random/randn!/Float32 31665 ns 31640 ns 1.00
array/reductions/mapreduce/Float32/1d 35194 ns 34735.5 ns 1.01
array/reductions/mapreduce/Float32/dims=1 40837 ns 40760 ns 1.00
array/reductions/mapreduce/Float32/dims=1L 51944 ns 51917 ns 1.00
array/reductions/mapreduce/Float32/dims=2 56712 ns 56503.5 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 69408 ns 69496.5 ns 1.00
array/reductions/mapreduce/Int64/1d 42852 ns 42820 ns 1.00
array/reductions/mapreduce/Int64/dims=1 42338.5 ns 44181 ns 0.96
array/reductions/mapreduce/Int64/dims=1L 87836 ns 87798 ns 1.00
array/reductions/mapreduce/Int64/dims=2 59872 ns 59808 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 85157 ns 85232 ns 1.00
array/reductions/reduce/Float32/1d 35522 ns 34883 ns 1.02
array/reductions/reduce/Float32/dims=1 43353.5 ns 39758 ns 1.09
array/reductions/reduce/Float32/dims=1L 52272 ns 52166 ns 1.00
array/reductions/reduce/Float32/dims=2 57289 ns 56925 ns 1.01
array/reductions/reduce/Float32/dims=2L 70052 ns 69909 ns 1.00
array/reductions/reduce/Int64/1d 43171 ns 42673 ns 1.01
array/reductions/reduce/Int64/dims=1 51441.5 ns 42123 ns 1.22
array/reductions/reduce/Int64/dims=1L 87779 ns 87782 ns 1.00
array/reductions/reduce/Int64/dims=2 59682 ns 59551 ns 1.00
array/reductions/reduce/Int64/dims=2L 84985.5 ns 84796 ns 1.00
array/reverse/1d 18621 ns 18432.5 ns 1.01
array/reverse/1dL 69138 ns 69025 ns 1.00
array/reverse/1dL_inplace 65918 ns 65968 ns 1.00
array/reverse/1d_inplace 8552.333333333334 ns 10240.666666666666 ns 0.84
array/reverse/2d 20669 ns 20709 ns 1.00
array/reverse/2dL 72745 ns 72815 ns 1.00
array/reverse/2dL_inplace 66016 ns 65992 ns 1.00
array/reverse/2d_inplace 10186 ns 11117.5 ns 0.92
array/sorting/1d 2734661 ns 2754859 ns 0.99
array/sorting/2d 1068455 ns 1075967 ns 0.99
array/sorting/by 3303203 ns 3328240 ns 0.99
cuda/synchronization/context/auto 1158.4 ns 1192.4 ns 0.97
cuda/synchronization/context/blocking 947.8181818181819 ns 947.7391304347826 ns 1.00
cuda/synchronization/context/nonblocking 7681.6 ns 7660.1 ns 1.00
cuda/synchronization/stream/auto 1018.4545454545455 ns 1032.5 ns 0.99
cuda/synchronization/stream/blocking 824.8139534883721 ns 841.5588235294117 ns 0.98
cuda/synchronization/stream/nonblocking 7679.9 ns 7189.6 ns 1.07
integration/byval/reference 144062 ns 143997 ns 1.00
integration/byval/slices=1 145976 ns 145776 ns 1.00
integration/byval/slices=2 284694 ns 284427 ns 1.00
integration/byval/slices=3 423386 ns 423129 ns 1.00
integration/cudadevrt 102770 ns 102598 ns 1.00
integration/volumerhs 9440456 ns 9429742.5 ns 1.00
kernel/indexing 13404 ns 13331 ns 1.01
kernel/indexing_checked 14139 ns 14116 ns 1.00
kernel/launch 2199.222222222222 ns 2147 ns 1.02
kernel/occupancy 679.9019607843137 ns 660.5723270440252 ns 1.03
kernel/rand 16496 ns 15598 ns 1.06
latency/import 3837634365 ns 3807044359.5 ns 1.01
latency/precompile 4602411854.5 ns 4590923492 ns 1.00
latency/ttfp 4415248089.5 ns 4392969126 ns 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CUDA Invalid Argument error when calling dot between two complex vectors

1 participant