Make CUDA & friends loadable on systems without NVPTX LLVM backend#3067
Merged
maleadt merged 2 commits intoJuliaGPU:masterfrom Mar 30, 2026
Merged
Make CUDA & friends loadable on systems without NVPTX LLVM backend#3067maleadt merged 2 commits intoJuliaGPU:masterfrom
maleadt merged 2 commits intoJuliaGPU:masterfrom
Conversation
The symbol won't be available on platforms which don't support CUDA, causing the warning ``` 1 dependency had output during precompilation: ┌ CUDATools │ WARNING: Imported binding CUDA_Compiler_jll.nvdisasm was undeclared at import time during import to CUDATools. └ ```
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3067 +/- ##
=======================================
Coverage 90.42% 90.42%
=======================================
Files 141 141
Lines 11993 11993
=======================================
Hits 10845 10845
Misses 1148 1148 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Contributor
There was a problem hiding this comment.
CUDA.jl Benchmarks
Details
| Benchmark suite | Current: f59f05f | Previous: d95add6 | Ratio |
|---|---|---|---|
latency/precompile |
4592146077.5 ns |
4591137397 ns |
1.00 |
latency/ttfp |
4399155900.5 ns |
4388441108 ns |
1.00 |
latency/import |
3819111214.5 ns |
3806192215 ns |
1.00 |
integration/volumerhs |
9439930.5 ns |
9441617.5 ns |
1.00 |
integration/byval/slices=1 |
146076 ns |
145778 ns |
1.00 |
integration/byval/slices=3 |
423209 ns |
423134 ns |
1.00 |
integration/byval/reference |
144218 ns |
143890 ns |
1.00 |
integration/byval/slices=2 |
284770 ns |
284514 ns |
1.00 |
integration/cudadevrt |
102812 ns |
102578 ns |
1.00 |
kernel/indexing |
13471.5 ns |
13667 ns |
0.99 |
kernel/indexing_checked |
14259 ns |
14261 ns |
1.00 |
kernel/occupancy |
681.0193548387097 ns |
664.75 ns |
1.02 |
kernel/launch |
2221.5555555555557 ns |
2208.777777777778 ns |
1.01 |
kernel/rand |
16362 ns |
18252 ns |
0.90 |
array/reverse/1d |
18700.5 ns |
18706 ns |
1.00 |
array/reverse/2dL_inplace |
66103 ns |
66090 ns |
1.00 |
array/reverse/1dL |
69258 ns |
69365 ns |
1.00 |
array/reverse/2d |
20940 ns |
20874 ns |
1.00 |
array/reverse/1d_inplace |
8651.333333333334 ns |
10373 ns |
0.83 |
array/reverse/2d_inplace |
10268 ns |
10322 ns |
0.99 |
array/reverse/2dL |
72954 ns |
72995 ns |
1.00 |
array/reverse/1dL_inplace |
66053.5 ns |
65923 ns |
1.00 |
array/copy |
18724 ns |
18987 ns |
0.99 |
array/iteration/findall/int |
149943.5 ns |
149956 ns |
1.00 |
array/iteration/findall/bool |
132890 ns |
132890 ns |
1 |
array/iteration/findfirst/int |
84061 ns |
84402 ns |
1.00 |
array/iteration/findfirst/bool |
81951.5 ns |
82118 ns |
1.00 |
array/iteration/scalar |
67980.5 ns |
68770 ns |
0.99 |
array/iteration/logical |
201560.5 ns |
204009 ns |
0.99 |
array/iteration/findmin/1d |
85760 ns |
86819.5 ns |
0.99 |
array/iteration/findmin/2d |
117484 ns |
118061 ns |
1.00 |
array/reductions/reduce/Int64/1d |
43796 ns |
43837 ns |
1.00 |
array/reductions/reduce/Int64/dims=1 |
42901 ns |
53019.5 ns |
0.81 |
array/reductions/reduce/Int64/dims=2 |
59921 ns |
60078 ns |
1.00 |
array/reductions/reduce/Int64/dims=1L |
87983 ns |
88042 ns |
1.00 |
array/reductions/reduce/Int64/dims=2L |
85126 ns |
85281 ns |
1.00 |
array/reductions/reduce/Float32/1d |
35574 ns |
35957 ns |
0.99 |
array/reductions/reduce/Float32/dims=1 |
42129 ns |
41971.5 ns |
1.00 |
array/reductions/reduce/Float32/dims=2 |
57231 ns |
57207 ns |
1.00 |
array/reductions/reduce/Float32/dims=1L |
52087 ns |
52254 ns |
1.00 |
array/reductions/reduce/Float32/dims=2L |
70549 ns |
70121.5 ns |
1.01 |
array/reductions/mapreduce/Int64/1d |
43545 ns |
43844 ns |
0.99 |
array/reductions/mapreduce/Int64/dims=1 |
43249 ns |
42506 ns |
1.02 |
array/reductions/mapreduce/Int64/dims=2 |
59882 ns |
59797 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=1L |
87920 ns |
87969 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2L |
85248 ns |
85477 ns |
1.00 |
array/reductions/mapreduce/Float32/1d |
35402 ns |
35390 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1 |
40235.5 ns |
39718.5 ns |
1.01 |
array/reductions/mapreduce/Float32/dims=2 |
57151 ns |
56925 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1L |
52079.5 ns |
52147 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2L |
70017 ns |
69617 ns |
1.01 |
array/broadcast |
20624 ns |
20791 ns |
0.99 |
array/copyto!/gpu_to_gpu |
11393 ns |
11363 ns |
1.00 |
array/copyto!/cpu_to_gpu |
218446 ns |
217254 ns |
1.01 |
array/copyto!/gpu_to_cpu |
284049 ns |
286261 ns |
0.99 |
array/accumulate/Int64/1d |
119013 ns |
119453.5 ns |
1.00 |
array/accumulate/Int64/dims=1 |
79987 ns |
80552 ns |
0.99 |
array/accumulate/Int64/dims=2 |
156049 ns |
156859 ns |
0.99 |
array/accumulate/Int64/dims=1L |
1694112 ns |
1706924 ns |
0.99 |
array/accumulate/Int64/dims=2L |
961481 ns |
962107.5 ns |
1.00 |
array/accumulate/Float32/1d |
102076 ns |
101709 ns |
1.00 |
array/accumulate/Float32/dims=1 |
77112 ns |
77739 ns |
0.99 |
array/accumulate/Float32/dims=2 |
144441 ns |
144317 ns |
1.00 |
array/accumulate/Float32/dims=1L |
1585891 ns |
1586304.5 ns |
1.00 |
array/accumulate/Float32/dims=2L |
657924.5 ns |
658932 ns |
1.00 |
array/construct |
1303.5 ns |
1319 ns |
0.99 |
array/random/randn/Float32 |
39636 ns |
44317.5 ns |
0.89 |
array/random/randn!/Float32 |
31597 ns |
27470 ns |
1.15 |
array/random/rand!/Int64 |
34479 ns |
27837 ns |
1.24 |
array/random/rand!/Float32 |
8529.333333333334 ns |
8551.666666666666 ns |
1.00 |
array/random/rand/Int64 |
37375 ns |
30127.5 ns |
1.24 |
array/random/rand/Float32 |
13215 ns |
13103 ns |
1.01 |
array/permutedims/4d |
52042 ns |
52084 ns |
1.00 |
array/permutedims/2d |
52826 ns |
52953 ns |
1.00 |
array/permutedims/3d |
53261.5 ns |
53412.5 ns |
1.00 |
array/sorting/1d |
2735692 ns |
2736660 ns |
1.00 |
array/sorting/by |
3307835 ns |
3305151.5 ns |
1.00 |
array/sorting/2d |
1076070 ns |
1069633 ns |
1.01 |
cuda/synchronization/stream/auto |
1018.1 ns |
1037.9 ns |
0.98 |
cuda/synchronization/stream/nonblocking |
7654.6 ns |
7454.700000000001 ns |
1.03 |
cuda/synchronization/stream/blocking |
831.952380952381 ns |
833.8589743589744 ns |
1.00 |
cuda/synchronization/context/auto |
1154.8 ns |
1167.8 ns |
0.99 |
cuda/synchronization/context/nonblocking |
7605.3 ns |
8361.2 ns |
0.91 |
cuda/synchronization/context/blocking |
929.5555555555555 ns |
926.1538461538462 ns |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
maleadt
approved these changes
Mar 30, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This makes
CUDA.jlloadable for me on macOS, without any warning (as it used to do in v5). PR best reviewed by ignoring whitespace changes