Skip to content

Make CUDA & friends loadable on systems without NVPTX LLVM backend#3067

Merged
maleadt merged 2 commits intoJuliaGPU:masterfrom
giordano:mg/precompilation-non-cuda
Mar 30, 2026
Merged

Make CUDA & friends loadable on systems without NVPTX LLVM backend#3067
maleadt merged 2 commits intoJuliaGPU:masterfrom
giordano:mg/precompilation-non-cuda

Conversation

@giordano
Copy link
Copy Markdown
Contributor

@giordano giordano commented Mar 30, 2026

This makes CUDA.jl loadable for me on macOS, without any warning (as it used to do in v5). PR best reviewed by ignoring whitespace changes

The symbol won't be available on platforms which don't support CUDA, causing the
warning

```
  1 dependency had output during precompilation:
┌ CUDATools
│  WARNING: Imported binding CUDA_Compiler_jll.nvdisasm was undeclared at import time during import to CUDATools.
└
```
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.42%. Comparing base (d95add6) to head (f59f05f).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3067   +/-   ##
=======================================
  Coverage   90.42%   90.42%           
=======================================
  Files         141      141           
  Lines       11993    11993           
=======================================
  Hits        10845    10845           
  Misses       1148     1148           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Details
Benchmark suite Current: f59f05f Previous: d95add6 Ratio
latency/precompile 4592146077.5 ns 4591137397 ns 1.00
latency/ttfp 4399155900.5 ns 4388441108 ns 1.00
latency/import 3819111214.5 ns 3806192215 ns 1.00
integration/volumerhs 9439930.5 ns 9441617.5 ns 1.00
integration/byval/slices=1 146076 ns 145778 ns 1.00
integration/byval/slices=3 423209 ns 423134 ns 1.00
integration/byval/reference 144218 ns 143890 ns 1.00
integration/byval/slices=2 284770 ns 284514 ns 1.00
integration/cudadevrt 102812 ns 102578 ns 1.00
kernel/indexing 13471.5 ns 13667 ns 0.99
kernel/indexing_checked 14259 ns 14261 ns 1.00
kernel/occupancy 681.0193548387097 ns 664.75 ns 1.02
kernel/launch 2221.5555555555557 ns 2208.777777777778 ns 1.01
kernel/rand 16362 ns 18252 ns 0.90
array/reverse/1d 18700.5 ns 18706 ns 1.00
array/reverse/2dL_inplace 66103 ns 66090 ns 1.00
array/reverse/1dL 69258 ns 69365 ns 1.00
array/reverse/2d 20940 ns 20874 ns 1.00
array/reverse/1d_inplace 8651.333333333334 ns 10373 ns 0.83
array/reverse/2d_inplace 10268 ns 10322 ns 0.99
array/reverse/2dL 72954 ns 72995 ns 1.00
array/reverse/1dL_inplace 66053.5 ns 65923 ns 1.00
array/copy 18724 ns 18987 ns 0.99
array/iteration/findall/int 149943.5 ns 149956 ns 1.00
array/iteration/findall/bool 132890 ns 132890 ns 1
array/iteration/findfirst/int 84061 ns 84402 ns 1.00
array/iteration/findfirst/bool 81951.5 ns 82118 ns 1.00
array/iteration/scalar 67980.5 ns 68770 ns 0.99
array/iteration/logical 201560.5 ns 204009 ns 0.99
array/iteration/findmin/1d 85760 ns 86819.5 ns 0.99
array/iteration/findmin/2d 117484 ns 118061 ns 1.00
array/reductions/reduce/Int64/1d 43796 ns 43837 ns 1.00
array/reductions/reduce/Int64/dims=1 42901 ns 53019.5 ns 0.81
array/reductions/reduce/Int64/dims=2 59921 ns 60078 ns 1.00
array/reductions/reduce/Int64/dims=1L 87983 ns 88042 ns 1.00
array/reductions/reduce/Int64/dims=2L 85126 ns 85281 ns 1.00
array/reductions/reduce/Float32/1d 35574 ns 35957 ns 0.99
array/reductions/reduce/Float32/dims=1 42129 ns 41971.5 ns 1.00
array/reductions/reduce/Float32/dims=2 57231 ns 57207 ns 1.00
array/reductions/reduce/Float32/dims=1L 52087 ns 52254 ns 1.00
array/reductions/reduce/Float32/dims=2L 70549 ns 70121.5 ns 1.01
array/reductions/mapreduce/Int64/1d 43545 ns 43844 ns 0.99
array/reductions/mapreduce/Int64/dims=1 43249 ns 42506 ns 1.02
array/reductions/mapreduce/Int64/dims=2 59882 ns 59797 ns 1.00
array/reductions/mapreduce/Int64/dims=1L 87920 ns 87969 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 85248 ns 85477 ns 1.00
array/reductions/mapreduce/Float32/1d 35402 ns 35390 ns 1.00
array/reductions/mapreduce/Float32/dims=1 40235.5 ns 39718.5 ns 1.01
array/reductions/mapreduce/Float32/dims=2 57151 ns 56925 ns 1.00
array/reductions/mapreduce/Float32/dims=1L 52079.5 ns 52147 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 70017 ns 69617 ns 1.01
array/broadcast 20624 ns 20791 ns 0.99
array/copyto!/gpu_to_gpu 11393 ns 11363 ns 1.00
array/copyto!/cpu_to_gpu 218446 ns 217254 ns 1.01
array/copyto!/gpu_to_cpu 284049 ns 286261 ns 0.99
array/accumulate/Int64/1d 119013 ns 119453.5 ns 1.00
array/accumulate/Int64/dims=1 79987 ns 80552 ns 0.99
array/accumulate/Int64/dims=2 156049 ns 156859 ns 0.99
array/accumulate/Int64/dims=1L 1694112 ns 1706924 ns 0.99
array/accumulate/Int64/dims=2L 961481 ns 962107.5 ns 1.00
array/accumulate/Float32/1d 102076 ns 101709 ns 1.00
array/accumulate/Float32/dims=1 77112 ns 77739 ns 0.99
array/accumulate/Float32/dims=2 144441 ns 144317 ns 1.00
array/accumulate/Float32/dims=1L 1585891 ns 1586304.5 ns 1.00
array/accumulate/Float32/dims=2L 657924.5 ns 658932 ns 1.00
array/construct 1303.5 ns 1319 ns 0.99
array/random/randn/Float32 39636 ns 44317.5 ns 0.89
array/random/randn!/Float32 31597 ns 27470 ns 1.15
array/random/rand!/Int64 34479 ns 27837 ns 1.24
array/random/rand!/Float32 8529.333333333334 ns 8551.666666666666 ns 1.00
array/random/rand/Int64 37375 ns 30127.5 ns 1.24
array/random/rand/Float32 13215 ns 13103 ns 1.01
array/permutedims/4d 52042 ns 52084 ns 1.00
array/permutedims/2d 52826 ns 52953 ns 1.00
array/permutedims/3d 53261.5 ns 53412.5 ns 1.00
array/sorting/1d 2735692 ns 2736660 ns 1.00
array/sorting/by 3307835 ns 3305151.5 ns 1.00
array/sorting/2d 1076070 ns 1069633 ns 1.01
cuda/synchronization/stream/auto 1018.1 ns 1037.9 ns 0.98
cuda/synchronization/stream/nonblocking 7654.6 ns 7454.700000000001 ns 1.03
cuda/synchronization/stream/blocking 831.952380952381 ns 833.8589743589744 ns 1.00
cuda/synchronization/context/auto 1154.8 ns 1167.8 ns 0.99
cuda/synchronization/context/nonblocking 7605.3 ns 8361.2 ns 0.91
cuda/synchronization/context/blocking 929.5555555555555 ns 926.1538461538462 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@maleadt maleadt merged commit 441fec5 into JuliaGPU:master Mar 30, 2026
2 checks passed
@giordano giordano deleted the mg/precompilation-non-cuda branch March 30, 2026 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants