Skip to content

Fix Int32 overflow in !range metadata for large intrinsics#3119

Open
rainerrodrigues wants to merge 6 commits intoJuliaGPU:masterfrom
rainerrodrigues:fix-range-metadata
Open

Fix Int32 overflow in !range metadata for large intrinsics#3119
rainerrodrigues wants to merge 6 commits intoJuliaGPU:masterfrom
rainerrodrigues:fix-range-metadata

Conversation

@rainerrodrigues
Copy link
Copy Markdown

Fixes #3024

Summary

This PR fixes a bug where !range metadata on LLVM intrinsics was causing an Int32 overflow, which resulted in negative upper bounds and prevented LLVM from optimizing away standard bounds checks.

The Cause

LLVM !range metadata expects a half-open interval [start, stop). In CUDACore/src/device/intrinsics/indexing.jl, the exclusive upper bound is calculated as (range.stop + 1) % Int32.
For intrinsics with large maximums, like gridDim_x (2^31 - 1), adding 1 overflows the signed 32-bit integer, resulting in -2147483648. LLVM interprets this as a wrap-around range, effectively destroying bounds-check optimizations.

The Fix

Added a safeguard in _index to only attach the !range metadata if range.stop < typemax(Int32). Small-bounded intrinsics (like blockDim) keep their metadata, while large-bounded ones (like gridDim) safely omit it to prevent the overflow.

Testing

Added a unit test in test/core/device/intrinsics.jl using CUDA.code_llvm(..., dump_module=true, raw=true) to verify that:

  1. !range is correctly emitted for blockDim().x.
  2. !range is safely omitted for gridDim().x.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Details
Benchmark suite Current: 7e44cec Previous: d627b3e Ratio
array/accumulate/Float32/1d 101339 ns 100853 ns 1.00
array/accumulate/Float32/dims=1 77260 ns 76362 ns 1.01
array/accumulate/Float32/dims=1L 1586550 ns 1585357 ns 1.00
array/accumulate/Float32/dims=2 144263 ns 143506 ns 1.01
array/accumulate/Float32/dims=2L 658401 ns 657416 ns 1.00
array/accumulate/Int64/1d 118883.5 ns 118459.5 ns 1.00
array/accumulate/Int64/dims=1 80473 ns 79599 ns 1.01
array/accumulate/Int64/dims=1L 1695761.5 ns 1694409 ns 1.00
array/accumulate/Int64/dims=2 156095 ns 155450.5 ns 1.00
array/accumulate/Int64/dims=2L 962295.5 ns 960958 ns 1.00
array/broadcast 20701 ns 20282 ns 1.02
array/construct 1307.7 ns 1328.2 ns 0.98
array/copy 18275 ns 17911 ns 1.02
array/copyto!/cpu_to_gpu 214231.5 ns 212984 ns 1.01
array/copyto!/gpu_to_cpu 283233 ns 280259 ns 1.01
array/copyto!/gpu_to_gpu 11021 ns 10684 ns 1.03
array/iteration/findall/bool 134462 ns 134327 ns 1.00
array/iteration/findall/int 149883 ns 149222 ns 1.00
array/iteration/findfirst/bool 81431 ns 81163 ns 1.00
array/iteration/findfirst/int 83343 ns 82984 ns 1.00
array/iteration/findmin/1d 87141.5 ns 85430.5 ns 1.02
array/iteration/findmin/2d 117107 ns 116921 ns 1.00
array/iteration/logical 199980 ns 196920 ns 1.02
array/iteration/scalar 68578 ns 67453 ns 1.02
array/permutedims/2d 52164 ns 52024.5 ns 1.00
array/permutedims/3d 52715 ns 52690 ns 1.00
array/permutedims/4d 53830.5 ns 51921.5 ns 1.04
array/random/rand/Float32 12904 ns 12625 ns 1.02
array/random/rand/Int64 24944 ns 24502 ns 1.02
array/random/rand!/Float32 8426.333333333334 ns 8842 ns 0.95
array/random/rand!/Int64 21569 ns 21520 ns 1.00
array/random/randn/Float32 37511 ns 38339 ns 0.98
array/random/randn!/Float32 30819.5 ns 30558 ns 1.01
array/reductions/mapreduce/Float32/1d 34615 ns 34493 ns 1.00
array/reductions/mapreduce/Float32/dims=1 49961 ns 39238 ns 1.27
array/reductions/mapreduce/Float32/dims=1L 51391 ns 51115 ns 1.01
array/reductions/mapreduce/Float32/dims=2 58098 ns 56362 ns 1.03
array/reductions/mapreduce/Float32/dims=2L 67940 ns 68855 ns 0.99
array/reductions/mapreduce/Int64/1d 42975.5 ns 41840 ns 1.03
array/reductions/mapreduce/Int64/dims=1 42620 ns 43041.5 ns 0.99
array/reductions/mapreduce/Int64/dims=1L 87221 ns 87062 ns 1.00
array/reductions/mapreduce/Int64/dims=2 60865 ns 59400.5 ns 1.02
array/reductions/mapreduce/Int64/dims=2L 84250 ns 84650 ns 1.00
array/reductions/reduce/Float32/1d 35253.5 ns 34748 ns 1.01
array/reductions/reduce/Float32/dims=1 48912 ns 42567 ns 1.15
array/reductions/reduce/Float32/dims=1L 51580 ns 51269 ns 1.01
array/reductions/reduce/Float32/dims=2 58356 ns 56421.5 ns 1.03
array/reductions/reduce/Float32/dims=2L 68854 ns 69423 ns 0.99
array/reductions/reduce/Int64/1d 42724 ns 41786 ns 1.02
array/reductions/reduce/Int64/dims=1 42554 ns 41701 ns 1.02
array/reductions/reduce/Int64/dims=1L 87190 ns 87004 ns 1.00
array/reductions/reduce/Int64/dims=2 60933 ns 59173 ns 1.03
array/reductions/reduce/Int64/dims=2L 84180 ns 84474 ns 1.00
array/reverse/1d 17974 ns 17689 ns 1.02
array/reverse/1dL 68610 ns 68239 ns 1.01
array/reverse/1dL_inplace 65750 ns 65528 ns 1.00
array/reverse/1d_inplace 10291.666666666666 ns 8294.666666666666 ns 1.24
array/reverse/2d 20772 ns 20296 ns 1.02
array/reverse/2dL 72889 ns 72502 ns 1.01
array/reverse/2dL_inplace 65846 ns 65556 ns 1.00
array/reverse/2d_inplace 10049 ns 9743 ns 1.03
array/sorting/1d 2735535 ns 2734808 ns 1.00
array/sorting/2d 1069343 ns 1071711 ns 1.00
array/sorting/by 3304052 ns 3304331 ns 1.00
cuda/synchronization/context/auto 1171.7 ns 1162.7 ns 1.01
cuda/synchronization/context/blocking 923.5 ns 928.3666666666667 ns 0.99
cuda/synchronization/context/nonblocking 7085.6 ns 6978.1 ns 1.02
cuda/synchronization/stream/auto 998.9642857142858 ns 994.2105263157895 ns 1.00
cuda/synchronization/stream/blocking 804.0309278350516 ns 840.6543209876543 ns 0.96
cuda/synchronization/stream/nonblocking 6989.2 ns 7241.299999999999 ns 0.97
integration/byval/reference 143828 ns 143552.5 ns 1.00
integration/byval/slices=1 145614 ns 145385 ns 1.00
integration/byval/slices=2 284608 ns 284202.5 ns 1.00
integration/byval/slices=3 423154 ns 422620 ns 1.00
integration/cudadevrt 102393 ns 102213 ns 1.00
integration/volumerhs 23422519 ns 23474593 ns 1.00
kernel/indexing 13226 ns 12944 ns 1.02
kernel/indexing_checked 13872 ns 13691.5 ns 1.01
kernel/launch 2106.8888888888887 ns 2106.5555555555557 ns 1.00
kernel/occupancy 664.8993710691824 ns 665.54375 ns 1.00
kernel/rand 14214 ns 13998 ns 1.02
latency/import 3821530614.5 ns 3829156614.5 ns 1.00
latency/precompile 4585635970.5 ns 4603348569 ns 1.00
latency/ttfp 4391840056 ns 4403053778.5 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@maleadt
Copy link
Copy Markdown
Member

maleadt commented Apr 30, 2026

CI failures look related.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 1, 2026

Codecov Report

❌ Patch coverage is 0% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 16.43%. Comparing base (d627b3e) to head (7e44cec).

Files with missing lines Patch % Lines
lib/cusparse/src/array.jl 0.00% 23 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3119      +/-   ##
==========================================
+ Coverage   16.41%   16.43%   +0.01%     
==========================================
  Files         123      123              
  Lines        9678     9670       -8     
==========================================
  Hits         1589     1589              
+ Misses       8089     8081       -8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Indexing range metadata is set up wrong

2 participants