Skip to content

Homogenize default behavior of higher-dimensional similar to SparseArrays#3091

Open
alonsoC1s wants to merge 4 commits intoJuliaGPU:masterfrom
alonsoC1s:issue-3061
Open

Homogenize default behavior of higher-dimensional similar to SparseArrays#3091
alonsoC1s wants to merge 4 commits intoJuliaGPU:masterfrom
alonsoC1s:issue-3061

Conversation

@alonsoC1s
Copy link
Copy Markdown

As mentioned in #3061 , when calling similar with CUSPARSE arguments and dimensions higher than 2, the fallback CPU method is called. As a result, calls to similar with GPU (sparse) matrices sometimes return host arrays, causing scalar indexing issues down the line

This PR introduces a fallback to similar that creates un-initialized CuArrays when there are 3 or more dimensions given as arguments, which is similar to what Base and SparseArrays do

Important note: I wasn't able to fully test this locally due to some issues instantiating the repo environment with Pkg

@maleadt
Copy link
Copy Markdown
Member

maleadt commented Apr 10, 2026

Test failures related.

@alonsoC1s
Copy link
Copy Markdown
Author

@maleadt my bad. I think I got it this time

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Details
Benchmark suite Current: 14bf466 Previous: 6ccd4b4 Ratio
array/accumulate/Float32/1d 101152 ns 101723 ns 0.99
array/accumulate/Float32/dims=1 76463 ns 76608 ns 1.00
array/accumulate/Float32/dims=1L 1585076 ns 1585294 ns 1.00
array/accumulate/Float32/dims=2 143498.5 ns 143948 ns 1.00
array/accumulate/Float32/dims=2L 657108 ns 657945 ns 1.00
array/accumulate/Int64/1d 118738 ns 118967 ns 1.00
array/accumulate/Int64/dims=1 79870 ns 79956 ns 1.00
array/accumulate/Int64/dims=1L 1705571 ns 1694445 ns 1.01
array/accumulate/Int64/dims=2 156206 ns 156040 ns 1.00
array/accumulate/Int64/dims=2L 961516 ns 961840 ns 1.00
array/broadcast 20611 ns 20347 ns 1.01
array/construct 1285.6 ns 1311.9 ns 0.98
array/copy 18763 ns 18931 ns 0.99
array/copyto!/cpu_to_gpu 214811 ns 215113 ns 1.00
array/copyto!/gpu_to_cpu 282848 ns 283517 ns 1.00
array/copyto!/gpu_to_gpu 11383 ns 11647 ns 0.98
array/iteration/findall/bool 130977 ns 132615 ns 0.99
array/iteration/findall/int 147872 ns 149623 ns 0.99
array/iteration/findfirst/bool 81414 ns 82175 ns 0.99
array/iteration/findfirst/int 83740 ns 84437 ns 0.99
array/iteration/findmin/1d 87551.5 ns 87647 ns 1.00
array/iteration/findmin/2d 117015 ns 117309 ns 1.00
array/iteration/logical 199736.5 ns 203627.5 ns 0.98
array/iteration/scalar 66057.5 ns 68729 ns 0.96
array/permutedims/2d 52506 ns 52820 ns 0.99
array/permutedims/3d 52690 ns 52914 ns 1.00
array/permutedims/4d 51768 ns 51983 ns 1.00
array/random/rand/Float32 13131 ns 13104 ns 1.00
array/random/rand/Int64 37428 ns 37312 ns 1.00
array/random/rand!/Float32 8547.666666666666 ns 8603.333333333334 ns 0.99
array/random/rand!/Int64 26583.5 ns 34156 ns 0.78
array/random/randn/Float32 41112.5 ns 38723.5 ns 1.06
array/random/randn!/Float32 27093 ns 31520 ns 0.86
array/reductions/mapreduce/Float32/1d 34955 ns 35427 ns 0.99
array/reductions/mapreduce/Float32/dims=1 49142.5 ns 49562 ns 0.99
array/reductions/mapreduce/Float32/dims=1L 51817 ns 51766 ns 1.00
array/reductions/mapreduce/Float32/dims=2 56653 ns 56838 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 69781 ns 69604 ns 1.00
array/reductions/mapreduce/Int64/1d 42998 ns 43423 ns 0.99
array/reductions/mapreduce/Int64/dims=1 42561.5 ns 44694.5 ns 0.95
array/reductions/mapreduce/Int64/dims=1L 87664 ns 87805 ns 1.00
array/reductions/mapreduce/Int64/dims=2 59599.5 ns 60051.5 ns 0.99
array/reductions/mapreduce/Int64/dims=2L 84819.5 ns 85186 ns 1.00
array/reductions/reduce/Float32/1d 35358 ns 35458 ns 1.00
array/reductions/reduce/Float32/dims=1 39982 ns 46307.5 ns 0.86
array/reductions/reduce/Float32/dims=1L 52029 ns 52046 ns 1.00
array/reductions/reduce/Float32/dims=2 57206.5 ns 57117 ns 1.00
array/reductions/reduce/Float32/dims=2L 70193 ns 70127.5 ns 1.00
array/reductions/reduce/Int64/1d 43311.5 ns 41220 ns 1.05
array/reductions/reduce/Int64/dims=1 52670.5 ns 51895 ns 1.01
array/reductions/reduce/Int64/dims=1L 87719 ns 87739 ns 1.00
array/reductions/reduce/Int64/dims=2 59882 ns 59630 ns 1.00
array/reductions/reduce/Int64/dims=2L 85070 ns 84743.5 ns 1.00
array/reverse/1d 18364 ns 18349 ns 1.00
array/reverse/1dL 68923 ns 68960 ns 1.00
array/reverse/1dL_inplace 65903 ns 65909 ns 1.00
array/reverse/1d_inplace 10248.333333333334 ns 8540.833333333332 ns 1.20
array/reverse/2d 20486 ns 20881 ns 0.98
array/reverse/2dL 72567 ns 72996 ns 0.99
array/reverse/2dL_inplace 65929 ns 65926 ns 1.00
array/reverse/2d_inplace 10500 ns 10076 ns 1.04
array/sorting/1d 2722785 ns 2735188.5 ns 1.00
array/sorting/2d 1068162.5 ns 1069206 ns 1.00
array/sorting/by 3288624 ns 3304125.5 ns 1.00
cuda/synchronization/context/auto 1142.6 ns 1176.2 ns 0.97
cuda/synchronization/context/blocking 928.9117647058823 ns 924.5869565217391 ns 1.00
cuda/synchronization/context/nonblocking 8313.599999999999 ns 6942.1 ns 1.20
cuda/synchronization/stream/auto 987.1875 ns 999.9375 ns 0.99
cuda/synchronization/stream/blocking 829.2358490566038 ns 787.7961165048544 ns 1.05
cuda/synchronization/stream/nonblocking 8391.6 ns 7168.6 ns 1.17
integration/byval/reference 143925 ns 143982 ns 1.00
integration/byval/slices=1 145800 ns 145868 ns 1.00
integration/byval/slices=2 284394 ns 284528 ns 1.00
integration/byval/slices=3 423048 ns 422970 ns 1.00
integration/cudadevrt 102533 ns 102612 ns 1.00
integration/volumerhs 9439972 ns 9440461 ns 1.00
kernel/indexing 13275 ns 13181 ns 1.01
kernel/indexing_checked 14027.5 ns 14081 ns 1.00
kernel/launch 2183 ns 2150.777777777778 ns 1.01
kernel/occupancy 697.3758389261745 ns 672 ns 1.04
kernel/rand 14341 ns 14396 ns 1.00
latency/import 3828591693.5 ns 3814290062.5 ns 1.00
latency/precompile 4601504962.5 ns 4590207670.5 ns 1.00
latency/ttfp 4414741861.5 ns 4409319020 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@alonsoC1s
Copy link
Copy Markdown
Author

I found what was making the tests fail. I think there are a few workarounds, so I'd like some opinions on which is more suitable

Basically, I defined the signature Base.similar(::CuSparseMatrix, T::Type, dims::NTuple{N, Int}) where N thinking the cases of tuples of length 1 and 2 were covered, and this only dispatched for 3-tuples or longer. However, this ended up being called instead of the more general Base.similar(A::AbstractArray, I::Tuple{Int}), which is used inside Base._unsafe_getindex. Effectively, this broke the implicit linearization that happens when you do:

A[:]

I tried easy fixes with limited success. A few possibilities to do this correctly would be

  1. Explicitly figure out the linearization so that it works as Base is expecting
  2. Only patch the fallback to Base.similar when 3 dimensions (as opposed to the normal 2) are given
  3. Patch the fallback only for N >= 3 by using generated functions or something similar

I'm not sure what the best way to proceed would be @maleadt

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 13, 2026

Codecov Report

❌ Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.40%. Comparing base (8a4c30c) to head (14bf466).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
lib/cusparse/src/array.jl 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3091      +/-   ##
==========================================
- Coverage   90.43%   90.40%   -0.04%     
==========================================
  Files         141      141              
  Lines       12025    12028       +3     
==========================================
- Hits        10875    10874       -1     
- Misses       1150     1154       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@maleadt
Copy link
Copy Markdown
Member

maleadt commented Apr 14, 2026

Can we copy SparseArrays.jl and only provide similar in CUSPARSE for only 1/2D? https://github.com/JuliaSparse/SparseArrays.jl/blob/3ab784c88070f1e8d4c3e424c9297547a956f63f/src/sparsematrix.jl#L723-L724
And then provide a similar(a::AbstractGPUSparseArray, ::Type{T}, dims::Dims{N}) fallback that allocates a CuArray?

@alonsoC1s
Copy link
Copy Markdown
Author

If I'm understanding correctly, this is what I tried to implement at first and what ended up causing problems down the line in Base. The fallback with Dims{N} (I used Ntuple directly, but this is better) ended up inadvertently covering the case when similar is called for a CUSPARSE array with a 1-tuple as dims. This particular method shouldn't be overloaded, as Base relies on it when doing implicit linearization

The problem isn't catching the case SparseArrays catches with Dims{N}, the problem is that doing so would take precedence over another method, and break lots of things. Defining

similar(::AbstractGPUSparseArray, ::Type{T}, dims::Dims{1})

ourselves would need to somehow call the much more general method for AbstractArrays with the right arguments. The ideal case would be to dispatch on Dims{N} if and only if N >= 3 while leaving N=1 alone

Comment thread lib/cusparse/src/array.jl
Base.similar(Mat::CuSparseMatrixCSR, T::Type, dims::Tuple{Int, Int}) = similar(Mat, T, dims...)
Base.similar(Mat::CuSparseMatrixCOO, T::Type, dims::Tuple{Int, Int}) = similar(Mat, T, dims...)
# The next one would overwrite `similar` for dims::Tuple{Int}, which breaks linearization
# Base.similar(::CuSparseMatrix, T::Type, dims::NTuple{N, Int}) where N = CuArray{T}(undef, dims)
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line works effectively the same as the fallback from SparseArrays. It uses NTuple instead of Dims{N}, but Dims{N} would have the same issue AFAICT

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants