Added Base.similar methods for CuSparseMatrixCOO and BSR#3114
Open
rainerrodrigues wants to merge 4 commits intoJuliaGPU:masterfrom
Open
Added Base.similar methods for CuSparseMatrixCOO and BSR#3114rainerrodrigues wants to merge 4 commits intoJuliaGPU:masterfrom
rainerrodrigues wants to merge 4 commits intoJuliaGPU:masterfrom
Conversation
kshyatt
reviewed
Apr 21, 2026
Member
|
Also, can some tests be added? |
Contributor
There was a problem hiding this comment.
CUDA.jl Benchmarks
Details
| Benchmark suite | Current: f08a059 | Previous: e4ac81a | Ratio |
|---|---|---|---|
array/accumulate/Float32/1d |
101247 ns |
101073 ns |
1.00 |
array/accumulate/Float32/dims=1 |
76743 ns |
76196 ns |
1.01 |
array/accumulate/Float32/dims=1L |
1585917 ns |
1585166 ns |
1.00 |
array/accumulate/Float32/dims=2 |
143733 ns |
143846 ns |
1.00 |
array/accumulate/Float32/dims=2L |
658278 ns |
657343 ns |
1.00 |
array/accumulate/Int64/1d |
118799 ns |
118428 ns |
1.00 |
array/accumulate/Int64/dims=1 |
80961 ns |
79813 ns |
1.01 |
array/accumulate/Int64/dims=1L |
1707513 ns |
1706332.5 ns |
1.00 |
array/accumulate/Int64/dims=2 |
156699.5 ns |
155958.5 ns |
1.00 |
array/accumulate/Int64/dims=2L |
962521 ns |
961689 ns |
1.00 |
array/broadcast |
20622 ns |
20223 ns |
1.02 |
array/construct |
1249.6 ns |
1268 ns |
0.99 |
array/copy |
18153 ns |
18010.5 ns |
1.01 |
array/copyto!/cpu_to_gpu |
214356 ns |
214386 ns |
1.00 |
array/copyto!/gpu_to_cpu |
283694 ns |
282599 ns |
1.00 |
array/copyto!/gpu_to_gpu |
10887 ns |
10725 ns |
1.02 |
array/iteration/findall/bool |
135104 ns |
133957 ns |
1.01 |
array/iteration/findall/int |
150112 ns |
148817 ns |
1.01 |
array/iteration/findfirst/bool |
81558 ns |
80695 ns |
1.01 |
array/iteration/findfirst/int |
82867 ns |
82681 ns |
1.00 |
array/iteration/findmin/1d |
83372 ns |
85081 ns |
0.98 |
array/iteration/findmin/2d |
116997 ns |
116308 ns |
1.01 |
array/iteration/logical |
200714.5 ns |
196867 ns |
1.02 |
array/iteration/scalar |
66145 ns |
66869 ns |
0.99 |
array/permutedims/2d |
52371.5 ns |
51940.5 ns |
1.01 |
array/permutedims/3d |
52407 ns |
52252 ns |
1.00 |
array/permutedims/4d |
52082.5 ns |
51176 ns |
1.02 |
array/random/rand/Float32 |
13014 ns |
13404 ns |
0.97 |
array/random/rand/Int64 |
25431 ns |
24711 ns |
1.03 |
array/random/rand!/Float32 |
10177.666666666666 ns |
10187 ns |
1.00 |
array/random/rand!/Int64 |
21989 ns |
21588 ns |
1.02 |
array/random/randn/Float32 |
37480 ns |
43197 ns |
0.87 |
array/random/randn!/Float32 |
30651 ns |
30898 ns |
0.99 |
array/reductions/mapreduce/Float32/1d |
33980 ns |
34650 ns |
0.98 |
array/reductions/mapreduce/Float32/dims=1 |
40495.5 ns |
39535.5 ns |
1.02 |
array/reductions/mapreduce/Float32/dims=1L |
51393.5 ns |
51174 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2 |
56615 ns |
56317 ns |
1.01 |
array/reductions/mapreduce/Float32/dims=2L |
69345 ns |
69320 ns |
1.00 |
array/reductions/mapreduce/Int64/1d |
42363 ns |
42198 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=1 |
42214 ns |
41733 ns |
1.01 |
array/reductions/mapreduce/Int64/dims=1L |
87299 ns |
86905 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2 |
59569 ns |
59221 ns |
1.01 |
array/reductions/mapreduce/Int64/dims=2L |
84785.5 ns |
84434 ns |
1.00 |
array/reductions/reduce/Float32/1d |
34384 ns |
34198.5 ns |
1.01 |
array/reductions/reduce/Float32/dims=1 |
40444 ns |
39152.5 ns |
1.03 |
array/reductions/reduce/Float32/dims=1L |
51476 ns |
51208.5 ns |
1.01 |
array/reductions/reduce/Float32/dims=2 |
56764 ns |
56419 ns |
1.01 |
array/reductions/reduce/Float32/dims=2L |
69652 ns |
69565 ns |
1.00 |
array/reductions/reduce/Int64/1d |
42501 ns |
42368 ns |
1.00 |
array/reductions/reduce/Int64/dims=1 |
42020 ns |
50017 ns |
0.84 |
array/reductions/reduce/Int64/dims=1L |
87188 ns |
86919 ns |
1.00 |
array/reductions/reduce/Int64/dims=2 |
59606.5 ns |
59687 ns |
1.00 |
array/reductions/reduce/Int64/dims=2L |
84660.5 ns |
84484 ns |
1.00 |
array/reverse/1d |
18051 ns |
17716 ns |
1.02 |
array/reverse/1dL |
68725 ns |
68268 ns |
1.01 |
array/reverse/1dL_inplace |
65738 ns |
65642 ns |
1.00 |
array/reverse/1d_inplace |
8549.333333333334 ns |
10197.333333333334 ns |
0.84 |
array/reverse/2d |
21092 ns |
20523 ns |
1.03 |
array/reverse/2dL |
73335 ns |
72523 ns |
1.01 |
array/reverse/2dL_inplace |
65750 ns |
65706 ns |
1.00 |
array/reverse/2d_inplace |
9950 ns |
9831 ns |
1.01 |
array/sorting/1d |
2736444 ns |
2735407.5 ns |
1.00 |
array/sorting/2d |
1069601 ns |
1068528 ns |
1.00 |
array/sorting/by |
3306636 ns |
3304139 ns |
1.00 |
cuda/synchronization/context/auto |
1131.6 ns |
1165.7 ns |
0.97 |
cuda/synchronization/context/blocking |
920.377358490566 ns |
876.8679245283018 ns |
1.05 |
cuda/synchronization/context/nonblocking |
7005.6 ns |
6853.1 ns |
1.02 |
cuda/synchronization/stream/auto |
981.1875 ns |
1035.7142857142858 ns |
0.95 |
cuda/synchronization/stream/blocking |
802.4257425742575 ns |
789.9066666666666 ns |
1.02 |
cuda/synchronization/stream/nonblocking |
7178.8 ns |
7513.5 ns |
0.96 |
integration/byval/reference |
143915 ns |
143670 ns |
1.00 |
integration/byval/slices=1 |
145772 ns |
145632 ns |
1.00 |
integration/byval/slices=2 |
284572 ns |
284397 ns |
1.00 |
integration/byval/slices=3 |
422978 ns |
423006 ns |
1.00 |
integration/cudadevrt |
102395 ns |
102385 ns |
1.00 |
integration/volumerhs |
23468528 ns |
23431934.5 ns |
1.00 |
kernel/indexing |
13234 ns |
13020 ns |
1.02 |
kernel/indexing_checked |
13957 ns |
13828 ns |
1.01 |
kernel/launch |
2182.3333333333335 ns |
2128.1111111111113 ns |
1.03 |
kernel/occupancy |
679.6064516129032 ns |
701.7571428571429 ns |
0.97 |
kernel/rand |
16515 ns |
14086 ns |
1.17 |
latency/import |
77746111924 ns |
3826333658 ns |
20.32 |
latency/precompile |
8301636095.5 ns |
4608074622.5 ns |
1.80 |
latency/ttfp |
78328318500 ns |
4404025901.5 ns |
17.79 |
This comment was automatically generated by workflow using github-action-benchmark.
kshyatt
reviewed
Apr 21, 2026
rainerrodrigues
commented
Apr 22, 2026
Author
rainerrodrigues
left a comment
There was a problem hiding this comment.
@kshyatt Hi, can you check if this is suitable and extensive enough for testing?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds the missing Base.similar methods for CuSparseMatrixCOO and CuSparseMatrixBSR, allowing them to fallback gracefully without converting to dense CPU arrays.
Fixes #3061
Fixes #3055