Homogenize default behavior of higher-dimensional similar to SparseArrays#3091
Homogenize default behavior of higher-dimensional similar to SparseArrays#3091alonsoC1s wants to merge 4 commits intoJuliaGPU:masterfrom
Conversation
|
Test failures related. |
|
@maleadt my bad. I think I got it this time |
There was a problem hiding this comment.
CUDA.jl Benchmarks
Details
| Benchmark suite | Current: 14bf466 | Previous: 6ccd4b4 | Ratio |
|---|---|---|---|
array/accumulate/Float32/1d |
101152 ns |
101723 ns |
0.99 |
array/accumulate/Float32/dims=1 |
76463 ns |
76608 ns |
1.00 |
array/accumulate/Float32/dims=1L |
1585076 ns |
1585294 ns |
1.00 |
array/accumulate/Float32/dims=2 |
143498.5 ns |
143948 ns |
1.00 |
array/accumulate/Float32/dims=2L |
657108 ns |
657945 ns |
1.00 |
array/accumulate/Int64/1d |
118738 ns |
118967 ns |
1.00 |
array/accumulate/Int64/dims=1 |
79870 ns |
79956 ns |
1.00 |
array/accumulate/Int64/dims=1L |
1705571 ns |
1694445 ns |
1.01 |
array/accumulate/Int64/dims=2 |
156206 ns |
156040 ns |
1.00 |
array/accumulate/Int64/dims=2L |
961516 ns |
961840 ns |
1.00 |
array/broadcast |
20611 ns |
20347 ns |
1.01 |
array/construct |
1285.6 ns |
1311.9 ns |
0.98 |
array/copy |
18763 ns |
18931 ns |
0.99 |
array/copyto!/cpu_to_gpu |
214811 ns |
215113 ns |
1.00 |
array/copyto!/gpu_to_cpu |
282848 ns |
283517 ns |
1.00 |
array/copyto!/gpu_to_gpu |
11383 ns |
11647 ns |
0.98 |
array/iteration/findall/bool |
130977 ns |
132615 ns |
0.99 |
array/iteration/findall/int |
147872 ns |
149623 ns |
0.99 |
array/iteration/findfirst/bool |
81414 ns |
82175 ns |
0.99 |
array/iteration/findfirst/int |
83740 ns |
84437 ns |
0.99 |
array/iteration/findmin/1d |
87551.5 ns |
87647 ns |
1.00 |
array/iteration/findmin/2d |
117015 ns |
117309 ns |
1.00 |
array/iteration/logical |
199736.5 ns |
203627.5 ns |
0.98 |
array/iteration/scalar |
66057.5 ns |
68729 ns |
0.96 |
array/permutedims/2d |
52506 ns |
52820 ns |
0.99 |
array/permutedims/3d |
52690 ns |
52914 ns |
1.00 |
array/permutedims/4d |
51768 ns |
51983 ns |
1.00 |
array/random/rand/Float32 |
13131 ns |
13104 ns |
1.00 |
array/random/rand/Int64 |
37428 ns |
37312 ns |
1.00 |
array/random/rand!/Float32 |
8547.666666666666 ns |
8603.333333333334 ns |
0.99 |
array/random/rand!/Int64 |
26583.5 ns |
34156 ns |
0.78 |
array/random/randn/Float32 |
41112.5 ns |
38723.5 ns |
1.06 |
array/random/randn!/Float32 |
27093 ns |
31520 ns |
0.86 |
array/reductions/mapreduce/Float32/1d |
34955 ns |
35427 ns |
0.99 |
array/reductions/mapreduce/Float32/dims=1 |
49142.5 ns |
49562 ns |
0.99 |
array/reductions/mapreduce/Float32/dims=1L |
51817 ns |
51766 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2 |
56653 ns |
56838 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2L |
69781 ns |
69604 ns |
1.00 |
array/reductions/mapreduce/Int64/1d |
42998 ns |
43423 ns |
0.99 |
array/reductions/mapreduce/Int64/dims=1 |
42561.5 ns |
44694.5 ns |
0.95 |
array/reductions/mapreduce/Int64/dims=1L |
87664 ns |
87805 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2 |
59599.5 ns |
60051.5 ns |
0.99 |
array/reductions/mapreduce/Int64/dims=2L |
84819.5 ns |
85186 ns |
1.00 |
array/reductions/reduce/Float32/1d |
35358 ns |
35458 ns |
1.00 |
array/reductions/reduce/Float32/dims=1 |
39982 ns |
46307.5 ns |
0.86 |
array/reductions/reduce/Float32/dims=1L |
52029 ns |
52046 ns |
1.00 |
array/reductions/reduce/Float32/dims=2 |
57206.5 ns |
57117 ns |
1.00 |
array/reductions/reduce/Float32/dims=2L |
70193 ns |
70127.5 ns |
1.00 |
array/reductions/reduce/Int64/1d |
43311.5 ns |
41220 ns |
1.05 |
array/reductions/reduce/Int64/dims=1 |
52670.5 ns |
51895 ns |
1.01 |
array/reductions/reduce/Int64/dims=1L |
87719 ns |
87739 ns |
1.00 |
array/reductions/reduce/Int64/dims=2 |
59882 ns |
59630 ns |
1.00 |
array/reductions/reduce/Int64/dims=2L |
85070 ns |
84743.5 ns |
1.00 |
array/reverse/1d |
18364 ns |
18349 ns |
1.00 |
array/reverse/1dL |
68923 ns |
68960 ns |
1.00 |
array/reverse/1dL_inplace |
65903 ns |
65909 ns |
1.00 |
array/reverse/1d_inplace |
10248.333333333334 ns |
8540.833333333332 ns |
1.20 |
array/reverse/2d |
20486 ns |
20881 ns |
0.98 |
array/reverse/2dL |
72567 ns |
72996 ns |
0.99 |
array/reverse/2dL_inplace |
65929 ns |
65926 ns |
1.00 |
array/reverse/2d_inplace |
10500 ns |
10076 ns |
1.04 |
array/sorting/1d |
2722785 ns |
2735188.5 ns |
1.00 |
array/sorting/2d |
1068162.5 ns |
1069206 ns |
1.00 |
array/sorting/by |
3288624 ns |
3304125.5 ns |
1.00 |
cuda/synchronization/context/auto |
1142.6 ns |
1176.2 ns |
0.97 |
cuda/synchronization/context/blocking |
928.9117647058823 ns |
924.5869565217391 ns |
1.00 |
cuda/synchronization/context/nonblocking |
8313.599999999999 ns |
6942.1 ns |
1.20 |
cuda/synchronization/stream/auto |
987.1875 ns |
999.9375 ns |
0.99 |
cuda/synchronization/stream/blocking |
829.2358490566038 ns |
787.7961165048544 ns |
1.05 |
cuda/synchronization/stream/nonblocking |
8391.6 ns |
7168.6 ns |
1.17 |
integration/byval/reference |
143925 ns |
143982 ns |
1.00 |
integration/byval/slices=1 |
145800 ns |
145868 ns |
1.00 |
integration/byval/slices=2 |
284394 ns |
284528 ns |
1.00 |
integration/byval/slices=3 |
423048 ns |
422970 ns |
1.00 |
integration/cudadevrt |
102533 ns |
102612 ns |
1.00 |
integration/volumerhs |
9439972 ns |
9440461 ns |
1.00 |
kernel/indexing |
13275 ns |
13181 ns |
1.01 |
kernel/indexing_checked |
14027.5 ns |
14081 ns |
1.00 |
kernel/launch |
2183 ns |
2150.777777777778 ns |
1.01 |
kernel/occupancy |
697.3758389261745 ns |
672 ns |
1.04 |
kernel/rand |
14341 ns |
14396 ns |
1.00 |
latency/import |
3828591693.5 ns |
3814290062.5 ns |
1.00 |
latency/precompile |
4601504962.5 ns |
4590207670.5 ns |
1.00 |
latency/ttfp |
4414741861.5 ns |
4409319020 ns |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
|
I found what was making the tests fail. I think there are a few workarounds, so I'd like some opinions on which is more suitable Basically, I defined the signature A[:]I tried easy fixes with limited success. A few possibilities to do this correctly would be
I'm not sure what the best way to proceed would be @maleadt |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3091 +/- ##
==========================================
- Coverage 90.43% 90.40% -0.04%
==========================================
Files 141 141
Lines 12025 12028 +3
==========================================
- Hits 10875 10874 -1
- Misses 1150 1154 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Can we copy SparseArrays.jl and only provide |
|
If I'm understanding correctly, this is what I tried to implement at first and what ended up causing problems down the line in Base. The fallback with The problem isn't catching the case SparseArrays catches with similar(::AbstractGPUSparseArray, ::Type{T}, dims::Dims{1})ourselves would need to somehow call the much more general method for AbstractArrays with the right arguments. The ideal case would be to dispatch on |
| Base.similar(Mat::CuSparseMatrixCSR, T::Type, dims::Tuple{Int, Int}) = similar(Mat, T, dims...) | ||
| Base.similar(Mat::CuSparseMatrixCOO, T::Type, dims::Tuple{Int, Int}) = similar(Mat, T, dims...) | ||
| # The next one would overwrite `similar` for dims::Tuple{Int}, which breaks linearization | ||
| # Base.similar(::CuSparseMatrix, T::Type, dims::NTuple{N, Int}) where N = CuArray{T}(undef, dims) |
There was a problem hiding this comment.
This line works effectively the same as the fallback from SparseArrays. It uses NTuple instead of Dims{N}, but Dims{N} would have the same issue AFAICT
As mentioned in #3061 , when calling
similarwith CUSPARSE arguments and dimensions higher than 2, the fallback CPU method is called. As a result, calls tosimilarwith GPU (sparse) matrices sometimes return host arrays, causing scalar indexing issues down the lineThis PR introduces a fallback to
similarthat creates un-initializedCuArrays when there are 3 or more dimensions given as arguments, which is similar to what Base and SparseArrays doImportant note: I wasn't able to fully test this locally due to some issues instantiating the repo environment with Pkg