Conversation
|
Your PR requires formatting changes to meet the project's style guidelines. Click here to view the suggested changes.diff --git a/test/libraries/cusolver/dense.jl b/test/libraries/cusolver/dense.jl
index 11f771a9b..2ae9d19c0 100644
--- a/test/libraries/cusolver/dense.jl
+++ b/test/libraries/cusolver/dense.jl
@@ -265,7 +265,7 @@ k = 1
A += A'
d_A = CuArray(A)
Eig = eigen(LinearAlgebra.Hermitian(A))
- d_eig = eigen(d_A)
+ d_eig = eigen(d_A)
@test Eig.values ≈ collect(d_eig.values)
d_eig = eigen(LinearAlgebra.Hermitian(d_A))
@test Eig.values ≈ collect(d_eig.values)
@@ -535,7 +535,7 @@ k = 1
B = rand(elty, n)
d_B = CuArray(B)
@test collect(d_M \ d_B) ≈ M \ B
- @test_throws DimensionMismatch("arguments must have the same number of rows") d_M \ CUDA.ones(elty, n+1)
+ @test_throws DimensionMismatch("arguments must have the same number of rows") d_M \ CUDA.ones(elty, n + 1)
A = rand(elty, m, n) # A is a matrix and B,C is a vector
d_A = CuArray(A)
M = qr(A)
diff --git a/test/libraries/cusolver/dense_generic.jl b/test/libraries/cusolver/dense_generic.jl
index af851d314..2328ede47 100644
--- a/test/libraries/cusolver/dense_generic.jl
+++ b/test/libraries/cusolver/dense_generic.jl
@@ -96,7 +96,7 @@ p = 5
@testset "pivoting = $pivoting" for pivoting in (false, true)
A = rand(elty,n,n)
B = rand(elty,n,p)
- C = rand(elty,n)
+ C = rand(elty, n)
A = A + transpose(A)
d_A = CuMatrix(A)
d_B = CuMatrix(B) |
There was a problem hiding this comment.
CUDA.jl Benchmarks
Details
| Benchmark suite | Current: eff6181 | Previous: ab74bfc | Ratio |
|---|---|---|---|
latency/precompile |
45578858184 ns |
45609312670.5 ns |
1.00 |
latency/ttfp |
6537715929.5 ns |
6509804251 ns |
1.00 |
latency/import |
3174056456.5 ns |
3165547030 ns |
1.00 |
integration/volumerhs |
9616256.5 ns |
9618874 ns |
1.00 |
integration/byval/slices=1 |
146842 ns |
146985 ns |
1.00 |
integration/byval/slices=3 |
425135 ns |
425068 ns |
1.00 |
integration/byval/reference |
145080.5 ns |
145136 ns |
1.00 |
integration/byval/slices=2 |
285884 ns |
286134 ns |
1.00 |
integration/cudadevrt |
103462 ns |
103387 ns |
1.00 |
kernel/indexing |
14349 ns |
14093 ns |
1.02 |
kernel/indexing_checked |
14782 ns |
14974 ns |
0.99 |
kernel/occupancy |
688.4697986577181 ns |
665.62893081761 ns |
1.03 |
kernel/launch |
2098.7 ns |
2199.6666666666665 ns |
0.95 |
kernel/rand |
15211 ns |
15117 ns |
1.01 |
array/reverse/1d |
19927.5 ns |
19957 ns |
1.00 |
array/reverse/2d |
25134 ns |
24198 ns |
1.04 |
array/reverse/1d_inplace |
11425 ns |
10158 ns |
1.12 |
array/reverse/2d_inplace |
13345 ns |
11719 ns |
1.14 |
array/copy |
21276 ns |
21382 ns |
1.00 |
array/iteration/findall/int |
159081 ns |
157735 ns |
1.01 |
array/iteration/findall/bool |
139993 ns |
138994.5 ns |
1.01 |
array/iteration/findfirst/int |
154502 ns |
154611 ns |
1.00 |
array/iteration/findfirst/bool |
155177 ns |
155155 ns |
1.00 |
array/iteration/scalar |
73281 ns |
70419 ns |
1.04 |
array/iteration/logical |
216850.5 ns |
215597 ns |
1.01 |
array/iteration/findmin/1d |
42096 ns |
42227 ns |
1.00 |
array/iteration/findmin/2d |
94739 ns |
94292 ns |
1.00 |
array/reductions/reduce/1d |
40714 ns |
36180 ns |
1.13 |
array/reductions/reduce/2d |
44965 ns |
50906.5 ns |
0.88 |
array/reductions/mapreduce/1d |
39496 ns |
34242 ns |
1.15 |
array/reductions/mapreduce/2d |
51637 ns |
41685.5 ns |
1.24 |
array/broadcast |
21179 ns |
20673 ns |
1.02 |
array/copyto!/gpu_to_gpu |
11837 ns |
13667 ns |
0.87 |
array/copyto!/cpu_to_gpu |
210619 ns |
210128 ns |
1.00 |
array/copyto!/gpu_to_cpu |
244174 ns |
243984 ns |
1.00 |
array/accumulate/1d |
109343 ns |
109871 ns |
1.00 |
array/accumulate/2d |
80328 ns |
80363 ns |
1.00 |
array/construct |
1288.9 ns |
1295.6 ns |
0.99 |
array/random/randn/Float32 |
43969.5 ns |
49526 ns |
0.89 |
array/random/randn!/Float32 |
26606 ns |
26634 ns |
1.00 |
array/random/rand!/Int64 |
27255 ns |
27280 ns |
1.00 |
array/random/rand!/Float32 |
8775.666666666666 ns |
8717.666666666666 ns |
1.01 |
array/random/rand/Int64 |
29951 ns |
30145 ns |
0.99 |
array/random/rand/Float32 |
13025.5 ns |
13266 ns |
0.98 |
array/permutedims/4d |
61887.5 ns |
61147 ns |
1.01 |
array/permutedims/2d |
56050.5 ns |
55994 ns |
1.00 |
array/permutedims/3d |
56853 ns |
56646 ns |
1.00 |
array/sorting/1d |
2777230 ns |
2777259 ns |
1.00 |
array/sorting/by |
3368135 ns |
3369070 ns |
1.00 |
array/sorting/2d |
1085502 ns |
1085925.5 ns |
1.00 |
cuda/synchronization/stream/auto |
1047.3 ns |
1011.6 ns |
1.04 |
cuda/synchronization/stream/nonblocking |
6573.700000000001 ns |
6554.8 ns |
1.00 |
cuda/synchronization/stream/blocking |
853.275 ns |
833.1359223300971 ns |
1.02 |
cuda/synchronization/context/auto |
1212.9 ns |
1175.8 ns |
1.03 |
cuda/synchronization/context/nonblocking |
6716 ns |
6767.2 ns |
0.99 |
cuda/synchronization/context/blocking |
940.59375 ns |
939.3666666666667 ns |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2707 +/- ##
==========================================
+ Coverage 86.71% 86.97% +0.25%
==========================================
Files 153 153
Lines 13117 13117
==========================================
+ Hits 11375 11408 +33
+ Misses 1742 1709 -33 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
381bf62 to
9b67c0d
Compare
|
What is the method of CUBLAS that is used? |
|
I think it dispatches from It seems silly to keep both definitions, though. Can we benchmark them and select one? I would expect the specialized CUSOLVER method to perform better, but from a quick look it seems like the inverse (hah). |
So we're still keeping both versions? |
So it appears we have a duplication of
invfor triangulars - if the matrix is strided, CUSOLVER is used, but if it's dense, CUBLAS is used. Not sure we want to keep that but for now this finally tests and fixes the CUSOLVER one and addresses another bug inSymmetrix \ CuMat.