Skip to content

More tests and bugfixes for CUSOLVER#2707

Merged
kshyatt merged 2 commits intomasterfrom
ksh/cusolver
Mar 21, 2025
Merged

More tests and bugfixes for CUSOLVER#2707
kshyatt merged 2 commits intomasterfrom
ksh/cusolver

Conversation

@kshyatt
Copy link
Copy Markdown
Member

@kshyatt kshyatt commented Mar 20, 2025

So it appears we have a duplication of inv for triangulars - if the matrix is strided, CUSOLVER is used, but if it's dense, CUBLAS is used. Not sure we want to keep that but for now this finally tests and fixes the CUSOLVER one and addresses another bug in Symmetrix \ CuMat.

@kshyatt kshyatt added bugfix This gets something working again. cuda libraries Stuff about CUDA library wrappers. tests Adds or changes tests. labels Mar 20, 2025
@kshyatt kshyatt requested a review from amontoison March 20, 2025 14:44
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 20, 2025

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic master) to apply these changes.

Click here to view the suggested changes.
diff --git a/test/libraries/cusolver/dense.jl b/test/libraries/cusolver/dense.jl
index 11f771a9b..2ae9d19c0 100644
--- a/test/libraries/cusolver/dense.jl
+++ b/test/libraries/cusolver/dense.jl
@@ -265,7 +265,7 @@ k = 1
         A             += A'
         d_A            = CuArray(A)
         Eig            = eigen(LinearAlgebra.Hermitian(A))
-        d_eig          = eigen(d_A)
+        d_eig = eigen(d_A)
         @test Eig.values ≈ collect(d_eig.values)
         d_eig          = eigen(LinearAlgebra.Hermitian(d_A))
         @test Eig.values ≈ collect(d_eig.values)
@@ -535,7 +535,7 @@ k = 1
         B              = rand(elty, n)
         d_B            = CuArray(B)
         @test collect(d_M \ d_B) ≈ M \ B
-        @test_throws DimensionMismatch("arguments must have the same number of rows") d_M \ CUDA.ones(elty, n+1)
+        @test_throws DimensionMismatch("arguments must have the same number of rows") d_M \ CUDA.ones(elty, n + 1)
         A              = rand(elty, m, n)  # A is a matrix and B,C is a vector
         d_A            = CuArray(A)
         M              = qr(A)
diff --git a/test/libraries/cusolver/dense_generic.jl b/test/libraries/cusolver/dense_generic.jl
index af851d314..2328ede47 100644
--- a/test/libraries/cusolver/dense_generic.jl
+++ b/test/libraries/cusolver/dense_generic.jl
@@ -96,7 +96,7 @@ p = 5
             @testset "pivoting = $pivoting" for pivoting in (false, true)
                 A = rand(elty,n,n)
                 B = rand(elty,n,p)
-                C = rand(elty,n)
+                C = rand(elty, n)
                 A = A + transpose(A)
                 d_A = CuMatrix(A)
                 d_B = CuMatrix(B)

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Details
Benchmark suite Current: eff6181 Previous: ab74bfc Ratio
latency/precompile 45578858184 ns 45609312670.5 ns 1.00
latency/ttfp 6537715929.5 ns 6509804251 ns 1.00
latency/import 3174056456.5 ns 3165547030 ns 1.00
integration/volumerhs 9616256.5 ns 9618874 ns 1.00
integration/byval/slices=1 146842 ns 146985 ns 1.00
integration/byval/slices=3 425135 ns 425068 ns 1.00
integration/byval/reference 145080.5 ns 145136 ns 1.00
integration/byval/slices=2 285884 ns 286134 ns 1.00
integration/cudadevrt 103462 ns 103387 ns 1.00
kernel/indexing 14349 ns 14093 ns 1.02
kernel/indexing_checked 14782 ns 14974 ns 0.99
kernel/occupancy 688.4697986577181 ns 665.62893081761 ns 1.03
kernel/launch 2098.7 ns 2199.6666666666665 ns 0.95
kernel/rand 15211 ns 15117 ns 1.01
array/reverse/1d 19927.5 ns 19957 ns 1.00
array/reverse/2d 25134 ns 24198 ns 1.04
array/reverse/1d_inplace 11425 ns 10158 ns 1.12
array/reverse/2d_inplace 13345 ns 11719 ns 1.14
array/copy 21276 ns 21382 ns 1.00
array/iteration/findall/int 159081 ns 157735 ns 1.01
array/iteration/findall/bool 139993 ns 138994.5 ns 1.01
array/iteration/findfirst/int 154502 ns 154611 ns 1.00
array/iteration/findfirst/bool 155177 ns 155155 ns 1.00
array/iteration/scalar 73281 ns 70419 ns 1.04
array/iteration/logical 216850.5 ns 215597 ns 1.01
array/iteration/findmin/1d 42096 ns 42227 ns 1.00
array/iteration/findmin/2d 94739 ns 94292 ns 1.00
array/reductions/reduce/1d 40714 ns 36180 ns 1.13
array/reductions/reduce/2d 44965 ns 50906.5 ns 0.88
array/reductions/mapreduce/1d 39496 ns 34242 ns 1.15
array/reductions/mapreduce/2d 51637 ns 41685.5 ns 1.24
array/broadcast 21179 ns 20673 ns 1.02
array/copyto!/gpu_to_gpu 11837 ns 13667 ns 0.87
array/copyto!/cpu_to_gpu 210619 ns 210128 ns 1.00
array/copyto!/gpu_to_cpu 244174 ns 243984 ns 1.00
array/accumulate/1d 109343 ns 109871 ns 1.00
array/accumulate/2d 80328 ns 80363 ns 1.00
array/construct 1288.9 ns 1295.6 ns 0.99
array/random/randn/Float32 43969.5 ns 49526 ns 0.89
array/random/randn!/Float32 26606 ns 26634 ns 1.00
array/random/rand!/Int64 27255 ns 27280 ns 1.00
array/random/rand!/Float32 8775.666666666666 ns 8717.666666666666 ns 1.01
array/random/rand/Int64 29951 ns 30145 ns 0.99
array/random/rand/Float32 13025.5 ns 13266 ns 0.98
array/permutedims/4d 61887.5 ns 61147 ns 1.01
array/permutedims/2d 56050.5 ns 55994 ns 1.00
array/permutedims/3d 56853 ns 56646 ns 1.00
array/sorting/1d 2777230 ns 2777259 ns 1.00
array/sorting/by 3368135 ns 3369070 ns 1.00
array/sorting/2d 1085502 ns 1085925.5 ns 1.00
cuda/synchronization/stream/auto 1047.3 ns 1011.6 ns 1.04
cuda/synchronization/stream/nonblocking 6573.700000000001 ns 6554.8 ns 1.00
cuda/synchronization/stream/blocking 853.275 ns 833.1359223300971 ns 1.02
cuda/synchronization/context/auto 1212.9 ns 1175.8 ns 1.03
cuda/synchronization/context/nonblocking 6716 ns 6767.2 ns 0.99
cuda/synchronization/context/blocking 940.59375 ns 939.3666666666667 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 20, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.97%. Comparing base (f7465ff) to head (aeeb48f).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2707      +/-   ##
==========================================
+ Coverage   86.71%   86.97%   +0.25%     
==========================================
  Files         153      153              
  Lines       13117    13117              
==========================================
+ Hits        11375    11408      +33     
+ Misses       1742     1709      -33     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kshyatt kshyatt force-pushed the ksh/cusolver branch 2 times, most recently from 381bf62 to 9b67c0d Compare March 21, 2025 01:13
@amontoison
Copy link
Copy Markdown
Member

What is the method of CUBLAS that is used?
Is it trsm used with I as right-hand side?

@maleadt
Copy link
Copy Markdown
Member

maleadt commented Mar 21, 2025

I think it dispatches from ldiv! through generic_trimatdiv! to trsv!.

It seems silly to keep both definitions, though. Can we benchmark them and select one? I would expect the specialized CUSOLVER method to perform better, but from a quick look it seems like the inverse (hah).

Comment thread lib/cusolver/dense_generic.jl
@kshyatt kshyatt merged commit cdb1e18 into master Mar 21, 2025
3 checks passed
@kshyatt kshyatt deleted the ksh/cusolver branch March 21, 2025 20:55
@maleadt
Copy link
Copy Markdown
Member

maleadt commented Mar 22, 2025

So it appears we have a duplication of inv for triangulars - if the matrix is strided, CUSOLVER is used, but if it's dense, CUBLAS is used.

So we're still keeping both versions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix This gets something working again. cuda libraries Stuff about CUDA library wrappers. tests Adds or changes tests.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants