More tests and bugfixes for CUSOLVER by kshyatt · Pull Request #2707 · JuliaGPU/CUDA.jl

kshyatt · 2025-03-20T14:44:58Z

So it appears we have a duplication of inv for triangulars - if the matrix is strided, CUSOLVER is used, but if it's dense, CUBLAS is used. Not sure we want to keep that but for now this finally tests and fixes the CUSOLVER one and addresses another bug in Symmetrix \ CuMat.

github-actions · 2025-03-20T14:45:34Z

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic master) to apply these changes.

Click here to view the suggested changes.

diff --git a/test/libraries/cusolver/dense.jl b/test/libraries/cusolver/dense.jl
index 11f771a9b..2ae9d19c0 100644
--- a/test/libraries/cusolver/dense.jl
+++ b/test/libraries/cusolver/dense.jl
@@ -265,7 +265,7 @@ k = 1
         A             += A'
         d_A            = CuArray(A)
         Eig            = eigen(LinearAlgebra.Hermitian(A))
-        d_eig          = eigen(d_A)
+        d_eig = eigen(d_A)
         @test Eig.values ≈ collect(d_eig.values)
         d_eig          = eigen(LinearAlgebra.Hermitian(d_A))
         @test Eig.values ≈ collect(d_eig.values)
@@ -535,7 +535,7 @@ k = 1
         B              = rand(elty, n)
         d_B            = CuArray(B)
         @test collect(d_M \ d_B) ≈ M \ B
-        @test_throws DimensionMismatch("arguments must have the same number of rows") d_M \ CUDA.ones(elty, n+1)
+        @test_throws DimensionMismatch("arguments must have the same number of rows") d_M \ CUDA.ones(elty, n + 1)
         A              = rand(elty, m, n)  # A is a matrix and B,C is a vector
         d_A            = CuArray(A)
         M              = qr(A)
diff --git a/test/libraries/cusolver/dense_generic.jl b/test/libraries/cusolver/dense_generic.jl
index af851d314..2328ede47 100644
--- a/test/libraries/cusolver/dense_generic.jl
+++ b/test/libraries/cusolver/dense_generic.jl
@@ -96,7 +96,7 @@ p = 5
             @testset "pivoting = $pivoting" for pivoting in (false, true)
                 A = rand(elty,n,n)
                 B = rand(elty,n,p)
-                C = rand(elty,n)
+                C = rand(elty, n)
                 A = A + transpose(A)
                 d_A = CuMatrix(A)
                 d_B = CuMatrix(B)

github-actions

CUDA.jl Benchmarks

Details

Benchmark suite	Current: `eff6181`	Previous: `ab74bfc`	Ratio
`latency/precompile`	`45578858184` ns	`45609312670.5` ns	`1.00`
`latency/ttfp`	`6537715929.5` ns	`6509804251` ns	`1.00`
`latency/import`	`3174056456.5` ns	`3165547030` ns	`1.00`
`integration/volumerhs`	`9616256.5` ns	`9618874` ns	`1.00`
`integration/byval/slices=1`	`146842` ns	`146985` ns	`1.00`
`integration/byval/slices=3`	`425135` ns	`425068` ns	`1.00`
`integration/byval/reference`	`145080.5` ns	`145136` ns	`1.00`
`integration/byval/slices=2`	`285884` ns	`286134` ns	`1.00`
`integration/cudadevrt`	`103462` ns	`103387` ns	`1.00`
`kernel/indexing`	`14349` ns	`14093` ns	`1.02`
`kernel/indexing_checked`	`14782` ns	`14974` ns	`0.99`
`kernel/occupancy`	`688.4697986577181` ns	`665.62893081761` ns	`1.03`
`kernel/launch`	`2098.7` ns	`2199.6666666666665` ns	`0.95`
`kernel/rand`	`15211` ns	`15117` ns	`1.01`
`array/reverse/1d`	`19927.5` ns	`19957` ns	`1.00`
`array/reverse/2d`	`25134` ns	`24198` ns	`1.04`
`array/reverse/1d_inplace`	`11425` ns	`10158` ns	`1.12`
`array/reverse/2d_inplace`	`13345` ns	`11719` ns	`1.14`
`array/copy`	`21276` ns	`21382` ns	`1.00`
`array/iteration/findall/int`	`159081` ns	`157735` ns	`1.01`
`array/iteration/findall/bool`	`139993` ns	`138994.5` ns	`1.01`
`array/iteration/findfirst/int`	`154502` ns	`154611` ns	`1.00`
`array/iteration/findfirst/bool`	`155177` ns	`155155` ns	`1.00`
`array/iteration/scalar`	`73281` ns	`70419` ns	`1.04`
`array/iteration/logical`	`216850.5` ns	`215597` ns	`1.01`
`array/iteration/findmin/1d`	`42096` ns	`42227` ns	`1.00`
`array/iteration/findmin/2d`	`94739` ns	`94292` ns	`1.00`
`array/reductions/reduce/1d`	`40714` ns	`36180` ns	`1.13`
`array/reductions/reduce/2d`	`44965` ns	`50906.5` ns	`0.88`
`array/reductions/mapreduce/1d`	`39496` ns	`34242` ns	`1.15`
`array/reductions/mapreduce/2d`	`51637` ns	`41685.5` ns	`1.24`
`array/broadcast`	`21179` ns	`20673` ns	`1.02`
`array/copyto!/gpu_to_gpu`	`11837` ns	`13667` ns	`0.87`
`array/copyto!/cpu_to_gpu`	`210619` ns	`210128` ns	`1.00`
`array/copyto!/gpu_to_cpu`	`244174` ns	`243984` ns	`1.00`
`array/accumulate/1d`	`109343` ns	`109871` ns	`1.00`
`array/accumulate/2d`	`80328` ns	`80363` ns	`1.00`
`array/construct`	`1288.9` ns	`1295.6` ns	`0.99`
`array/random/randn/Float32`	`43969.5` ns	`49526` ns	`0.89`
`array/random/randn!/Float32`	`26606` ns	`26634` ns	`1.00`
`array/random/rand!/Int64`	`27255` ns	`27280` ns	`1.00`
`array/random/rand!/Float32`	`8775.666666666666` ns	`8717.666666666666` ns	`1.01`
`array/random/rand/Int64`	`29951` ns	`30145` ns	`0.99`
`array/random/rand/Float32`	`13025.5` ns	`13266` ns	`0.98`
`array/permutedims/4d`	`61887.5` ns	`61147` ns	`1.01`
`array/permutedims/2d`	`56050.5` ns	`55994` ns	`1.00`
`array/permutedims/3d`	`56853` ns	`56646` ns	`1.00`
`array/sorting/1d`	`2777230` ns	`2777259` ns	`1.00`
`array/sorting/by`	`3368135` ns	`3369070` ns	`1.00`
`array/sorting/2d`	`1085502` ns	`1085925.5` ns	`1.00`
`cuda/synchronization/stream/auto`	`1047.3` ns	`1011.6` ns	`1.04`
`cuda/synchronization/stream/nonblocking`	`6573.700000000001` ns	`6554.8` ns	`1.00`
`cuda/synchronization/stream/blocking`	`853.275` ns	`833.1359223300971` ns	`1.02`
`cuda/synchronization/context/auto`	`1212.9` ns	`1175.8` ns	`1.03`
`cuda/synchronization/context/nonblocking`	`6716` ns	`6767.2` ns	`0.99`
`cuda/synchronization/context/blocking`	`940.59375` ns	`939.3666666666667` ns	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

codecov · 2025-03-20T17:58:04Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.97%. Comparing base (f7465ff) to head (aeeb48f).
Report is 1 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2707      +/-   ##
==========================================
+ Coverage   86.71%   86.97%   +0.25%     
==========================================
  Files         153      153              
  Lines       13117    13117              
==========================================
+ Hits        11375    11408      +33     
+ Misses       1742     1709      -33

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

amontoison · 2025-03-21T06:00:41Z

What is the method of CUBLAS that is used?
Is it trsm used with I as right-hand side?

maleadt · 2025-03-21T06:22:06Z

I think it dispatches from ldiv! through generic_trimatdiv! to trsv!.

It seems silly to keep both definitions, though. Can we benchmark them and select one? I would expect the specialized CUSOLVER method to perform better, but from a quick look it seems like the inverse (hah).

maleadt · 2025-03-22T08:34:16Z

So it appears we have a duplication of inv for triangulars - if the matrix is strided, CUSOLVER is used, but if it's dense, CUBLAS is used.

So we're still keeping both versions?

kshyatt added bugfix This gets something working again. cuda libraries Stuff about CUDA library wrappers. tests Adds or changes tests. labels Mar 20, 2025

kshyatt requested a review from amontoison March 20, 2025 14:44

github-actions Bot reviewed Mar 20, 2025

View reviewed changes

kshyatt force-pushed the ksh/cusolver branch 2 times, most recently from 381bf62 to 9b67c0d Compare March 21, 2025 01:13

amontoison reviewed Mar 21, 2025

View reviewed changes

Comment thread lib/cusolver/dense_generic.jl

More tests and bugfixes for CUSOLVER

17dd92e

kshyatt force-pushed the ksh/cusolver branch from 9b67c0d to aeeb48f Compare March 21, 2025 16:46

Add a test for sytrs

aeeb48f

amontoison approved these changes Mar 21, 2025

View reviewed changes

kshyatt merged commit cdb1e18 into master Mar 21, 2025
3 checks passed

kshyatt deleted the ksh/cusolver branch March 21, 2025 20:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More tests and bugfixes for CUSOLVER#2707

More tests and bugfixes for CUSOLVER#2707
kshyatt merged 2 commits intomasterfrom
ksh/cusolver

kshyatt commented Mar 20, 2025

Uh oh!

github-actions Bot commented Mar 20, 2025 •

edited

Loading

Uh oh!

github-actions Bot left a comment

Uh oh!

codecov Bot commented Mar 20, 2025 •

edited

Loading

Uh oh!

amontoison commented Mar 21, 2025

Uh oh!

maleadt commented Mar 21, 2025

Uh oh!

Uh oh!

Uh oh!

maleadt commented Mar 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kshyatt commented Mar 20, 2025

Uh oh!

github-actions Bot commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

CUDA.jl Benchmarks

Uh oh!

codecov Bot commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

amontoison commented Mar 21, 2025

Uh oh!

maleadt commented Mar 21, 2025

Uh oh!

Uh oh!

Uh oh!

maleadt commented Mar 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Mar 20, 2025 •

edited

Loading

codecov Bot commented Mar 20, 2025 •

edited

Loading