Skip to content

Get rid of workspace buffers when requiring a new one.#3062

Merged
maleadt merged 1 commit intomasterfrom
tb/workspace
Mar 26, 2026
Merged

Get rid of workspace buffers when requiring a new one.#3062
maleadt merged 1 commit intomasterfrom
tb/workspace

Conversation

@maleadt
Copy link
Copy Markdown
Member

@maleadt maleadt commented Mar 26, 2026

This drops the libraries/cusolver/dense_generic which has been failing recently from 3710 MB RSS to 2110 MB.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.95%. Comparing base (b4d4284) to head (8b1a02b).
⚠️ Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3062      +/-   ##
==========================================
+ Coverage   89.74%   89.95%   +0.21%     
==========================================
  Files         159      159              
  Lines       13374    13374              
==========================================
+ Hits        12002    12031      +29     
+ Misses       1372     1343      -29     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Details
Benchmark suite Current: 8b1a02b Previous: b4d4284 Ratio
latency/precompile 4058602565 ns 4061473970 ns 1.00
latency/ttfp 14087237272 ns 14133764963 ns 1.00
latency/import 3542171230 ns 3547097268 ns 1.00
integration/volumerhs 9446785 ns 9426493 ns 1.00
integration/byval/slices=1 146076 ns 145913 ns 1.00
integration/byval/slices=3 423162 ns 423161 ns 1.00
integration/byval/reference 144173 ns 143947 ns 1.00
integration/byval/slices=2 284748 ns 284734 ns 1.00
integration/cudadevrt 102772 ns 102560 ns 1.00
kernel/indexing 13565 ns 13222 ns 1.03
kernel/indexing_checked 14271 ns 13991 ns 1.02
kernel/occupancy 659.5465838509317 ns 673.639751552795 ns 0.98
kernel/launch 2162.5 ns 2213.3333333333335 ns 0.98
kernel/rand 18279 ns 16428 ns 1.11
array/reverse/1d 18306 ns 18516 ns 0.99
array/reverse/2dL_inplace 66102.5 ns 66029 ns 1.00
array/reverse/1dL 68897 ns 68986 ns 1.00
array/reverse/2d 20632 ns 21144 ns 0.98
array/reverse/1d_inplace 10466.833333333332 ns 10316.333333333334 ns 1.01
array/reverse/2d_inplace 10670 ns 10547 ns 1.01
array/reverse/2dL 72742 ns 73179 ns 0.99
array/reverse/1dL_inplace 65960 ns 65885 ns 1.00
array/copy 18593 ns 18715 ns 0.99
array/iteration/findall/int 148897.5 ns 149711.5 ns 0.99
array/iteration/findall/bool 131473 ns 132274 ns 0.99
array/iteration/findfirst/int 82767.5 ns 83298 ns 0.99
array/iteration/findfirst/bool 80893 ns 81740 ns 0.99
array/iteration/scalar 68739 ns 68091 ns 1.01
array/iteration/logical 199703.5 ns 203632 ns 0.98
array/iteration/findmin/1d 84759 ns 87161.5 ns 0.97
array/iteration/findmin/2d 116619 ns 117314 ns 0.99
array/reductions/reduce/Int64/1d 42709 ns 43093.5 ns 0.99
array/reductions/reduce/Int64/dims=1 51699 ns 43339.5 ns 1.19
array/reductions/reduce/Int64/dims=2 59821 ns 59714.5 ns 1.00
array/reductions/reduce/Int64/dims=1L 87629 ns 87713 ns 1.00
array/reductions/reduce/Int64/dims=2L 84896 ns 84933 ns 1.00
array/reductions/reduce/Float32/1d 34824 ns 35099.5 ns 0.99
array/reductions/reduce/Float32/dims=1 40249.5 ns 49465.5 ns 0.81
array/reductions/reduce/Float32/dims=2 57452.5 ns 56884 ns 1.01
array/reductions/reduce/Float32/dims=1L 51898 ns 51935 ns 1.00
array/reductions/reduce/Float32/dims=2L 70145 ns 69990 ns 1.00
array/reductions/mapreduce/Int64/1d 42594 ns 43043 ns 0.99
array/reductions/mapreduce/Int64/dims=1 42456 ns 44383.5 ns 0.96
array/reductions/mapreduce/Int64/dims=2 59973 ns 59605 ns 1.01
array/reductions/mapreduce/Int64/dims=1L 87657 ns 87872 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 85167.5 ns 85316 ns 1.00
array/reductions/mapreduce/Float32/1d 34151.5 ns 35025 ns 0.98
array/reductions/mapreduce/Float32/dims=1 39911 ns 41001.5 ns 0.97
array/reductions/mapreduce/Float32/dims=2 57040 ns 56849 ns 1.00
array/reductions/mapreduce/Float32/dims=1L 51996 ns 51771 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 69528 ns 69231 ns 1.00
array/broadcast 20572 ns 20483 ns 1.00
array/copyto!/gpu_to_gpu 11131 ns 11330 ns 0.98
array/copyto!/cpu_to_gpu 213964 ns 214913 ns 1.00
array/copyto!/gpu_to_cpu 283062 ns 282237 ns 1.00
array/accumulate/Int64/1d 118404 ns 118707 ns 1.00
array/accumulate/Int64/dims=1 79481 ns 79832 ns 1.00
array/accumulate/Int64/dims=2 155850 ns 155666 ns 1.00
array/accumulate/Int64/dims=1L 1693815.5 ns 1695124.5 ns 1.00
array/accumulate/Int64/dims=2L 960659 ns 961014 ns 1.00
array/accumulate/Float32/1d 101080 ns 101589.5 ns 0.99
array/accumulate/Float32/dims=1 76385 ns 76781 ns 0.99
array/accumulate/Float32/dims=2 143233.5 ns 143784 ns 1.00
array/accumulate/Float32/dims=1L 1584876 ns 1585667 ns 1.00
array/accumulate/Float32/dims=2L 656755 ns 657563 ns 1.00
array/construct 1280.8 ns 1327.7 ns 0.96
array/random/randn/Float32 44009 ns 38841 ns 1.13
array/random/randn!/Float32 27321.5 ns 31414 ns 0.87
array/random/rand!/Int64 34378 ns 33583 ns 1.02
array/random/rand!/Float32 8588.333333333334 ns 8570 ns 1.00
array/random/rand/Int64 32498.5 ns 37488 ns 0.87
array/random/rand/Float32 13223 ns 13104 ns 1.01
array/permutedims/4d 51659 ns 52907.5 ns 0.98
array/permutedims/2d 52304 ns 52478 ns 1.00
array/permutedims/3d 52716 ns 53034 ns 0.99
array/sorting/1d 2734941 ns 2735432.5 ns 1.00
array/sorting/by 3303943 ns 3304825.5 ns 1.00
array/sorting/2d 1068616 ns 1068239 ns 1.00
cuda/synchronization/stream/auto 1088 ns 1031.3 ns 1.05
cuda/synchronization/stream/nonblocking 7461.1 ns 7742.8 ns 0.96
cuda/synchronization/stream/blocking 860.6739130434783 ns 831.5466666666666 ns 1.04
cuda/synchronization/context/auto 1241.4 ns 1181 ns 1.05
cuda/synchronization/context/nonblocking 6884.8 ns 8005.7 ns 0.86
cuda/synchronization/context/blocking 947.741935483871 ns 938.6410256410256 ns 1.01

This comment was automatically generated by workflow using github-action-benchmark.

@maleadt maleadt merged commit e260b92 into master Mar 26, 2026
2 checks passed
@maleadt maleadt deleted the tb/workspace branch March 26, 2026 09:50
kshyatt added a commit that referenced this pull request Apr 21, 2026
* Fix workspace size on 5.11 and CUDA 13.2

* Update Project.toml

Patch version bump
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant