Get rid of workspace buffers when requiring a new one.#3062
Merged
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3062 +/- ##
==========================================
+ Coverage 89.74% 89.95% +0.21%
==========================================
Files 159 159
Lines 13374 13374
==========================================
+ Hits 12002 12031 +29
+ Misses 1372 1343 -29 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Contributor
There was a problem hiding this comment.
CUDA.jl Benchmarks
Details
| Benchmark suite | Current: 8b1a02b | Previous: b4d4284 | Ratio |
|---|---|---|---|
latency/precompile |
4058602565 ns |
4061473970 ns |
1.00 |
latency/ttfp |
14087237272 ns |
14133764963 ns |
1.00 |
latency/import |
3542171230 ns |
3547097268 ns |
1.00 |
integration/volumerhs |
9446785 ns |
9426493 ns |
1.00 |
integration/byval/slices=1 |
146076 ns |
145913 ns |
1.00 |
integration/byval/slices=3 |
423162 ns |
423161 ns |
1.00 |
integration/byval/reference |
144173 ns |
143947 ns |
1.00 |
integration/byval/slices=2 |
284748 ns |
284734 ns |
1.00 |
integration/cudadevrt |
102772 ns |
102560 ns |
1.00 |
kernel/indexing |
13565 ns |
13222 ns |
1.03 |
kernel/indexing_checked |
14271 ns |
13991 ns |
1.02 |
kernel/occupancy |
659.5465838509317 ns |
673.639751552795 ns |
0.98 |
kernel/launch |
2162.5 ns |
2213.3333333333335 ns |
0.98 |
kernel/rand |
18279 ns |
16428 ns |
1.11 |
array/reverse/1d |
18306 ns |
18516 ns |
0.99 |
array/reverse/2dL_inplace |
66102.5 ns |
66029 ns |
1.00 |
array/reverse/1dL |
68897 ns |
68986 ns |
1.00 |
array/reverse/2d |
20632 ns |
21144 ns |
0.98 |
array/reverse/1d_inplace |
10466.833333333332 ns |
10316.333333333334 ns |
1.01 |
array/reverse/2d_inplace |
10670 ns |
10547 ns |
1.01 |
array/reverse/2dL |
72742 ns |
73179 ns |
0.99 |
array/reverse/1dL_inplace |
65960 ns |
65885 ns |
1.00 |
array/copy |
18593 ns |
18715 ns |
0.99 |
array/iteration/findall/int |
148897.5 ns |
149711.5 ns |
0.99 |
array/iteration/findall/bool |
131473 ns |
132274 ns |
0.99 |
array/iteration/findfirst/int |
82767.5 ns |
83298 ns |
0.99 |
array/iteration/findfirst/bool |
80893 ns |
81740 ns |
0.99 |
array/iteration/scalar |
68739 ns |
68091 ns |
1.01 |
array/iteration/logical |
199703.5 ns |
203632 ns |
0.98 |
array/iteration/findmin/1d |
84759 ns |
87161.5 ns |
0.97 |
array/iteration/findmin/2d |
116619 ns |
117314 ns |
0.99 |
array/reductions/reduce/Int64/1d |
42709 ns |
43093.5 ns |
0.99 |
array/reductions/reduce/Int64/dims=1 |
51699 ns |
43339.5 ns |
1.19 |
array/reductions/reduce/Int64/dims=2 |
59821 ns |
59714.5 ns |
1.00 |
array/reductions/reduce/Int64/dims=1L |
87629 ns |
87713 ns |
1.00 |
array/reductions/reduce/Int64/dims=2L |
84896 ns |
84933 ns |
1.00 |
array/reductions/reduce/Float32/1d |
34824 ns |
35099.5 ns |
0.99 |
array/reductions/reduce/Float32/dims=1 |
40249.5 ns |
49465.5 ns |
0.81 |
array/reductions/reduce/Float32/dims=2 |
57452.5 ns |
56884 ns |
1.01 |
array/reductions/reduce/Float32/dims=1L |
51898 ns |
51935 ns |
1.00 |
array/reductions/reduce/Float32/dims=2L |
70145 ns |
69990 ns |
1.00 |
array/reductions/mapreduce/Int64/1d |
42594 ns |
43043 ns |
0.99 |
array/reductions/mapreduce/Int64/dims=1 |
42456 ns |
44383.5 ns |
0.96 |
array/reductions/mapreduce/Int64/dims=2 |
59973 ns |
59605 ns |
1.01 |
array/reductions/mapreduce/Int64/dims=1L |
87657 ns |
87872 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2L |
85167.5 ns |
85316 ns |
1.00 |
array/reductions/mapreduce/Float32/1d |
34151.5 ns |
35025 ns |
0.98 |
array/reductions/mapreduce/Float32/dims=1 |
39911 ns |
41001.5 ns |
0.97 |
array/reductions/mapreduce/Float32/dims=2 |
57040 ns |
56849 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1L |
51996 ns |
51771 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2L |
69528 ns |
69231 ns |
1.00 |
array/broadcast |
20572 ns |
20483 ns |
1.00 |
array/copyto!/gpu_to_gpu |
11131 ns |
11330 ns |
0.98 |
array/copyto!/cpu_to_gpu |
213964 ns |
214913 ns |
1.00 |
array/copyto!/gpu_to_cpu |
283062 ns |
282237 ns |
1.00 |
array/accumulate/Int64/1d |
118404 ns |
118707 ns |
1.00 |
array/accumulate/Int64/dims=1 |
79481 ns |
79832 ns |
1.00 |
array/accumulate/Int64/dims=2 |
155850 ns |
155666 ns |
1.00 |
array/accumulate/Int64/dims=1L |
1693815.5 ns |
1695124.5 ns |
1.00 |
array/accumulate/Int64/dims=2L |
960659 ns |
961014 ns |
1.00 |
array/accumulate/Float32/1d |
101080 ns |
101589.5 ns |
0.99 |
array/accumulate/Float32/dims=1 |
76385 ns |
76781 ns |
0.99 |
array/accumulate/Float32/dims=2 |
143233.5 ns |
143784 ns |
1.00 |
array/accumulate/Float32/dims=1L |
1584876 ns |
1585667 ns |
1.00 |
array/accumulate/Float32/dims=2L |
656755 ns |
657563 ns |
1.00 |
array/construct |
1280.8 ns |
1327.7 ns |
0.96 |
array/random/randn/Float32 |
44009 ns |
38841 ns |
1.13 |
array/random/randn!/Float32 |
27321.5 ns |
31414 ns |
0.87 |
array/random/rand!/Int64 |
34378 ns |
33583 ns |
1.02 |
array/random/rand!/Float32 |
8588.333333333334 ns |
8570 ns |
1.00 |
array/random/rand/Int64 |
32498.5 ns |
37488 ns |
0.87 |
array/random/rand/Float32 |
13223 ns |
13104 ns |
1.01 |
array/permutedims/4d |
51659 ns |
52907.5 ns |
0.98 |
array/permutedims/2d |
52304 ns |
52478 ns |
1.00 |
array/permutedims/3d |
52716 ns |
53034 ns |
0.99 |
array/sorting/1d |
2734941 ns |
2735432.5 ns |
1.00 |
array/sorting/by |
3303943 ns |
3304825.5 ns |
1.00 |
array/sorting/2d |
1068616 ns |
1068239 ns |
1.00 |
cuda/synchronization/stream/auto |
1088 ns |
1031.3 ns |
1.05 |
cuda/synchronization/stream/nonblocking |
7461.1 ns |
7742.8 ns |
0.96 |
cuda/synchronization/stream/blocking |
860.6739130434783 ns |
831.5466666666666 ns |
1.04 |
cuda/synchronization/context/auto |
1241.4 ns |
1181 ns |
1.05 |
cuda/synchronization/context/nonblocking |
6884.8 ns |
8005.7 ns |
0.86 |
cuda/synchronization/context/blocking |
947.741935483871 ns |
938.6410256410256 ns |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
kshyatt
added a commit
that referenced
this pull request
Apr 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This drops the
libraries/cusolver/dense_genericwhich has been failing recently from 3710 MB RSS to 2110 MB.