Conversation
There was a problem hiding this comment.
CUDA.jl Benchmarks
Details
| Benchmark suite | Current: 45b627c | Previous: af6961a | Ratio |
|---|---|---|---|
array/accumulate/Float32/1d |
101055 ns |
101502.5 ns |
1.00 |
array/accumulate/Float32/dims=1 |
76936 ns |
76629.5 ns |
1.00 |
array/accumulate/Float32/dims=1L |
1585505 ns |
1585691.5 ns |
1.00 |
array/accumulate/Float32/dims=2 |
144151 ns |
143840 ns |
1.00 |
array/accumulate/Float32/dims=2L |
658023 ns |
657866.5 ns |
1.00 |
array/accumulate/Int64/1d |
118692 ns |
119085 ns |
1.00 |
array/accumulate/Int64/dims=1 |
79686 ns |
79723 ns |
1.00 |
array/accumulate/Int64/dims=1L |
1694741 ns |
1694532 ns |
1.00 |
array/accumulate/Int64/dims=2 |
156062 ns |
156114 ns |
1.00 |
array/accumulate/Int64/dims=2L |
961855 ns |
961892 ns |
1.00 |
array/broadcast |
20347 ns |
20629 ns |
0.99 |
array/construct |
1287.6 ns |
1256.5 ns |
1.02 |
array/copy |
18146 ns |
17886 ns |
1.01 |
array/copyto!/cpu_to_gpu |
214789 ns |
212623 ns |
1.01 |
array/copyto!/gpu_to_cpu |
285256 ns |
283142 ns |
1.01 |
array/copyto!/gpu_to_gpu |
10932 ns |
10702 ns |
1.02 |
array/iteration/findall/bool |
135084 ns |
134713 ns |
1.00 |
array/iteration/findall/int |
150895 ns |
149574 ns |
1.01 |
array/iteration/findfirst/bool |
82571 ns |
81484 ns |
1.01 |
array/iteration/findfirst/int |
84149.5 ns |
83536 ns |
1.01 |
array/iteration/findmin/1d |
85286.5 ns |
85967.5 ns |
0.99 |
array/iteration/findmin/2d |
117254 ns |
116482 ns |
1.01 |
array/iteration/logical |
200524.5 ns |
199753 ns |
1.00 |
array/iteration/scalar |
67866 ns |
67105.5 ns |
1.01 |
array/permutedims/2d |
52533 ns |
52568 ns |
1.00 |
array/permutedims/3d |
52716.5 ns |
53057 ns |
0.99 |
array/permutedims/4d |
51875 ns |
51868.5 ns |
1.00 |
array/random/rand/Float32 |
13217 ns |
12842 ns |
1.03 |
array/random/rand/Int64 |
25251 ns |
25396 ns |
0.99 |
array/random/rand!/Float32 |
9853.5 ns |
8414.666666666666 ns |
1.17 |
array/random/rand!/Int64 |
21914 ns |
21981 ns |
1.00 |
array/random/randn/Float32 |
43758 ns |
37382.5 ns |
1.17 |
array/random/randn!/Float32 |
31002 ns |
30580 ns |
1.01 |
array/reductions/mapreduce/Float32/1d |
35214.5 ns |
34346 ns |
1.03 |
array/reductions/mapreduce/Float32/dims=1 |
39305 ns |
39918.5 ns |
0.98 |
array/reductions/mapreduce/Float32/dims=1L |
51269 ns |
51509 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2 |
56724.5 ns |
56563 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2L |
69541 ns |
69236 ns |
1.00 |
array/reductions/mapreduce/Int64/1d |
43334 ns |
42783 ns |
1.01 |
array/reductions/mapreduce/Int64/dims=1 |
43236.5 ns |
42869 ns |
1.01 |
array/reductions/mapreduce/Int64/dims=1L |
87164 ns |
87545 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2 |
59640 ns |
59325 ns |
1.01 |
array/reductions/mapreduce/Int64/dims=2L |
85148 ns |
85131 ns |
1.00 |
array/reductions/reduce/Float32/1d |
35062 ns |
34338 ns |
1.02 |
array/reductions/reduce/Float32/dims=1 |
42837 ns |
48895 ns |
0.88 |
array/reductions/reduce/Float32/dims=1L |
51460 ns |
51567 ns |
1.00 |
array/reductions/reduce/Float32/dims=2 |
56801 ns |
56622 ns |
1.00 |
array/reductions/reduce/Float32/dims=2L |
70088 ns |
69625 ns |
1.01 |
array/reductions/reduce/Int64/1d |
43395 ns |
42697 ns |
1.02 |
array/reductions/reduce/Int64/dims=1 |
42548 ns |
47176.5 ns |
0.90 |
array/reductions/reduce/Int64/dims=1L |
87073 ns |
87361 ns |
1.00 |
array/reductions/reduce/Int64/dims=2 |
59960.5 ns |
59337 ns |
1.01 |
array/reductions/reduce/Int64/dims=2L |
85238 ns |
84717 ns |
1.01 |
array/reverse/1d |
17777 ns |
18051 ns |
0.98 |
array/reverse/1dL |
68336 ns |
68628 ns |
1.00 |
array/reverse/1dL_inplace |
65822 ns |
65835 ns |
1.00 |
array/reverse/1d_inplace |
10346.166666666668 ns |
8676.333333333334 ns |
1.19 |
array/reverse/2d |
20928 ns |
20541 ns |
1.02 |
array/reverse/2dL |
72881 ns |
72496 ns |
1.01 |
array/reverse/2dL_inplace |
65775 ns |
65804 ns |
1.00 |
array/reverse/2d_inplace |
10414 ns |
10104 ns |
1.03 |
array/sorting/1d |
2736888 ns |
2734890.5 ns |
1.00 |
array/sorting/2d |
1069374 ns |
1069333 ns |
1.00 |
array/sorting/by |
3306761 ns |
3305442 ns |
1.00 |
cuda/synchronization/context/auto |
1174 ns |
1142.4 ns |
1.03 |
cuda/synchronization/context/blocking |
945.3255813953489 ns |
918.9428571428572 ns |
1.03 |
cuda/synchronization/context/nonblocking |
7649 ns |
7080.9 ns |
1.08 |
cuda/synchronization/stream/auto |
974.7777777777778 ns |
983.3846153846154 ns |
0.99 |
cuda/synchronization/stream/blocking |
791.6862745098039 ns |
819.060606060606 ns |
0.97 |
cuda/synchronization/stream/nonblocking |
7291.200000000001 ns |
7951.1 ns |
0.92 |
integration/byval/reference |
143814 ns |
143938 ns |
1.00 |
integration/byval/slices=1 |
145593 ns |
145713 ns |
1.00 |
integration/byval/slices=2 |
284337 ns |
284653 ns |
1.00 |
integration/byval/slices=3 |
422732 ns |
422978 ns |
1.00 |
integration/cudadevrt |
102289 ns |
102487 ns |
1.00 |
integration/volumerhs |
23451596 ns |
23490453 ns |
1.00 |
kernel/indexing |
13244 ns |
13409 ns |
0.99 |
kernel/indexing_checked |
14046 ns |
14040 ns |
1.00 |
kernel/launch |
2248.8888888888887 ns |
2267.4444444444443 ns |
0.99 |
kernel/occupancy |
702.8958333333334 ns |
828.1463414634146 ns |
0.85 |
kernel/rand |
14343 ns |
14466 ns |
0.99 |
latency/import |
3824776930.5 ns |
3809615751 ns |
1.00 |
latency/precompile |
4599001518 ns |
4560767959.5 ns |
1.01 |
latency/ttfp |
4432304043.5 ns |
4401798424 ns |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
|
Oh, right. Sorry. I was really just took We would like to do the following in our package: |
|
The functionality is too much tied to internal state for it to become public, I think. Especially because I plan to rework exception handling, albeit at some undetermined point in the future. Would it be fine if you lose some of the reporting accuracy and simply do what we used to do before: |
|
I see. We will stick with whatever is currently recommended. |
@gputhrowwas exported on v5