Modify LTimes to match Kripke LTimes by michaelmckinsey1 · Pull Request #684 · llnl/RAJAPerf

michaelmckinsey1 · 2026-06-10T00:39:43Z

Matching parameters like zones, groups, moments, and directions, the runtime of RAJAPerf LTimes is equivalent (<1%) to Kripke LTimes on CPU. But I have noticed on GPU, runtime varies by ~15% on CUDA and ~60% on ROCm. After these changes CUDA runtime is within 1% and ROCm within 6% (CUDA is now faster, ROCm actually slower with block 25). This is because AMD wavefront size 64 on block 25 wastes significantly more threads than NVIDIA warp size 32.

Summary

This PR is a refactoring
It does the following (modify list as needed):
- Modifies ell layout from d contiguous to m contiguous to make accesses coalesced. This matches Kripke ZGD, where Field_Ell is layout ordering moment as stride-1. Doesn't apply to psi because does not depend on m (threads access same value), phi because no d (already contiguous in m).
  - This change made ~5% runtime difference on CUDA
- Changes default block size to be m instead of 256 and remapped the kernel so z -> blockIdx.x, g -> blockIdx.y, m -> threadIdx.x. This changes the launch to grid=(num_z, num_g, 1), block=(m, 1, 1), which matches the kripke launch. However, running at different legendre orders (m) will result in different block sizes.
- ~~Would need to make similar changes for LTIMES-NOVIEW?~~ done

… inner loop. Grid/block size changes to match Kripke Ltimessdom

MrBurmark · 2026-06-18T16:30:16Z

Yes you would have to make LTIMES and LTIMES_NO_VIEW match.

michaelmckinsey1 · 2026-06-18T21:42:07Z

Yes you would have to make LTIMES and LTIMES_NO_VIEW match.

Done

MrBurmark · 2026-06-19T21:52:34Z

+          RAJA::statement::CudaKernelAsync<
+            RAJA::statement::For<1, RAJA::cuda_block_x_loop, // z
+              RAJA::statement::For<2, RAJA::cuda_block_y_loop, // g
+                RAJA::statement::For<3, RAJA::cuda_thread_x_loop, // m


I assume this is a non-size loop policy because it is in ltimes. Here we know the block size at compile time, is that also true in kripke?

In Kripke, the block sizes are determined by parameters passed in at runtime. For example, like in this version of LTimes, it will be blocked on zones and groups. The exact parameters which will be blocked are not always the same for each Kripke run. For instance, if we use the DZG layout at runtime, then the loops will be blocked with directions and zones (while groups are threaded).

rchen20 · 2026-06-19T22:08:00Z

-  static const size_t default_gpu_block_size = 256;
-  using gpu_block_sizes_type = integer::make_gpu_block_size_list_type<default_gpu_block_size,
-                                                         integer::MultipleOf<32>>;
+  static const size_t default_gpu_block_size = 25;


How was a block size of 25 chosen? Would it be better for both GPU platforms to set this to 32?

It is the square of legendre order + 1. In kripke this is also the case, the default legendre is 4, so (4+1)^2 means the kernel in kripke will be (5,5,1)

In RAJAPerf, we set m=25 directly (and this is the default). So m=36 would then be equivalent to legendre=5 in kripke.

rchen20 · 2026-06-19T22:11:30Z

-            RAJA::statement::For<1, RAJA::cuda_global_size_z_direct<z_block_sz>,     //z
-              RAJA::statement::For<2, RAJA::cuda_global_size_y_direct<g_block_sz>,   //g
-                RAJA::statement::For<3, RAJA::cuda_global_size_x_direct<m_block_sz>, //m
+          RAJA::statement::CudaKernelAsync<


Note that I've reverted LTimes to launch synchronously in Kripke, for correctness. This is fine though because the direction loop is inner-most, which should avoid race conditions.

It has been async in RAJAPerf, I just changed it from CudaKernelFixedAsync to CudaKernelAsync, but for completeness I can make it CudaKernel. I don't this would matter for performance in RAJAPerf.

MrBurmark · 2026-06-22T19:47:42Z

Do we want to only have this kripke conforming tuning, or should does it make sense to keep the current tuning as well?

artv3 · 2026-06-22T21:37:17Z

@michaelmckinsey1 take a look here: https://github.com/llnl/RAJA/blob/develop/benchmark/ltimes.cpp, it would be cool to also have a GPU shared memory version as a tuning!

michaelmckinsey1 added 2 commits June 9, 2026 16:21

ell layout d-stride. Change default block size to 25 to match moments…

7bdc07f

… inner loop. Grid/block size changes to match Kripke Ltimessdom

Add link

7a794c7

michaelmckinsey1 self-assigned this Jun 10, 2026

Revert

1147a77

michaelmckinsey1 marked this pull request as ready for review June 17, 2026 21:37

michaelmckinsey1 changed the title ~~[WIP] Possible LTimes Changes~~ Modify LTimes to match Kripke LTimes Jun 17, 2026

michaelmckinsey1 requested a review from MrBurmark June 17, 2026 21:38

MrBurmark reviewed Jun 18, 2026

View reviewed changes

Comment thread src/apps/LTIMES.hpp Outdated

MrBurmark reviewed Jun 18, 2026

View reviewed changes

Comment thread src/apps/LTIMES.hpp

michaelmckinsey1 added 3 commits June 18, 2026 13:21

fix m stride-1

d2f79aa

Match sycl

6c6c9de

Apply same changes to noview

351f2d8

Merge branch 'develop' into ltimes-block25

3edfc3b

michaelmckinsey1 requested a review from MrBurmark June 18, 2026 21:45

MrBurmark reviewed Jun 19, 2026

View reviewed changes

MrBurmark requested a review from rchen20 June 19, 2026 21:53

rchen20 reviewed Jun 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Modify LTimes to match Kripke LTimes#684

Modify LTimes to match Kripke LTimes#684
michaelmckinsey1 wants to merge 7 commits into
developfrom
ltimes-block25

michaelmckinsey1 commented Jun 10, 2026 •

edited

Loading

Uh oh!

MrBurmark commented Jun 18, 2026

Uh oh!

Uh oh!

Uh oh!

michaelmckinsey1 commented Jun 18, 2026

Uh oh!

MrBurmark Jun 19, 2026

Uh oh!

rchen20 Jun 19, 2026

Uh oh!

rchen20 Jun 19, 2026

Uh oh!

michaelmckinsey1 Jun 24, 2026

Uh oh!

rchen20 Jun 19, 2026

Uh oh!

michaelmckinsey1 Jun 24, 2026 •

edited

Loading

Uh oh!

MrBurmark commented Jun 22, 2026

Uh oh!

artv3 commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

michaelmckinsey1 commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

MrBurmark commented Jun 18, 2026

Uh oh!

Uh oh!

Uh oh!

michaelmckinsey1 commented Jun 18, 2026

Uh oh!

MrBurmark Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

rchen20 Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

rchen20 Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

michaelmckinsey1 Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

rchen20 Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

michaelmckinsey1 Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MrBurmark commented Jun 22, 2026

Uh oh!

artv3 commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

michaelmckinsey1 commented Jun 10, 2026 •

edited

Loading

michaelmckinsey1 Jun 24, 2026 •

edited

Loading