Skip to content

Annotate kernels separately#695

Draft
michaelmckinsey1 wants to merge 6 commits into
developfrom
multi-kernel-regions
Draft

Annotate kernels separately#695
michaelmckinsey1 wants to merge 6 commits into
developfrom
multi-kernel-regions

Conversation

@michaelmckinsey1

@michaelmckinsey1 michaelmckinsey1 commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Summary

  • This PR is additional Caliper regions for RAJAPerf kernels that have multiple kernel launches.
  • It does the following (modify list as needed):
    • Adds additional Caliper regions at the request of @pearce8
  • There is currently no synchronize for these regions for asynchronous GPU kernels. But these should be profiled with CUDA/HIP events anyway, which do not need CPU synchronization to measure GPU time.
  • add function types to be able to filter out kernels instead of type=function -> type=subkernel

Examples

Polybench_JACOBI_1D has one launch per rep for poly_jacobi_1D_1 and one for poly_jacobi_1D_2. So its tree will now profile each separately:
image

POLYBENCH_FLOYD_WARSHALL, HALO_PACKING, HALO_EXCHANGE all have variable amount of launches, so we will append _k instead of adding k regions to the tree
image

image image

All kernels with only 1 launch per rep are unchanged.

@michaelmckinsey1 michaelmckinsey1 self-assigned this Jun 24, 2026
@michaelmckinsey1 michaelmckinsey1 requested a review from pearce8 June 26, 2026 19:49
@michaelmckinsey1 michaelmckinsey1 changed the title [WIP] Annotate kernels separately Annotate kernels separately Jun 26, 2026
@michaelmckinsey1 michaelmckinsey1 marked this pull request as ready for review June 26, 2026 19:49
@michaelmckinsey1 michaelmckinsey1 marked this pull request as draft June 26, 2026 21:04
@michaelmckinsey1

Copy link
Copy Markdown
Contributor Author

made a draft because need to add different types to these regions so we can filter them with caliper

Comment thread src/apps/ENERGY-Cuda.cpp
RP_CALI_MARK_END(RP_CALI_REGION(ENERGY_1));

RP_CALI_MARK_BEGIN(RP_CALI_REGION(ENERGY_2));
RAJA::forall< RAJA::cuda_exec<block_size, async> >( res,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelmckinsey1 , why not use just use RAJA's kernel naming capability here? Example...

RAJA::forall<RAJA::cuda_exec<256>>(
    range, 
    RAJA::Name("VectorAddKernel"), // <-- Kernel Name injected here
    [=] RAJA_DEVICE (int i) {
        c[i] = a[i] + b[i];
    }
);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants