Skip to content

Add reduced memory runtime toggle for LST#50925

Open
GNiendorf wants to merge 1 commit into
cms-sw:masterfrom
SegmentLinking:min_mem
Open

Add reduced memory runtime toggle for LST#50925
GNiendorf wants to merge 1 commit into
cms-sw:masterfrom
SegmentLinking:min_mem

Conversation

@GNiendorf
Copy link
Copy Markdown
Contributor

This PR adds a reduceMemByFullPrecompute runtime flag that enables exact buffer sizing for all LST objects (LS, T3, T5, T4) in each counting kernel, reducing average memory usage from ~80 MB to ~33 MB per event. When the flag is off (default), behavior is identical to master with negligible timing overhead, as the new kernel launches are gated behind host-side if (reduceMemByFullPrecompute_) checks and use templated kernel variants. The flag is exposed as --reduce_mem_by_full_precompute in standalone and as a reduceMemByFullPrecompute config parameter in the CMSSW EDProducer. Increases LST time/event by roughly 10-20% on CPU and GPU (depending on stream count, lower when running multiple streams) for a 60-70% reduction in total memory. Table below shows average and max decreases in memory per event over 100 events.

Screenshot 2026-05-08 at 11 54 48 AM

c.c @slava77

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented May 12, 2026

cms-bot internal usage

@cmsbuild
Copy link
Copy Markdown
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50925/49310

@cmsbuild
Copy link
Copy Markdown
Contributor

A new Pull Request was created by @GNiendorf for master.

It involves the following packages:

  • RecoTracker/LST (reconstruction)
  • RecoTracker/LSTCore (reconstruction)

@Moanwar, @cmsbuild, @jfernan2, @mandrenguyen, @srimanob can you please review it and eventually sign? Thanks.
@GiacomoSguazzoni, @VinInn, @VourMa, @dgulhan, @elusian, @felicepantaleo, @gpetruc, @mmasciov, @mmusich, @mtosi, @rovere this is something you requested to watch as well.
@ftenchini, @mandrenguyen, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@mmusich
Copy link
Copy Markdown
Contributor

mmusich commented May 12, 2026

test parameters:

  • enable = hlt_p2_timing

@mmusich
Copy link
Copy Markdown
Contributor

mmusich commented May 12, 2026

@cmsbuild, please test with cms-sw/cms-bot#2740

@cmsbuild
Copy link
Copy Markdown
Contributor

+1

Size: This PR adds an extra 80KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-052a34/53207/summary.html
COMMIT: 99fb77a
CMSSW: CMSSW_17_0_X_2026-05-12-1100/el8_amd64_gcc13
Additional Tests: HLT_P2_TIMING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/50925/53207/install.sh to create a dev area with all the needed externals and cmssw changes.

HLT P2 Timing: chart

Comparison Summary

Summary:

  • You potentially added 1 lines to the logs
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 55
  • DQMHistoTests: Total histograms compared: 4420967
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4420947
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 54 files compared)
  • Checked 235 log files, 207 edm output root files, 55 DQM output files
  • TriggerResults: no differences found

@jfernan2
Copy link
Copy Markdown
Contributor

assign heterogeneous

@cmsbuild
Copy link
Copy Markdown
Contributor

New categories assigned: heterogeneous

@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

@makortel
Copy link
Copy Markdown
Contributor

assign heterogeneous

Do you have specific question(s) in mind?

@jfernan2
Copy link
Copy Markdown
Contributor

Sorry to bother, I was not fully sure about the use of the kernel in RecoTracker/LSTCore/src/alpaka/LSTEvent.dev.cc within Alpaka.
Please forgive my ignorance about this type of structures. The rest of the PR looks ok to me. Thank you

@makortel
Copy link
Copy Markdown
Contributor

test parameters:

  • enable = hlt_p2_timing,gpu

@makortel
Copy link
Copy Markdown
Contributor

@cmsbuild, please test

@makortel
Copy link
Copy Markdown
Contributor

I'm not seeing anything obviously concerning (beyond the presumable code size and compilation time increase from the two instantiations of the kernel class templates).

@cmsbuild
Copy link
Copy Markdown
Contributor

-1

Failed Tests: RelVals-AMD_MI300X
Size: This PR adds an extra 16KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-052a34/53259/summary.html
COMMIT: 99fb77a
CMSSW: CMSSW_17_0_X_2026-05-14-1700/el8_amd64_gcc13
Additional Tests: HLT_P2_TIMING,GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/50925/53259/install.sh to create a dev area with all the needed externals and cmssw changes.

HLT P2 Timing: chart

Failed RelVals-AMD_MI300X

  • 34634.40334634.403_TTbar_14TeV+Run4D121PU_Patatrack_PixelOnlyAlpaka_Validation/step2_TTbar_14TeV+Run4D121PU_Patatrack_PixelOnlyAlpaka_Validation.log

Comparison Summary

Summary:

  • You potentially removed 9 lines from the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 9 differences found in the comparisons
  • DQMHistoTests: Total files compared: 55
  • DQMHistoTests: Total histograms compared: 4420967
  • DQMHistoTests: Total failures: 16
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4420931
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 54 files compared)
  • Checked 235 log files, 207 edm output root files, 55 DQM output files
  • TriggerResults: no differences found

AMD_W7900 Comparison Summary

Summary:

NVIDIA_H100 Comparison Summary

Summary:

NVIDIA_L40S Comparison Summary

Summary:

  • You potentially added 1 lines to the logs
  • Reco comparison results: 343 differences found in the comparisons
  • DQMHistoTests: Total files compared: 13
  • DQMHistoTests: Total histograms compared: 216259
  • DQMHistoTests: Total failures: 25384
  • DQMHistoTests: Total nulls: 32
  • DQMHistoTests: Total successes: 190843
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 12 files compared)
  • Checked 49 log files, 50 edm output root files, 13 DQM output files
  • TriggerResults: found differences in 1 / 12 workflows

@jfernan2
Copy link
Copy Markdown
Contributor

+1

@fwyzard
Copy link
Copy Markdown
Contributor

fwyzard commented May 18, 2026

+heterogeneous

Code changes look OK.

MI300X failure is a recurring problem, and seems unrelated to these changes.

@fwyzard
Copy link
Copy Markdown
Contributor

fwyzard commented May 18, 2026

ignore tests-rejected with ib-failure

@cmsbuild
Copy link
Copy Markdown
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (test failures were overridden). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @ftenchini, @mandrenguyen (and backports should be raised in the release meeting by the corresponding L2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants