Skip to content

Implement stream::FixedQueueEDProducer#50627

Merged
cmsbuild merged 3 commits into
cms-sw:masterfrom
fwyzard:FixedQueueEDProducer
Apr 11, 2026
Merged

Implement stream::FixedQueueEDProducer#50627
cmsbuild merged 3 commits into
cms-sw:masterfrom
fwyzard:FixedQueueEDProducer

Conversation

@fwyzard
Copy link
Copy Markdown
Contributor

@fwyzard fwyzard commented Apr 1, 2026

PR description:

This PR builds on top of #50675.

Implement a new kind of alpaka stream::EDProducer with a fixed association of device queues (e.g. CUDA streams) to framework streams.

This is useful for using external software that associates resources to the device queues, for example the PyTorch device memory caching allocator.

Migrating the PyTorch alpaka modules from stream::EDProducer to stream::FixedQueueEDProducer ensures that PyTorch sees only a limited number of device queues, reducing the overall device memory utilisation.

For more background information see the presentation ML inference on GPUs in CMSSW with PyTorch by @EmanueleCoradin at the CMS developments with GPUs on March 30th, 2026.

PR validation:

All unit tests pass.

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Apr 1, 2026

cms-bot internal usage

@fwyzard
Copy link
Copy Markdown
Contributor Author

fwyzard commented Apr 1, 2026

enable gpu

@fwyzard
Copy link
Copy Markdown
Contributor Author

fwyzard commented Apr 1, 2026

please test

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Apr 1, 2026

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50627/48825

ERROR: Build errors found during clang-tidy run.

Suppressed 1322 warnings (1318 in non-user code, 4 NOLINT).
--
src/HeterogeneousCore/AlpakaCore/interface/alpaka/stream/FixedQueueEDProducer.h:32:11: error: 'maybe_unused' attribute cannot be applied to a statement [clang-diagnostic-error]
   32 |         [[maybe_unused]] ev.queue();
      |           ^              ~~
Suppressed 2968 warnings (2964 in non-user code, 4 NOLINT).
--
src/HeterogeneousCore/AlpakaCore/interface/alpaka/stream/FixedQueueEDProducer.h:32:11: error: 'maybe_unused' attribute cannot be applied to a statement [clang-diagnostic-error]
   32 |         [[maybe_unused]] ev.queue();
      |           ^              ~~
Suppressed 2966 warnings (2962 in non-user code, 4 NOLINT).
--
src/HeterogeneousCore/AlpakaCore/interface/alpaka/stream/FixedQueueEDProducer.h:32:11: error: 'maybe_unused' attribute cannot be applied to a statement [clang-diagnostic-error]
   32 |         [[maybe_unused]] ev.queue();
      |           ^              ~~
Suppressed 2974 warnings (2970 in non-user code, 4 NOLINT).
--
src/HeterogeneousCore/AlpakaCore/interface/alpaka/stream/FixedQueueEDProducer.h:32:11: error: 'maybe_unused' attribute cannot be applied to a statement [clang-diagnostic-error]
   32 |         [[maybe_unused]] ev.queue();
      |           ^              ~~
Suppressed 2966 warnings (2962 in non-user code, 4 NOLINT).
--
gmake: *** [config/SCRAM/GMake/Makefile.coderules:129: code-checks] Error 2
gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 2

@fwyzard
Copy link
Copy Markdown
Contributor Author

fwyzard commented Apr 1, 2026

please test

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Apr 1, 2026

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50627/48826

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Apr 1, 2026

A new Pull Request was created by @fwyzard for master.

It involves the following packages:

  • HeterogeneousCore/AlpakaCore (heterogeneous)
  • HeterogeneousCore/AlpakaTest (heterogeneous)
  • PhysicsTools/PyTorchAlpakaTest (heterogeneous, ml)

@fwyzard, @hjkwon260, @makortel, @valsdav, @y19y19 can you please review it and eventually sign? Thanks.
@makortel, @rovere this is something you requested to watch as well.
@ftenchini, @mandrenguyen, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Apr 2, 2026

+1

Size: This PR adds an extra 44KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-35ac0e/52411/summary.html
COMMIT: 8b5045e
CMSSW: CMSSW_16_1_X_2026-04-01-1100/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/50627/52411/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-35ac0e/52411/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-35ac0e/52411/git-merge-result

Comparison Summary

The workflows 2025.0010001, 2025.0000002, 2024.0070001, 2024.0060001, 2024.0050001, 2024.0040001, 2024.0030001, 2024.0020001, 2024.0010001, 2024.0000001, 2023.0020001 have different files in step1_dasquery.log than the ones found in the baseline. You may want to check and retrigger the tests if necessary. You can check it in the "files" directory in the results of the comparisons

Summary:

  • You potentially removed 299 lines from the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 41482 differences found in the comparisons
  • DQMHistoTests: Total files compared: 52
  • DQMHistoTests: Total histograms compared: 3449714
  • DQMHistoTests: Total failures: 162
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3449532
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 1.876 KiB( 40 files compared)
  • DQMHistoSizes: changed ( 18434.0,... ): 0.938 KiB HLT/ScoutingOffline
  • Checked 223 log files, 193 edm output root files, 52 DQM output files
  • TriggerResults: no differences found

AMD_MI300X Comparison Summary

Summary:

AMD_W7900 Comparison Summary

Summary:

NVIDIA_H100 Comparison Summary

Summary:

NVIDIA_L40S Comparison Summary

Summary:

Max Memory Comparisons exceeding threshold

@cms-sw/core-l2 , I found 6 workflow step(s) with memory usage exceeding the error threshold:

Expand to see workflows ...
  • Error: Workflow 2023.0020001_RunJetMET02023D_10k step3 max memory diff 329.9 exceeds +/- 90.0 MiB
  • Error: Workflow 2024.0000001_RunZeroBias2024B_10k step3 max memory diff -96.5 exceeds +/- 90.0 MiB
  • Error: Workflow 2024.0010001_RunJetMET02024C_10k step3 max memory diff 111.9 exceeds +/- 90.0 MiB
  • Error: Workflow 2024.0030001_RunDisplacedJet2024E_10k step3 max memory diff 184.7 exceeds +/- 90.0 MiB
  • Error: Workflow 2024.0050001_RunBTagMu2024G_10k step3 max memory diff 110.3 exceeds +/- 90.0 MiB
  • Error: Workflow 2025.0010001_RunJetMET02025C_10k step3 max memory diff 179.0 exceeds +/- 90.0 MiB

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Apr 6, 2026

Pull request #50627 was updated. @fwyzard, @hjkwon260, @makortel, @valsdav, @y19y19 can you please check and sign again.

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Apr 6, 2026

+1

Size: This PR adds an extra 20KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-35ac0e/52492/summary.html
COMMIT: fdc8269
CMSSW: CMSSW_17_0_X_2026-04-06-1100/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/50627/52492/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 5 lines to the logs
  • Reco comparison results: 3 differences found in the comparisons
  • DQMHistoTests: Total files compared: 53
  • DQMHistoTests: Total histograms compared: 4180749
  • DQMHistoTests: Total failures: 47
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4180682
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 52 files compared)
  • Checked 227 log files, 197 edm output root files, 53 DQM output files
  • TriggerResults: no differences found

AMD_MI300X Comparison Summary

Summary:

AMD_W7900 Comparison Summary

Summary:

  • You potentially added 8 lines to the logs
  • Reco comparison results: 385 differences found in the comparisons
  • DQMHistoTests: Total files compared: 13
  • DQMHistoTests: Total histograms compared: 216539
  • DQMHistoTests: Total failures: 31880
  • DQMHistoTests: Total nulls: 34
  • DQMHistoTests: Total successes: 184625
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 12 files compared)
  • Checked 49 log files, 50 edm output root files, 13 DQM output files
  • TriggerResults: found differences in 1 / 12 workflows

NVIDIA_H100 Comparison Summary

Summary:

NVIDIA_L40S Comparison Summary

Summary:

@fwyzard
Copy link
Copy Markdown
Contributor Author

fwyzard commented Apr 7, 2026

@makortel I've split the first part of this PR, rewritten to allocate device queues only as needed, into #50675 .

If those changes look good I will rebase and update this PR on top of them.

fwyzard added 3 commits April 7, 2026 17:26
stream::FixedQueueEDProducer is a stream EDProducer with a fixed association of
device queues to framework streams.
This ensures that PyTorch sees only a limited number of device streams,
reducing the overall device memory utilisation.
@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Apr 7, 2026

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50627/48906

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Apr 7, 2026

Pull request #50627 was updated. @cmsbuild, @fwyzard, @hjkwon260, @makortel, @valsdav, @y19y19 can you please check and sign again.

@fwyzard
Copy link
Copy Markdown
Contributor Author

fwyzard commented Apr 7, 2026

@cms-sw/ml-l2 do you have any comments or suggestions?

@fwyzard
Copy link
Copy Markdown
Contributor Author

fwyzard commented Apr 7, 2026

enable gpu

@fwyzard
Copy link
Copy Markdown
Contributor Author

fwyzard commented Apr 7, 2026

please test

@fwyzard
Copy link
Copy Markdown
Contributor Author

fwyzard commented Apr 7, 2026

type ngt

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Apr 8, 2026

+1

Size: This PR adds an extra 36KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-35ac0e/52512/summary.html
COMMIT: 00541fe
CMSSW: CMSSW_17_0_X_2026-04-07-1100/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/50627/52512/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 1 lines to the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 2 differences found in the comparisons
  • DQMHistoTests: Total files compared: 53
  • DQMHistoTests: Total histograms compared: 4180749
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4180726
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 52 files compared)
  • Checked 227 log files, 197 edm output root files, 53 DQM output files
  • TriggerResults: no differences found

AMD_MI300X Comparison Summary

Summary:

AMD_W7900 Comparison Summary

Summary:

NVIDIA_H100 Comparison Summary

Summary:

  • You potentially added 2 lines to the logs
  • Reco comparison results: 336 differences found in the comparisons
  • DQMHistoTests: Total files compared: 13
  • DQMHistoTests: Total histograms compared: 216539
  • DQMHistoTests: Total failures: 37208
  • DQMHistoTests: Total nulls: 36
  • DQMHistoTests: Total successes: 179295
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 12 files compared)
  • Checked 49 log files, 50 edm output root files, 13 DQM output files
  • TriggerResults: found differences in 1 / 12 workflows

NVIDIA_L40S Comparison Summary

Summary:

Max Memory Comparisons exceeding threshold NVIDIA_L40S

@cms-sw/core-l2 , I found 1 workflow step(s) with memory usage exceeding the error threshold:

Expand to see workflows ...
  • Error: Workflow 34634.7503_TTbar_14TeV+Run4D121PU_HLTHeterogeneousValid step2 max memory diff 118.4 exceeds +/- 90.0 MiB

@makortel
Copy link
Copy Markdown
Contributor

makortel commented Apr 8, 2026

Looks ok to me

@fwyzard
Copy link
Copy Markdown
Contributor Author

fwyzard commented Apr 8, 2026

+heterogeneous

@hjkwon260
Copy link
Copy Markdown
Contributor

+ml

@cmsbuild
Copy link
Copy Markdown
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @mandrenguyen, @sextonkennedy, @ftenchini (and backports should be raised in the release meeting by the corresponding L2)

@mandrenguyen
Copy link
Copy Markdown
Contributor

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants