Implement stream::FixedQueueEDProducer#50627
Conversation
|
cms-bot internal usage |
|
enable gpu |
|
please test |
|
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50627/48825 ERROR: Build errors found during clang-tidy run. |
0dff13b to
8b5045e
Compare
|
please test |
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50627/48826
|
|
A new Pull Request was created by @fwyzard for master. It involves the following packages:
@fwyzard, @hjkwon260, @makortel, @valsdav, @y19y19 can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
|
+1 Size: This PR adds an extra 44KB to repository The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
You can see more details here: Comparison SummaryThe workflows 2025.0010001, 2025.0000002, 2024.0070001, 2024.0060001, 2024.0050001, 2024.0040001, 2024.0030001, 2024.0020001, 2024.0010001, 2024.0000001, 2023.0020001 have different files in step1_dasquery.log than the ones found in the baseline. You may want to check and retrigger the tests if necessary. You can check it in the "files" directory in the results of the comparisons Summary:
AMD_MI300X Comparison SummarySummary:
AMD_W7900 Comparison SummarySummary:
NVIDIA_H100 Comparison SummarySummary:
NVIDIA_L40S Comparison SummarySummary:
Max Memory Comparisons exceeding threshold@cms-sw/core-l2 , I found 6 workflow step(s) with memory usage exceeding the error threshold: Expand to see workflows ...
|
|
+1 Size: This PR adds an extra 20KB to repository Comparison SummarySummary:
AMD_MI300X Comparison SummarySummary:
AMD_W7900 Comparison SummarySummary:
NVIDIA_H100 Comparison SummarySummary:
NVIDIA_L40S Comparison SummarySummary:
|
stream::FixedQueueEDProducer is a stream EDProducer with a fixed association of device queues to framework streams.
This ensures that PyTorch sees only a limited number of device streams, reducing the overall device memory utilisation.
fdc8269 to
00541fe
Compare
|
@cms-sw/ml-l2 do you have any comments or suggestions? |
|
enable gpu |
|
please test |
|
type ngt |
|
+1 Size: This PR adds an extra 36KB to repository Comparison SummarySummary:
AMD_MI300X Comparison SummarySummary:
AMD_W7900 Comparison SummarySummary:
NVIDIA_H100 Comparison SummarySummary:
NVIDIA_L40S Comparison SummarySummary:
Max Memory Comparisons exceeding threshold NVIDIA_L40S@cms-sw/core-l2 , I found 1 workflow step(s) with memory usage exceeding the error threshold: Expand to see workflows ...
|
|
Looks ok to me |
|
+heterogeneous |
|
+ml |
|
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @mandrenguyen, @sextonkennedy, @ftenchini (and backports should be raised in the release meeting by the corresponding L2) |
|
+1 |
PR description:
This PR builds on top of #50675.
Implement a new kind of alpaka
stream::EDProducerwith a fixed association of device queues (e.g. CUDA streams) to framework streams.This is useful for using external software that associates resources to the device queues, for example the PyTorch device memory caching allocator.
Migrating the PyTorch alpaka modules from
stream::EDProducertostream::FixedQueueEDProducerensures that PyTorch sees only a limited number of device queues, reducing the overall device memory utilisation.For more background information see the presentation ML inference on GPUs in CMSSW with PyTorch by @EmanueleCoradin at the CMS developments with GPUs on March 30th, 2026.
PR validation:
All unit tests pass.