Skip to content

Extend the TrivialSerialisation mechanism to device products#50597

Merged
cmsbuild merged 5 commits into
cms-sw:masterfrom
fwyzard:device_TrivialSerialisation
Mar 31, 2026
Merged

Extend the TrivialSerialisation mechanism to device products#50597
cmsbuild merged 5 commits into
cms-sw:masterfrom
fwyzard:device_TrivialSerialisation

Conversation

@fwyzard
Copy link
Copy Markdown
Contributor

@fwyzard fwyzard commented Mar 31, 2026

PR description:

Reimplementation of #50154.

This adds support for device products to the serialisation mechanism under HeterogeneousCore/TrivialSerialisation.

It implements an alpaka Plugin Factory under HeterogeneousCore/TrivialSerialisation/interface/alpaka/ similar to the existing host one. Some of the features of this new Plugin Factory are:

  • There is one factory per backend. This is to prevent the "multiple definitions of plugin X found in different files" error when two .so files define the same plugin from the same factory, for separate backends. The factory registration is as follows:
EDM_REGISTER_PLUGINFACTORY(ALPAKA_ACCELERATOR_NAMESPACE::ngt::SerialiserFactoryPortable,
                           "SerialiserFactoryPortable@" EDM_STRINGIZE(ALPAKA_ACCELERATOR_NAMESPACE));
  • Plugins for device collections are registered via the DEFINE_TRIVIAL_SERIALISER_PLUGIN_DEVICE(TYPE) macro, where TYPE is the inner product type (e.g. PortableDeviceCollection<...>), not wrapped in DeviceProduct. The plugins are registered under both the mangled typeid name and EDM_STRINGIZE(TYPE). EDM_STRINGIZE(TYPE) is more human-readable, and thus more suitable for Python configuration files.

For example, a plugin might be registered as follows:

DEFINE_TRIVIAL_SERIALISER_PLUGIN_DEVICE(hcal::RecHitDeviceCollection);

and then referred to from a configuration:

    products = cms.VPSet(
    cms.PSet(
        type = cms.string("hcal::RecHitDeviceCollection"),
        ...

This feature is also added to the existing non-alpaka serialiser plugin factory.

This PR also adds plugin definitions for various types in DataFormats.

PR validation:

The following tests are included in the PR:

  • DataFormats/TrivialSerialisation/test/alpaka/test_catch2_MemoryCopyTraitsPortable.dev.cc
  • HeterogeneousCore/TrivialSerialisation/test/alpaka/test_catch2_portableCollectionsSerialiserPluginFactory.dev.cc

A description of the tests are included at the top of each file.
All tests pass.

@fwyzard
Copy link
Copy Markdown
Contributor Author

fwyzard commented Mar 31, 2026

enable gpu

@fwyzard
Copy link
Copy Markdown
Contributor Author

fwyzard commented Mar 31, 2026

please test

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Mar 31, 2026

cms-bot internal usage

@cmsbuild
Copy link
Copy Markdown
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50597/48782

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@fwyzard
Copy link
Copy Markdown
Contributor Author

fwyzard commented Mar 31, 2026

please test

@fwyzard
Copy link
Copy Markdown
Contributor Author

fwyzard commented Mar 31, 2026

+heterogeneous

@cmsbuild
Copy link
Copy Markdown
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50597/48784

@cmsbuild
Copy link
Copy Markdown
Contributor

A new Pull Request was created by @fwyzard for master.

It involves the following packages:

  • DataFormats/BeamSpot (reconstruction)
  • DataFormats/EcalDigi (simulation)
  • DataFormats/EcalRecHit (reconstruction)
  • DataFormats/HGCalDigi (simulation)
  • DataFormats/HGCalReco (reconstruction)
  • DataFormats/HcalDigi (simulation)
  • DataFormats/HcalRecHit (reconstruction)
  • DataFormats/ParticleFlowReco (reconstruction)
  • DataFormats/Portable (heterogeneous)
  • DataFormats/PortableTestObjects (heterogeneous)
  • DataFormats/SiPixelClusterSoA (heterogeneous, reconstruction)
  • DataFormats/SiPixelDigiSoA (heterogeneous, reconstruction)
  • DataFormats/TrackSoA (heterogeneous, reconstruction)
  • DataFormats/TrackingRecHitSoA (heterogeneous, reconstruction)
  • DataFormats/TrivialSerialisation (heterogeneous)
  • DataFormats/VertexSoA (heterogeneous, reconstruction)
  • HeterogeneousCore/TrivialSerialisation (heterogeneous)
  • RecoLocalTracker/SiPixelClusterizer (reconstruction)

@Moanwar, @civanch, @jfernan2, @kpedro88, @mandrenguyen, @mdhildreth, @srimanob can you please review it and eventually sign? Thanks.
@GiacomoSguazzoni, @IzaakWN, @ReyerBand, @VinInn, @VourMa, @abdoulline, @argiro, @bsunanda, @dkotlins, @elusian, @felicepantaleo, @ferencek, @gpetruc, @hatakeyamak, @lgray, @makortel, @mariadalfonso, @missirol, @mmasciov, @mmusich, @mroguljic, @mtosi, @pfs, @rchatter, @rovere, @thomreis, @threus, @tsusa, @wang0jin this is something you requested to watch as well.
@ftenchini, @mandrenguyen, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@fwyzard fwyzard changed the title Device trivial serialisation Extend the TrivialSerialisation mechanism to device products Mar 31, 2026
@fwyzard
Copy link
Copy Markdown
Contributor Author

fwyzard commented Mar 31, 2026

@cms-sw/reconstruction-l2 can you sign this ?
All changes should be transparent.

@fwyzard
Copy link
Copy Markdown
Contributor Author

fwyzard commented Mar 31, 2026

please test

@fwyzard
Copy link
Copy Markdown
Contributor Author

fwyzard commented Mar 31, 2026

ignore tests-rejected with ib-failure

@Moanwar
Copy link
Copy Markdown
Contributor

Moanwar commented Mar 31, 2026

+1

@cmsbuild
Copy link
Copy Markdown
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs after it passes the integration tests. This pull request will now be reviewed by the release team before it's merged. @ftenchini, @mandrenguyen, @sextonkennedy (and backports should be raised in the release meeting by the corresponding L2)

@ftenchini
Copy link
Copy Markdown

+1

@cmsbuild
Copy link
Copy Markdown
Contributor

-1

Failed Tests: UnitTests
Size: This PR adds an extra 96KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-792ca6/52359/summary.html
COMMIT: 1c886f7
CMSSW: CMSSW_16_1_X_2026-03-30-2300/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/50597/52359/install.sh to create a dev area with all the needed externals and cmssw changes.

Failed Unit Tests

I found 1 errors in the following unit tests:

---> test testMPIAutosplitter had ERRORS

Comparison Summary

Summary:

  • You potentially removed 3 lines from the logs
  • Reco comparison results: 8 differences found in the comparisons
  • DQMHistoTests: Total files compared: 55
  • DQMHistoTests: Total histograms compared: 4420399
  • DQMHistoTests: Total failures: 5
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4420374
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 54 files compared)
  • Checked 235 log files, 208 edm output root files, 55 DQM output files
  • TriggerResults: no differences found

AMD_MI300X Comparison Summary

Summary:

AMD_W7900 Comparison Summary

Summary:

NVIDIA_H100 Comparison Summary

Summary:

NVIDIA_L40S Comparison Summary

Summary:

@cmsbuild cmsbuild merged commit c801033 into cms-sw:master Mar 31, 2026
21 of 22 checks passed
@smuzaffar
Copy link
Copy Markdown
Contributor

@makortel @fwyzard, fwlite builds are failing[a] after this update . Looks like we need to add both
DataFormats/AlpakaCommon and HeterogeneousCore/AlpakaInterface (which is used by DataFormats/AlpakaCommon) in fwlite build set. Any objections?

[a] https://cmssdt.cern.ch/SDT/cgi-bin/buildlogs/fwlite/el8_amd64_gcc13/CMSSW_16_1_X_2026-03-31-2300/HeterogeneousCore/TrivialSerialisation

In file included from src/HeterogeneousCore/TrivialSerialisation/interface/alpaka/SerialiserFactoryDevice.h:8,
                 from src/HeterogeneousCore/TrivialSerialisation/src/alpaka/SerialiserFactoryDevice.cc:1:
src/HeterogeneousCore/TrivialSerialisation/interface/alpaka/Serialiser.h:8:10: fatal error: DataFormats/AlpakaCommon/interface/alpaka/DeviceProductType.h: No such file or directory
    8 | #include "DataFormats/AlpakaCommon/interface/alpaka/DeviceProductType.h"
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
In file included from src/HeterogeneousCore/TrivialSerialisation/interface/alpaka/SerialiserFactoryDevice.h:8,
                 from src/HeterogeneousCore/TrivialSerialisation/src/alpaka/SerialiserFactoryDevice.cc:1:
src/HeterogeneousCore/TrivialSerialisation/interface/alpaka/Serialiser.h:8:10: fatal error: DataFormats/AlpakaCommon/interface/alpaka/DeviceProductType.h: No such file or directory
    8 | #include "DataFormats/AlpakaCommon/interface/alpaka/DeviceProductType.h"

@fwyzard
Copy link
Copy Markdown
Contributor Author

fwyzard commented Apr 1, 2026

I don't know what are the downsides of adding DataFormats/AlpakaCommon and HeterogeneousCore/AlpakaInterface to fwlite 🤔

An alternative would be to remove HeterogeneousCore/MPICore and HeterogeneousCore/TrivialSerialisation.

What is the list of packages that we include in or exclude from fwlite ?

@smuzaffar
Copy link
Copy Markdown
Contributor

@fwyzard , fwlite build set is available at https://github.com/cms-sw/cmsdist/blob/IB/CMSSW_16_1_X/master/fwlite_build_set.file . ah I see that HeterogeneousCore/AlpakaInterface is already included so it is only DataFormats/AlpakaCommon

@fwyzard
Copy link
Copy Markdown
Contributor Author

fwyzard commented Apr 1, 2026

OK, then adding DataFormats/AlpakaCommon should be fine.

@makortel
Copy link
Copy Markdown
Contributor

makortel commented Apr 1, 2026

I agree, adding DataFormats/AlpakaCommon is fine (in a way FWLite build set was one the reasons for that package)

@fwyzard
Copy link
Copy Markdown
Contributor Author

fwyzard commented Apr 25, 2026

type ngt

@fwyzard fwyzard deleted the device_TrivialSerialisation branch April 25, 2026 16:36
@cmsbuild cmsbuild added the ngt label Apr 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants