Extend the HeterogeneousCore/TrivialSerialisation mechanism to device products#50154
Extend the HeterogeneousCore/TrivialSerialisation mechanism to device products#50154ghyls wants to merge 1 commit into
HeterogeneousCore/TrivialSerialisation mechanism to device products#50154Conversation
|
cms-bot internal usage |
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50154/48098
|
d654851 to
0f09fae
Compare
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50154/48099
|
|
Pull request #50154 was updated. |
0f09fae to
a85a721
Compare
|
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50154/48111
Code check has found code style and quality issues which could be resolved by applying following patch(s)
|
a85a721 to
19318aa
Compare
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50154/48412
|
|
Pull request #50154 was updated. |
|
please test |
|
A new Pull Request was created by @ghyls for master. It involves the following packages:
@Moanwar, @civanch, @fwyzard, @jfernan2, @kpedro88, @makortel, @mandrenguyen, @mdhildreth, @srimanob can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
|
-1 Failed Tests: Build Failed BuildI found compilation error when building: /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02932/el8_amd64_gcc13/external/gcc/13.4.0-6908cfdf803923e783448096ca4f0923/bin/../lib/gcc/x86_64-redhat-linux-gnu/13.4.0/../../../../x86_64-redhat-linux-gnu/bin/ld.bfd: :(.text.startup+0x85): undefined reference to `edmplugin::PluginFactory::get()' /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02932/el8_amd64_gcc13/external/gcc/13.4.0-6908cfdf803923e783448096ca4f0923/bin/../lib/gcc/x86_64-redhat-linux-gnu/13.4.0/../../../../x86_64-redhat-linux-gnu/bin/ld.bfd: :(.text.startup+0xd2): undefined reference to `edmplugin::PluginFactory::get()' /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02932/el8_amd64_gcc13/external/gcc/13.4.0-6908cfdf803923e783448096ca4f0923/bin/../lib/gcc/x86_64-redhat-linux-gnu/13.4.0/../../../../x86_64-redhat-linux-gnu/bin/ld.bfd: :(.text.startup+0x11f): undefined reference to `edmplugin::PluginFactory::get()' /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02932/el8_amd64_gcc13/external/gcc/13.4.0-6908cfdf803923e783448096ca4f0923/bin/../lib/gcc/x86_64-redhat-linux-gnu/13.4.0/../../../../x86_64-redhat-linux-gnu/bin/ld.bfd: :(.text.startup+0x16c): undefined reference to `edmplugin::PluginFactory::get()' /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02932/el8_amd64_gcc13/external/gcc/13.4.0-6908cfdf803923e783448096ca4f0923/bin/../lib/gcc/x86_64-redhat-linux-gnu/13.4.0/../../../../x86_64-redhat-linux-gnu/bin/ld.bfd: tmp/el8_amd64_gcc13/src/DataFormats/PortableTestObjects/plugins/DataFormatsPortableTestObjectsPluginsAlpakaCudaAsync/ccUFa2Q7.ltrans0.ltrans.o::(.text.startup+0x1b9): more undefined references to `edmplugin::PluginFactory::get()' follow collect2: error: ld returned 1 exit status gmake: *** [tmp/el8_amd64_gcc13/src/DataFormats/PortableTestObjects/plugins/DataFormatsPortableTestObjectsPluginsAlpakaCudaAsync/libDataFormatsPortableTestObjectsPluginsAlpakaCudaAsync.so] Error 1 Leaving library rule at src/DataFormats/PortableTestObjects/plugins Entering library rule at src/DataFormats/PortableTestObjects/plugins >> Compiling alpaka/rocm edm plugin src/DataFormats/PortableTestObjects/plugins/alpaka/TrivialSerialisation.cc /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02932/el8_amd64_gcc13/external/gcc/13.4.0-6908cfdf803923e783448096ca4f0923/bin/c++ -c -DCMS_MICRO_ARCH='x86-64-v3' -DGNU_GCC -D_GNU_SOURCE -DTBB_USE_GLIBCXX_VERSION=130400 -DTBB_SUPPRESS_DEPRECATED_MESSAGES -DTBB_PREVIEW_RESUMABLE_TASKS=1 -DTBB_PREVIEW_TASK_GROUP_EXTENSIONS=1 -D__HIP_PLATFORM_HCC__ -D__HIP_PLATFORM_AMD__ -DBOOST_SPIRIT_THREADSAFE -DPHOENIX_THREADSAFE -DBOOST_MATH_DISABLE_STD_FPCLASSIFY -DBOOST_UUID_RANDOM_PROVIDER_FORCE_POSIX -DBOOST_MPL_IGNORE_PARENTHESES_WARNING -DCMSSW_GIT_HASH='CMSSW_16_1_X_2026-03-08-2300' -DPROJECT_NAME='CMSSW' -DPROJECT_VERSION='CMSSW_16_1_X_2026-03-08-2300' -Isrc -Ipoison -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02932/el8_amd64_gcc13/cms/cmssw-patch/CMSSW_16_1_X_2026-03-08-2300/src -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02932/el8_amd64_gcc13/external/pcre/8.43-6d98fda3bfd074ebb583e2d6a2c75d25/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02932/el8_amd64_gcc13/external/boost/1.80.0-b819d3899535842b3b08dcd6a725af1a/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02932/el8_amd64_gcc13/external/bz2lib/1.0.6-d113e1c6278c07eeaff5f84db9548446/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02932/el8_amd64_gcc13/external/libuuid/2.34-5ba7a8abfc0c5fecdc448cca360c25ff/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02932/el8_amd64_gcc13/external/rocm/7.1.0-3368aff4c211469d06c08ddf03fbf8f5/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02932/el8_amd64_gcc13/lcg/root/6.36.09-8a6db48e639169d76071b7f4a20dbfbf/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02932/el8_amd64_gcc13/external/tbb/v2022.3.0-88eb7be4ee320d604a798a914aea6359/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02932/el8_amd64_gcc13/external/xz/5.6.4-b9c4ffbc390ed320a5d57fd552e29a05/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02932/el8_amd64_gcc13/external/zlib/1.2.13-589f6bb51bbeba38a7adf5a10ea8a093/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02932/el8_amd64_gcc13/external/alpaka/2.1.1-3caaac8d71f39d400ab2511b2403675a/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02932/el8_amd64_gcc13/external/eigen/3bb6a48d8c171cf20b5f8e48bfb4e424fbd4f79e-95c02b8a883b2934decb8bb53ff9b486/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02932/el8_amd64_gcc13/external/eigen/3bb6a48d8c171cf20b5f8e48bfb4e424fbd4f79e-95c02b8a883b2934decb8bb53ff9b486/include/eigen3 -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02932/el8_amd64_gcc13/external/fmt/10.2.1-31d67b0504b4ba2262f03d3c5cad83c1/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02932/el8_amd64_gcc13/external/md5/1.0.0-26057075013e190e56dad37d35219376/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02932/el8_amd64_gcc13/external/tinyxml2/6.2.0-67924ead96ecb4e69aad321b767979a5/include -O3 -pthread -pipe -Werror=main -Werror=pointer-arith -Werror=overlength-strings -Wno-vla -Werror=overflow -std=c++20 -ftree-vectorize -Werror=array-bounds -Werror=format-contains-nul -Werror=type-limits -fvisibility-inlines-hidden -fno-math-errno --param vect-max-version-for-alias-checks=50 -Xassembler --compress-debug-sections -Wno-error=array-bounds -Warray-bounds -fuse-ld=bfd -march=x86-64-v3 -felide-constructors -fmessage-length=0 -Wall -Wno-non-template-friend -Wno-long-long -Wreturn-type -Wextra -Wpessimizing-move -Wclass-memaccess -Wno-cast-function-type -Wno-unused-but-set-parameter -Wno-ignored-qualifiers -Wno-unused-parameter -Wunused -Wparentheses -Werror=return-type -Werror=unused-value -Werror=unused-label -Werror=address -Werror=format -Werror=sign-compare -Werror=write-strings -Werror=delete-non-virtual-dtor -Werror=strict-aliasing -Werror=narrowing -Werror=unused-but-set-variable -Werror=reorder -Werror=unused-variable -Werror=conversion-null -Werror=return-local-addr -Wnon-virtual-dtor -Werror=switch -fdiagnostics-show-option -Wno-unused-local-typedefs -Wno-attributes -Wno-psabi -DEIGEN_DONT_PARALLELIZE -DEIGEN_MAX_ALIGN_BYTES=64 -DALPAKA_DEFAULT_HOST_MEMORY_ALIGNMENT=128 -DALPAKA_DISABLE_VENDOR_RNG -DALPAKA_HAS_STD_ATOMIC_REF -Wno-error=unused-variable -DALPAKA_ACC_GPU_HIP_ENABLED -DALPAKA_ACC_GPU_HIP_ONLY_MODE -DALPAKA_HOST_ONLY -DBOOST_DISABLE_ASSERTS -flto=auto -fipa-icf -flto-odr-type-merging -fno-fat-lto-objects -Wodr -fPIC -MMD -MF tmp/el8_amd64_gcc13/src/DataFormats/PortableTestObjects/plugins/DataFormatsPortableTestObjectsPluginsAlpakaROCmAsync/alpaka/TrivialSerialisation.cc.d src/DataFormats/PortableTestObjects/plugins/alpaka/TrivialSerialisation.cc -o tmp/el8_amd64_gcc13/src/DataFormats/PortableTestObjects/plugins/DataFormatsPortableTestObjectsPluginsAlpakaROCmAsync/alpaka/TrivialSerialisation.cc.o |
| <use name="DataFormats/TrivialSerialisation"/> | ||
| <use name="FWCore/PluginManager"/> | ||
| <use name="FWCore/Utilities"/> | ||
| <use name="HeterogeneousCore/AlpakaCore"/> |
There was a problem hiding this comment.
The dependence on HeterogeneousCore/AlpakaCore should be avoided on packages that DataFormats are allowed to depend on, because AlpakaCore depends on e.g. FWCore/Framework.
| #include "HeterogeneousCore/AlpakaCore/interface/alpaka/DeviceProductType.h" | ||
| #include "HeterogeneousCore/AlpakaCore/interface/alpaka/EDMetadata.h" |
There was a problem hiding this comment.
The DeviceProductType.h would be easily moved elsewhere from AlpakaCore.
The EDMetadata uses FWCore/Concurrency/interface/WaitingTaskWithArenaHolder.h, and we'd prefer to keep DataFormats independent of FWCore/Concurrency as well. The EDMetadata would probably have to be broken into two classes.
There was a problem hiding this comment.
@ghyls can you rebase this PR on top of #50490, and update the includes and BuildFile accordingly ?
| #include "HeterogeneousCore/AlpakaCore/interface/alpaka/DeviceProductType.h" | |
| #include "HeterogeneousCore/AlpakaCore/interface/alpaka/EDMetadata.h" | |
| #include "DataFormats/AlpakaCommon/interface/alpaka/DeviceProductType.h" | |
| #include "DataFormats/AlpakaCommon/interface/alpaka/EDMetadata.h" |
|
Hi, we will build the next prerelease, CMSSW_16_1_0_pre3, in a couple of days. If you'd like this PR to be included into 16_1_0 and think this is feasible to converge within 1-2 days, please let us know. Alternatively, this can always go into a future 16_1_X build. |
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50154/48658
|
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50154/48663
|
|
please test |
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50154/48665
|
|
please test |
| <use name="DataFormats/Portable"/> | ||
| <use name="DataFormats/PortableTestObjects"/> | ||
| <use name="DataFormats/TrivialSerialisation"/> | ||
| <use name="Eigen"/> |
There was a problem hiding this comment.
Should this be eigen lowercase ?
| <use name="Eigen"/> | |
| <use name="eigen"/> |
| namespace alpaka_portable_test = ALPAKA_ACCELERATOR_NAMESPACE::portabletest; | ||
| namespace alpaka_ngt = ALPAKA_ACCELERATOR_NAMESPACE::ngt; |
There was a problem hiding this comment.
Since you do
using namespace ALPAKA_ACCELERATOR_NAMESPACE;these are not really needed, I think you can simply use e.g. portabletest::TestSoALayout and ngt::SerialiserBase, etc.
There was a problem hiding this comment.
The “issue” is that the portabletest and the ngt namespaces exist both inside and outside the ALPAKA_ACCELERATOR_NAMESPACE, so directly using e.g. ngt::SerialiserBase would make the compiler throw an error: reference to 'ngt' is ambiguous. If the aliases are not nice, we could use ALPAKA_ACCELERATOR_NAMESPACE::ngt every time instead of alpaka_ngt. I don't know if there is any other solution...
There was a problem hiding this comment.
I see.
OK, for a test let's not do anything more complicated.
| // that properties are not present, as they are not needed. | ||
| static_assert(not ngt::HasTrivialCopyProperties<T>); | ||
| } else if constexpr (not ngt::HasTrivialCopyProperties<T>) { | ||
| // If T has no TrivialCopyProperties, call initialize() without any additional arguments. |
There was a problem hiding this comment.
Can you add back these comments, here and just below ?
Or is there an explicit reason why they were removed ?
| <use name="DataFormats/AlpakaCommon"/> | ||
| <use name="DataFormats/Portable"/> |
There was a problem hiding this comment.
I'm commenting on this one, but it applies to all plugins/BuildFile.xml files.
Are these two really needed?
I expect that DataFormats/AlpakaCommon is included via HeterogeneousCore/TrivialSerialisation, and DataFormats/Portable via DataFormats/VertexSoA.
In general I find it cleaner to list only the packages for which source files have #include directives.
In this case it would be
<use name="DataFormats/VertexSoA"/>
<use name="HeterogeneousCore/AlpakaInterface"/>
<use name="HeterogeneousCore/TrivialSerialisation"/>
| EDMetadata& metadata, | ||
| bool tryReuseQueue) override { | ||
| WrapperType const& w = dynamic_cast<WrapperType const&>(wrapper); | ||
| if constexpr (detail::useProductDirectly) { |
There was a problem hiding this comment.
Isn't detail::useProductDirectly always false for the device serialiser ?
Or does the alpaka-based serialiser support also host products ?
| std::unique_ptr<edm::WrapperBase> get(std::shared_ptr<EDMetadata> metadata) override { | ||
| if constexpr (detail::useProductDirectly) { | ||
| return std::make_unique<WrapperType>(edm::WrapperBase::Emplace{}, std::move(data_)); | ||
| } else { | ||
| // metadata is needed to construct edm::DeviceProduct<T> | ||
| return std::make_unique<WrapperType>(edm::WrapperBase::Emplace{}, std::move(metadata), std::move(data_)); | ||
| } | ||
| } |
There was a problem hiding this comment.
Can you remind me why we need to pass a metadata object (which then is unused) also in the host case ?
|
-1 Failed Tests: UnitTests Failed Unit TestsI found 1 errors in the following unit tests: ---> test TestHeterogeneousCoreTrivialSerialisationGenericClonerPortable had ERRORS Comparison SummarySummary:
|
|
|
||
| <test name="TestHeterogeneousCoreTrivialSerialisationGenericCloner" command="cmsRun ${LOCALTOP}/src/HeterogeneousCore/TrivialSerialisation/test/testGenericCloner_cfg.py"/> | ||
| <test name="TestHeterogeneousCoreTrivialSerialisationGenericClonerHost" command="cmsRun ${LOCALTOP}/src/HeterogeneousCore/TrivialSerialisation/test/testGenericClonerHost_cfg.py"/> | ||
| <test name="TestHeterogeneousCoreTrivialSerialisationGenericClonerPortable" command="cmsRun ${LOCALTOP}/src/HeterogeneousCore/TrivialSerialisation/test/testGenericClonerPortable_cfg.py"/> |
There was a problem hiding this comment.
eterogeneousCore/TrivialSerialisation/test/testGenericClonerPortable_cfg.py does not exist.
|
The failure happens because the unit test that supposedly uses an alpaka-based However, given that some code that uses the device product serialisation has been developed in #50503, I think it would be better to reuse those developments and to actually implement an alpaka-based |
|
Last comment, I would suggest to split the PR into three commits:
|
|
superseded by #50597 |
|
please close |
PR description:
This adds support for device products to the serialisation mechanism under
HeterogeneousCore/TrivialSerialisation.It implements an alpaka Plugin Factory under
HeterogeneousCore/TrivialSerialisation/interface/alpaka/similar to the existing host one. Some of the features of this new Plugin Factory are:.sofiles define the same plugin from the same factory, for separate backends. The factory registration is as follows:DEFINE_TRIVIAL_SERIALISER_PLUGIN_DEVICE(TYPE)macro, whereTYPEis the inner product type (e.g. PortableDeviceCollection<...>), not wrapped in DeviceProduct. The plugins are registered under both the mangled typeid name and EDM_STRINGIZE(TYPE). EDM_STRINGIZE(TYPE) is more human-readable, and thus more suitable for Python configuration files.For example, a plugin might be registered as follows:
DEFINE_TRIVIAL_SERIALISER_PLUGIN_DEVICE(hcal::RecHitDeviceCollection);and then referred to from a configuration:
This feature is also added to the existing non-alpaka serialiser plugin factory.
This PR also adds plugin definitions for various types in DataFormats.
PR validation:
The following tests are included in the PR:
DataFormats/TrivialSerialisation/test/alpaka/test_catch2_MemoryCopyTraitsPortable.dev.ccHeterogeneousCore/TrivialSerialisation/test/alpaka/test_catch2_portableCollectionsSerialiserPluginFactory.dev.ccA description of the tests are included at the top of each file.
All tests pass.