Add an MPISenderPortable and MPIReceiverPortable modules to send/receive arbitrary device collections#50503
Add an MPISenderPortable and MPIReceiverPortable modules to send/receive arbitrary device collections#50503ghyls wants to merge 2 commits into
MPISenderPortable and MPIReceiverPortable modules to send/receive arbitrary device collections#50503Conversation
|
cms-bot internal usage |
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50503/48670
|
|
Milestone for this pull request has been moved to CMSSW_17_0_X. Please open a backport if it should also go in to CMSSW_16_1_X. |
|
A new Pull Request was created by @ghyls for master. It involves the following packages:
@Dr15Jones, @Moanwar, @civanch, @fwyzard, @jfernan2, @kpedro88, @makortel, @mandrenguyen, @mdhildreth, @smuzaffar, @srimanob can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
| <use name="HeterogeneousCore/TrivialSerialisation"/> | ||
| <flags ALPAKA_BACKENDS="1"/> | ||
| <flags EDM_PLUGIN="1"/> | ||
| </library> No newline at end of file |
There was a problem hiding this comment.
A newline will be added
There was a problem hiding this comment.
will be renamed to testGenericClonerPortable_cfg.py
| bool hasCopyToHost() const override { return HasCopyToHost<T, Queue>; } | ||
|
|
||
| bool hasCopyToDevice() const override { | ||
| if constexpr (HasCopyToHost<T, Queue>) { |
There was a problem hiding this comment.
Why do you check for HasCopyToHost before checking for HasCopyToDevice ?
There was a problem hiding this comment.
To be followed up in a separate PR, see https://github.com/cms-ngt-hlt/ngt32/issues/46 .
| } | ||
|
|
||
| std::function<std::shared_ptr<Queue>(edm::WrapperBase const&)> getQueue() const override { | ||
| if constexpr (HasCopyToHost<T, Queue>) { |
There was a problem hiding this comment.
Why do you check for HasCopyToHost before returning the queue?
A DeviceProduct<T> can provide the queue even if the underling type does not have a copy-to-host specialisation.
There was a problem hiding this comment.
Hmm true, thank you.
getQueue() is only used in the context of registering a D to H transformation, which requires a HasCopyToHost<T> to exist. That is why otherwise I made it return immediately.
However, I see how this makes getQueue misleading. Since this check is already done by preTransformDtoH and transformDtoH I have removed it from getQueue.
| if (deviceSerialiser) { | ||
| entry.typeID = edm::TypeID{deviceSerialiser->productTypeID()}; | ||
| entry.getToken = | ||
| this->consumes(edm::TypeToGet{entry.typeID, edm::PRODUCT_TYPE}, edm::InputTag{label, instance}); |
There was a problem hiding this comment.
| this->consumes(edm::TypeToGet{entry.typeID, edm::PRODUCT_TYPE}, edm::InputTag{label, instance}); | |
| this->consumes(edm::TypeToGet{entry.typeID, edm::PRODUCT_TYPE}, src); |
?
|
|
||
| if (verbose_) { | ||
| edm::LogInfo("GenericClonerPortable") << "will clone device product of type '" << type << "', label '" | ||
| << label << "', instance '" << instance << "'"; |
There was a problem hiding this comment.
| << label << "', instance '" << instance << "'"; | |
| << src.label() << "', instance '" << src.instance() << "'"; |
There was a problem hiding this comment.
Actually it may be possible to simply print src, can check if this works
edm::LogInfo("GenericClonerPortable") << "will clone device product of type '" << type << "', tag '" << src << '\'';?
There was a problem hiding this comment.
Indeed, it works. Thank you
| if (hostSerialiser) { | ||
| entry.typeID = edm::TypeID{twd.typeInfo()}; | ||
| entry.getToken = | ||
| this->consumes(edm::TypeToGet{entry.typeID, edm::PRODUCT_TYPE}, edm::InputTag{label, instance}); |
There was a problem hiding this comment.
| this->consumes(edm::TypeToGet{entry.typeID, edm::PRODUCT_TYPE}, edm::InputTag{label, instance}); | |
| this->consumes(edm::TypeToGet{entry.typeID, edm::PRODUCT_TYPE}, src); |
| edm::LogInfo("GenericClonerPortable") << "will clone host product of type '" << type << "', label '" << label | ||
| << "', instance '" << instance << "'"; |
There was a problem hiding this comment.
| edm::LogInfo("GenericClonerPortable") << "will clone host product of type '" << type << "', label '" << label | |
| << "', instance '" << instance << "'"; | |
| edm::LogInfo("GenericClonerPortable") << "will clone host product of type '" << type << "', label '" << src.label() | |
| << "', instance '" << src.instance() << "'"; |
| } | ||
|
|
||
| entry.typeID = edm::TypeID{twd.typeInfo()}; | ||
| entry.getToken = this->consumes(edm::TypeToGet{entry.typeID, edm::PRODUCT_TYPE}, edm::InputTag{label, instance}); |
There was a problem hiding this comment.
| entry.getToken = this->consumes(edm::TypeToGet{entry.typeID, edm::PRODUCT_TYPE}, edm::InputTag{label, instance}); | |
| entry.getToken = this->consumes(edm::TypeToGet{entry.typeID, edm::PRODUCT_TYPE}, src); |
| edm::LogInfo("GenericClonerPortable") << "will clone ROOT-serialised product of type '" << type << "', label '" | ||
| << label << "', instance '" << instance << "'"; |
There was a problem hiding this comment.
| edm::LogInfo("GenericClonerPortable") << "will clone ROOT-serialised product of type '" << type << "', label '" | |
| << label << "', instance '" << instance << "'"; | |
| edm::LogInfo("GenericClonerPortable") << "will clone ROOT-serialised product of type '" << type << "', label '" | |
| << src.label() << "', instance '" << src.instance() << "'"; |
| [copyAsync = std::forward<TCopyAsync>(copyAsync), synchronize = this->synchronize()]( | ||
| edm::StreamID streamID, edm::WrapperBase const& wb, edm::WaitingTaskWithArenaHolder holder) -> std::any { | ||
| detail::EDMetadataAcquireSentry sentry(streamID, std::move(holder), synchronize); | ||
| auto productOnHost = copyAsync(sentry.metadata()->queue(), wb); |
There was a problem hiding this comment.
isn't this productOnDevice ?
There was a problem hiding this comment.
Sorry for not having noticed this...
It is fixed now.
|
Pull request #50503 was updated. @Dr15Jones, @Moanwar, @civanch, @cmsbuild, @fwyzard, @jfernan2, @kpedro88, @makortel, @mandrenguyen, @mdhildreth, @smuzaffar, @srimanob can you please check and sign again. |
|
allow @ghyls test rights |
makortel
left a comment
There was a problem hiding this comment.
Few things that caught my eye
| std::string productInstance) { | ||
| TransformerBase::registerTransformAsyncImp( | ||
| *this, iToken, returnType, std::move(productInstance), std::move(iPre), std::move(iF)); | ||
| } |
There was a problem hiding this comment.
Please extend this overload to the other module base classes that provide Transformer (that is, global and limited).
The real test suite for the Transformer is in
cmssw/FWCore/Integration/test/BuildFile.xml
Lines 60 to 78 in bd607fa
Could you extend those to cover this overload as well?
| edm::EDPutToken produces(edm::TypeID deviceProductType, | ||
| edm::TypeID hostProductType, | ||
| std::string instanceName, | ||
| TCopyAsync&& copyAsync, | ||
| TTransform&& transform) { |
There was a problem hiding this comment.
I suppose this produces() overload can be called only when the copy operation exists for the host-to-device copy (which is different from the typed produces()), and the present API does not support producing a type-erased host-only data product. I'd suggest to add a comment about that.
| edm::EDPutToken produces(edm::TypeID deviceProductType, | ||
| edm::TypeID hostProductType, |
There was a problem hiding this comment.
My personal preference would be to have the primarily produced type first, and the implicitly copied type second
| edm::EDPutToken produces(edm::TypeID deviceProductType, | |
| edm::TypeID hostProductType, | |
| edm::EDPutToken produces(edm::TypeID hostProductType, | |
| edm::TypeID deviceProductType, |
| TTransform&& transform) { | ||
| edm::EDPutToken token = this->producesCollector().template produces<Tr>(hostProductType, instanceName); | ||
|
|
||
| if constexpr (not detail::useProductDirectly) { |
There was a problem hiding this comment.
In the detail::useProductDirectly == true case, should deviceProductType == hostProductType?
| std::string instanceName, | ||
| TCopyAsync&& copyAsync, | ||
| TTransform&& transform) { | ||
| edm::EDPutToken token = this->producesCollector().template produces<Tr>(hostProductType, instanceName); |
There was a problem hiding this comment.
Why go through producesCollector() instead of
| edm::EDPutToken token = this->producesCollector().template produces<Tr>(hostProductType, instanceName); | |
| edm::EDPutToken token = Base::template produces<Tr>(hostProductType, instanceName); |
?
PR description:
These PR includes two separate developments, one of which requires the other.
Enable the registration at runtime of DtoH and HtoD product transformations:
Add an
MPISenderPortableandMPIReceiverPortablemodules to send/receive device collectionsMPISenderPortable.ccandMPIReceiverPortable.cc, which can send/receive device runtime-typed device collections for which a device TrivialSerialiser plugin exists, directly to/from device memory.PR validation:
A
GenericClonerPortabletest module is introduced to demonstrate the D to H and H to D transformation registrations at runtime. The module clones a host or a device product and registers the H to D (or D to H) transformation for it. The test is configured viatestGenericClonerDevice.pyA small test is added to
FWCore/Framework/test/stream_producer_catch2.ccto test the non-templatedregisterTransformAsyncoverload added toFramework/interface/stream/implementors.h.The Portable MPI modules (
MPISenderPortable.ccandMPIReceiverPortable.cc) are tested via the two new configurations added toHeterogeneousCore/MPICore/test.Backport
16_0_Xand16_1_Xfor it to be used in the NGT demonstrator during data taking this year.