Extend edmMpiSplitConfig for multiple controllers#50678
Conversation
|
cms-bot internal usage |
|
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50678/48900
Code check has found code style and quality issues which could be resolved by applying following patch(s)
|
900d1a0 to
ce7ce11
Compare
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50678/48901
|
|
A new Pull Request was created by @Annnnya for master. It involves the following packages:
@cmsbuild, @fwyzard, @makortel can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
|
How (un)likely is it that we get a collision in the hash of two process names? Given that most process names are quite short, maybe we could gather the names directly instead of the hashes ? |
|
please test |
|
@Annnnya could you
? |
|
From a cursory look the python part looks fine. |
|
type ngt |
|
-1 Failed Tests: RelVals-INPUT Failed RelVals-INPUTThe relvals timed out after 4 hours. Comparison SummarySummary:
|
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50678/49087
|
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50678/49089
|
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50678/49090
|
|
please test |
|
-1 Failed Tests: UnitTests The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic: You can see more details here: Failed Unit TestsI found 1 errors in the following unit tests: ---> test testMPITooManyStreams had ERRORS Comparison SummarySummary:
|
| } | ||
| edm::LogInfo("MPI") << "MPIController Comm World size: " << size; | ||
|
|
||
| // All processes exchane the hashes of their names |
There was a problem hiding this comment.
| // All processes exchane the hashes of their names | |
| // All processes exchange the hashes of their names. |
| * | ||
| * If this module encounters a configuration error it will call MPI_Abort() instead of throwing an exception. | ||
| * Otherwise the call to MPI_Finalize() in the MPIService destructor may hang. |
There was a problem hiding this comment.
| * | |
| * If this module encounters a configuration error it will call MPI_Abort() instead of throwing an exception. | |
| * Otherwise the call to MPI_Finalize() in the MPIService destructor may hang. |
|
@Annnnya can you update also the new tests, |
…ange the hashes at communicatior channels creation
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50678/49093
|
|
please test |
|
+1 Size: This PR adds an extra 32KB to repository The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic: You can see more details here: Comparison SummarySummary:
|
|
+heterogeneous |
|
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @ftenchini, @mandrenguyen, @sextonkennedy (and backports should be raised in the release meeting by the corresponding L2) |
|
+1 |
PR description:
This pull request corresponds to ngt issue #18
It changes the splitter to allow creating multiple-remote setups, where parameters for different processes are separated with ":"
It also changes the logic by which the
MPIControllerandMPISourcediscover each other: instead of specifying ranks of the follower process,MPIControllercan setremoteProcessparameter with remote process name, and it's rank will be discovered at runtime.MPISourcehas to specify the controller process name viacontrollerProcessNameparameter, andMPIControllermust specify name of matching process infollowerProcessNameparameter.Integration tests were modified to account for the configuration changes.
PR validation:
Tried using new splitter with the command:
edmMpiSplitConfig hlt.py -l local.py -m hltEcalDigisSoA hltEcalUncalibRecHitSoA -r remote_ecal.py -n ECAL : \ -m hltHcalDigisSoA hltHbheRecoSoA hltParticleFlowRecHitHBHESoA hltParticleFlowClusterHBHESoA -d hltHcalDigis -r remote_hcal.py -n HCALAlso integration test to split multiple remotes was added in MPICore package.