Set up HGCAL GPU vs CPU DQM#50974
Conversation
|
cms-bot internal usage |
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50974/49380
|
|
A new Pull Request was created by @fiemmi for master. It involves the following packages:
The following packages do not have a category, yet: DQM/HGCAL @Martin-Grunewald, @cmsbuild, @davidlange6, @fabiocos, @ftenchini, @mandrenguyen, @mmusich can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
| #include "FWCore/Framework/interface/MakerMacros.h" | ||
| #include "DataFormats/Candidate/interface/Candidate.h" | ||
| #include "DataFormats/CaloRecHit/interface/CaloClusterCollection.h" | ||
| #include "DataFormats/CaloRecHit/interface/CaloCluster.h" |
There was a problem hiding this comment.
picky but can you alpha-order?
| edm::EDGetTokenT<reco::CaloClusterCollection> tokenMonitoredLayerClusters_; | ||
| edm::EDGetTokenT<reco::CaloClusterCollection> tokenReferenceLayerClusters_; |
There was a problem hiding this comment.
| edm::EDGetTokenT<reco::CaloClusterCollection> tokenMonitoredLayerClusters_; | |
| edm::EDGetTokenT<reco::CaloClusterCollection> tokenReferenceLayerClusters_; | |
| const edm::EDGetTokenT<reco::CaloClusterCollection> tokenMonitoredLayerClusters_; | |
| const edm::EDGetTokenT<reco::CaloClusterCollection> tokenReferenceLayerClusters_; |
| tokenReferenceLayerClusters_( | ||
| consumes<reco::CaloClusterCollection>(iConfig.getParameter<edm::InputTag>("referenceLayerClusters"))) {} | ||
|
|
||
| HGCALGPUvsCPUComparisonHists::~HGCALGPUvsCPUComparisonHists() {} |
There was a problem hiding this comment.
| HGCALGPUvsCPUComparisonHists::~HGCALGPUvsCPUComparisonHists() {} |
provided the method declaration is declared override
| void HGCALGPUvsCPUComparisonHists::beginJob(const edm::EventSetup& iSetup) {} | ||
|
|
||
| void HGCALGPUvsCPUComparisonHists::bookHistograms(DQMStore::IBooker& iBooker, edm::Run const&, edm::EventSetup const&) { | ||
| iBooker.setCurrentFolder("HGCAL"); |
There was a problem hiding this comment.
shouldn't this be configurable ?
At the very least I'd like the new plot to appear under HLT...
| const std::vector<reco::CaloCluster>& monitoredLayerClusters = iEvent.get(tokenMonitoredLayerClusters_); | ||
| const std::vector<reco::CaloCluster>& referenceLayerClusters = iEvent.get(tokenReferenceLayerClusters_); |
There was a problem hiding this comment.
what if the product is not available?
We don't want to crash processing because of missing input in DQM.
|
@fiemmi in addition to the review above, this relatively simple PR has 15 commits with sometimes not very useful comments, please consider squashing to a minimum. Also
|
| protected: | ||
| void beginJob(const edm::EventSetup& iSetup); | ||
| void analyze(const edm::Event& iEvent, const edm::EventSetup& iSetup) override; | ||
| void bookHistograms(DQMStore::IBooker& iBooker, edm::Run const& iRun, edm::EventSetup const& iSetup) override; |
There was a problem hiding this comment.
if this module has to run in the HLT, it must have a fillDescriptions. Please provide one.
|
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50974/49421
Code check has found code style and quality issues which could be resolved by applying following patch(s)
|
58d87d8 to
ad87c6a
Compare
|
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50974/49423
Code check has found code style and quality issues which could be resolved by applying following patch(s)
|
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50974/49424
|
|
-1 Failed Tests: UnitTests Failed Unit TestsI found 1 errors in the following unit tests: ---> test test_check_phase2_hlt_duplicates had ERRORS Comparison SummarySummary:
AMD_MI300X Comparison SummaryThere are some workflows for which there are errors in the baseline: Summary:
AMD_W7900 Comparison SummarySummary:
NVIDIA_H100 Comparison SummarySummary:
NVIDIA_L40S Comparison SummarySummary:
Max Memory Comparisons exceeding threshold NVIDIA_H100@cms-sw/core-l2 , I found 1 workflow step(s) with memory usage exceeding the error threshold: Expand to see workflows ...
Max Memory Comparisons exceeding threshold NVIDIA_L40S@cms-sw/core-l2 , I found 1 workflow step(s) with memory usage exceeding the error threshold: Expand to see workflows ...
|
|
Pull request #50974 was updated. @Martin-Grunewald, @ctarricone, @davidlange6, @fabiocos, @ftenchini, @gabrielmscampos, @mandrenguyen, @mmusich, @rseidita can you please check and sign again. |
| layerClusters = cms.VInputTag("hltHgcalLayerClustersEE", *ceh_layerClusters), | ||
| time_layerclusters = cms.VInputTag("hltHgcalLayerClustersEE:timeLayerCluster", *ceh_time_layerClusters), |
There was a problem hiding this comment.
I think we should define hltMergeLayerClustersSerialSync directly with the SerialSync collections. Otherwise, at definition time it is identical to hltMergeLayerClusters, and the Phase-2 HLT duplicate-check unit test fails.
| layerClusters = cms.VInputTag("hltHgcalLayerClustersEE", *ceh_layerClusters), | |
| time_layerclusters = cms.VInputTag("hltHgcalLayerClustersEE:timeLayerCluster", *ceh_time_layerClusters), | |
| layerClusters = cms.VInputTag("hltHgCalLayerClustersFromSoAProducerSerialSync", *ceh_layerClusters), | |
| time_layerclusters = cms.VInputTag("hltHgCalLayerClustersFromSoAProducerSerialSync:timeLayerCluster", *ceh_time_layerClusters) |
| alpakaValidationHLT.toModify(hltMergeLayerClustersSerialSync, | ||
| layerClusters = ["hltHgCalLayerClustersFromSoAProducerSerialSync", *ceh_layerClusters], | ||
| time_layerclusters = ["hltHgCalLayerClustersFromSoAProducerSerialSync:timeLayerCluster", *ceh_time_layerClusters] | ||
| ) |
There was a problem hiding this comment.
Given the comment above, this should no longer be needed
| alpakaValidationHLT.toModify(hltMergeLayerClustersSerialSync, | |
| layerClusters = ["hltHgCalLayerClustersFromSoAProducerSerialSync", *ceh_layerClusters], | |
| time_layerclusters = ["hltHgCalLayerClustersFromSoAProducerSerialSync:timeLayerCluster", *ceh_time_layerClusters] | |
| ) |
dadb719 to
cd5ccdb
Compare
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50974/49431
|
|
Pull request #50974 was updated. @Martin-Grunewald, @cmsbuild, @ctarricone, @davidlange6, @fabiocos, @ftenchini, @gabrielmscampos, @mandrenguyen, @mmusich, @rseidita can you please check and sign again. |
|
@cmsbuild, please test |
| ) | ||
|
|
||
| hltHgcalSoARecHitsProducerSerialSync = makeSerialClone(hltHgcalSoARecHitsProducer | ||
| ) |
There was a problem hiding this comment.
can you avoid going into the next line?
| # Process modifiers: ticl_barrel and alpaka | ||
| from Configuration.ProcessModifiers.alpaka_cff import alpaka | ||
| from Configuration.ProcessModifiers.ticl_barrel_cff import ticl_barrel | ||
| from Configuration.ProcessModifiers.alpakaValidationHLT_cff import alpakaValidationHLT |
There was a problem hiding this comment.
what do you need this for here?
There was a problem hiding this comment.
Good catch, this is just a leftover from the previous update, where we removed the invocation of alpakaValidationHLT.toModify(). It is not needed in the current version of the code. Will be removed in the next update.
| @@ -1,4 +1,5 @@ | |||
| import FWCore.ParameterSet.Config as cms | |||
| from HeterogeneousCore.AlpakaCore.functions import makeSerialClone | |||
There was a problem hiding this comment.
what do you need this for here?
There was a problem hiding this comment.
Along the same lines of the comment above, this is a leftover from a previous version of the code. Will be removed in the next update. Thanks for spotting it.
| hltHgcalLayerClustersHSci+ | ||
| hltHgcalLayerClustersHSi+ | ||
| hltMergeLayerClustersSerialSync) | ||
| alpakaValidationHLT.toReplaceWith(HLTTICLLocalRecoSequence, _HLTTICLLocalRecoSequence_heterogeneousGPUCPU) |
There was a problem hiding this comment.
the modifier should be explicitly imported in this file.
| @@ -1,4 +1,5 @@ | |||
| import FWCore.ParameterSet.Config as cms | |||
| from HeterogeneousCore.AlpakaCore.functions import makeSerialClone | |||
There was a problem hiding this comment.
what is this needed for here?
|
+1 Size: This PR adds an extra 56KB to repository Comparison SummarySummary:
AMD_MI300X Comparison SummarySummary:
AMD_W7900 Comparison SummarySummary:
NVIDIA_H100 Comparison SummarySummary:
NVIDIA_L40S Comparison SummarySummary:
Max Memory Comparisons exceeding threshold NVIDIA_H100@cms-sw/core-l2 , I found 1 workflow step(s) with memory usage exceeding the error threshold: Expand to see workflows ...
Max Memory Comparisons exceeding threshold NVIDIA_L40S@cms-sw/core-l2 , I found 1 workflow step(s) with memory usage exceeding the error threshold: Expand to see workflows ...
|
| private: | ||
| const edm::EDGetTokenT<reco::CaloClusterCollection> tokenMonitoredLayerClusters_; | ||
| const edm::EDGetTokenT<reco::CaloClusterCollection> tokenReferenceLayerClusters_; | ||
| const std::string topFolderName; |
There was a problem hiding this comment.
| const std::string topFolderName; | |
| const std::string topFolderName_; |
to be consistent.
| //2D | ||
| hLayerCluster2D_x = iBooker.book2D("hLayerCluster2D_x", "hLayerCluster2D_x", 200, -50, 50, 200, -50, 50); | ||
| hLayerCluster2D_y = iBooker.book2D("hLayerCluster2D_y", "hLayerCluster2D_y", 200, -50, 50, 200, -50, 50); | ||
| hLayerCluster2D_z = iBooker.book2D("hLayerCluster2D_z", "hLayerCluster2D_z", 250, -500, 500, 250, -500, 500); |
There was a problem hiding this comment.
do we really need this amount of bins in the 2D histograms?
The memory footprint of this PR in terms of DQM memory is on the high-ish side. See #50974 (comment)
DQMHistoSizes: Histogram memory added: 808.695 KiB( 65 files compared)
DQMHistoSizes: changed ( 34434.7503 ): 808.695 KiB HLT/HeterogeneousComparisons
There was a problem hiding this comment.
Will halve the number of bins on both axes in the next update.
| auto seed = (*referenceLayerClusters)[idx].seed(); | ||
| if (seedToIdx.find(seed) != seedToIdx.end()) { | ||
| edm::LogWarning("HGCALGPUvsCPUComparisonHists") << "Duplicate seed in reference collection."; | ||
| return; |
There was a problem hiding this comment.
do you really want to return here, or just continue in the loop?
| edm::Handle<reco::CaloClusterCollection> monitoredLayerClusters_, referenceLayerClusters_; | ||
| iEvent.getByToken(tokenMonitoredLayerClusters_, monitoredLayerClusters_); | ||
| iEvent.getByToken(tokenReferenceLayerClusters_, referenceLayerClusters_); | ||
| if (!(monitoredLayerClusters_.isValid()) || !(referenceLayerClusters_.isValid())) { | ||
| edm::LogWarning("HGCALGPUvsCPUComparisonHists") << "Monitored or reference collection is invalid."; | ||
| return; | ||
| } | ||
| const std::vector<reco::CaloCluster>* monitoredLayerClusters = monitoredLayerClusters_.product(); | ||
| const std::vector<reco::CaloCluster>* referenceLayerClusters = referenceLayerClusters_.product(); |
There was a problem hiding this comment.
| edm::Handle<reco::CaloClusterCollection> monitoredLayerClusters_, referenceLayerClusters_; | |
| iEvent.getByToken(tokenMonitoredLayerClusters_, monitoredLayerClusters_); | |
| iEvent.getByToken(tokenReferenceLayerClusters_, referenceLayerClusters_); | |
| if (!(monitoredLayerClusters_.isValid()) || !(referenceLayerClusters_.isValid())) { | |
| edm::LogWarning("HGCALGPUvsCPUComparisonHists") << "Monitored or reference collection is invalid."; | |
| return; | |
| } | |
| const std::vector<reco::CaloCluster>* monitoredLayerClusters = monitoredLayerClusters_.product(); | |
| const std::vector<reco::CaloCluster>* referenceLayerClusters = referenceLayerClusters_.product(); | |
| const auto& monitoredHandle = iEvent.getHandle(tokenMonitoredLayerClusters_); | |
| const auto& referenceHandle = iEvent.getHandle(tokenReferenceLayerClusters_); | |
| if (!monitoredHandle.isValid() || !referenceHandle.isValid()) { | |
| edm::LogWarning("HGCALGPUvsCPUComparisonHists") | |
| << "Monitored or reference LayerCluster collection is invalid."; | |
| return; | |
| } | |
| const reco::CaloClusterCollection& monitoredLayerClusters = *monitoredHandle; | |
| const reco::CaloClusterCollection& referenceLayerClusters = *referenceHandle; |
- Use edm::Handle with auto, and
getHandle()instead ofgetByToken() - Validity check -> same logic, cleaner syntax
- Prefer a
constreference over a raw pointer - do not use trailing underscores in locals.
|
|
||
| //look for GPU and CPU LayerClusters whose seeds match | ||
| //map LC seeds to LC indices for the reference collection | ||
| std::unordered_map<uint32_t, std::pair<unsigned, bool>> |
There was a problem hiding this comment.
I think the corresponding header file is missing for this.
| auto it = seedToIdx.find(monitored.seed()); | ||
| if (it != seedToIdx.end() && it->second.second == false) { | ||
| it->second.second = true; //establish a match | ||
| const auto& reference = (*referenceLayerClusters)[it->second.first]; |
There was a problem hiding this comment.
I think the whole code block L110 to L127 can be rewritten slightly more efficiently:
std::unordered_map<uint32_t, unsigned> seedToIdx;
seedToIdx.reserve(referenceLayerClusters->size());
for (unsigned idx = 0; idx < referenceLayerClusters->size(); idx++) {
auto [it, inserted] = seedToIdx.try_emplace((*referenceLayerClusters)[idx].seed(), idx);
if (!inserted) {
edm::LogWarning("HGCALGPUvsCPUComparisonHists") << "Duplicate seed in reference collection.";
return; // continue?
}
}
std::unordered_set<uint32_t> matched;
matched.reserve(referenceLayerClusters->size());
for (const auto& monitored : *monitoredLayerClusters) {
auto it = seedToIdx.find(monitored.seed());
if (it != seedToIdx.end() && !it->second.second) {
it->second.second = true;
const auto& reference = (*referenceLayerClusters)[it->second.first];
// fill histograms...
}
}There was a problem hiding this comment.
I really like the idea of using try_emplace. The current version of the code performs two hash-table operations (find(seed) and operator[](seed)) while try_emplace merges them to just one.
I propose to change
for (unsigned idx = 0; idx < referenceLayerClusters->size(); idx++) {
auto seed = (*referenceLayerClusters)[idx].seed();
if (seedToIdx.find(seed) != seedToIdx.end()) {
edm::LogWarning("HGCALGPUvsCPUComparisonHists") << "Duplicate seed in reference collection.";
return;
}
seedToIdx[seed] = {idx, false}; //initialze all reference LCs as unmatched
}to
for (unsigned idx = 0; idx < referenceLayerClusters.size(); idx++) {
auto [it, inserted] = seedToIdx.try_emplace(referenceLayerClusters[idx].seed(), idx, false); //initialze all reference LCs as unmatched
if (!inserted) {
edm::LogWarning("HGCALGPUvsCPUComparisonHists") << "Duplicate seed in reference collection.";
continue;
}
}At the same time, this proposal still uses one container (a std::unordered_map<uint32_t, std::pair<unsigned, bool>>) instead of two. Let me know if you would find this acceptable.
| hLayerCluster2D_nRecHits->Fill(reference.size(), monitored.size()); | ||
| } else { | ||
| edm::LogWarning("HGCALGPUvsCPUComparisonHists") << "No match or duplicate match to reference collection found."; | ||
| return; |
There was a problem hiding this comment.
again do you really want to return here?
| hltHgcalLayerClustersEE+ | ||
| hltHgcalLayerClustersHSci+ | ||
| hltHgcalLayerClustersHSi+ | ||
| hltMergeLayerClustersSerialSync) |
There was a problem hiding this comment.
overall I am a bit confused by how this sequence is written. Why are the producer instances hltHGCalUncalibRecHit , hltHGCalRecHit and the bloc hltHgcalLayerClustersEE+hltHgcalLayerClustersHSci+hltHgcalLayerClustersHSi repeated twice? Even if the framework elides the duplication is confusing to see them two times in the same sequence.
PR description:
This PR implements a new$x_{\textrm{GPU}} - x_{\textrm{CPU}}$ and $x_{\textrm{GPU}}:x_{\textrm{CPU}}$ respectively, where $x$ is a given TICL observable.
DQMEDAnalyzerto monitor TICL GPU and CPU reconstruction for HGCAL. It further schedules the analyzer through thealpakaValidationHLTprocModifier. The output consists ofTH1Fs andTH2Fs storingThis PR is not dependent on any other PR.
PR validation:
The PR has been validated through the following pipeline:
After running it, the aforementioned histograms can be found by opening the ROOT file
DQM_V0001_R000000001__Global__CMSSW_X_Y_Z__RECO.rootand inspecting the directoryDQMData/Run 1/HGCAL/Run summmary.If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:
This PR is not a backport.