Switch to the new tracking baseline (single iteration, CA-extended Patatrack + LST, mkFit) as Phase 2 HLT default#50040
Conversation
|
cms-bot internal usage |
|
A new Pull Request was created by @VourMa for master. It involves the following packages:
@AdrianoDee, @DickyChant, @Martin-Grunewald, @antoniovagnerini, @cmsbuild, @ctarricone, @davidlange6, @fabiocos, @ftenchini, @gabrielmscampos, @mandrenguyen, @miquork, @mmusich, @nothingface0, @rseidita can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
|
test parameters:
|
|
@cmsbuild, please test |
| ) | ||
|
|
||
| from Configuration.ProcessModifiers.phase2CAExtension_cff import phase2CAExtension |
There was a problem hiding this comment.
Previously, extended pixel tracks without the corresponding ID were used here, even though these are not used anywhere in the downstream code, except for pixel vertexing, for reasons that are not explained with a comment in the relevant module. This was in disagreement with the rest of the configurations, which were using here the pixel tracks that are used by downstream modules.
With the deletion of these lines in this PR, extended with the corresponding ID are used here, i.e. the one that are used by downstream code. This behavior is to me the "natural" one, in agreement with the rest of the configuration. However, if there is a strong preference that this is kept as different from the rest of the configurations, I will adjust accordingly.
Tagging @elenavernazza and @mmusich as the original authors of this code. FYI @rovere.
This also affects a couple of replacements in the following lines.
There was a problem hiding this comment.
@VourMa, we are using e.g. the non-high purity pixel tracks to make track quality selection studies (Cc: @EmanueleCoradin). We would appreciate if you could keep sending those to nanoAOD until these studies are done. After we have a robust selection strategy for pixel tracks we can go back sending the HP collection.
Thanks for asking!
There was a problem hiding this comment.
do I understand correctly that *pixelTrack* tables are not used in the default NANO (I was looking at an expanded config in 29834.772)? In this sense these are a POG/DPG NANO flavor. Is there a workflow that enables these?
There was a problem hiding this comment.
Is there a workflow that enables these?
yes, actually multiple.
0.759: HLT phase-2 timing menu, with NANO:@Phase2HLT
0.772: HLT phase-2 NGT Scouting menu, with NANO:@NGTScouting
0.773: HLT phase-2 NGT Scouting menu, with NANO:@NGTScoutingVal
the are all included in the ph2_hlt matrix tested here:
cmssw/Configuration/PyReleaseValidation/scripts/runTheMatrix.py
Lines 178 to 182 in 19dbf0b
There was a problem hiding this comment.
ehm, I was looking at 772 and it is not using the pixelTrack table.
nanoAOD_step = cms.Path(dstNanoFlavour) doesn't have it.
There are sequences containing NanoPixelTables but are all unused
- hltPixelOnlyNanoFlavour
- dstValidationNanoFlavour
- hltValidationNanoFlavour
this is CMSSW_16_1_0_pre1
Did I miss something?
There was a problem hiding this comment.
try looking in .773 if you are referring specifically to the pixel tables. I thought you referred in general to the HLT nanoAOD-s not being in workflows.
|
-1 Failed Tests: HLTP2Integration HLTP2Timing RelVals Failed RelVals |
Would it be possible to identify which is the source of the memory usage increase on CPU? |
I have made a report previously on this point: All in all, the new memory for the default baseline is consistent with the memory of the (now deleted, as it became the new baseline) 0.7571workflow (HLT75e33TimingAlpakaSingleIterLSTSeedingMkFitBuilding). |
is it possible instead to do a direct measurement of the workflow memory (and possibly profiling) instead of trying to infer from the bot results? |
These bot measurements are pretty consistent across PRs (e.g. similar results can be inferred from the tests in #50283). What is the reason to mistrust them? |
|
is this CPU memory a blocking issue? Perhaps it's practical to make a github issue for a follow up. |
yes, let's follow-up in an issue. I created #50288 |
|
+hlt
|
In general nothing is a blocking issue. On the other hand, one of the outcome of this "strict" review is a considerable reduction of memory in the new tracking baseline from LST. While I agree we should try to have new developments integrated as soon as possible, we do also have to keep a constant eye on resource usage and act accordingly to reduce them, when and where feasible. |
While the physics performance was thoroughly reviewed within the TRK POG (@cms-sw/tracking-pog-l2) (as well as within TSG/HLT upgrade), and that the TRK POG supports this PR, let me highlight for historical precision that there are improvements foreseen on top of the current PR, physics-wise; e.g., see presentations at TRK POG last week (Feb 23), or today (Mar 2). |
are the circles links available for the timing/job reports? |
they are, follow the link at:
|
|
@cms-sw/dqm-l2 We have converged on this, could you take a look and let me know whether it looks OK from your side? |
|
+pdmv |
|
+dqm |
|
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @mandrenguyen, @ftenchini (and backports should be raised in the release meeting by the corresponding L2) |
|
+1 |
|
For the record, in the IB CMSSW_16_1_X_2026-03-03-2300 after this PR was merged we started observing an higher than usual rate of failed workflows in the GPU matrix on the machines with GPUs, from about 1 per IB to about 20-30 per IBs:
The failures is more often than not a segmentation violation in the step2 in the HLT, example log file with the stack trace containing: @JanGerritSchulz diagnosed and fixed this issue at #50318 (there might be other failures downstream). |

This PR switches the default tracking sequence for Phase 2 HLT from the current baseline (two iterations, Patatrack quads + legacy triplets for seeding, CKF for building) to a new baseline (single iteration, CA-extended Patatrack + LST for seeding, mkFit for building) proposed by the TRK POG, in coordination with HLT Upgrade.
Previous behavior (configs as defined above):
phase2LegacyPixelTracks: Current baseline but with legacy (instead of Patatrack) quads.phase2CAExtension,singleIterPatatrack,trackingLST,seedingLST,trackingMkFitCommon,hltTrackingMkFitInitialStep: New baseline.Behavior adter this PR (configs as defined above):
hltPhase2LegacyTracking: Current baseline but with legacy (instead of Patatrack) quads.hltPhase2LegacyTrackingPatatrackQuads: Current baseline.By switching to the new baseline, a significant simplification of the tracking modules has been performed by removing all intermediate tracking configurations. Apart from the configurations discussed above, only the following configurations remain for Phase 2 HLT:
trackingLST: single iteration, CA-extended Patatrack, LST for building.trackingMkFitFit: single iteration, CA-extended Patatrack + LST for seeding, mkFit for building, mkFit (instead of CKF) fitting.As a result of the above, the workflows of intermediate configurations have been removed. Together with that, the updates of #49755 (and this PR is superseded by this one) have been included here, to avoid conflicts.
The NGT scouting configurations have been touched as apart of the aforementioned simplifications but all of the previous configurations are still supported.
The PR has been validated by running all the supported configuration and making sure that they produce exactly the same results as before the changes, i.e. this PR is purely technical for those configurations:
Current baseline (click me for validation plot)
Current baseline but with legacy (instead of Patatrack) quads (click me for validation plot)
New baseline (click me for validation plot)
`trackingLST` (click me for validation plot)
`trackingMkFitFit` (click me for validation plot)
`ngtScouting` (click me for validation plot)
`ngtScouting,trackingLST` (click me for validation plot)
FYI @rovere