soak: add Gate 8b MAC-churn 24h harness#104
Merged
Merged
Conversation
Sibling to run-gate8b-soak.sh. Reuses the 2-PE shared-ESI topology at tests/soak/gate8b-soak.clab.yml and layers sustained bridge-FDB churn on top of the existing DF-flip loop, so the soak exercises: - kernel-learn / local-MAC observation → Type 2 origination - ADR-0059 receive-side aliasing-ECMP (FDB nexthop groups) - RFC 7432 §15.1 MAC mobility sequencing - Gate 8b BUM-suppression while FDB programming is in flight - the ADR-0059 drift-recovery counters under realistic timing MAC injection is direct kernel mutation via `docker exec <pe> bridge fdb add/del <mac> dev ce100a master static` — the daemon's local-MAC observation pipeline is the path under test, not the gRPC inject path. Churn pattern: bounded rotating MAC pool (default 512 MACs, ~256 per PE), batched ops every CHURN_INTERVAL_SEC (default 5s) picked weighted across add / delete / PE-to-PE mobility. Pool occupancy is bounded so the soak doesn't drift into empty or saturated. CSV samples extend the base harness with kernel-side FDB totals, extern_learn counts, ip nexthop counts, evpn_local_originations, evpn_local_origination_errors, evpn_local_observations_dropped, evpn_duplicate_mac_moves, and the four ADR-0059 drift counters, plus harness-side add / del / move totals. tests/soak/README.md gains the operator-facing section covering topology reuse, run knobs, sampling fields, and smoke-before-soak discipline. The MAC-churn variant is the alpha-checklist exit condition for flipping apply_bum_enforcement and apply_aliasing_ecmp production defaults.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Sibling to
tests/soak/run-gate8b-soak.sh. Reuses the existing 2-PE shared-ESI topology and layers sustained bridge-FDB churn on top of the DF-flip loop so the soak exercises:The alpha-checklist exit condition for flipping
apply_bum_enforcementandapply_aliasing_ecmpproduction defaults.Design choices (per discussion)
docker exec <pe> bridge fdb add/del <mac> dev ce100a master static). Exercises the local-MAC observation pipeline; deliberately not the gRPC inject path.CHURN_INTERVAL_SEC(default 5s), weighted across add / delete / PE-to-PE mobility. Pool occupancy bracketed around target so the soak doesn't drift to empty or saturated.ip nexthopcount (ADR-0059 NHGs),evpn_local_originations_total,evpn_local_origination_errors_total,evpn_local_observations_dropped_total,evpn_duplicate_mac_moves_total, all four ADR-0059 drift counters, and harness-side add / del / move totals.Smoke-before-soak path
Test plan
bash -n tests/soak/run-gate8b-mac-churn-soak.sh— syntax cleanNotes for review
tests/soak/analyze-gate8b-soak.pystill covers memory / DF gates that apply; MAC-churn-specific gates (e.g.evpn_local_origination_errors_total == 0, drift counters bounded, extern_learn count stable on receiver) surface from manual CSV inspection. A dedicated analyzer can land alongside the first 24h run's postmortem.add/del/movetotals must live in counter files (not bash vars) for the CSV writer to see them.