Skip to content

fix(e2e): strict wireserver validation — fail fast on unexpected curl exits#8580

Open
r2k1 wants to merge 1 commit into
mainfrom
e2e-wireserver-strict-validation
Open

fix(e2e): strict wireserver validation — fail fast on unexpected curl exits#8580
r2k1 wants to merge 1 commit into
mainfrom
e2e-wireserver-strict-validation

Conversation

@r2k1
Copy link
Copy Markdown
Contributor

@r2k1 r2k1 commented May 25, 2026

What this PR does / why we need it:

Tightens the e2e wireserver-block validator so it fails fast on any unexpected curl exit code from an unprivileged pod, instead of retrying for a minute and only failing if no acceptable exit code ever showed up.

Why

validateWireServerBlocked is a security check: pods must not be able to reach the wireserver IP (168.63.129.16). The previous implementation retried for 1 minute, passing the check if curl eventually returned exit 28 (timeout). Two problems with that:

  1. It treats the result of a security check as something to wait out — but a successful curl from a pod is never a transient condition. It means FORWARD DROP/REJECT rules are missing right now, and the test should surface that immediately.
  2. The retry loop bounded the budget by time, not by observations. If the first curl returned exit 0 (reachable) but the last one returned 28 (timeout), the test would pass while a real regression was present somewhere in the window.

What

  • Whitelist exit codes 28 (FORWARD DROP timeout) and 7 (FORWARD REJECT refused) as the only valid "wireserver blocked" signals.
  • Anything else fails loudly with full diagnostics: FORWARD chain, KUBE-FORWARD chain, iptables-save filter, and conntrack entries for the wireserver IP. This makes regressions trivial to triage from the test log.
  • Retry the exec call only on transient kube-apiserver exec failures, never on the curl result itself — one observation of an unexpected exit code is enough to fail.

This is strictly more defensive than the original (which only accepted exit 28) because it also accepts REJECT-based blocks, while failing on every other class of regression instead of swallowing them.

Scope

e2e/validation.go only. Test-only change, no product code touched. Extracted from #8480.

Which issue(s) this PR fixes:

N/A — test-only hardening, no linked issue.

… exits

The previous validation retried for 1 minute, passing if curl eventually
timed out (exit 28). This had two problems:

1. Silently accepted other "success-looking" exit codes (e.g. 0 = reachable)
   if they happened on the last poll iteration in earlier variants.
2. Retried through what is fundamentally a binary security check — any
   successful curl from a pod means the FORWARD DROP/REJECT rules are
   missing or wrong, which is a regression to surface immediately, not a
   transient condition to wait out.

Changes:

- Whitelist exit codes 28 (FORWARD DROP timeout) and 7 (FORWARD REJECT
  refused) as the only valid "wireserver blocked" signals.
- Anything else fails loudly with full diagnostics: FORWARD chain,
  KUBE-FORWARD chain, iptables-save filter, and conntrack entries for
  the wireserver IP.
- Retry the exec call only on transient kube-apiserver exec failures,
  never on the curl result itself — a single observation of an
  unexpected exit code is enough to fail the security check.

This is strictly more defensive than the original (which only accepted
exit 28) because it also accepts REJECT-based blocks, while failing on
every other class of regression instead of swallowing them.

Extracted from #8480.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens the e2e security validator that ensures unprivileged pods cannot reach the Azure WireServer IP (168.63.129.16), changing it from “retry until timeout exit code appears” to “fail fast on any unexpected curl result,” while still retrying transient Kubernetes exec failures.

Changes:

  • Accept curl exit codes 28 (timeout/DROP) and 7 (connect failed/REJECT) as the only valid signals that WireServer is blocked.
  • Stop retrying based on curl outcomes; instead, fail immediately on any other curl exit code.
  • Add richer failure diagnostics (FORWARD + KUBE-FORWARD chain, iptables-save filter excerpt, conntrack entries) when an unexpected exit code occurs.

Comment thread e2e/validation.go
},
}

allowedExitCodes := map[string]bool{"28": true, "7": true}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be easier to read with a list of allowable exit codes rather than a map. The constant time lookup benefit we get below doesn't seem worth it given the length of time these tests take to run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants