Skip to content

feat: classify GitHub webhook destination and carry target_id to the worker#2503

Merged
solaris007 merged 11 commits into
mainfrom
feat/multi-github-destination-web
May 28, 2026
Merged

feat: classify GitHub webhook destination and carry target_id to the worker#2503
solaris007 merged 11 commits into
mainfrom
feat/multi-github-destination-web

Conversation

@solaris007
Copy link
Copy Markdown
Member

Summary

  • Web-tier half of multi-GitHub-destination support (ADR adobe/mysticat-architecture#94), pairing with the already-merged worker (chore(deps): update helix-post-deploy orb to v3.0.1 #27).
  • A GITHUB_TARGETS registry classifies each inbound webhook to a destination from its signed body; the HMAC handler selects the per-target webhook secret, verifies once, and attaches {target_id, app_slug} to the auth profile; the controller emits target_id on the SQS payload and uses the per-target appSlug for trigger rules.
  • With GITHUB_TARGETS unset, the handler is byte-for-byte the legacy single-secret path (no target_id) - so this ships as a no-op refactor until the registry is configured.

Plan: docs/plans/2026-05-27-multi-github-destination-web.md.

Migration / secrets (this PR is code only)

Ships safe (legacy path) with GITHUB_TARGETS unset. Cutover (later, per env): set GITHUB_TARGETS=[github-public default] once the worker's Vault targets["github-public"] exists (already added in dev) - this flips to the registry path emitting target_id="github-public". GHEC later adds GITHUB_WEBHOOK_SECRET_GHEC + a ghec rule. All via the existing deploy-secrets store. Full steps in the plan.

Test plan

  • npm test -> 10986 passing (full suite); github-targets.js 95% / handler 98.6% coverage
  • New: classifier truth table; HMAC classify-then-verify-once + skip (no-HMAC) + per-target secret + malformed-body; per-target appSlug; target_id on SQS; legacy no-op when GITHUB_TARGETS unset
  • Security: parse-before-verify, skip computes no HMAC, no untrusted body value in logs, target_id/app_slug sourced only from the post-HMAC profile
  • lint clean

Copilot AI review requested due to automatic review settings May 28, 2026 06:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds web-tier support for classifying GitHub webhooks by destination, verifying with per-target webhook secrets, and forwarding a non-secret target_id to the worker while preserving legacy behavior when GITHUB_TARGETS is unset.

Changes:

  • Adds a GitHub target registry/classifier and registry-mode HMAC verification path.
  • Passes per-target appSlug into trigger rules and emits target_id in SQS payloads.
  • Adds unit coverage and an implementation/rollout plan for multi-destination webhook support.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/support/github-targets.js Adds parsing, validation, metadata extraction, and classification for GitHub webhook targets.
src/support/github-webhook-hmac-handler.js Adds registry-mode classify/select-secret/verify flow while retaining legacy single-secret behavior.
src/controllers/webhooks.js Reads target metadata from auth profile and includes target_id in queued jobs.
src/utils/github-trigger-rules.js Allows caller-provided app slug for reviewer matching.
test/support/github-targets.test.js Covers registry parsing and classification behavior.
test/support/github-webhook-hmac-handler.test.js Covers registry-mode HMAC authentication and failure paths.
test/controllers/webhooks.test.js Covers target ID propagation and per-target app slug behavior.
test/utils/github-trigger-rules.test.js Covers explicit app slug override behavior.
docs/plans/2026-05-27-multi-github-destination-web.md Documents implementation plan, migration, and rollout steps.

| `GITHUB_WEBHOOK_SECRET_GHEC` | **Yes** | Envs serving GHEC | GHEC cutover only |

`GITHUB_WEBHOOK_SECRET` and `GITHUB_APP_SLUG` are unchanged. All of these are injected through the existing secret store via `npm run deploy-secrets` (`hedy --aws-update-secrets --params-file=secrets/secrets.env`, the same store that already provides `GITHUB_WEBHOOK_SECRET` / `GITHUB_APP_SLUG`). `GITHUB_TARGETS` is non-secret but rides the same channel.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@solaris007 solaris007 requested a review from MysticatBot May 28, 2026 07:40
@solaris007 solaris007 self-assigned this May 28, 2026
Copy link
Copy Markdown

@MysticatBot MysticatBot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @solaris007,

Strengths

  • Clean separation of concerns (src/support/github-targets.js): The classifier is pure, dependency-free, and testable in isolation. The HMAC handler owns the auth decision. The controller owns the routing decision. Textbook layered design.
  • Backward-compatible by construction: The if (!targetsRaw) branch preserves the legacy path byte-for-byte. The appSlug = env.GITHUB_APP_SLUG default parameter preserves existing 3-arg call sites. This deploys with zero behavioral change until GITHUB_TARGETS is set.
  • classify-then-verify-once is cryptographically sound: The attacker controls the body but not the secret. A forged body that classifies to target X fails HMAC verification against X's secret. No trial-verify loop means no timing side-channel across targets. O(1) and safe.
  • Correct log-injection avoidance (src/support/github-webhook-hmac-handler.js): All log strings are static literals. Body-derived values (meta.host, enterpriseSlug) are never interpolated into log messages.
  • Thorough input validation (src/support/github-targets.js): Every structural invariant is validated on parse with actionable error messages that name the offending entry. Good defensiveness for operator-authored JSON config.
  • Security-critical test (test/support/github-webhook-hmac-handler.test.js): The "rejects a GHEC body signed with the github-public secret" test directly verifies the core security invariant - classify-then-verify cannot be fooled by signing with the wrong target's secret.
  • Additive-then-cutover deployment model: GITHUB_TARGETS as a feature gate means the new code is a no-op until explicitly activated, with rollback being "unset the var." Operationally sound.

Issues

Important (Should Fix)

  1. parseTargets re-parses on every request in registry mode - src/support/github-webhook-hmac-handler.js (registry path, targets = parseTargets(context.env))

    The handler calls parseTargets(context.env) on every inbound webhook when GITHUB_TARGETS is set. This re-parses and re-validates the same JSON string on every request (JSON.parse, forEach with type checks, filter). context.env is immutable for the lifetime of a Lambda cold start.

    Why it matters: At webhook-processing volume (burst traffic from GitHub), repeated JSON parsing adds CPU overhead on a hot auth path. A misconfiguration also produces a noisy error log per-request rather than once.

    How to fix: Cache the parsed result keyed on the raw string value:

    let _cachedRaw = null;
    let _cachedTargets = null;
    
    // In checkAuth, replace the try/catch with:
    if (targetsRaw !== _cachedRaw) {
      try {
        _cachedTargets = parseTargets(context.env);
      } catch (e) {
        _cachedTargets = e; // cache the error too
      }
      _cachedRaw = targetsRaw;
    }
    if (_cachedTargets instanceof Error) {
      this.log(`Invalid GITHUB_TARGETS config (misconfigured=true): ${_cachedTargets.message}`, 'error');
      return null;
    }
    const targets = _cachedTargets;

Minor (Nice to Have)

  1. classify predicate order is counter-intuitive - src/support/github-targets.js:119-123

    The find callback checks t.match?.default === true as its first OR-branch before the slug match. This works correctly because parseTargets enforces default-last ordering (so slug-specific entries are iterated first). But a reader of classify alone gets the wrong mental model of the priority. Flipping the predicate so the slug check comes first makes classify correct regardless of input ordering:

    const match = targets.find((t) =>
      (enterpriseSlug
        && Array.isArray(t.match?.enterpriseSlug)
        && t.match.enterpriseSlug.includes(enterpriseSlug))
      || t.match?.default === true);
  2. webhookSecretEnvVar not validated as a valid env var name - src/support/github-targets.js (parseTargets)

    The validation checks it is a non-empty string but does not restrict to a safe character set. A typo like webhookSecretEnvVar: "__proto__" would access prototype properties on context.env. Low risk since the registry is operator-authored, but a format check is cheap defense-in-depth:

    if (!/^[A-Z][A-Z0-9_]*$/.test(t.webhookSecretEnvVar)) {
      throw new Error(`GITHUB_TARGETS["${t.id}"].webhookSecretEnvVar must be a valid env var name`);
    }
  3. Inconsistent naming across the flow - src/support/github-webhook-hmac-handler.js (.withProfile)

    The registry uses appSlug (camelCase), but the auth profile writes app_slug (snake_case, matching target_id). Both conventions are defensible for their respective boundaries, but tracing a value named two ways across three files adds a stumble point.

Recommendations

  • Consider emitting a structured log tag on classification outcome (classification_result: matched|skipped) for observability into traffic distribution once GHEC is added.
  • A JSDoc @typedef for the registry entry shape in github-targets.js would prevent drift as three files reference { id, match, appSlug, webhookSecretEnvVar }.
  • The 6 uncovered lines in github-targets.js (per codecov) are likely the missing-id branch and the typeof payload !== 'object' guard - trivial to cover with one more test each.

Assessment

Ready to merge? With fixes

The architecture is sound: correct secret boundary, safe deployment model, cryptographically valid classify-then-verify-once. The per-request parsing (finding 1) is the main item worth addressing before this hits production traffic - it is a straightforward cache that eliminates repeated JSON.parse on a hot auth path. The remaining findings are incremental improvements.

Next Steps

  1. Address the per-request parseTargets caching (Important).
  2. Consider the Minor items - the predicate flip and env-var-name validation are both one-liners.

Skill: pr-review | Model: us.anthropic.claude-opus-4-6-v1[1m] | Duration: 1m 35s | Cost: $2.86 | Commit: edc50570a85a8c0e2dc772c684673b465d3018c3
If this code review was useful, please react with 👍. Otherwise, react with 👎.

@MysticatBot MysticatBot added the ai-reviewed Reviewed by AI label May 28, 2026
@github-actions
Copy link
Copy Markdown

This PR will trigger a minor release when merged.

@solaris007
Copy link
Copy Markdown
Member Author

Dispositions on this review:

  • M1 (cache parseTargets) - not taking. The parse is ~microseconds for a 2-3 entry array on a low-QPS webhook path, dwarfed by the HMAC compute + SQS send + Slack post. The handler is instantiated per request (AUTH_HANDLERS holds the class), so caching across requests needs module-level mutable state in a security-sensitive auth module - a testability cost that outweighs the gain. The per-request error log on a misconfig is minor and arguably useful visibility. Will revisit if profiling ever shows it matters.
  • M2 (flip classify predicate) - not taking; the proposed fix doesn't do what it claims. Flipping the || operands does not make classify order-independent: find returns the first matching element, and a default entry matches via its own branch regardless of operand order, so a mis-ordered [default, ghec] still returns default either way. Correctness here is guaranteed by parseTargets enforcing default-last (+ an inline comment on the coupling). Genuine order-independence would need a two-pass (slug-pass then default-fallback) - happy to add if we want that defense-in-depth, but it's not what the operand flip delivers.
  • M3 (validate webhookSecretEnvVar) - done in d0d54cc0: parseTargets now rejects names not matching ^[A-Z][A-Z0-9_]*$ (+ test), so a __proto__-style typo fails loudly at parse instead of resolving to a truthy prototype object.
  • M4 (naming appSlug/app_slug) - leaving as-is. The registry uses JS camelCase (appSlug); the auth profile uses snake_case (app_slug) to match the existing user_id profile field. Each boundary is locally consistent.
  • Recommendations: the classification-outcome log is deferred to the GHEC cutover (as scoped); the @typedef and the two guard-line coverage tests are noted for a follow-up.

Copy link
Copy Markdown

@MysticatBot MysticatBot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @solaris007,

Strengths

  • webhookSecretEnvVar validation addressed (src/support/github-targets.js:79-81): The /^[A-Z][A-Z0-9_]*$/ regex rejects prototype-pollution vectors like __proto__ at parse time with a clear error message, including a test that exercises the exact case. Clean fix.
  • Classify-then-verify-once remains cryptographically sound: A forged body selects a candidate target whose secret the attacker cannot forge, so HMAC verification fails. No trial-verify loop, no timing side-channel across targets. The test "rejects a GHEC body signed with the github-public secret" directly validates this core security invariant.
  • Clean refactor of readBodyWithLimits (src/support/github-webhook-hmac-handler.js:37-53): Extracting the two-tier size check into a shared method eliminates duplicated defense logic across legacy and registry paths.
  • verifySignature extraction (src/support/github-webhook-hmac-handler.js:24-27): Shared helper reduces HMAC verification to a single call site per path, preventing crypto-behavior drift between the two paths.
  • Conditional target_id emission (src/controllers/webhooks.js:220): ...(targetId ? { target_id: targetId } : {}) preserves backward compatibility - legacy-mode SQS payloads omit the field entirely rather than sending undefined.
  • Body-derived values never reach log format strings: All dynamic log interpolations use either numbers (contentLength, byteLength) or validated operator-config strings (result.id). No attacker-controlled data can reach log messages.
  • Prior finding dispositions are technically sound: The pushback on re-parsing (microseconds on low-QPS path), predicate order (flipping || does not change find semantics), and naming conventions (boundary-appropriate casing) all hold on re-evaluation.

Issues

Important (Should Fix)

  1. Codecov/patch is failing: 6 uncovered lines in github-targets.js - src/support/github-targets.js

    Two validation branches lack test coverage, causing the codecov/patch check to fail:

    (a) The !t || typeof t.id !== 'string' || !t.id guard (entry missing an id field):

    it('throws when an entry is missing id', () => {
      const noId = JSON.stringify([{ match: { default: true }, appSlug: 's', webhookSecretEnvVar: 'V' }]);
      expect(() => parseTargets({ GITHUB_TARGETS: noId })).to.throw('missing a string "id"');
    });

    (b) The !payload || typeof payload !== 'object' guard in extractClassificationMetadata (JSON that parses to a primitive):

    it('returns null for JSON primitives', () => {
      expect(extractClassificationMetadata('"just a string"')).to.be.null;
      expect(extractClassificationMetadata('null')).to.be.null;
    });

    Both are one-liners that pin real validation behavior and green the CI check.

Minor (Nice to Have)

  1. result.id interpolated into error log, inconsistent with the PR's own static-message discipline - src/support/github-webhook-hmac-handler.js (registry path, missing-secret log)

    this.log(`Webhook secret for target ${result.id} not configured (misconfigured=true)`, 'error');

    Six lines above, the skip case uses a fully static message. result.id is env-sourced and validated as a non-empty string, so this is not exploitable today. But the PR's own commenting discipline (and the plan doc's "do not interpolate body-derived values") establishes a static-message pattern for this handler. A static message here would be consistent:

    this.log('Webhook secret for selected target not configured (misconfigured=true)', 'error');

    Alternatively, if distinguishing which target failed matters operationally, keep the interpolation and consider adding an id-format constraint (e.g., /^[a-z][a-z0-9-]*$/) in parseTargets to bound the character set.

Recommendations

  • Add a one-line precondition comment at the top of classify noting that targets must have the default entry last (enforced by parseTargets). The function's correctness depends on this ordering invariant, and making it explicit at the consumption site prevents future maintainers from missing the coupling.
  • The classify test suite asserts only .id on matched targets. Consider one assertion on the full returned shape (deep.include({ id, appSlug, webhookSecretEnvVar })) to catch any future refactoring that returns a partial projection.

Assessment

Ready to merge? Yes

The security model is sound: correct secret boundary, O(1) classify-then-verify with no timing side-channel, safe deployment model, and a no-op legacy fallback until configured. The two fixup commits cleanly address the accepted finding. The only actionable item is greening the codecov check with two trivial test additions the author already acknowledged - once those land, this is ready.

Next Steps

  1. Add the two missing test cases to green codecov/patch (Important).
  2. Consider the static-log-message and id-format suggestions at your discretion (Minor).

@solaris007 solaris007 requested a review from MysticatBot May 28, 2026 09:19
@solaris007
Copy link
Copy Markdown
Member Author

Re-review dispositions:

  • New-1 (codecov/patch) - already addressed in 76e35518, which predated this re-review. Added the missing-id and non-object/primitive guard tests suggested here, plus malformed-URL and registry empty-body cases; github-targets.js and the handler are now 100% line coverage and codecov/patch is green.
  • New-2 (result.id in the missing-secret log) - took option (b) in e0030366: parseTargets now validates each id against ^[a-z][a-z0-9-]{0,63}$ - the worker's exact target_id pattern. That bounds result.id (so the informative log stays safe) and stops the web tier emitting a target_id the worker would reject, surfacing a bad id at config-parse rather than as a per-message worker failure.
  • Rec (classify precondition comment) - already present (4ed6ac27): a comment above the find notes parseTargets enforces default-last, so slug-specific entries are evaluated before the catch-all.
  • Rec (assert full target shape) - done in e0030366: the EMU-slug classify test now asserts deep.include({ id, appSlug, webhookSecretEnvVar }).

Copy link
Copy Markdown

@MysticatBot MysticatBot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @solaris007,

Strengths

  • Previously flagged issues resolved cleanly. The codecov coverage gap (prior Important) is addressed by commit 76e35518 adding the missing test cases for the !t.id guard and typeof payload !== 'object' branch. The result.id log interpolation concern (prior Minor) is closed by commit e0030366 which adds the ^[a-z][a-z0-9-]{0,63}$ constraint, bounding the interpolated value to a safe, predictable character set.
  • Cross-system contract enforcement at config time (src/support/github-targets.js:68-69): The id-format validation ensures the web tier cannot emit a target_id the worker would reject. Catching contract violations at deploy/cold-start rather than at message-processing time turns a runtime mystery into a loud, immediate startup failure with an actionable error message.
  • Classify-then-verify-once remains cryptographically sound (src/support/github-webhook-hmac-handler.js): verifySignature uses crypto.timingSafeEqual with both buffers guaranteed 71 bytes by the SIGNATURE_PATTERN pre-check. No timing oracle across targets. O(1) secret selection.
  • Trust boundary preserved in controller (src/controllers/webhooks.js:885-886): target_id and app_slug are sourced exclusively from the AuthInfo profile (populated only after successful HMAC verification). No untrusted body values reach the SQS payload or trigger-rule evaluation.
  • webhookSecretEnvVar charset restriction (src/support/github-targets.js:85): ^[A-Z][A-Z0-9_]*$ prevents prototype-chain keys from being used as property lookups on context.env.
  • No body-derived values in log format strings: All dynamic log interpolations source from validated operator config or numeric values. The skip-path log correctly avoids interpolating meta.host.
  • Existing test fixtures remain valid under the new constraint: All prior fixtures ('x', 'ghec', 'github-public') already satisfy the new regex, so no test churn was needed.

Assessment

Ready to merge? Yes

The architecture is sound: feature-gated additive behavior, correct secret boundary, O(1) classification with no timing side-channel, and a validated cross-service contract on target_id format. The new commit does exactly one thing - adds a parse-time id format constraint that closes the prior review's concern about log interpolation safety while also ensuring web/worker agreement at config time rather than at runtime. All prior findings are resolved. No new concerns.


Skill: pr-review | Model: us.anthropic.claude-opus-4-6-v1[1m] | Duration: 0m 57s | Cost: $3.52 | Commit: e0030366d621ca067c2dd2654f05e267767cbb3d
If this code review was useful, please react with 👍. Otherwise, react with 👎.

@solaris007 solaris007 merged commit ef66b61 into main May 28, 2026
20 checks passed
@solaris007 solaris007 deleted the feat/multi-github-destination-web branch May 28, 2026 09:36
solaris007 pushed a commit that referenced this pull request May 28, 2026
# [1.523.0](v1.522.2...v1.523.0) (2026-05-28)

### Features

* classify GitHub webhook destination and carry target_id to the worker ([#2503](#2503)) ([ef66b61](ef66b61)), closes [adobe/mysticat-architecture#94](https://github.com/adobe/mysticat-architecture/issues/94) [#27](#27)
@solaris007
Copy link
Copy Markdown
Member Author

🎉 This PR is included in version 1.523.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

solaris007 added a commit that referenced this pull request May 28, 2026
… secret note (#2508)

## Summary
Follow-up to #2503. Two doc fixes:

1. **Un-break the docs-site build.** #2503 merged
`docs/plans/2026-05-27-multi-github-destination-web.md`, whose embedded
JSDoc object type (`@returns {{host: ...}|null}`) is parsed by Jekyll's
Liquid engine as an unterminated `{{ }}` variable. That threw a
`Liquid::SyntaxError` and failed the `pages build and deployment` run on
`main` (commit `ef66b617`). This workflow only runs on push to `main`,
not on PRs, so neither CI nor review caught it pre-merge.

Fix: add `docs/_config.yml` excluding `plans/` and `specs/`. These are
internal working docs, not the public API reference. This un-breaks the
build, keeps internal plans/specs off the public docs site, and clears a
pre-existing (non-fatal) Liquid warning in
`plans/2026-05-21-slack-observability-web-tier.md`.

2. **Correct the secret-store guidance.** The plan doc and a
`github-targets.js` comment claimed webhook secrets are injected via the
`deploy-secrets` npm script. That script writes to AWS Secrets Manager
from a `secrets/` file absent from the repo and CI. This service
actually loads runtime config from HashiCorp Vault
(`dx_mysticat/{env}/api-service`, KV v2) at cold start via the
`vaultSecrets` middleware (`src/index.js`). Corrected so the runbook
points operators at `vault kv patch`.

## Validation
- Ran the CI action image locally
(`ghcr.io/actions/jekyll-build-pages:v1.0.13`): `_config.yml` loads and
the `Liquid::SyntaxError` is gone. A fully green local build is blocked
only by the `jekyll-github-metadata` plugin's GitHub-API auth (no
network/token in the sandbox), which real CI provides - so the
definitive gate is the post-merge `pages build and deployment` run.
- No functional code change (the only `src/` edit is a one-line
comment).
solaris007 added a commit that referenced this pull request May 28, 2026
## Summary
Observability follow-up to #2503. Adds `targetId` (the resolved
destination, or `'legacy'` when `GITHUB_TARGETS` is unset) to the
"Enqueued webhook job" info log.

Neither the web tier nor (in our Coralogix) the Python worker logs the
resolved `target_id` today, so the classification outcome is only
inferable from timing - this is the gap hit while validating the dev
cutover. This makes it directly observable, so the upcoming GHEC cutover
can be verified from logs. Implements the observability recommendation
from the #2503 review.

## Test plan
- [x] `mocha test/controllers/webhooks.test.js` -> 30 passing; two new
cases assert the resolved id (`ghec`) and the `legacy` fallback on the
enqueue log.
- [x] eslint clean. No behavior change beyond the added log field;
`target_id` on the SQS payload is unchanged.
solaris007 pushed a commit that referenced this pull request May 28, 2026
# [1.524.0](v1.523.0...v1.524.0) (2026-05-28)

### Features

* **webhooks:** log resolved targetId on the enqueue log ([#2509](#2509)) ([d289627](d289627)), closes [#2503](#2503)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants