From d2cecb80745cb7a6cbcefa7ac1d609aca5eb2259 Mon Sep 17 00:00:00 2001 From: Andy Dalton Date: Tue, 12 May 2026 10:34:18 -0400 Subject: [PATCH 01/11] Enforce epic-to-feature parent linking in design /sync The /sync command was silently dropping the parent field when creating epics, then listing it as a manual "Next Step" instead of treating it as a failure. This adds explicit Jira API field syntax, post-creation verification for both epics and stories, and prohibits deferring parent linking to manual follow-up. Co-Authored-By: Claude Opus 4.6 (1M context) --- design/skills/sync.md | 28 ++++++++++++++++++++++------ 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/design/skills/sync.md b/design/skills/sync.md index 077a099..e4d0047 100644 --- a/design/skills/sync.md +++ b/design/skills/sync.md @@ -148,7 +148,7 @@ For each epic, create a Jira issue: - **Type:** Epic - **Project:** {project key} -- **Parent:** {feature-key} +- **Parent:** {feature-key} — **mandatory; set via the `fields` parameter: `{"parent": {"key": "{feature-key}"}}`** - **Summary:** {epic title} - **Description:** @@ -173,8 +173,14 @@ Feature: {feature-key} - **T-Shirt Size:** {size} (set via the appropriate Jira field) -After creating each epic, record the Jira key in the sync manifest -immediately (before creating the next epic). +After creating each epic, verify the created issue has `parent.key` +equal to `{feature-key}` by reading the issue back. If the parent is +missing, stop and report the error — **do not silently defer parent +linking to a "Next Steps" section.** The Feature→Epic hierarchy is a +required outcome of this step, not an optional follow-up. + +Record the Jira key in the sync manifest immediately (before creating +the next epic). **If creation fails:** Stop immediately. Report which epics were created successfully (they are already recorded in the manifest) and which one @@ -200,7 +206,7 @@ For each story under each epic, create a Jira issue: - **Type:** Story - **Project:** {project key} -- **Parent:** {epic jira key} (from Step 4) +- **Parent:** {epic jira key} (from Step 4) — **set via the `fields` parameter: `{"parent": {"key": "{epic jira key}"}}`** - **Summary:** `[{prefix}] {story title}` - **Description:** @@ -247,8 +253,13 @@ local identifiers to Jira keys (see Reference Resolution). This applies to Dependencies (`Story 1.01` → `EDM-XXXX`) and Design Reference (`Epic 1` → `EDM-YYYY`). -After creating each story, record the Jira key in the sync manifest -immediately (before creating the next story). +After creating each story, verify the created issue has `parent.key` +equal to the epic's Jira key by reading the issue back. If the parent +is missing, stop and report the error — **do not silently defer parent +linking.** The Epic→Story hierarchy is a required outcome of this step. + +Record the Jira key in the sync manifest immediately (before creating +the next story). **Issue links:** After creating the story, create Jira issue links for each dependency listed in the story. Resolve dependency references to Jira keys @@ -311,8 +322,13 @@ should be: Summarize: - How many epics and stories were created - Any issues that were skipped (already existed from previous sync) +- Confirm the hierarchy was verified: every epic has parent = Feature (verified in Step 4), every story has parent = its epic (verified in Step 5) - Link to the Feature issue in Jira (which now shows the full hierarchy) +**Do not suggest manual parent linking as a next step.** If any parent +link was not set during creation, that is a failure that should have been +caught in Steps 4-5. + Present a translation table mapping all local artifact identifiers to Jira keys. This table is the definitive reference for anyone working with the local `.artifacts/` files who needs to find the corresponding Jira issue: From 27b490a163a9ff5d59f47de50882f8bfc5b7800a Mon Sep 17 00:00:00 2001 From: Andy Dalton Date: Tue, 12 May 2026 11:25:19 -0400 Subject: [PATCH 02/11] Add scenario consolidation to e2e /plan phase Scenarios that share the same setup (given) and action (when) are now merged into consolidated tests with multiple validations, reducing e2e suite execution time by avoiding repeated expensive setup+action. Adds a consolidation step, 8-validation cap, AC-tagged traceability, and updates downstream phases (revise, publish) for consistency. Co-Authored-By: Claude Opus 4.6 (1M context) --- e2e/guidelines.md | 2 + e2e/skills/plan.md | 109 ++++++++++++++++++++++++++++++++---------- e2e/skills/publish.md | 3 +- e2e/skills/revise.md | 7 ++- 4 files changed, 94 insertions(+), 27 deletions(-) diff --git a/e2e/guidelines.md b/e2e/guidelines.md index 731176a..c6bc8d0 100644 --- a/e2e/guidelines.md +++ b/e2e/guidelines.md @@ -4,6 +4,7 @@ - The e2e tests must validate the **user-facing behaviors** described in the story's acceptance criteria. Each AC maps to one or more concrete test scenarios. - **E2e tests exercise the system from the outside.** They validate observable outcomes through the project's test infrastructure, not internal component contracts. Write tests that a QE engineer would write — scenario-driven, using the project's actual tools and infrastructure. +- **Consolidate scenarios that share expensive setup and actions.** E2e setup (provisioning, navigation, waiting for state) and actions (API calls, UI interactions) dominate test execution time. When multiple scenarios share the same given+when, merge their validations into a single test. Repeating expensive setup+action just to check one more assertion is the single largest contributor to avoidable e2e suite bloat. **Cap consolidated scenarios at 8 validations** — beyond this, failure output becomes difficult to triage. If merging would exceed the cap, split by validation category into separate tests. - **Scope is e2e only.** Do not consider, plan, or write unit or integration tests. Every test this workflow produces must exercise the system end-to-end. - **Follow the project's existing e2e test patterns.** Read the most similar existing test suite before writing new tests. Match the test infrastructure usage, lifecycle hooks, naming conventions, labels, and assertion style. - Follow the **project's commit format** as discovered during `/ingest` and recorded in the validation profile. Commit one logical unit of work per commit — typically one commit per plan task. Don't batch everything into a single commit, but don't create a commit per file either. @@ -43,6 +44,7 @@ - Order-dependent tests (each test must be independently runnable) - Shared mutable state between tests (use per-test isolation) - Missing cleanup (follow the project's teardown patterns) + - Duplicated setup+action across scenarios (consolidate scenarios that share the same given+when — see Principles) - Test infrastructure bypass (use the project's test abstractions, not ad-hoc API calls or direct infrastructure access) - Run the project's e2e test suite (scoped to the new tests) before considering the work complete. - Self-review test code before presenting. Check for: unused imports, dead code, debug artifacts, hardcoded values that should be constants, inconsistencies with the reference suite's patterns. diff --git a/e2e/skills/plan.md b/e2e/skills/plan.md index 5a968ef..66b7b1b 100644 --- a/e2e/skills/plan.md +++ b/e2e/skills/plan.md @@ -49,7 +49,35 @@ Before writing the plan, create a mental map: - How should scenarios be organized into the project's test grouping blocks? - What labels/tags should each scenario have (for CI filtering)? -### Step 3: Write the Test Plan +### Step 3: Consolidate Scenarios + +E2e setup and actions are expensive — repeating them across scenarios that +only differ in what they validate wastes execution time. Before writing the +plan, consolidate: + +1. **Identify shared setup+action pairs.** Scan the scenarios from Step 2 + for groups that have identical setup (given) and action (when) but + different validations (then). +2. **Merge validations.** For each group, combine the validation blocks + into a single consolidated scenario. Tag each validation with its + source AC (e.g., `[AC-1]`, `[AC-3]`) to preserve traceability. +3. **Apply the cap.** A consolidated scenario must not exceed **8 + validations** — beyond this, failure output becomes difficult to + triage. If merging would exceed this, split by validation category + (e.g., state correctness vs. side effects vs. error conditions). + This means the setup+action runs more than once, which is a + deliberate tradeoff: the cost of repeating setup is acceptable when + the alternative is a single test too large to debug. +4. **Keep independent scenarios separate.** If a scenario has unique setup + or action that no other scenario shares, leave it as-is. Consolidation + is not forced — it applies only when there's a genuine shared + given+when. +5. **Record consolidation decisions.** Note which raw scenarios were merged, + why, and any that were intentionally kept separate. These decisions are + recorded in the Scenario Consolidation section of the plan (see the + Step 4 template). + +### Step 4: Write the Test Plan Write `.artifacts/e2e/{jira-key}/02-plan.md` with this structure: @@ -96,27 +124,50 @@ Write `.artifacts/e2e/{jira-key}/02-plan.md` with this structure: the project's terminology (e.g., Describe/Context/It for Ginkgo, test classes/methods for pytest, describe/it for Playwright).} -### AC-1: {description} +{Scenarios come in two forms: consolidated (shared setup+action, multiple + validations) and standalone (unique setup or action). Use consolidated + form whenever multiple scenarios share the same given+when — this avoids + repeating expensive e2e setup.} -#### Scenario 1.1: {description — what the test verifies} +### Consolidated Scenario C1: {description — shared context} +- **Covers:** AC-1, AC-3 - **Block structure:** {test grouping/nesting using the project's vocabulary} - **Labels/tags:** {using the project's label convention} -- **Setup:** {what the test needs beyond suite-level per-test setup} -- **Steps:** - 1. {action using test infrastructure method} - 2. {action} - 3. {verification using the project's assertion/polling style} -- **Assertions:** {what to verify — use the project's assertion style} +- **Setup (given):** {shared setup — what the test needs beyond suite-level per-test setup} +- **Action (when):** + 1. {first action using test infrastructure methods} + 2. {subsequent action — multi-step sequences are common in e2e} +- **Validations (then):** + 1. [AC-1] {assertion — use the project's assertion/polling style} + 2. [AC-1] {assertion} + 3. [AC-3] {assertion} - **Cleanup:** {what teardown hooks handle vs. test-specific cleanup} -#### Scenario 1.2: {error or edge case} -... +### Standalone Scenario S1: {description — what the test verifies} -### AC-2: {description} +- **Covers:** AC-2 +- **Block structure:** {test grouping/nesting using the project's vocabulary} +- **Labels/tags:** {using the project's label convention} +- **Setup (given):** {what the test needs beyond suite-level per-test setup} +- **Action (when):** {action using test infrastructure methods} +- **Validations (then):** + 1. [AC-2] {assertion — use the project's assertion/polling style} + 2. [AC-2] {second assertion for the same AC} +- **Cleanup:** {what teardown hooks handle vs. test-specific cleanup} + +## Scenario Consolidation + +{Explain consolidation decisions. Only consolidated scenarios appear in + this table — standalone scenarios do not need consolidation rationale. + For each consolidated scenario, note which raw scenarios were merged, + why (shared setup+action), and the validation count. Also note any + scenarios intentionally kept separate despite similar setup (e.g., + merging would exceed the 8-validation cap).} -#### Scenario 2.1: {description} -... +| Consolidated Scenario | Merged From | Validation Count | Rationale | +|-----------------------|-------------|------------------|-----------| +| C1 | AC-1 happy path, AC-3 status verification | {N} | {shared setup+action description} | ## Test Infrastructure Usage @@ -162,25 +213,32 @@ Write `.artifacts/e2e/{jira-key}/02-plan.md` with this structure: - **Commit message:** `{use commit format from 01-context.md}` - **Status:** Pending -### Task 2: Implement AC-1 scenarios +### Task 2: Implement consolidated scenario C1 - **Files:** `{test file path}` -- **What:** Scenarios 1.1, 1.2 — {brief description of what they test} -- **Why:** AC-1 +- **What:** Consolidated scenario C1 — {brief description of shared context and validations} +- **Why:** AC-1, AC-3 - **Commit message:** `{format}` - **Status:** Pending -### Task 3: Implement AC-2 scenarios -... +### Task 3: Implement standalone scenario S1 + +- **Files:** `{test file path}` +- **What:** Standalone scenario S1 — {brief description} +- **Why:** AC-2 +- **Commit message:** `{format}` +- **Status:** Pending ## Acceptance Criteria Coverage | AC | Description | Scenarios | Task | |----|-------------|-----------|------| -| AC-1 | {brief} | 1.1, 1.2 | Task 2 | -| AC-2 | {brief} | 2.1 | Task 3 | +| AC-1 | {brief} | C1 | Task 2 | +| AC-2 | {brief} | S1 | Task 3 | +| AC-3 | {brief} | C1 | Task 2 | -{Every AC must appear in at least one scenario. Flag any gaps.} +{Every AC must appear in at least one scenario. Consolidated scenarios + will appear in multiple AC rows — this is expected. Flag any gaps.} ## Risk Assessment @@ -194,7 +252,7 @@ Write `.artifacts/e2e/{jira-key}/02-plan.md` with this structure: These may be carried forward from the ingest phase's open questions.} ``` -### Step 4: Self-Review +### Step 5: Self-Review Before presenting the plan, verify: @@ -209,9 +267,12 @@ Before presenting the plan, verify: - [ ] Commit messages follow the project's format (from validation profile) - [ ] No scenarios require environment capabilities not present in the project - [ ] Task count is reasonable — if you have more than 8 tasks, consider whether the story needs re-scoping +- [ ] Scenarios sharing setup+action are consolidated, not duplicated as separate tests +- [ ] No consolidated scenario exceeds 8 validations +- [ ] Each validation in a consolidated scenario is tagged with its source AC - [ ] The plan is achievable — no scenarios depend on unmerged features or unavailable test infrastructure methods -### Step 5: Present to User +### Step 6: Present to User Show the user the complete plan and highlight: - Test approach and reference suite selection diff --git a/e2e/skills/publish.md b/e2e/skills/publish.md index e1a88b8..6395277 100644 --- a/e2e/skills/publish.md +++ b/e2e/skills/publish.md @@ -138,7 +138,8 @@ In either case, save the result to behaviors they validate.} ### E2E Test Scenarios -{Bulleted list of test scenarios, grouped by acceptance criterion.} +{Bulleted list of test scenarios — consolidated and standalone — with + their AC coverage noted (e.g., "[AC-1, AC-3]").} ### Test Infrastructure - **Suite location:** {path to the new test suite directory} diff --git a/e2e/skills/revise.md b/e2e/skills/revise.md index b6d4eaf..47a3c7f 100644 --- a/e2e/skills/revise.md +++ b/e2e/skills/revise.md @@ -42,7 +42,7 @@ The user's feedback may target: - Different scenarios ("Add an error path for when the device is offline during rollback") - Scenario removal ("We don't need to test the concurrent update edge case — the [DEV] integration tests cover it") - Scenario reordering ("Test the happy path before the error cases") -- Scenario splitting ("Scenario 1.1 is testing too many things, split it") +- Scenario splitting ("Consolidated scenario C1 has too many validations, split it") **Test approach changes:** - Different reference suite ("Use the fleet_update suite as the pattern, not the agent suite") @@ -91,6 +91,9 @@ After applying changes, verify: - Do test infrastructure methods referenced actually exist in the project? - Are labels consistent with the project's conventions? - Does the AC coverage matrix reflect the current scenario mapping? +- Are there new consolidation opportunities (added scenarios sharing setup+action with existing ones)? +- Do consolidated scenarios still respect the 8-validation cap after changes? +- Does the Scenario Consolidation table still match the current scenario list? - Are commit messages still properly formatted? ### Step 5: Update Artifact @@ -105,7 +108,7 @@ Summarize what changed: ## Revision Summary ### Changes Applied -- Scenario 1.2: Added error path for offline device during rollback +- Standalone scenario S2: Added error path for offline device during rollback - Task 2: Split into Task 2a (happy path) and Task 2b (error cases) - Reference suite: Changed from agent suite to fleet_update suite From db2fbeba2c62590290895db9226b504e0755e1e3 Mon Sep 17 00:00:00 2001 From: Andy Dalton Date: Tue, 12 May 2026 14:14:34 -0400 Subject: [PATCH 03/11] Unify per-task code review to use shared self-review gate Replace inline three-tier review logic in implement and e2e /code phases with references to _shared/recipes/self-review-gate.md. Add SUPPLEMENTARY_CRITERIA parameter to the recipe so callers can pass domain-specific evaluation criteria (e2e anti-patterns) and review focus directives (cross-cutting concerns) through the formal channel into the reviewer subagent. Reframe publish-time review from full re-review to cross-cutting review focused on inter-task interactions, since each sub-task is now individually reviewed during /code. Co-Authored-By: Claude Opus 4.6 (1M context) --- _shared/recipes/self-review-gate.md | 10 ++++- e2e/skills/code.md | 54 ++++++++----------------- e2e/skills/publish.md | 10 +++-- implement/skills/code.md | 63 +++++++++-------------------- implement/skills/publish.md | 10 +++-- 5 files changed, 54 insertions(+), 93 deletions(-) diff --git a/_shared/recipes/self-review-gate.md b/_shared/recipes/self-review-gate.md index 6f59639..b3a0fcb 100644 --- a/_shared/recipes/self-review-gate.md +++ b/_shared/recipes/self-review-gate.md @@ -19,6 +19,7 @@ over inline review but does not replace independent review. | DIFF_COMMAND | No | Must be a `git diff` invocation (the gate appends `--name-status` to it). Note: `git diff` only shows tracked file changes — if the workflow creates new untracked files, the calling workflow should stage them first or use a DIFF_COMMAND that captures them. | `git diff HEAD` | | MAX_ROUNDS | No | Maximum fix-and-re-review iterations | `3` | | CONTEXT_FILES | No | Workflow artifacts providing review context (e.g., design docs, requirements, implementation notes) | None | +| SUPPLEMENTARY_CRITERIA | No | Additional evaluation criteria beyond the review protocol. Passed to the reviewer alongside the standard criteria. Use for domain-specific checks (e.g., e2e anti-patterns) or review focus directives (e.g., cross-cutting concerns to prioritize, issues to skip). | None | ## Procedure @@ -86,13 +87,18 @@ independence. Load it with: - The diff output - The project's AGENTS.md/CLAUDE.md (if they exist) - Any CONTEXT_FILES provided by the calling workflow +- Any SUPPLEMENTARY_CRITERIA provided by the calling workflow Retain the subagent's ID for use in Step 4 — resuming the same reviewer gives it memory of its previous findings and concerns, producing more coherent follow-up reviews. -**If subagents are not available**, perform the review inline. Adopt the -reviewer perspective: evaluate the code as if you did not write it. +**If subagents are not available**, perform the review inline. Apply +any SUPPLEMENTARY_CRITERIA alongside the standard evaluation criteria. +Adopt the reviewer perspective: evaluate the code as if you did not +write it. Do not let your knowledge of the implementation rationale +excuse issues that a fresh reviewer would flag — review the diff on +its own merits. ### Step 3: Validate and Assess Findings diff --git a/e2e/skills/code.md b/e2e/skills/code.md index 791f8ae..30900c2 100644 --- a/e2e/skills/code.md +++ b/e2e/skills/code.md @@ -236,44 +236,22 @@ operate on the staged diff: git add {specific files} ``` -Then run a code review on the staged changes. The review method is -discovered, not hardcoded — use the first tier that applies: - -**Tier 1: Project-defined review tooling.** Check the project's -`AGENTS.md` or `CLAUDE.md` for automated code review tooling — CLI -commands or scripts that run a review. If review tooling is found, run it. - -**Tier 2: Independent review agent.** If no project-specific review -tooling exists, spawn a code review subagent with a fresh context -window. Give it: - -- The project's `AGENTS.md` / `CLAUDE.md` and test documentation -- The staged diff for this task (`git diff --cached`) -- The reference suite files for pattern comparison -- The test scenarios being implemented (from the plan task) - -Do **not** give it the implementing agent's reasoning or conversation -history — the value comes from independent eyes. - -The subagent should review as a senior QE engineer familiar with the -project's testing conventions, focusing on: correct test infrastructure -usage, assertion completeness, anti-patterns (hardcoded sleeps, brittle -selectors, missing cleanup), label conventions, and test isolation. - -**Tier 3: Structured self-review.** If the runtime does not support -spawning subagents, fall back to a structured self-review. Re-read the -staged diff and check for: - -- Anti-patterns: hardcoded sleeps, shared mutable state, missing cleanup -- Test infrastructure bypass: direct API calls instead of project-provided abstractions -- Missing async polling: synchronous assertions on async operations -- Hardcoded values: inline strings/numbers instead of constants -- Pattern drift: deviations from the reference suite's conventions -- Missing labels: tests without CI-filtering labels - -**Triage findings.** Evaluate each finding on its technical merit. -Fix findings that add value. Dismiss findings that don't with a brief -rationale. If any fixes were applied, re-stage the affected files before +Run the self-review gate on the staged changes. + +Read and follow `../../_shared/recipes/self-review-gate.md` with these +parameters: + +| Parameter | Value | +|-----------|-------| +| DIFF_COMMAND | `git diff --cached` | +| MAX_ROUNDS | `1` | +| CONTEXT_FILES | `.artifacts/e2e/{jira-key}/01-context.md`, `.artifacts/e2e/{jira-key}/02-plan.md` (if they exist) | +| SUPPLEMENTARY_CRITERIA | Check for e2e-specific issues: (1) Anti-patterns: hardcoded sleeps, shared mutable state, missing cleanup. (2) Test infrastructure bypass: direct API calls instead of project abstractions. (3) Missing async polling: synchronous assertions on async operations. (4) Hardcoded values: inline strings/numbers instead of project-defined constants. (5) Pattern drift: deviations from the reference suite's conventions. (6) Missing labels: tests without CI-filtering labels. | + +If the gate reports FLAG (unfixed CRITICAL or HIGH findings), stop and +present the findings to the user before committing. + +If the gate made code fixes, re-stage the affected files before proceeding to commit. Note any dismissed findings in the implementation report (Discoveries section). diff --git a/e2e/skills/publish.md b/e2e/skills/publish.md index 6395277..da2f582 100644 --- a/e2e/skills/publish.md +++ b/e2e/skills/publish.md @@ -62,10 +62,11 @@ Verify readiness: gh auth status ``` -### Step 2: Self-Review Gate +### Step 2: Cross-Cutting Review -Before pushing, run a self-review of the branch changes to catch issues -before they reach external reviewers. +Each sub-task was already reviewed individually during `/code`. This +review focuses on issues that only emerge when looking at the branch +as a whole — problems that span tasks or arise from their interaction. Read the `## Branch` section of `02-plan.md` to get the base branch, then read and follow `../../_shared/recipes/self-review-gate.md` with these @@ -76,6 +77,7 @@ parameters: | DIFF_COMMAND | `git diff {base}...HEAD` | | MAX_ROUNDS | `3` | | CONTEXT_FILES | `.artifacts/e2e/{jira-key}/01-context.md`, `.artifacts/e2e/{jira-key}/02-plan.md` (if they exist) | +| SUPPLEMENTARY_CRITERIA | This is a cross-cutting review. Each sub-task was already reviewed individually. Focus on inter-task issues: (1) Cross-test consistency (setup/teardown patterns, assertion style). (2) Shared fixtures or helpers that emerged across tasks. (3) Label and tag consistency across test files. (4) Pattern drift between tests written in different tasks. Skip issues already caught per-task: individual test correctness, per-test anti-pattern checks, single-task infrastructure usage. | If the gate reports FLAG (unfixed CRITICAL or HIGH findings), stop and present the findings to the user. Do not proceed until the user decides @@ -88,7 +90,7 @@ git add {fixed files} ``` ```bash -git commit -m "{jira-key}: address self-review findings" +git commit -m "{JIRA-KEY}: address cross-cutting review findings" ``` ### Step 3: Confirm Details diff --git a/implement/skills/code.md b/implement/skills/code.md index b0e7fdb..d5e3529 100644 --- a/implement/skills/code.md +++ b/implement/skills/code.md @@ -236,51 +236,24 @@ operate on the staged diff: git add {specific files} ``` -Then run a code review on the staged changes. The review method is -discovered, not hardcoded — use the first tier that applies: - -**Tier 1: Project-defined review tooling.** Check the project's -`AGENTS.md` or `CLAUDE.md` for automated code review tooling — CLI -commands or scripts that run a review (e.g., a review CLI, a linter -with review rules). General review checklists intended for human -reviewers do not qualify — feed those into Tier 2 or 3 as supplementary -context instead. If review tooling is found, run it. This is the -extensibility point — when the project adds review tooling, it gets -picked up automatically. - -**Tier 2: Independent review agent.** If no project-specific review -tooling exists, spawn a code review subagent with a fresh context -window. Give it: - -- The project's `AGENTS.md` / `CLAUDE.md` (for coding conventions) -- The staged diff for this task (`git diff --cached`) -- The surrounding files for context (files in the same package) -- The behavioral contracts being implemented (from the plan task) - -Do **not** give it the implementing agent's reasoning or conversation -history — the value comes from independent eyes, not rubber-stamping. - -The subagent should review as a senior engineer familiar with the -project's conventions, focusing on: correctness, edge cases, error -handling, naming, security (injection vectors, credential leaking), -and adherence to project patterns. - -**Tier 3: Structured self-review.** If the runtime does not support -spawning subagents, fall back to a structured self-review. Re-read the -staged diff and check for: - -- Unused imports, dead code, debug artifacts -- Missing error handling or nil/null checks -- Hardcoded values that should be constants or configuration -- Inconsistencies with surrounding code patterns -- Security: input validation, credential exposure, injection vectors - -**Triage findings.** Evaluate each finding on its technical merit — -the same pushback principle used in `/respond`. Fix findings that add -value. Dismiss findings that don't with a brief rationale. If any -fixes were applied, re-stage the affected files before proceeding to -commit. Note any dismissed findings in the implementation report -(Discoveries section) so there is a paper trail. +Run the self-review gate on the staged changes. + +Read and follow `../../_shared/recipes/self-review-gate.md` with these +parameters: + +| Parameter | Value | +|-----------|-------| +| DIFF_COMMAND | `git diff --cached` | +| MAX_ROUNDS | `1` | +| CONTEXT_FILES | `.artifacts/implement/{jira-key}/01-context.md`, `.artifacts/implement/{jira-key}/02-plan.md` (if they exist) | +| SUPPLEMENTARY_CRITERIA | None | + +If the gate reports FLAG (unfixed CRITICAL or HIGH findings), stop and +present the findings to the user before committing. + +If the gate made code fixes, re-stage the affected files before +proceeding to commit. Note any dismissed findings in the implementation +report (Discoveries section) so there is a paper trail. #### 3g: Commit diff --git a/implement/skills/publish.md b/implement/skills/publish.md index 8f955b7..b1d878c 100644 --- a/implement/skills/publish.md +++ b/implement/skills/publish.md @@ -62,10 +62,11 @@ Verify readiness: gh auth status ``` -### Step 2: Self-Review Gate +### Step 2: Cross-Cutting Review -Before pushing, run a self-review of the branch changes to catch issues -before they reach external reviewers. +Each sub-task was already reviewed individually during `/code`. This +review focuses on issues that only emerge when looking at the branch +as a whole — problems that span tasks or arise from their interaction. Read the `## Branch` section of `02-plan.md` to get the base branch, then read and follow `../../_shared/recipes/self-review-gate.md` with these @@ -76,6 +77,7 @@ parameters: | DIFF_COMMAND | `git diff {base}...HEAD` | | MAX_ROUNDS | `3` | | CONTEXT_FILES | `.artifacts/implement/{jira-key}/01-context.md`, `.artifacts/implement/{jira-key}/02-plan.md` (if they exist) | +| SUPPLEMENTARY_CRITERIA | This is a cross-cutting review. Each sub-task was already reviewed individually. Focus on inter-task issues: (1) Inconsistencies across files or tasks (error handling style, naming conventions, logging patterns). (2) Duplicated logic that emerged across separate tasks. (3) Integration gaps between components implemented in different tasks. (4) API surface coherence (public interfaces make sense together). Skip issues already caught per-task: individual function correctness, per-file error handling completeness, single-task test coverage. | If the gate reports FLAG (unfixed CRITICAL or HIGH findings), stop and present the findings to the user. Do not proceed until the user decides @@ -88,7 +90,7 @@ git add {fixed files} ``` ```bash -git commit -m "{jira-key}: address self-review findings" +git commit -m "{JIRA-KEY}: address cross-cutting review findings" ``` ### Step 3: Confirm Details From d00ddfb3b6a60fe420ed645656409674d0bf8093 Mon Sep 17 00:00:00 2001 From: Andy Dalton Date: Tue, 12 May 2026 14:48:09 -0400 Subject: [PATCH 04/11] Deepen code review from checklist to adversarial design review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add Design as evaluation criterion #4 in the shared review protocol with concrete detection signals (parameter count, import spread, repeated error handling, exposed internal types). Add "Challenge decisions, not just implementation" as the first core principle, pushing reviewers to question whether abstractions should exist and whether boundaries are in the right place — not just whether the implementation is correct. Replace inline code quality checklists in implement and e2e /validate Step 6 with self-review gate invocations, bringing subagent isolation, iterative fix-and-re-review, and the full protocol to the validation phase. Preserve domain-specific checks as SUPPLEMENTARY_CRITERIA. Remove stale criteria enumeration from code-review workflow. Update validation report templates to reflect gate-based review. Reframe self-review gate preamble to scope what it can and cannot catch without enumerating specific criteria. Co-Authored-By: Claude Opus 4.6 (1M context) --- _shared/recipes/self-review-gate.md | 16 ++++---- _shared/review-protocol.md | 32 +++++++++++---- code-review/skills/start.md | 3 +- e2e/skills/validate.md | 42 +++++++++---------- implement/skills/validate.md | 64 ++++++++++------------------- 5 files changed, 76 insertions(+), 81 deletions(-) diff --git a/_shared/recipes/self-review-gate.md b/_shared/recipes/self-review-gate.md index b3a0fcb..f1f79a4 100644 --- a/_shared/recipes/self-review-gate.md +++ b/_shared/recipes/self-review-gate.md @@ -4,13 +4,15 @@ A quality gate that reviews code changes before they are pushed or submitted as a PR. Used by workflows that make code changes (bugfix, implement, cve-fix, e2e) to catch issues before external review. -This gate is one peer in a defence-in-depth chain: self-review catches -mechanical issues that automated checks are good at finding (convention -violations, obvious bugs, missing error handling, inconsistencies with -surrounding code), while downstream reviewers (coderabbit, human reviewers) -provide genuinely independent perspectives on design, architecture, and -subtle correctness issues. A same-model subagent improves review quality -over inline review but does not replace independent review. +This gate is one peer in a defence-in-depth chain. It evaluates all +criteria in the review protocol using a same-model subagent for +isolation. A subagent reviewing only the diff, without the +implementation rationale loaded, can catch issues visible from the code +that the implementor is too close to see. It will not catch everything +a human reviewer would — subtle correctness issues and cross-system +design concerns benefit from genuinely independent review by downstream +reviewers (CodeRabbit, human reviewers). The goal is to catch what can +be caught early, not to replace independent review. ## Parameters diff --git a/_shared/review-protocol.md b/_shared/review-protocol.md index 40463d7..e775be8 100644 --- a/_shared/review-protocol.md +++ b/_shared/review-protocol.md @@ -6,10 +6,16 @@ workflow and the self-review gates embedded in other workflows. ## Core Principles -- **Review what the developer wrote, not what you would have written.** Evaluate - correctness, clarity, safety, and adherence to project conventions. Do not - impose stylistic preferences that are not codified in the project's linting - or contribution guidelines. +- **Challenge decisions, not just implementation.** Don't just verify that + the code does what it intends — question whether the intent is right. + Should this abstraction exist? Is this the right boundary between + components? Would a different approach avoid the complexity being managed + here? A review that only checks correctness is a linter with extra steps. +- **Review what the developer wrote, not what you would have written.** The + above principle challenges decisions; this one constrains how. Evaluate + the approach on its merits — does it work, is it clear, is it safe? Do + not impose stylistic preferences or rewrite working code in your preferred + idiom. Challenge the design, not the taste. - **Findings must be actionable.** Every finding must include a concrete suggestion the developer can evaluate and apply. Vague observations ("consider improving this") are not findings. @@ -36,13 +42,23 @@ Evaluate changes against these categories, prioritized by impact: appropriately? Are failure modes handled? 3. **Security** — Are there injection risks, unsafe operations, exposed secrets, or other OWASP-category concerns? -4. **Performance** — Are there unnecessary allocations, N+1 queries, unbounded +4. **Design** — Does each new abstraction earn its complexity? Are + responsibilities clearly divided — no god functions, no single type + accumulating unrelated concerns? Do interfaces hide implementation details + and expose coherent contracts? Are there implicit assumptions between + components that should be explicit (shared constants, expected call order, + assumed preconditions)? Signals: a function taking many parameters + (missing intermediate type), a type importing from several unrelated + packages (mixed concerns), identical error-handling blocks repeated + across call sites (missing shared helper), a new interface that exposes + internal types to callers. +5. **Performance** — Are there unnecessary allocations, N+1 queries, unbounded operations, or other efficiency concerns? -5. **Naming and clarity** — Are names descriptive? Is the intent clear from +6. **Naming and clarity** — Are names descriptive? Is the intent clear from reading the code? -6. **Test coverage** — Are the changes tested? Are edge cases covered? Are +7. **Test coverage** — Are the changes tested? Are edge cases covered? Are tests testing contracts, not implementation? -7. **Project conventions** — Do the changes follow the conventions discovered +8. **Project conventions** — Do the changes follow the conventions discovered from the project's AGENTS.md, CLAUDE.md, linting configs, and contribution guidelines? diff --git a/code-review/skills/start.md b/code-review/skills/start.md index 7cfa409..c96cc1c 100644 --- a/code-review/skills/start.md +++ b/code-review/skills/start.md @@ -247,8 +247,7 @@ change interact correctly with surrounding code? The diff shows what changed; the surrounding code reveals whether the change fits. The reviewer should evaluate all categories defined in -`../../_shared/review-protocol.md` (Correctness, Error handling, Security, -Performance, Naming and clarity, Test coverage, Project conventions). +`../../_shared/review-protocol.md`. If the user provided focus guidance, prioritize those areas but still report CRITICAL and HIGH findings in other categories. diff --git a/e2e/skills/validate.md b/e2e/skills/validate.md index 657259c..b418b27 100644 --- a/e2e/skills/validate.md +++ b/e2e/skills/validate.md @@ -176,32 +176,30 @@ If regressions are found: After automated checks pass, review the new test code for issues that automated tooling does not catch. Read the base branch from the `## Branch` -section of `02-plan.md`, then diff against it: +section of `02-plan.md`, then run the self-review gate. -```bash -git diff {base}..HEAD -``` +Read and follow `../../_shared/recipes/self-review-gate.md` with these +parameters: -**Test quality:** -- Do tests actually verify what the AC describes? (not asserting something tangentially related) -- Are assertions specific enough? (not just "no error" — verify the actual outcome) -- Are step annotations (if the project uses them) clear and meaningful? -- Will these tests break only when real behavior changes, not when unrelated details change? +| Parameter | Value | +|-----------|-------| +| DIFF_COMMAND | `git diff {base}..HEAD` | +| MAX_ROUNDS | `3` | +| CONTEXT_FILES | `.artifacts/e2e/{jira-key}/01-context.md`, `.artifacts/e2e/{jira-key}/02-plan.md` (if they exist) | +| SUPPLEMENTARY_CRITERIA | This is the full-branch validation review for e2e test code. Anti-pattern detection is already complete (Step 4) — skip those checks. Focus on test design depth: (1) **Assertion specificity** — do tests verify the actual behavioral outcome the AC describes, or just assert "no error"? (2) **Brittleness** — will tests break only when real behavior changes, or will they break when unrelated implementation details change? (3) **Test isolation** — could any test leak state (credentials, resources, configuration) that affects other suites or environments? (4) **Reference suite consistency** — does the test structure match the project's reference suite, or does it introduce divergent patterns that will confuse future contributors? | -**Maintainability:** -- Are test names descriptive of what they verify? -- Is the test structure consistent with the reference suite? -- Are helper functions used appropriately (not duplicating existing utilities)? +If the gate reports FLAG (unfixed CRITICAL or HIGH findings), stop and +present the findings to the user before proceeding. -**Safety:** -- No test code that could leak credentials or modify production state -- No test code that creates resources without cleanup -- No test code that could interfere with other test suites +If the gate made code fixes, commit them before continuing: -If this review surfaces issues: +```bash +git add {fixed files} +``` -1. Fix issues in the new test code, commit each fix separately -2. Note pre-existing issues in the validation report (do not fix them) +```bash +git commit -m "{JIRA-KEY}: address validation review findings" +``` ### Step 7: Acceptance Criteria Verification @@ -286,8 +284,8 @@ Write `.artifacts/e2e/{jira-key}/05-validation-report.md`: ## Quality Review Findings -{Findings from the test quality, maintainability, and safety review. - Distinguish between: +{Findings from the code quality review gate (protocol criteria plus + e2e-specific supplementary criteria). Distinguish between: - Issues fixed during validation (with commit hashes) - Pre-existing issues noted but not fixed If none: "No quality review findings."} diff --git a/implement/skills/validate.md b/implement/skills/validate.md index 16978d0..1df607a 100644 --- a/implement/skills/validate.md +++ b/implement/skills/validate.md @@ -184,52 +184,32 @@ If regressions are found: ### Step 6: Code Quality Review -After automated checks pass, review the story's changes for issues that -automated tooling does not catch. Read the base branch from the `## Branch` -section of `02-plan.md`, then diff against it: +After automated checks pass, review the story's full diff for issues +that automated tooling does not catch. Read the base branch from the +`## Branch` section of `02-plan.md`, then run the self-review gate. -```bash -git diff {base}..HEAD -``` - -Evaluate each area below. Only flag concrete findings — do not pad with -generic observations. - -**Security** - -- No hardcoded secrets, tokens, API keys, or credentials in the diff -- Input validation on all external or user-facing data -- Error messages don't leak sensitive information (stack traces, internal - paths, credentials) -- No injection vectors (SQL, command, path traversal) introduced - -**Performance** +Read and follow `../../_shared/recipes/self-review-gate.md` with these +parameters: -- No unnecessary allocations in hot paths -- Loops bounded — no unbounded iteration over external data -- Resource cleanup: connections, file handles, channels properly closed -- Concurrent work exits cleanly — no leaked goroutines, threads, or async tasks +| Parameter | Value | +|-----------|-------| +| DIFF_COMMAND | `git diff {base}..HEAD` | +| MAX_ROUNDS | `3` | +| CONTEXT_FILES | `.artifacts/implement/{jira-key}/01-context.md`, `.artifacts/implement/{jira-key}/02-plan.md` (if they exist) | +| SUPPLEMENTARY_CRITERIA | This is the full-branch validation review — the last quality gate before PR creation. The reviewer has the complete diff, not a per-task slice. In addition to the standard protocol criteria, evaluate: (1) **Backward compatibility** — does the change modify public APIs, error formats, configuration, or wire protocols? If so, is it backward-compatible or is the break documented and justified? (2) **Completeness across call sites** — if the story introduces a guard, wrapper, or handling pattern in one location, search the codebase for similar patterns needing the same treatment. A pattern applied to 7 of 8 identical call sites is itself a bug. | -**Backward Compatibility** +If the gate reports FLAG (unfixed CRITICAL or HIGH findings), stop and +present the findings to the user before proceeding. -- Does the change modify any public APIs, error formats, configuration - options, or wire protocols? -- If so, is it backward-compatible, or is the breaking change documented - and justified? -- Could this change be reverted without leaving the system in an - inconsistent state? +If the gate made code fixes, commit them before continuing: -**Completeness Across Call Sites** - -- If the story introduces a guard, wrapper, or handling pattern in one - location, search the codebase for similar patterns that need the same - treatment -- A pattern applied to 7 of 8 identical call sites is itself a bug - -If this review surfaces issues: +```bash +git add {fixed files} +``` -1. Fix issues caused by the story's changes, commit each fix separately -2. Note pre-existing issues in the validation report (do not fix them) +```bash +git commit -m "{JIRA-KEY}: address validation review findings" +``` ### Step 7: Acceptance Criteria Verification @@ -333,8 +313,8 @@ Write `.artifacts/implement/{jira-key}/05-validation-report.md`: ## Quality Review Findings -{Findings from the security, performance, backward compatibility, and - completeness review. Distinguish between: +{Findings from the code quality review gate (protocol criteria plus + validate-specific supplementary criteria). Distinguish between: - Issues fixed during validation (with commit hashes) - Pre-existing issues noted but not fixed If none: "No quality review findings."} From 1dc6eecd6e056f5db402e2ea8d4f1860498758da Mon Sep 17 00:00:00 2001 From: Andy Dalton Date: Tue, 12 May 2026 15:36:02 -0400 Subject: [PATCH 05/11] Add independent decomposition review to design /decompose Replace the self-graded 17-item checklist in /decompose with a subagent-based review gate that evaluates decomposition artifacts against the PRD without access to the design document. The reviewer sees only what the artifacts actually say, catching gaps the decomposer's mental model papers over. - Add design/decomposition-review.md with 6 evaluation criteria (requirement coverage, epic structure, story completeness, testing commitment, integration stability, documentation), finding format, severity definitions, and validation rules - Replace decompose Step 7 checklist with Steps 7-12: structural verification, review invocation (subagent with inline fallback), validate-and-assess, re-review loop (max 2 rounds), report, and updated presentation with FLAG handling - Fix self-review-gate.md Step 4 fresh subagent path to include SUPPLEMENTARY_CRITERIA, CONTEXT_FILES, and AGENTS.md/CLAUDE.md Co-Authored-By: Claude Opus 4.6 (1M context) --- _shared/recipes/self-review-gate.md | 5 +- design/decomposition-review.md | 152 ++++++++++++++++++++++++++++ design/skills/decompose.md | 151 ++++++++++++++++++++++----- 3 files changed, 283 insertions(+), 25 deletions(-) create mode 100644 design/decomposition-review.md diff --git a/_shared/recipes/self-review-gate.md b/_shared/recipes/self-review-gate.md index f1f79a4..c47a2f1 100644 --- a/_shared/recipes/self-review-gate.md +++ b/_shared/recipes/self-review-gate.md @@ -135,8 +135,9 @@ If Step 3 produced code changes: original findings and lets it verify they were addressed correctly. **If resumption is not available:** Spawn a new subagent loaded with the - review protocol, the updated diff, and a summary of the previous round's - findings and fixes so it has full context. + review protocol, the updated diff, the project's AGENTS.md/CLAUDE.md, + any CONTEXT_FILES, any SUPPLEMENTARY_CRITERIA, and a summary of the + previous round's findings and fixes so it has full context. **If subagents are not available:** Re-review inline, focusing on the current state of the diff (not just the delta from last round). diff --git a/design/decomposition-review.md b/design/decomposition-review.md new file mode 100644 index 0000000..a756bd9 --- /dev/null +++ b/design/decomposition-review.md @@ -0,0 +1,152 @@ +# Decomposition Review Protocol + +Evaluation criteria, finding format, and assessment model used by the +decomposition review step in the `/decompose` phase. This protocol is +runtime-agnostic: it defines what to evaluate and how to report findings, +whether the reviewer is a subagent or the decomposer performing inline +self-review. + +The reviewer receives the PRD and the decomposition artifacts (epics, +stories, coverage matrix) but intentionally does **not** receive the +design document. This is the independence property: the reviewer +evaluates what the artifacts actually say, not what the decomposer +intended. If a story's acceptance criteria are unclear without +cross-referencing the design document, that is a real finding — the +implementor needs self-contained acceptance criteria. Stories are +designed to be read alongside the design document during implementation +(Implementation Guidance sections may reference design sections by +number), so design-section references in Implementation Guidance are +expected and should not be flagged as a clarity problem. + +## Core Principles + +- **Challenge the decomposition structure, not just the content.** Don't + just verify that each story describes work — question whether the epic + boundaries are right, whether stories are right-sized, and whether the + dependency order would actually produce a working feature. A review + that only checks formatting is a linter with extra steps. +- **Evaluate against the PRD, not against assumed design intent.** The + reviewer intentionally does not receive the design document. Evaluate + whether the decomposition fully addresses the PRD's requirements. If a + requirement appears missing, verify against the PRD before flagging — + the design may have legitimately restructured how a requirement is + addressed. +- **Findings must be actionable.** Every finding must cite a specific + artifact file and section, and include a concrete suggestion the + decomposer can evaluate and apply. Vague observations ("consider + adding more detail") are not findings. +- **Severity must be honest.** CRITICAL and HIGH indicate problems that + would cause incorrect implementation or significant rework. MEDIUM and + LOW are suggestions that improve clarity or structure but are not + blocking. Inflating severity erodes trust; deflating it hides real + problems. +- **Assess on value, not severity.** A LOW finding that genuinely + improves story clarity is worth implementing. A MEDIUM finding that is + a structural preference without real impact is not. Severity indicates + potential damage; value determines whether the finding is worth acting + on. +- **Do not flag design-document cross-references in Implementation + Guidance.** Stories are designed to be read alongside the design + document during implementation. References like "See §4.3 of the + design document" are intentional and expected. Flag instead: acceptance + criteria that require the design document to understand, or + Implementation Guidance that lacks enough direction for an implementor + to start work. + +## Evaluation Criteria + +Evaluate the decomposition against these categories, prioritized by +impact: + +1. **Requirement Coverage** — Does the coverage matrix (`06-coverage.md`) + map every FR and NFR from the PRD? Do story acceptance criteria + collectively cover the PRD's acceptance criteria, not just the FR/NFR + descriptions? Are there stories without PRD traceability — and if so, + are they justified (e.g., infrastructure prerequisites)? Has any + requirement been silently dropped, deferred to "v2", or reduced in + scope? + +2. **Epic Structure** — Is each epic organized around user value, not + technical layers? Does each epic deliver complete functionality + independently? Are T-shirt sizes assigned and reasonable? Is the epic + dependency order documented and logical? Could implementing the epics + in the documented order actually work, or are there hidden + dependencies? + +3. **Story Completeness** — Are acceptance criteria behavioral outcomes + (what the system does, testable from outside), not implementation + specifications? Does each story have a title prefix assigned (`[DEV]`, + `[QE]`, `[UI]`, `[CI]`, `[DOCS]`)? Is the story implementation order + within each epic documented? Are stories right-sized — not so small + they add process overhead, not so large they resist meaningful review? + Does the Implementation Guidance give enough direction to start work? + If a story has more than approximately 8 acceptance criteria, should + it be split? + +4. **Testing Commitment** — Does every `[DEV]` story include both + functionality AND testing? Do `[QE]` stories define concrete e2e test + scenarios that validate user-facing behavior, not duplicate + unit/integration tests from `[DEV]` stories? Does each epic with + e2e-testable surface include at least one `[QE]` story? Does any + story fall back to "manual validation" or "document procedure for QE" + where an automated test could be written? + +5. **Integration Stability** — Would implementing all stories in + dependency order produce a working feature end-to-end? Does any + integration work fall between stories — work that is needed but not + captured in any story? Are story dependencies documented and + complete? Does each story leave the system in a stable state (all + tests passing after merge)? + +6. **Documentation** — Do `[DOCS]` stories include a Documentation + Inputs section? Does that section inventory all user-facing artifacts + from the implementation stories it references? Is the inventory + detailed enough for a documentation author to know what to cover + without reverse-engineering from code diffs? + +## Finding Format + +Each finding must include: + +- **File:** artifact file path (e.g., + `05-stories/epic-1/story-02-add-validation.md`) +- **Section:** section within the artifact (e.g., "Acceptance Criteria", + "Dependencies", "Testing Approach") +- **Severity:** CRITICAL | HIGH | MEDIUM | LOW +- **Category:** one of the evaluation criteria above +- **Issue:** what the problem is +- **Suggestion:** concrete, actionable fix + +Findings that cannot cite a specific file and section in the actual +artifacts must be discarded — they indicate hallucinated references. +Coverage matrix findings should cite `06-coverage.md` and the specific +row or gap. + +## Severity Definitions + +| Severity | Meaning | Action | +|----------|---------|--------| +| CRITICAL | Would cause incorrect implementation — missing requirements, contradictory acceptance criteria, broken dependency chain, scope reduction | Must fix before presenting | +| HIGH | Would cause rework — story too large to review in a single PR, missing testing commitment, integration gap between stories, `[QE]` story duplicating `[DEV]` test scope | Should fix before presenting | +| MEDIUM | Would reduce clarity — vague acceptance criteria, imprecise implementation guidance, sizing concerns, minor coverage matrix inconsistencies | Fix if it adds value | +| LOW | Structural improvements — naming consistency, ordering suggestions, documentation clarity | Fix only if clearly valuable | + +## Validation Rules + +After obtaining a review (whether from a subagent or inline), validate +every finding: + +1. **Verify file references.** Confirm the cited artifact file exists in + the decomposition output (under + `.artifacts/design/{issue-number}/`). +2. **Verify section references.** Confirm the cited section exists in + the artifact file (e.g., an "Acceptance Criteria" heading actually + appears in the story file). +3. **Discard hallucinated findings.** If a finding references an artifact + file or section that does not exist, discard it silently. Do not + present it. +4. **Cross-check against the PRD.** For findings that claim a + requirement is missing or inadequately covered, verify the claim + against the actual PRD. The PRD is the source of truth — if the + reviewer misidentified a requirement or the requirement does not + exist, discard the finding. diff --git a/design/skills/decompose.md b/design/skills/decompose.md index 0711125..40cfa80 100644 --- a/design/skills/decompose.md +++ b/design/skills/decompose.md @@ -305,29 +305,128 @@ Write `.artifacts/design/{issue-number}/06-coverage.md`: If none: "All stories trace to PRD requirements."} ``` -### Step 7: Self-Review - -Before presenting, verify: - -- [ ] Every epic is organized around user value, not technical layers -- [ ] Each epic is standalone — delivers complete functionality independently -- [ ] Every `[DEV]` story includes functionality AND testing (at minimum, comprehensive unit tests) -- [ ] `[QE]` stories define concrete e2e test scenarios — they are not duplicating unit/integration tests from `[DEV]` stories -- [ ] Each epic with e2e-testable surface includes at least one `[QE]` story -- [ ] Story acceptance criteria are behavioral outcomes, not implementation specifications — implementation details are in Implementation Guidance -- [ ] Epic T-shirt sizes are assigned and reasonable -- [ ] Story title prefixes are assigned -- [ ] Coverage matrix shows all PRD requirements mapped -- [ ] No coverage gaps (or gaps are flagged explicitly) -- [ ] Story ACs collectively cover the PRD's acceptance criteria (§5), not just the FR/NFR descriptions -- [ ] Implementing all stories in dependency order would produce a working feature end-to-end — no integration work falls between stories -- [ ] No story falls back to "manual validation" or "document procedure for QE" where an automated test could be written — if the barrier is infrastructure availability (a dependency), the story includes the automated test; the dependency is on where it runs, not whether it exists -- [ ] Epic dependency order is documented -- [ ] Story implementation order within each epic is documented -- [ ] `[DOCS]` stories include a Documentation Inputs section inventorying all user-facing artifacts from implementation stories -- [ ] No scope reduction — every PRD requirement is represented - -### Step 8: Present to User +### Step 7: Verify Artifact Structure + +Quick sanity check before invoking the decomposition review. Verify: + +1. `04-epics.md` exists and references all epic files +2. Each epic file referenced in `04-epics.md` exists (e.g., + `05-stories/epic-1-{slug}.md`) +3. Each epic has a corresponding story directory with story files +4. `06-coverage.md` exists and contains at least one mapping row + +If structural issues are found, fix them before proceeding. Do not +invoke a review on incomplete artifacts. + +### Step 8: Review Decomposition + +Review the decomposition for structural quality and requirement +coverage. This review operates independently from the design document — +the reviewer evaluates what the artifacts actually say, not what the +decomposer intended. + +Read `../decomposition-review.md` for evaluation criteria, finding +format, and severity definitions. + +**If the AI runtime supports subagents**, spawn the review in a +subagent for independence. Load it with: + +- The decomposition review protocol (`../decomposition-review.md`) +- The PRD (`.artifacts/prd/{issue-number}/03-prd.md`) +- All decomposition artifacts: `04-epics.md`, all + `05-stories/epic-{N}-{slug}.md` files, all + `05-stories/epic-{N}/story-{NN}-{slug}.md` files, `06-coverage.md` +- NOT the design document (`03-design.md`) — the reviewer evaluates + the artifacts on their own merits + +Retain the subagent's ID for use in Step 10 — resuming the same +reviewer gives it memory of its previous findings and concerns, +producing more coherent follow-up reviews. + +**If subagents are not available**, perform the review inline. Read the +decomposition review protocol and apply all evaluation criteria. Adopt +the reviewer perspective: evaluate the artifacts as if you did not write +them. Do not let your knowledge of the design document excuse issues +that a reviewer seeing only the PRD and artifacts would flag. + +Both paths use the same evaluation criteria, finding format, and +severity definitions from the protocol. The subagent path provides +stronger independence; the inline path still catches issues by forcing +a perspective shift. + +### Step 9: Validate and Assess Findings + +For each finding from Step 8: + +1. **Validate the reference.** Confirm the cited artifact file and + section exist. Discard any finding that references a file or section + that does not exist. + +2. **Cross-check PRD claims.** For findings that assert a missing + requirement or inadequate coverage, verify the claim against the + actual PRD. Discard findings based on misidentified requirements. + +3. **Assess on value.** Does this finding genuinely improve the + decomposition? Would the artifacts be meaningfully better (clearer + acceptance criteria, more complete coverage, better dependency + ordering, more appropriate sizing) if the change were made? + + - If yes: fix the issue immediately. Re-read the affected artifact + file before modifying it. Verify the fix after applying it. + - If no: note why the finding was dismissed (structural preference, + no real improvement, would add churn without value). + +Only fix findings that add real value. Do not make changes for +structural preferences not grounded in the evaluation criteria. + +### Step 10: Re-Review (if fixes were made) + +If Step 9 produced changes to the decomposition artifacts: + +1. Obtain a re-review of the updated artifacts: + + **If a subagent was used in Step 8 and the runtime supports agent + resumption:** Resume the same reviewer agent. Send it the updated + artifacts and a summary of fixes applied. This gives the reviewer + memory of its original findings and lets it verify they were + addressed correctly. + + **If resumption is not available:** Spawn a new subagent loaded with + the decomposition review protocol, the updated artifacts, the PRD + (but NOT the design document), and a summary of the previous round's + findings and fixes so it has full context. + + **If subagents are not available:** Re-read all decomposition artifact + files before re-reviewing — do not rely on your cached understanding + from the previous round. Evaluate the full artifact set (not just the + delta from the last round). + + The re-review should focus on: + - Whether fixes were applied correctly + - Whether fixes introduced new issues +2. If new issues are found, fix them following the same validate-and- + assess procedure from Step 9 +3. Cap at 2 review-fix rounds total. Decomposition fixes are structural + and less likely than code fixes to need multiple iterations. + +If no fixes were needed in Step 9, the review passes immediately. + +### Step 11: Report Review Summary + +```markdown +## Decomposition Review Summary + +- **Findings:** {total identified} ({N} after validation) +- **Fixed:** {N} findings addressed +- **Dismissed:** {N} findings ({brief rationale for each}) +- **Rounds:** {N} review-fix iterations +- **Gate:** PASS | FLAG + +{If any CRITICAL or HIGH findings remain unfixed, set Gate to FLAG and +list them with their file, section, and issue description.} +``` + +### Step 12: Present to User Present the decomposition and highlight: - Number of epics and stories @@ -335,6 +434,12 @@ Present the decomposition and highlight: - Any coverage gaps - Stories that might need size adjustment (too large or too small) - Any assumptions or judgment calls in the decomposition +- Decomposition review summary (from Step 11) + +If the decomposition review gate reported FLAG, present the unfixed +CRITICAL/HIGH findings and ask the user to decide how to handle them. +These may indicate genuine gaps requiring user judgment — the decomposer +should not resolve them unilaterally. ## Output From ffe9dc1d5b988848ae8c4659f66b39f1492165cf Mon Sep 17 00:00:00 2001 From: Andy Dalton Date: Tue, 12 May 2026 16:13:27 -0400 Subject: [PATCH 06/11] Give [DOCS] stories a dedicated template in design /decompose and /sync MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [DOCS] stories previously shared the same template as [DEV]/[QE]/etc stories, inheriting code-oriented sections (Implementation Guidance, Testing Approach) that are inappropriate for documentation work. This replaces those sections with Documentation Scope (what the reader needs to understand) and Documentation Inputs (inventory of user-facing artifacts to document), keeping [DOCS] stories focused on *what* needs documenting rather than *how* to build it. Documentation Inputs are explicitly framed as guidance, not authority — implementation may diverge from design, so documentation authors should verify against shipped code. Updates the decomposition review protocol and sync phase to recognize and evaluate the [DOCS]-specific template structure. Co-Authored-By: Claude Opus 4.6 (1M context) --- design/decomposition-review.md | 26 +++++---- design/skills/decompose.md | 99 +++++++++++++++++++++++++++------- design/skills/sync.md | 46 +++++++++++++--- 3 files changed, 136 insertions(+), 35 deletions(-) diff --git a/design/decomposition-review.md b/design/decomposition-review.md index a756bd9..168e272 100644 --- a/design/decomposition-review.md +++ b/design/decomposition-review.md @@ -51,7 +51,10 @@ expected and should not be flagged as a clarity problem. design document" are intentional and expected. Flag instead: acceptance criteria that require the design document to understand, or Implementation Guidance that lacks enough direction for an implementor - to start work. + to start work. Note: `[DOCS]` stories use `Documentation Scope` + instead of `Implementation Guidance` — evaluate those sections on + whether they describe what the reader needs to understand, not on + implementation direction. ## Evaluation Criteria @@ -79,7 +82,8 @@ impact: `[QE]`, `[UI]`, `[CI]`, `[DOCS]`)? Is the story implementation order within each epic documented? Are stories right-sized — not so small they add process overhead, not so large they resist meaningful review? - Does the Implementation Guidance give enough direction to start work? + For non-`[DOCS]` stories, does the Implementation Guidance give + enough direction to start work? If a story has more than approximately 8 acceptance criteria, should it be split? @@ -98,11 +102,15 @@ impact: complete? Does each story leave the system in a stable state (all tests passing after merge)? -6. **Documentation** — Do `[DOCS]` stories include a Documentation - Inputs section? Does that section inventory all user-facing artifacts - from the implementation stories it references? Is the inventory - detailed enough for a documentation author to know what to cover - without reverse-engineering from code diffs? +6. **Documentation** — Do `[DOCS]` stories use the `[DOCS]` template + (Documentation Scope and Documentation Inputs instead of + Implementation Guidance and Testing Approach)? Does the Documentation + Scope describe what the reader needs to understand without prescribing + how to write or organize the documentation? Does the Documentation + Inputs section inventory all user-facing artifacts from the + implementation stories it references? Is the inventory detailed enough + for a documentation author to know what to look for in the + implementation? ## Finding Format @@ -111,7 +119,7 @@ Each finding must include: - **File:** artifact file path (e.g., `05-stories/epic-1/story-02-add-validation.md`) - **Section:** section within the artifact (e.g., "Acceptance Criteria", - "Dependencies", "Testing Approach") + "Dependencies", "Testing Approach", "Documentation Scope") - **Severity:** CRITICAL | HIGH | MEDIUM | LOW - **Category:** one of the evaluation criteria above - **Issue:** what the problem is @@ -128,7 +136,7 @@ row or gap. |----------|---------|--------| | CRITICAL | Would cause incorrect implementation — missing requirements, contradictory acceptance criteria, broken dependency chain, scope reduction | Must fix before presenting | | HIGH | Would cause rework — story too large to review in a single PR, missing testing commitment, integration gap between stories, `[QE]` story duplicating `[DEV]` test scope | Should fix before presenting | -| MEDIUM | Would reduce clarity — vague acceptance criteria, imprecise implementation guidance, sizing concerns, minor coverage matrix inconsistencies | Fix if it adds value | +| MEDIUM | Would reduce clarity — vague acceptance criteria, imprecise implementation guidance or documentation scope, sizing concerns, minor coverage matrix inconsistencies | Fix if it adds value | | LOW | Structural improvements — naming consistency, ordering suggestions, documentation clarity | Fix only if clearly valuable | ## Validation Rules diff --git a/design/skills/decompose.md b/design/skills/decompose.md index 40cfa80..b54d1f9 100644 --- a/design/skills/decompose.md +++ b/design/skills/decompose.md @@ -165,6 +165,8 @@ mkdir -p .artifacts/design/{issue-number}/05-stories/epic-{N} Write each story to `.artifacts/design/{issue-number}/05-stories/epic-{N}/story-{NN}-{slug}.md`: +**For `[DEV]`, `[UI]`, `[UX]`, `[QE]`, and `[CI]` stories:** + ```markdown # Story {N}.{NN}: [{prefix}] {title} @@ -207,6 +209,51 @@ PRD Requirements: {FR-1, NFR-1} Design section: {§4.3 API Changes, or specific subsection} ``` +**For `[DOCS]` stories** (see `[DOCS]` story requirements below for the +Documentation Inputs section): + +```markdown +# Story {N}.{NN}: [DOCS] {title} + +**As a** {role}, +**I want to** {capability}, +**So that** {benefit}. + +## Acceptance Criteria + +- [ ] {documentation outcome — what the reader can learn or do, verifiable by reading the documentation} +- [ ] {documentation outcome} + +## Documentation Scope + +{Which documentation surfaces this story affects and why. Focus on what + the reader needs to understand — not how to write it or where to put it. + Those decisions belong to the documentation workflow or author who picks + up the ticket. + + Example: "This feature introduces a new CLI command for managing device + rollbacks. Users need to understand when to use rollback vs. revert, + the required preconditions, and how to verify success."} + +## Documentation Inputs + +{Inventories every user-facing artifact introduced or changed by the + feature, grouped by the implementation story that delivers it. + See the Documentation Inputs requirements below for format details.} + +## Dependencies + +{Comma-separated list using the canonical format: "Story 1.01, Story 1.02" + — always "Story" + space + epic.story number. Or "None" if no dependencies. + This format is required for reliable reference resolution during /sync.} + +## Design Reference + +Epic: Epic {N} — {epic title} +PRD Requirements: {FR-1, NFR-1} +Design section: {§4.3 API Changes, or specific subsection} +``` + **Naming conventions:** - Story number is zero-padded: `story-01`, `story-02` - Slug is kebab-case derived from the story title: @@ -220,20 +267,29 @@ the work: - `[UX]` — user experience design work - `[QE]` — standalone e2e test automation, manual test execution, or negative test coverage owned by the QE team - `[CI]` — CI/CD infrastructure and pipeline work -- `[DOCS]` — documentation; requires a **Documentation Inputs** section (see below) +- `[DOCS]` — documentation; uses a dedicated template (see above) with **Documentation Scope** and **Documentation Inputs** sections If unsure of the correct prefix, use `[DEV]` as the default and note it for user review. -**`[DOCS]` story requirements:** `[DOCS]` stories must include a -**Documentation Inputs** section (added between Testing Approach and -Dependencies) that inventories every user-facing artifact introduced or -changed by the feature, grouped by the implementation story that delivers it. - -At design time you have the complete picture of what's new and what changed. -Capture it so the documentation author (human or AI) gets a concrete -checklist of what to cover — rather than reverse-engineering the user-visible -surface from code diffs after implementation. +**`[DOCS]` story requirements:** `[DOCS]` stories use a different template +than other story types (see above). They replace `Implementation Guidance` +and `Testing Approach` with `Documentation Scope` and `Documentation Inputs`. + +The **Documentation Scope** section describes what the reader needs to +understand about the feature — not how to write the docs or where to put +them. Those decisions belong to the documentation author or workflow that +picks up the ticket. See the template example above for illustration. + +The **Documentation Inputs** section inventories every user-facing artifact +introduced or changed by the feature, grouped by the implementation story +that delivers it. At design time you have the complete picture of what's +new and what changed. Capture it so the documentation author (human or AI) +gets a concrete starting point — rather than reverse-engineering the +user-visible surface from code diffs after implementation. Note that +implementation may diverge from the design, so these inputs are guidance, +not authority. The documentation author should verify against the actual +implementation. Group items under bold story headers (e.g., **Story 1.01 — {title}:**) followed by bullet points listing each user-facing artifact: new or changed @@ -256,19 +312,24 @@ the last epic. combining it with an adjacent story. - Are the acceptance criteria behavioral outcomes (what the system does), not implementation specifications (method names, internal structure)? - Implementation details belong in Implementation Guidance. -- Does the Implementation Guidance give enough direction for an AI agent - or developer to build the right thing without cross-referencing the - design document for every decision? -- Does the Implementation Guidance reference code by identifier (function - names, type names, struct names), not by line number? Line numbers - shift as the codebase evolves between design time and implementation - time. Identifiers are robust. + For non-`[DOCS]` stories, implementation details belong in + Implementation Guidance. For `[DOCS]` stories, implementation details + belong in Documentation Scope. +- For non-`[DOCS]` stories: does the Implementation Guidance give enough + direction for an AI agent or developer to build the right thing without + cross-referencing the design document for every decision? +- For non-`[DOCS]` stories: does the Implementation Guidance reference + code by identifier (function names, type names, struct names), not by + line number? Line numbers shift as the codebase evolves between design + time and implementation time. Identifiers are robust. - If a story has more than ~8 acceptance criteria, examine whether it can be split along a natural boundary. +- For `[DOCS]` stories: does the Documentation Scope describe what the + reader needs to understand without prescribing how to write or organize + the documentation? - For `[DOCS]` stories: does the Documentation Inputs section list every user-facing artifact from the implementation stories, with enough detail - for a documentation author to know what to cover? + for a documentation author to know what to look for in the implementation? **Filename stability:** Story filenames (e.g., `story-01-scaffold-build-pipeline.md`) serve as idempotency keys in the sync manifest. Once stories have been synced diff --git a/design/skills/sync.md b/design/skills/sync.md index e4d0047..2417734 100644 --- a/design/skills/sync.md +++ b/design/skills/sync.md @@ -22,7 +22,7 @@ creating, and always get explicit user approval. - **Idempotent.** Track what was created in a manifest. If re-run, only create new items. - **Create only — never update or delete.** Once Jira issues are created, they evolve independently — developers add implementation notes, QA adds test details, PMs adjust criteria. Pushing file content back to Jira would clobber those additions. If the decomposition is revised after sync, `/revise` will tell the user exactly which Jira issues need manual updates. - **Link to source.** Every Jira issue description references the design document. -- **Jira-native references.** Local identifiers (`Story 1.01`, `Epic 1`) have meaning only within the `.artifacts/` directory. When constructing Jira issue descriptions, resolve all local references (in Dependencies, Design Reference, and any other cross-references) to Jira issue keys using the sync manifest. Jira is the source of truth — readers of a Jira issue should never need to decode a local artifact numbering scheme. +- **Jira-native references.** Local identifiers (`Story 1.01`, `Epic 1`) have meaning only within the `.artifacts/` directory. When constructing Jira issue descriptions, resolve all local references (in Dependencies, Documentation Inputs, Design Reference, and any other cross-references) to Jira issue keys using the sync manifest. Jira is the source of truth — readers of a Jira issue should never need to decode a local artifact numbering scheme. ## Reference Resolution @@ -210,6 +210,8 @@ For each story under each epic, create a Jira issue: - **Summary:** `[{prefix}] {story title}` - **Description:** +**For `[DEV]`, `[UI]`, `[UX]`, `[QE]`, and `[CI]` stories:** + ```markdown ## User Story @@ -242,15 +244,45 @@ PRD Requirements: {requirement IDs} Design section: {§reference} ``` -**`[DOCS]` stories:** For stories with a `[DOCS]` prefix, include a -`## Documentation Inputs` section (between Testing Approach and -Dependencies) containing the documentation inputs from the story file. -Resolve story references in the documentation inputs to Jira keys -(`**Story 1.01 — {title}:**` → `**EDM-XXXX — {title}:**`). +**For `[DOCS]` stories:** + +```markdown +## User Story + +**As a** {role}, +**I want to** {capability}, +**So that** {benefit}. + +## Acceptance Criteria + +{acceptance criteria from story file} + +## Documentation Scope + +{documentation scope from story file} + +## Documentation Inputs + +{documentation inputs from story file, with story references resolved to + Jira keys: "**Story 1.01 — {title}:**" → "**EDM-XXXX — {title}:**"} + +## Dependencies + +{dependencies from story file, with local references resolved to Jira keys + per the Reference Resolution section} + +## Design Reference + +Design document: {link to design doc PR or file} +Epic: {epic jira key} +PRD Requirements: {requirement IDs} +Design section: {§reference} +``` **Reference resolution:** Before submitting the description, resolve all local identifiers to Jira keys (see Reference Resolution). This applies to -Dependencies (`Story 1.01` → `EDM-XXXX`) and Design Reference +Dependencies (`Story 1.01` → `EDM-XXXX`), Documentation Inputs +(`Story 1.01` → `EDM-XXXX`), and Design Reference (`Epic 1` → `EDM-YYYY`). After creating each story, verify the created issue has `parent.key` From b08b6378249a079d7d88812ab3c1560a98955ad8 Mon Sep 17 00:00:00 2001 From: Andy Dalton Date: Tue, 12 May 2026 16:22:49 -0400 Subject: [PATCH 07/11] Scope Implementation Guidance reference in guidelines.md to non-DOCS stories MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Same consistency fix applied to decompose.md quality checks and decomposition-review.md — guidelines.md was missed. Co-Authored-By: Claude Opus 4.6 (1M context) --- design/guidelines.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/guidelines.md b/design/guidelines.md index c25bdf4..79ddce7 100644 --- a/design/guidelines.md +++ b/design/guidelines.md @@ -39,7 +39,7 @@ - Follow the design document template (`templates/design.md`). Do not invent new sections or omit existing ones without user approval. Sections with no impact should say so explicitly (e.g., "No schema changes required") rather than being omitted. - Follow the section guidance (`templates/section-guidance.md`) for content standards. - Design-scoped goals are **implementation constraints**, not product outcomes. They complement — not duplicate — the PRD's goals. -- Acceptance criteria must be **behavioral outcomes** (what the system does, testable from outside), not activities or implementation specifications. Implementation details belong in Implementation Guidance. +- Acceptance criteria must be **behavioral outcomes** (what the system does, testable from outside), not activities or implementation specifications. For non-`[DOCS]` stories, implementation details belong in Implementation Guidance. For `[DOCS]` stories, implementation details belong in Documentation Scope. - Open questions must have owners and impact statements. ## Escalation From 2ec15cbb70c1ef2e20e4f6d6984dedf36dfa84eb Mon Sep 17 00:00:00 2001 From: Andy Dalton Date: Tue, 12 May 2026 16:25:37 -0400 Subject: [PATCH 08/11] Fix README decomposition constraints to reflect DOCS story template "Every story includes functionality and testing" was only true for [DEV] stories. Scoped it to [DEV] and added a line noting [DOCS] stories use their own template. Co-Authored-By: Claude Opus 4.6 (1M context) --- design/README.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/design/README.md b/design/README.md index 7c1e79f..df81ef3 100644 --- a/design/README.md +++ b/design/README.md @@ -127,12 +127,13 @@ The decomposition follows the team's Jira hierarchy: - **Feature** (exists in Jira — input to this workflow) - **Epic** — user-value oriented, standalone, T-shirt sized - - **Story** — right-sized, includes functionality + testing, prefixed with `[DEV]`/`[UI]`/`[UX]`/`[QE]`/`[DOCS]`/`[CI]` + - **Story** — right-sized, prefixed with `[DEV]`/`[UI]`/`[UX]`/`[QE]`/`[DOCS]`/`[CI]` Key constraints: - Each epic delivers complete functionality independently - Each story leaves the system in a stable state (CI/CD to main) -- Every story includes both functionality and testing (no deferred test stories) +- Every `[DEV]` story includes both functionality and testing (no deferred test stories) +- `[DOCS]` stories use a dedicated template with Documentation Scope and Documentation Inputs instead of Implementation Guidance and Testing Approach - Tests validate the software's contract, not its implementation — use test types appropriate to the change (unit, integration, e2e) - A coverage matrix ensures all PRD requirements are addressed From fe7efce74ef9c088f931d08f4c2fbebe7198245c Mon Sep 17 00:00:00 2001 From: Andy Dalton Date: Tue, 12 May 2026 16:33:46 -0400 Subject: [PATCH 09/11] Re-run checks after self-review gate fixes before committing The self-review gate can modify code, but all three workflows (e2e /code, implement /code, implement /validate) committed the fixes without re-running tests and lint. Now each re-runs the task-scoped checks to verify the post-fix state before committing. Co-Authored-By: Claude Opus 4.6 (1M context) --- e2e/skills/code.md | 7 ++++--- implement/skills/code.md | 8 +++++--- implement/skills/validate.md | 4 +++- 3 files changed, 12 insertions(+), 7 deletions(-) diff --git a/e2e/skills/code.md b/e2e/skills/code.md index 30900c2..6eb35bc 100644 --- a/e2e/skills/code.md +++ b/e2e/skills/code.md @@ -251,9 +251,10 @@ parameters: If the gate reports FLAG (unfixed CRITICAL or HIGH findings), stop and present the findings to the user before committing. -If the gate made code fixes, re-stage the affected files before -proceeding to commit. Note any dismissed findings in the implementation -report (Discoveries section). +If the gate made code fixes, re-stage the affected files, then re-run +the task-scoped tests (Step 3c) and fast quality checks (Step 3d) to +verify the fixes. Only proceed to commit once checks pass. Note any +dismissed findings in the implementation report (Discoveries section). #### 3f: Commit diff --git a/implement/skills/code.md b/implement/skills/code.md index d5e3529..18fcf55 100644 --- a/implement/skills/code.md +++ b/implement/skills/code.md @@ -251,9 +251,11 @@ parameters: If the gate reports FLAG (unfixed CRITICAL or HIGH findings), stop and present the findings to the user before committing. -If the gate made code fixes, re-stage the affected files before -proceeding to commit. Note any dismissed findings in the implementation -report (Discoveries section) so there is a paper trail. +If the gate made code fixes, re-stage the affected files, then re-run +the task-scoped tests (Step 3c) and fast quality checks (Step 3d) to +verify the fixes. Only proceed to commit once checks pass. Note any +dismissed findings in the implementation report (Discoveries section) +so there is a paper trail. #### 3g: Commit diff --git a/implement/skills/validate.md b/implement/skills/validate.md index 1df607a..6f2532c 100644 --- a/implement/skills/validate.md +++ b/implement/skills/validate.md @@ -201,7 +201,9 @@ parameters: If the gate reports FLAG (unfixed CRITICAL or HIGH findings), stop and present the findings to the user before proceeding. -If the gate made code fixes, commit them before continuing: +If the gate made code fixes, re-run the affected pre-PR checks from +Step 3 (and Step 4 coverage analysis if the fixes touch covered +packages) to verify the post-fix state. Once checks pass, commit: ```bash git add {fixed files} From 1a7ebc563ddf902a96c6032e6f14095685201fd9 Mon Sep 17 00:00:00 2001 From: Andy Dalton Date: Wed, 13 May 2026 09:58:31 -0400 Subject: [PATCH 10/11] Raise validation cap to 15 and add minimize-actions principle to e2e MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The cap of 8 validations per consolidated scenario was too tight — 52% of existing flightctl e2e tests exceed it (median is 9). Raised to 15, which covers 75% of existing tests. Added a minimize-actions-per-scenario principle to e2e guidelines: prefer the fewest state-changing actions needed to reach the condition under test, so failures are diagnosable. Updated the plan template to express single-action preference instead of normalizing multi-step sequences. Co-Authored-By: Claude Opus 4.6 (1M context) --- e2e/guidelines.md | 3 ++- e2e/skills/plan.md | 7 ++++--- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/e2e/guidelines.md b/e2e/guidelines.md index c6bc8d0..8cbdfce 100644 --- a/e2e/guidelines.md +++ b/e2e/guidelines.md @@ -4,7 +4,8 @@ - The e2e tests must validate the **user-facing behaviors** described in the story's acceptance criteria. Each AC maps to one or more concrete test scenarios. - **E2e tests exercise the system from the outside.** They validate observable outcomes through the project's test infrastructure, not internal component contracts. Write tests that a QE engineer would write — scenario-driven, using the project's actual tools and infrastructure. -- **Consolidate scenarios that share expensive setup and actions.** E2e setup (provisioning, navigation, waiting for state) and actions (API calls, UI interactions) dominate test execution time. When multiple scenarios share the same given+when, merge their validations into a single test. Repeating expensive setup+action just to check one more assertion is the single largest contributor to avoidable e2e suite bloat. **Cap consolidated scenarios at 8 validations** — beyond this, failure output becomes difficult to triage. If merging would exceed the cap, split by validation category into separate tests. +- **Consolidate scenarios that share expensive setup and actions.** E2e setup (provisioning, navigation, waiting for state) and actions (API calls, UI interactions) dominate test execution time. When multiple scenarios share the same given+when, merge their validations into a single test. Repeating expensive setup+action just to check one more assertion is the single largest contributor to avoidable e2e suite bloat. **Cap consolidated scenarios at 15 validations** — beyond this, failure output becomes difficult to triage. If merging would exceed the cap, split by validation category into separate tests. +- **Minimize actions per scenario.** Each scenario should use the fewest state-changing actions needed to reach the condition under test — ideally one. When a test fails after five actions, it's unclear which action caused the failure. Multi-action sequences are acceptable only when the desired state cannot be reached in fewer steps (e.g., create-then-update workflows). If a scenario requires many actions, consider whether it's testing a workflow or accidentally bundling independent behaviors. - **Scope is e2e only.** Do not consider, plan, or write unit or integration tests. Every test this workflow produces must exercise the system end-to-end. - **Follow the project's existing e2e test patterns.** Read the most similar existing test suite before writing new tests. Match the test infrastructure usage, lifecycle hooks, naming conventions, labels, and assertion style. - Follow the **project's commit format** as discovered during `/ingest` and recorded in the validation profile. Commit one logical unit of work per commit — typically one commit per plan task. Don't batch everything into a single commit, but don't create a commit per file either. diff --git a/e2e/skills/plan.md b/e2e/skills/plan.md index 66b7b1b..07fe37f 100644 --- a/e2e/skills/plan.md +++ b/e2e/skills/plan.md @@ -61,7 +61,7 @@ plan, consolidate: 2. **Merge validations.** For each group, combine the validation blocks into a single consolidated scenario. Tag each validation with its source AC (e.g., `[AC-1]`, `[AC-3]`) to preserve traceability. -3. **Apply the cap.** A consolidated scenario must not exceed **8 +3. **Apply the cap.** A consolidated scenario must not exceed **15 validations** — beyond this, failure output becomes difficult to triage. If merging would exceed this, split by validation category (e.g., state correctness vs. side effects vs. error conditions). @@ -136,8 +136,9 @@ Write `.artifacts/e2e/{jira-key}/02-plan.md` with this structure: - **Labels/tags:** {using the project's label convention} - **Setup (given):** {shared setup — what the test needs beyond suite-level per-test setup} - **Action (when):** - 1. {first action using test infrastructure methods} - 2. {subsequent action — multi-step sequences are common in e2e} + 1. {action using test infrastructure methods — prefer a single + state-changing action; use multiple only when the desired state + cannot be reached in fewer steps} - **Validations (then):** 1. [AC-1] {assertion — use the project's assertion/polling style} 2. [AC-1] {assertion} From b6938793b6e9cecdc7e45b6f60f0d1e1c6b3fcf5 Mon Sep 17 00:00:00 2001 From: Andy Dalton Date: Wed, 13 May 2026 10:53:51 -0400 Subject: [PATCH 11/11] Fix stale 8-validation cap refs, add re-run checks to e2e publish MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Updated remaining references to the old 8-validation cap in e2e/skills/plan.md (two places) and e2e/skills/revise.md to 15. Added re-run of validation checks after gate fixes in e2e/skills/publish.md — same pattern fixed in code.md and validate.md earlier. Added scenario identifier uniqueness checks to plan.md self-review checklist and revise.md consistency review. Co-Authored-By: Claude Opus 4.6 (1M context) --- e2e/skills/plan.md | 5 +++-- e2e/skills/publish.md | 3 ++- e2e/skills/revise.md | 3 ++- 3 files changed, 7 insertions(+), 4 deletions(-) diff --git a/e2e/skills/plan.md b/e2e/skills/plan.md index 07fe37f..e02c13e 100644 --- a/e2e/skills/plan.md +++ b/e2e/skills/plan.md @@ -164,7 +164,7 @@ Write `.artifacts/e2e/{jira-key}/02-plan.md` with this structure: For each consolidated scenario, note which raw scenarios were merged, why (shared setup+action), and the validation count. Also note any scenarios intentionally kept separate despite similar setup (e.g., - merging would exceed the 8-validation cap).} + merging would exceed the 15-validation cap).} | Consolidated Scenario | Merged From | Validation Count | Rationale | |-----------------------|-------------|------------------|-----------| @@ -269,8 +269,9 @@ Before presenting the plan, verify: - [ ] No scenarios require environment capabilities not present in the project - [ ] Task count is reasonable — if you have more than 8 tasks, consider whether the story needs re-scoping - [ ] Scenarios sharing setup+action are consolidated, not duplicated as separate tests -- [ ] No consolidated scenario exceeds 8 validations +- [ ] No consolidated scenario exceeds 15 validations - [ ] Each validation in a consolidated scenario is tagged with its source AC +- [ ] Scenario identifiers and titles are unique across the plan (no duplicate C#/S# or repeated names) - [ ] The plan is achievable — no scenarios depend on unmerged features or unavailable test infrastructure methods ### Step 6: Present to User diff --git a/e2e/skills/publish.md b/e2e/skills/publish.md index da2f582..127ab01 100644 --- a/e2e/skills/publish.md +++ b/e2e/skills/publish.md @@ -83,7 +83,8 @@ If the gate reports FLAG (unfixed CRITICAL or HIGH findings), stop and present the findings to the user. Do not proceed until the user decides how to handle them. -If the gate made code fixes, commit them before proceeding: +If the gate made code fixes, re-run the validation profile's required +checks to verify the post-fix state. Once checks pass, commit: ```bash git add {fixed files} diff --git a/e2e/skills/revise.md b/e2e/skills/revise.md index 47a3c7f..7cd3457 100644 --- a/e2e/skills/revise.md +++ b/e2e/skills/revise.md @@ -92,7 +92,8 @@ After applying changes, verify: - Are labels consistent with the project's conventions? - Does the AC coverage matrix reflect the current scenario mapping? - Are there new consolidation opportunities (added scenarios sharing setup+action with existing ones)? -- Do consolidated scenarios still respect the 8-validation cap after changes? +- Do consolidated scenarios still respect the 15-validation cap after changes? +- Are scenario identifiers and titles unique across the plan (no duplicate C#/S# or repeated names)? - Does the Scenario Consolidation table still match the current scenario list? - Are commit messages still properly formatted?