-
-
Notifications
You must be signed in to change notification settings - Fork 50
feat(review): Cursor review backend (cursor-agent CLI) — fn-74 #184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
gmickel
wants to merge
36
commits into
main
Choose a base branch
from
fn-74-cursor-review-backend-cursor-agent-cli
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+6,107
−1,793
Open
Changes from 28 commits
Commits
Show all changes
36 commits
Select commit
Hold shift + click to select a range
1100568
plan(fn-74): decompose Cursor review backend into 4 tasks — codex pla…
gmickel 0d3aba1
feat(review): cursor backend foundation — registry + run_cursor_exec …
gmickel af7008a
chore(flow): fn-74.1 done-summary + evidence (codex impl-review SHIP)
gmickel 6818a07
feat(review): cursor review commands — impl/plan/completion/validate/…
gmickel 0dd4205
chore(flow): fn-74.2 done-summary + evidence (codex impl-review SHIP)…
gmickel f28bd0b
feat(review): cursor backend skill+setup wiring + codex mirror — work…
gmickel 3d58677
chore(flow): fn-74.3 done-summary + evidence (codex impl-review SHIP)
gmickel 5ac94f6
docs(review): cursor review backend — flowctl.md/README/GLOSSARY/CHAN…
gmickel 1b8a601
docs(review): note downstream doc-chain coverage in CHANGELOG cursor …
gmickel e38a250
docs(review): fix stale review.backend config enum + setup usage temp…
gmickel a468214
docs(review): teams.md backend enumeration adds cursor — fn-74.4
gmickel 5d14bae
chore(review): codex-mirror regen + dogfood usage.md parity for curso…
gmickel ca3511d
chore(flow): fn-74.4 done-summary + evidence + review-backend sweep m…
gmickel 8b7d829
fix(review): cursor completion-review fixes (is_error + --spec guard …
gmickel b7b9928
fix(review): impl-review/plan-review command hints → rp|codex|copilot…
gmickel 5dbe249
chore(flow): fn-74 completion-review SHIP (codex)
gmickel 6e502a9
chore(flow): drop incidental .flow/config.json drift — local review.b…
gmickel 6ae2f7e
fix(review): address PR #184 codex review (2× P2) — fn-74
gmickel 46864c8
perf(review): cursor reviews read files from disk, never embed conten…
gmickel 136b1e9
perf(review): codex + copilot also read files from disk, never embed …
gmickel 520edca
chore(review): remove dead embed helper + modernize copilot for CLI 1…
gmickel be51d52
chore(review): post-review cleanup — drop dead embed flag/vars, fix c…
gmickel 472bfcf
fix(review): codex/copilot coerce config-default to command backend —…
gmickel 7f475d8
fix(review): address codex-bot PR #184 findings — cursor/codex/copilo…
gmickel a8b6847
fix(review): reserve cursor re-review preamble in argv budget + omit …
gmickel 3a8d809
fix(work): propagate the resolved review backend to the worker's impl…
gmickel 513caf1
fix(review): wire cursor into Ralph init + fix copilot auth-check pro…
gmickel 40bef68
fix(ralph): enforce review-receipt gate for cursor backend — fn-74
gmickel b48a4dc
fix(review): general argv-budget backstop for cursor prompts (spec/ta…
gmickel 5511faf
fix(review): cursor prompt-budget off-by-one at exactly the argv cap …
gmickel 7556a71
fix(review): backstop cursor validator + deep-pass prompts under the …
gmickel 445247e
chore(flow): mark fn-74 task definitions done — fn-74
gmickel 3e25606
fix(review): per-spec default_review applies to spec-scoped plan/comp…
gmickel 5b7efed
fix(review): cursor coerces ANY non-cursor resolved spec, not just en…
gmickel c38f1dc
fix(review): cursor never hands the reviewer an empty diff without a …
gmickel 47068f9
feat(review): always-on code-smell baseline + tightened rubric — shar…
gmickel File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
32 changes: 32 additions & 0 deletions
32
.flow/memory/bug/integration/adding-a-review-backend-sweep-all-2026-06-29.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| --- | ||
| title: "Adding a review backend: sweep ALL enumeration sites (config table, stage list, " | ||
| date: "2026-06-29" | ||
| track: bug | ||
| category: integration | ||
| module: "plugins/flow-next/docs, plugins/flow-next/scripts/flowctl.py" | ||
| tags: [review-backend, enumeration-drift, docs-sweep, cursor, fn-74] | ||
| problem_type: integration | ||
| symptoms: "codex impl-review NEEDS_WORK x3: each round found another stale rp/codex/copilot enum missing the new backend" | ||
| root_cause: "review-backend enumerations are scattered across many non-obvious sites (config tables, stage lists, setup templates, vault notes); several already omitted copilot, so a new backend exposes them as contradictions" | ||
| resolution_type: fix | ||
| --- | ||
|
|
||
| ## Problem | ||
| Adding a 4th cross-model review backend (`cursor`, fn-74) and doing the "docs sweep" task, codex impl-review went NEEDS_WORK three times — each round surfaced ANOTHER stale backend-enumeration site the obvious prose lists had missed. The enumerations live in many non-obvious places, and several already omitted even the *previous* backend (`copilot`), so they read as contradictions the moment you add the new one. | ||
|
|
||
| ## What Didn't Work | ||
| Updating only the visible "RepoPrompt / Codex / Copilot" prose lists (README adversarial-gates row, GLOSSARY cross-model-review line, the impl-review command row). That left contradictory enumerations elsewhere in the SAME files the reviewer flagged as introduced findings. | ||
|
|
||
| ## Solution | ||
| Sweep ALL of these enumeration sites when adding a review backend (the ones missed in fn-74, in flag order): | ||
| - `docs/flowctl.md`: the command list (~L14), the new `### <backend>` section (mirror copilot), the `review-backend` spec-grammar example (~L647), AND the **config-table `review.backend` row** (~L597) + the `config set` example comment (~L583) — these two were stale at `rp, codex, none` (omitted copilot too). | ||
| - `docs/teams.md`: BOTH the "RepoPrompt / Codex / Copilot" prose (×2) AND the **stage-[6] `Backends: rp, codex, copilot, none` exhaustive list** (~L171). | ||
| - `docs/skills.md`: the plan-review row's `(rp/codex/copilot)`. | ||
| - `skills/flow-next-setup/templates/usage.md`: the `review.backend # rp|codex|copilot|none` comment (~L165). | ||
| - Vault (`~/Documents/GordonsVault/.../flow-next - *.md`): Vocabulary backends line, Skills Catalog plan-review row, Lifecycle handover-#5 line, Architecture cmd list, **Release Timeline** (watch for a concurrent release-doc agent leaving a DUPLICATE row — dedupe). | ||
| - Downstream repos: flow-next.dev (`review/workflow` table + `--review` examples + spec-form note, `review/receipts` mode field, `releases/changelog`), AI×SDLC (`guides/flow-next.md` backend list + `code-review-tools-changelog.md`), GF (`spec/05-cross-model-review.md` + re-render `dist/*.html` + the bundled `code-factory-onboarding.html`). | ||
|
|
||
| NOTE: codex impl-review READS the vault file via its absolute path (flagged the duplicate/stale Release Timeline row) — downstream repo files in OTHER git repos are not in the diff, but vault notes referenced by absolute path are visible to it. | ||
|
|
||
| ## Prevention | ||
| Before committing a review-backend docs task, run `grep -rniE "rp.{0,3}codex.{0,3}copilot|rp, codex|review.backend" docs/ skills/ README.md GLOSSARY.md | grep -vi <new-backend>` and confirm every hit is either a per-backend section header, a host-platform mention (Codex/Copilot/Droid as *drivers*), or a deliberately-scoped recommendation — never a stale exhaustive enumeration. Same shape as the tracker-adapter sweep (see related entry). |
16 changes: 8 additions & 8 deletions
16
.flow/specs/fn-74-cursor-review-backend-cursor-agent-cli.json
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
14 changes: 14 additions & 0 deletions
14
.flow/tasks/fn-74-cursor-review-backend-cursor-agent-cli.1.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| { | ||
| "assignee": null, | ||
| "claim_note": "", | ||
| "claimed_at": null, | ||
| "created_at": "2026-06-29T11:35:58.566755Z", | ||
| "depends_on": [], | ||
| "id": "fn-74-cursor-review-backend-cursor-agent-cli.1", | ||
| "priority": null, | ||
| "spec": "fn-74-cursor-review-backend-cursor-agent-cli", | ||
| "spec_path": ".flow/tasks/fn-74-cursor-review-backend-cursor-agent-cli.1.md", | ||
| "status": "todo", | ||
|
gmickel marked this conversation as resolved.
Outdated
|
||
| "title": "flowctl cursor backend foundation \u2014 registry + run_cursor_exec + check + parser tests", | ||
| "updated_at": "2026-06-29T11:44:49.065046Z" | ||
| } | ||
47 changes: 47 additions & 0 deletions
47
.flow/tasks/fn-74-cursor-review-backend-cursor-agent-cli.1.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,47 @@ | ||
| --- | ||
| satisfies: [R1, R2, R3, R4, R11] | ||
| --- | ||
|
|
||
| ## Description | ||
|
|
||
| Foundation of the `cursor` review backend in flowctl — the registry entry, the helper trio, the `cursor check` subcommand, and the parser/run-exec unit tests. **This is the early proof point:** it validates the `cursor-agent` contract (run_cursor_exec parses `.result`/`.session_id`/`.is_error`, read-only `--mode ask`, resume-only session) and confirms the existing `BackendSpec`/registry already accept the model-yes/effort-no shape with **zero parser changes** (verified during spec smoke-tests). | ||
|
|
||
| **Size:** M | ||
| **Files:** `plugins/flow-next/scripts/flowctl.py`, `plugins/flow-next/tests/test_cursor_run_exec.py` (new), `plugins/flow-next/tests/test_backend_spec.py` | ||
|
|
||
| ## Approach | ||
|
|
||
| - Add `"cursor"` to `BACKEND_REGISTRY` after the copilot entry — `models` set (`auto`, `gpt-5.5-high`, `gpt-5.4-high`, `gpt-5.3-codex(-high/-xhigh)`, `gpt-5.2`, `composer-2.5`, `claude-opus-4-8-thinking-high`, `claude-opus-4-7-thinking-high`), `efforts: None`, `default_model: "gpt-5.5-high"`. `VALID_BACKENDS` derives. | ||
| - Mirror `require_copilot` / `get_copilot_version` / `run_copilot_exec` → `require_cursor` / `get_cursor_version` / `run_cursor_exec`. Invocation: `cursor-agent -p --output-format json --trust --mode ask --model <m> [--resume <sid>]`, run with `cwd=repo_root`, `timeout=600`. `session_id` is an **optional input** (None ⇒ omit `--resume`, capture the returned id; non-None ⇒ `--resume <id>`). Parse `.result`/`.session_id`/`.is_error`; non-zero exit on `is_error`/timeout/CLI failure. | ||
| - **Prompt delivery is positional argv** (cursor-agent takes the prompt as a positional arg, NOT stdin). Up to a threshold, pass positionally. **Above the threshold, raise an explicit "prompt too large" error** — do NOT copy copilot's temp-file step (it just reads the file back into argv and bypasses no cap; cursor-agent stdin is unconfirmed). A stdin path is added only if cursor-agent confirms stdin input. | ||
| - **Do NOT copy `run_copilot_exec`'s `--effort`/`claude-`-drop logic** — cursor folds effort into the model name and takes no `--effort` flag. | ||
| - Add `cursor check [--skip-probe]` subparser + `cmd_cursor_check` returning `{available, version, authed}` (text + `--json`), schema-aligned to copilot's `check`. | ||
|
|
||
| ## Investigation targets | ||
|
|
||
| **Required:** | ||
| - `plugins/flow-next/scripts/flowctl.py:3416-3477` — `BACKEND_REGISTRY` + `VALID_BACKENDS` | ||
| - `plugins/flow-next/scripts/flowctl.py:3753`,`:3761`,`:3798` — `require_copilot` / `get_copilot_version` / `run_copilot_exec` (the template; note its argv-vs-temp + `--effort` logic is what we deliberately diverge from) | ||
| - `plugins/flow-next/scripts/flowctl.py:3480`,`:3617`,`:3658` — `BackendSpec` / `parse_backend_spec_lenient` / `resolve_review_spec` (already handle model-yes/effort-no — add tests, no edits) | ||
| - `plugins/flow-next/scripts/flowctl.py:18622`, `:25938-25948` — `cmd_copilot_check` + copilot `check` subparser | ||
| - `plugins/flow-next/tests/test_copilot_run_exec.py`, `plugins/flow-next/tests/test_backend_spec.py` — test templates | ||
|
|
||
| ## Key context | ||
|
|
||
| `run_cursor_exec` MUST set `cwd=repo_root` (cursor scopes to the workspace dir; a review from a subdir reads the wrong tree). `--trust` is mandatory headless or the CLI hangs on a trust prompt. (Both verified in spec smoke-tests.) | ||
|
|
||
| ## Acceptance | ||
|
|
||
| - [ ] `BACKEND_REGISTRY` has `cursor` (models set, `efforts: None`, `default_model: gpt-5.5-high`); `VALID_BACKENDS` includes it; `flowctl review-backend` reports `cursor` from `.flow/config.json` + `FLOW_REVIEW_BACKEND` (R1) | ||
| - [ ] `BackendSpec.parse("cursor")` / `parse("cursor:gpt-5.5-high")` succeed; `parse("cursor:gpt-5.5-high:high")` raises (effort rejected); `parse("cursor:bogus")` raises listing valid models; `.resolve()` fills `gpt-5.5-high` with effort `None` (R2) | ||
| - [ ] `run_cursor_exec` shells `cursor-agent -p --output-format json --trust --mode ask --model <m>` with `cwd=repo_root`, no `--effort`; test asserts the `--mode ask` (read-only) flag is present; first call omits `--resume` and returns the generated `session_id`; returns non-zero on `is_error`/600s timeout (R3) | ||
| - [ ] above the argv threshold `run_cursor_exec` raises an explicit "prompt too large" error (asserted by a test) — never a silent read-back-into-argv (R3) | ||
| - [ ] `flowctl cursor check [--skip-probe]` reports `{available, version, authed}` in text and `--json` (R4) | ||
| - [ ] `test_cursor_run_exec.py` (success / `is_error` / timeout / first-call-omits-resume / resume-passes-id / cwd=repo_root / mode-ask-flag / prompt-too-large) + `test_backend_spec.py` cursor cases pass; full Python suite green (R11) | ||
|
|
||
| ## Done summary | ||
| Added the `cursor` review backend foundation in flowctl: the BACKEND_REGISTRY entry (model-yes / effort-no shape, default gpt-5.5-high), the require_cursor / get_cursor_version / run_cursor_exec helper trio (positional-argv prompt, resume-only session, cwd=repo_root, --mode ask --trust, no --effort, explicit prompt-too-large raise, non-zero on is_error/timeout), the `cursor check [--skip-probe]` subcommand, and unit tests (test_cursor_run_exec.py + test_backend_spec.py cursor cases). Full Python suite green at 1271 tests. | ||
| ## Evidence | ||
| - Commits: dcbb1a7e5a6e39a021ee56dd81290b4101bf8559 | ||
| - Tests: python3 -m unittest discover -s plugins/flow-next/tests (1271 passed, skipped=2) | ||
| - PRs: |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.