Skip to content
Open
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
1100568
plan(fn-74): decompose Cursor review backend into 4 tasks — codex pla…
gmickel Jun 29, 2026
0d3aba1
feat(review): cursor backend foundation — registry + run_cursor_exec …
gmickel Jun 29, 2026
af7008a
chore(flow): fn-74.1 done-summary + evidence (codex impl-review SHIP)
gmickel Jun 29, 2026
6818a07
feat(review): cursor review commands — impl/plan/completion/validate/…
gmickel Jun 29, 2026
0dd4205
chore(flow): fn-74.2 done-summary + evidence (codex impl-review SHIP)…
gmickel Jun 29, 2026
f28bd0b
feat(review): cursor backend skill+setup wiring + codex mirror — work…
gmickel Jun 29, 2026
3d58677
chore(flow): fn-74.3 done-summary + evidence (codex impl-review SHIP)
gmickel Jun 29, 2026
5ac94f6
docs(review): cursor review backend — flowctl.md/README/GLOSSARY/CHAN…
gmickel Jun 29, 2026
1b8a601
docs(review): note downstream doc-chain coverage in CHANGELOG cursor …
gmickel Jun 29, 2026
e38a250
docs(review): fix stale review.backend config enum + setup usage temp…
gmickel Jun 29, 2026
a468214
docs(review): teams.md backend enumeration adds cursor — fn-74.4
gmickel Jun 29, 2026
5d14bae
chore(review): codex-mirror regen + dogfood usage.md parity for curso…
gmickel Jun 29, 2026
ca3511d
chore(flow): fn-74.4 done-summary + evidence + review-backend sweep m…
gmickel Jun 29, 2026
8b7d829
fix(review): cursor completion-review fixes (is_error + --spec guard …
gmickel Jun 29, 2026
b7b9928
fix(review): impl-review/plan-review command hints → rp|codex|copilot…
gmickel Jun 29, 2026
5dbe249
chore(flow): fn-74 completion-review SHIP (codex)
gmickel Jun 29, 2026
6e502a9
chore(flow): drop incidental .flow/config.json drift — local review.b…
gmickel Jun 30, 2026
6ae2f7e
fix(review): address PR #184 codex review (2× P2) — fn-74
gmickel Jun 30, 2026
46864c8
perf(review): cursor reviews read files from disk, never embed conten…
gmickel Jun 30, 2026
136b1e9
perf(review): codex + copilot also read files from disk, never embed …
gmickel Jun 30, 2026
520edca
chore(review): remove dead embed helper + modernize copilot for CLI 1…
gmickel Jun 30, 2026
be51d52
chore(review): post-review cleanup — drop dead embed flag/vars, fix c…
gmickel Jun 30, 2026
472bfcf
fix(review): codex/copilot coerce config-default to command backend —…
gmickel Jun 30, 2026
7f475d8
fix(review): address codex-bot PR #184 findings — cursor/codex/copilo…
gmickel Jun 30, 2026
a8b6847
fix(review): reserve cursor re-review preamble in argv budget + omit …
gmickel Jun 30, 2026
3a8d809
fix(work): propagate the resolved review backend to the worker's impl…
gmickel Jun 30, 2026
513caf1
fix(review): wire cursor into Ralph init + fix copilot auth-check pro…
gmickel Jun 30, 2026
40bef68
fix(ralph): enforce review-receipt gate for cursor backend — fn-74
gmickel Jun 30, 2026
b48a4dc
fix(review): general argv-budget backstop for cursor prompts (spec/ta…
gmickel Jun 30, 2026
5511faf
fix(review): cursor prompt-budget off-by-one at exactly the argv cap …
gmickel Jun 30, 2026
7556a71
fix(review): backstop cursor validator + deep-pass prompts under the …
gmickel Jun 30, 2026
445247e
chore(flow): mark fn-74 task definitions done — fn-74
gmickel Jun 30, 2026
3e25606
fix(review): per-spec default_review applies to spec-scoped plan/comp…
gmickel Jul 1, 2026
5b7efed
fix(review): cursor coerces ANY non-cursor resolved spec, not just en…
gmickel Jul 1, 2026
c38f1dc
fix(review): cursor never hands the reviewer an empty diff without a …
gmickel Jul 1, 2026
47068f9
feat(review): always-on code-smell baseline + tightened rubric — shar…
gmickel Jul 1, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .flow/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,5 @@ tmp/
.migrating
.migration-manifest
sync-runs/
pilot-runs/
# End of auto-managed block. User patterns below this line are preserved.
2,136 changes: 1,554 additions & 582 deletions .flow/bin/flowctl.py

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
---
title: "Adding a review backend: sweep ALL enumeration sites (config table, stage list, "
date: "2026-06-29"
track: bug
category: integration
module: "plugins/flow-next/docs, plugins/flow-next/scripts/flowctl.py"
tags: [review-backend, enumeration-drift, docs-sweep, cursor, fn-74]
problem_type: integration
symptoms: "codex impl-review NEEDS_WORK x3: each round found another stale rp/codex/copilot enum missing the new backend"
root_cause: "review-backend enumerations are scattered across many non-obvious sites (config tables, stage lists, setup templates, vault notes); several already omitted copilot, so a new backend exposes them as contradictions"
resolution_type: fix
---

## Problem
Adding a 4th cross-model review backend (`cursor`, fn-74) and doing the "docs sweep" task, codex impl-review went NEEDS_WORK three times — each round surfaced ANOTHER stale backend-enumeration site the obvious prose lists had missed. The enumerations live in many non-obvious places, and several already omitted even the *previous* backend (`copilot`), so they read as contradictions the moment you add the new one.

## What Didn't Work
Updating only the visible "RepoPrompt / Codex / Copilot" prose lists (README adversarial-gates row, GLOSSARY cross-model-review line, the impl-review command row). That left contradictory enumerations elsewhere in the SAME files the reviewer flagged as introduced findings.

## Solution
Sweep ALL of these enumeration sites when adding a review backend (the ones missed in fn-74, in flag order):
- `docs/flowctl.md`: the command list (~L14), the new `### <backend>` section (mirror copilot), the `review-backend` spec-grammar example (~L647), AND the **config-table `review.backend` row** (~L597) + the `config set` example comment (~L583) — these two were stale at `rp, codex, none` (omitted copilot too).
- `docs/teams.md`: BOTH the "RepoPrompt / Codex / Copilot" prose (×2) AND the **stage-[6] `Backends: rp, codex, copilot, none` exhaustive list** (~L171).
- `docs/skills.md`: the plan-review row's `(rp/codex/copilot)`.
- `skills/flow-next-setup/templates/usage.md`: the `review.backend # rp|codex|copilot|none` comment (~L165).
- Vault (`~/Documents/GordonsVault/.../flow-next - *.md`): Vocabulary backends line, Skills Catalog plan-review row, Lifecycle handover-#5 line, Architecture cmd list, **Release Timeline** (watch for a concurrent release-doc agent leaving a DUPLICATE row — dedupe).
- Downstream repos: flow-next.dev (`review/workflow` table + `--review` examples + spec-form note, `review/receipts` mode field, `releases/changelog`), AI×SDLC (`guides/flow-next.md` backend list + `code-review-tools-changelog.md`), GF (`spec/05-cross-model-review.md` + re-render `dist/*.html` + the bundled `code-factory-onboarding.html`).

NOTE: codex impl-review READS the vault file via its absolute path (flagged the duplicate/stale Release Timeline row) — downstream repo files in OTHER git repos are not in the diff, but vault notes referenced by absolute path are visible to it.

## Prevention
Before committing a review-backend docs task, run `grep -rniE "rp.{0,3}codex.{0,3}copilot|rp, codex|review.backend" docs/ skills/ README.md GLOSSARY.md | grep -vi <new-backend>` and confirm every hit is either a per-backend section header, a host-platform mention (Codex/Copilot/Droid as *drivers*), or a deliberately-scoped recommendation — never a stale exhaustive enumeration. Same shape as the tracker-adapter sweep (see related entry).
16 changes: 8 additions & 8 deletions .flow/specs/fn-74-cursor-review-backend-cursor-agent-cli.json

Large diffs are not rendered by default.

50 changes: 44 additions & 6 deletions .flow/specs/fn-74-cursor-review-backend-cursor-agent-cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ Mirror the `copilot` backend end-to-end. Paths in
- **Repo scoping — REQUIRED.** `run_cursor_exec` runs with `cwd=repo_root`; add a test that invokes from a subdirectory and confirms the correct tree is reviewed.
- **`--trust` mandatory** headless or the CLI hangs on a trust prompt.
- **Read-only — VERIFIED.** `--mode ask` refused a "create a file" instruction; tree stayed clean. R8 asserts `git status` unchanged across a review.
- **Oversized prompts — VERIFIED on POSIX (60KB argv).** Reuse copilot's argv-vs-temp threshold. **Windows is the one open risk:** cursor-agent stdin support is unconfirmed and there is no `CreateProcessW`-safe path yet → during impl either confirm/implement a stdin path OR explicitly document Windows large-prompt as unsupported (don't silently hardcode argv).
- **Oversized prompts — VERIFIED on POSIX (60KB positional argv).** cursor-agent takes the prompt as a **positional argument** (not stdin). Up to the threshold, pass it positionally. **Above the threshold there is no safe path yet:** copilot's temp-file step just reads the file back into argv (it does NOT bypass any cap), and cursor-agent stdin support is unconfirmed → `run_cursor_exec` must raise an **explicit "prompt too large" error** above the threshold (with a test), NOT silently reuse the read-back-into-argv trick. Implement a stdin path only if cursor-agent confirms stdin input. (The Windows `CreateProcessW` cap is where this bites first.)
- **Triage precision** — see Architecture §8: deterministic by default; opt-in LLM judge stays codex/copilot and is a documented dependency for cursor users who enable it.
- **Auth not configured** → `check` and runners surface a clear error pointing at `cursor-agent` login / `CURSOR_API_KEY` (never a silent empty review).
- **`.result` empty / `is_error:true`** → backend failure (non-zero exit + stderr), never a false SHIP.
Expand All @@ -106,25 +106,25 @@ Mirror the `copilot` backend end-to-end. Paths in

## Acceptance Criteria

- **R1:** `cursor` is in `BACKEND_REGISTRY` and `VALID_BACKENDS`; `flowctl review-backend` resolves/reports `cursor` from `.flow/config.json`, `FLOW_REVIEW_BACKEND`, per-task stored review, and `--spec`.
- **R1:** `cursor` is in `BACKEND_REGISTRY` and `VALID_BACKENDS`; `flowctl review-backend` reports `cursor` from `.flow/config.json` + `FLOW_REVIEW_BACKEND` (its only two sources); per-task `default_review` and `--spec cursor:<model>` resolve via `resolve_review_spec` / the review commands (NOT `review-backend`).
- **R2:** `BackendSpec.parse("cursor")` / `parse("cursor:gpt-5.5-high")` succeed; `parse("cursor:gpt-5.5-high:high")` raises (effort rejected); `parse("cursor:bogus")` raises listing valid models; `.resolve()` fills `gpt-5.5-high`, effort `None`.
- **R3:** `run_cursor_exec` shells `cursor-agent -p --output-format json --trust --mode ask --model <m>` with `cwd=repo_root`; on a first call it omits `--resume` and returns Cursor's generated `session_id`; on continuation it passes `--resume <session_id>`; parses `.result`/`.session_id`/`.is_error`; returns non-zero on a 600s timeout.
- **R4:** `flowctl cursor check [--skip-probe]` reports availability + version + auth (`authed`) in text and `--json`, schema-aligned to copilot's `check`.
- **R5:** `flowctl cursor impl-review <task> --base <b> --receipt <r>` writes a `mode:"cursor"` receipt (no `effort` key) and prints `VERDICT=...`.
- **R6:** `cursor plan-review`, `completion-review`, `validate`, `deep-pass` dispatch through `run_cursor_exec` and write the same additive receipt shapes as codex/copilot (`mode:"cursor"`).
- **R7:** Re-review with an existing `mode=="cursor"` receipt resumes via `--resume <session_id>` (using the persisted returned id); a cross-backend receipt starts fresh.
- **R8:** A cursor review leaves the working tree unchanged (`git status` identical before/after).
- **R8:** A cursor review leaves the working tree unchanged. Unit-level: `run_cursor_exec` is asserted to pass `--mode ask` (read-only) and never an edit/write flag. Integration-level: an **optional live smoke test gated on `cursor-agent` availability** runs a real `cursor impl-review` against a temp git repo and asserts `git status` is identical before/after (skipped when the CLI is absent — never a mocked clean-tree claim).
- **R9:** `/flow-next:impl-review` routes `BACKEND=="cursor"` to `workflow-cursor.md`; `/flow-next:plan-review` and `/flow-next:spec-completion-review` handle `cursor`; every user-facing `--review=rp|codex|copilot|none` string includes `cursor`.
- **R10:** `flow-next-setup` `review.backend` config accepts `cursor` and spec form `cursor:gpt-5.5-high`.
- **R11:** Tests: `test_cursor_run_exec.py` (mock subprocess: success / `is_error` / timeout / **first-call-omits-resume** / **resume-passes-id** / **cwd=repo_root** / **no-effort-in-receipt**), `test_backend_spec.py` cursor cases (model-yes/effort-no), receipt-schema `mode:"cursor"`. Full Python suite passes.
- **R11:** Tests: `test_cursor_run_exec.py` (mock subprocess: success / `is_error` / timeout / **first-call-omits-resume** / **resume-passes-id** / **cwd=repo_root** / **mode-ask-flag** / **prompt-too-large**), `test_backend_spec.py` cursor cases (model-yes/effort-no). Receipt-schema `mode:"cursor"` + the `effort`-absent assertion are the review-command tests (R14, task .2). Full Python suite passes.
- **R12:** `scripts/sync-codex.sh` regenerated; `cursor` surfaces in the codex mirror; install/sync parity tests pass.
- **R13:** Docs chain updated at the concrete targets below; **no version bump** (batched), entries under `## Unreleased`:
- **Repo:** `plugins/flow-next/docs/flowctl.md` (cmd list L14 + new cursor backend section), `README.md` (L44 / L253 / L290 backend lists), `GLOSSARY.md` (L29 "Backends:" list), root `CHANGELOG.md` `## Unreleased`.
- **flow-next.dev:** `src/content/docs/review/workflow.mdx` + `review/receipts.mdx` + `install.mdx` backend enumeration, `releases/changelog.mdx`, bump `src/lib/site.ts` `FLOW_NEXT_VERSION` + `package.json`. No new page → navbars unchanged. Run `pnpm build`.
- **flow-next.dev:** `src/content/docs/review/workflow.mdx` (flip the live "coming next release" Cursor row → shipped) + `review/receipts.mdx` + `install.mdx` backend enumeration + `releases/changelog.mdx`. **No `FLOW_NEXT_VERSION` / `package.json` bump in this spec** — the docs-site version bump is release-only (batched), same rule as the plugin. No new page → navbars unchanged. Run `pnpm build`.
- **AI-x-SDLC:** `guides/flow-next.md` (L65 "(RepoPrompt, OpenAI Codex, GitHub Copilot)" → add Cursor), `guides/code-review-tools-changelog.md`.
- **GrowthFactors:** `spec/05-cross-model-review.md` (claim already lists Cursor — verify/tighten), re-render `dist/gf.html` (+ `shd`/`shopfully`/`flooid`) and the bundled `~/work/AI-x-SDLC-Starter-Kit/resources/assets/code-factory-onboarding.html`.
- **Obsidian vault:** the cross-model-review / Skills Catalog / Release Timeline note(s).
- **R14:** Cursor `impl-review` / `completion-review` receipts carry the **same rigor fields as copilot** — confidence-rubric anchors, suppressed-finding counts, introduced-vs-pre_existing classification, unaddressed-R-ID surfacing, and protected-path filtering — asserted by a receipt-parity test against the copilot field set.
- **R14:** Cursor `impl-review` / `completion-review` receipts carry the same **rigor fields** as copilot — confidence-rubric anchors, suppressed-finding counts, introduced-vs-pre_existing classification, unaddressed-R-ID surfacing, protected-path filtering — asserted by a parity test scoped to **those rigor fields only**, which **also asserts `effort` is absent** (cursor must never write it; effort is not a cursor field).

## Boundaries

Expand Down Expand Up @@ -176,3 +176,41 @@ fields [R14, R5, R11]. Proves the backend works end-to-end on a real spec.
Natural task seams: (1) flowctl core (registry + helpers + subcommands + handlers +
dispatch + unit tests), (2) skill/setup wiring + codex-mirror regen, (3) docs +
downstream chain.

## Plan (4 tasks)

Decomposed into 4 sequential tasks (a parity port is inherently code → wire → document); the flowctl core is split into **proof** + **commands** so each fits one `/flow-next:work` iteration.

1. **`.1` — flowctl cursor foundation** (M, no deps · **early proof**) — registry entry + `require_cursor`/`get_cursor_version`/`run_cursor_exec` + `cursor check` + parser/run-exec tests. → R1, R2, R3, R4, R11
2. **`.2` — cursor review commands** (M, deps .1) — 5 subcommands + `cmd_cursor_*` handlers + validator/deep dispatch + own-mode `mode:"cursor"` receipts (resume-guard, rigor parity, clean-tree live test). → R5, R6, R7, R8, R11, R14
3. **`.3` — skill + setup wiring + codex mirror** (M–L, deps .2) — `workflow-cursor.md` ×2 + plan-review section + `--review` literals (8 files) + setup config + `sync-codex.sh` regen. → R9, R10, R12
4. **`.4` — docs + downstream chain** (M, deps .3) — repo docs + flow-next.dev (flip the already-live "coming" Cursor row → shipped) + AI×SDLC + GF + vault. No version bump. → R13

### Early proof point
Task `.1` proves the `cursor-agent` contract end-to-end (`run_cursor_exec` + `check` + `BackendSpec` parse/resolve). Already de-risked by the spec's live smoke-tests + dogfood; if `.1` nonetheless fails, re-examine the cursor-agent CLI contract before `.2`+.

### Strategy Alignment
- **Cross-model review** — adds a fourth reviewer backend (Cursor: gpt-5.5-high / codex / composer / opus), widening the disagreement surface and letting teams bill review to an existing Cursor subscription.
- **Host agent IS the intelligence / lean flowctl** — pure parity port: a ~6-line registry entry + mirrored helpers; no new architecture, no new skill/command, no second-LLM-spawn-from-flowctl.

### Requirement coverage

| Req | Task(s) |
|-----|---------|
| R1 registry / resolve | .1 |
| R2 spec grammar (model-yes/effort-no) | .1 |
| R3 run_cursor_exec | .1 |
| R4 cursor check | .1 |
| R5 impl-review receipt mode:cursor | .2 |
| R6 plan/completion/validate/deep dispatch | .2 |
| R7 session-resume guard | .2 |
| R8 read-only / clean tree | .2 (live test) · .1 (`--mode ask` flag) |
| R9 skill routing + --review literals | .3 |
| R10 setup config | .3 |
| R11 tests | .1, .2 |
| R12 codex mirror | .3 |
| R13 docs chain | .4 |
| R14 receipt rigor parity | .2 |

### Soft sequencing note
fn-54 (eval-driven prompt optimization, 0 tasks) also edits the review `workflow*.md` files — coordinate on those edits if fn-54 activates concurrently. Not a hard dependency (spec-scout: standalone).
14 changes: 14 additions & 0 deletions .flow/tasks/fn-74-cursor-review-backend-cursor-agent-cli.1.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"assignee": null,
"claim_note": "",
"claimed_at": null,
"created_at": "2026-06-29T11:35:58.566755Z",
"depends_on": [],
"id": "fn-74-cursor-review-backend-cursor-agent-cli.1",
"priority": null,
"spec": "fn-74-cursor-review-backend-cursor-agent-cli",
"spec_path": ".flow/tasks/fn-74-cursor-review-backend-cursor-agent-cli.1.md",
"status": "todo",
Comment thread
gmickel marked this conversation as resolved.
Outdated
Comment thread
gmickel marked this conversation as resolved.
Outdated
"title": "flowctl cursor backend foundation \u2014 registry + run_cursor_exec + check + parser tests",
"updated_at": "2026-06-29T11:44:49.065046Z"
}
47 changes: 47 additions & 0 deletions .flow/tasks/fn-74-cursor-review-backend-cursor-agent-cli.1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
satisfies: [R1, R2, R3, R4, R11]
---

## Description

Foundation of the `cursor` review backend in flowctl — the registry entry, the helper trio, the `cursor check` subcommand, and the parser/run-exec unit tests. **This is the early proof point:** it validates the `cursor-agent` contract (run_cursor_exec parses `.result`/`.session_id`/`.is_error`, read-only `--mode ask`, resume-only session) and confirms the existing `BackendSpec`/registry already accept the model-yes/effort-no shape with **zero parser changes** (verified during spec smoke-tests).

**Size:** M
**Files:** `plugins/flow-next/scripts/flowctl.py`, `plugins/flow-next/tests/test_cursor_run_exec.py` (new), `plugins/flow-next/tests/test_backend_spec.py`

## Approach

- Add `"cursor"` to `BACKEND_REGISTRY` after the copilot entry — `models` set (`auto`, `gpt-5.5-high`, `gpt-5.4-high`, `gpt-5.3-codex(-high/-xhigh)`, `gpt-5.2`, `composer-2.5`, `claude-opus-4-8-thinking-high`, `claude-opus-4-7-thinking-high`), `efforts: None`, `default_model: "gpt-5.5-high"`. `VALID_BACKENDS` derives.
- Mirror `require_copilot` / `get_copilot_version` / `run_copilot_exec` → `require_cursor` / `get_cursor_version` / `run_cursor_exec`. Invocation: `cursor-agent -p --output-format json --trust --mode ask --model <m> [--resume <sid>]`, run with `cwd=repo_root`, `timeout=600`. `session_id` is an **optional input** (None ⇒ omit `--resume`, capture the returned id; non-None ⇒ `--resume <id>`). Parse `.result`/`.session_id`/`.is_error`; non-zero exit on `is_error`/timeout/CLI failure.
- **Prompt delivery is positional argv** (cursor-agent takes the prompt as a positional arg, NOT stdin). Up to a threshold, pass positionally. **Above the threshold, raise an explicit "prompt too large" error** — do NOT copy copilot's temp-file step (it just reads the file back into argv and bypasses no cap; cursor-agent stdin is unconfirmed). A stdin path is added only if cursor-agent confirms stdin input.
- **Do NOT copy `run_copilot_exec`'s `--effort`/`claude-`-drop logic** — cursor folds effort into the model name and takes no `--effort` flag.
- Add `cursor check [--skip-probe]` subparser + `cmd_cursor_check` returning `{available, version, authed}` (text + `--json`), schema-aligned to copilot's `check`.

## Investigation targets

**Required:**
- `plugins/flow-next/scripts/flowctl.py:3416-3477` — `BACKEND_REGISTRY` + `VALID_BACKENDS`
- `plugins/flow-next/scripts/flowctl.py:3753`,`:3761`,`:3798` — `require_copilot` / `get_copilot_version` / `run_copilot_exec` (the template; note its argv-vs-temp + `--effort` logic is what we deliberately diverge from)
- `plugins/flow-next/scripts/flowctl.py:3480`,`:3617`,`:3658` — `BackendSpec` / `parse_backend_spec_lenient` / `resolve_review_spec` (already handle model-yes/effort-no — add tests, no edits)
- `plugins/flow-next/scripts/flowctl.py:18622`, `:25938-25948` — `cmd_copilot_check` + copilot `check` subparser
- `plugins/flow-next/tests/test_copilot_run_exec.py`, `plugins/flow-next/tests/test_backend_spec.py` — test templates

## Key context

`run_cursor_exec` MUST set `cwd=repo_root` (cursor scopes to the workspace dir; a review from a subdir reads the wrong tree). `--trust` is mandatory headless or the CLI hangs on a trust prompt. (Both verified in spec smoke-tests.)

## Acceptance

- [ ] `BACKEND_REGISTRY` has `cursor` (models set, `efforts: None`, `default_model: gpt-5.5-high`); `VALID_BACKENDS` includes it; `flowctl review-backend` reports `cursor` from `.flow/config.json` + `FLOW_REVIEW_BACKEND` (R1)
- [ ] `BackendSpec.parse("cursor")` / `parse("cursor:gpt-5.5-high")` succeed; `parse("cursor:gpt-5.5-high:high")` raises (effort rejected); `parse("cursor:bogus")` raises listing valid models; `.resolve()` fills `gpt-5.5-high` with effort `None` (R2)
- [ ] `run_cursor_exec` shells `cursor-agent -p --output-format json --trust --mode ask --model <m>` with `cwd=repo_root`, no `--effort`; test asserts the `--mode ask` (read-only) flag is present; first call omits `--resume` and returns the generated `session_id`; returns non-zero on `is_error`/600s timeout (R3)
- [ ] above the argv threshold `run_cursor_exec` raises an explicit "prompt too large" error (asserted by a test) — never a silent read-back-into-argv (R3)
- [ ] `flowctl cursor check [--skip-probe]` reports `{available, version, authed}` in text and `--json` (R4)
- [ ] `test_cursor_run_exec.py` (success / `is_error` / timeout / first-call-omits-resume / resume-passes-id / cwd=repo_root / mode-ask-flag / prompt-too-large) + `test_backend_spec.py` cursor cases pass; full Python suite green (R11)

## Done summary
Added the `cursor` review backend foundation in flowctl: the BACKEND_REGISTRY entry (model-yes / effort-no shape, default gpt-5.5-high), the require_cursor / get_cursor_version / run_cursor_exec helper trio (positional-argv prompt, resume-only session, cwd=repo_root, --mode ask --trust, no --effort, explicit prompt-too-large raise, non-zero on is_error/timeout), the `cursor check [--skip-probe]` subcommand, and unit tests (test_cursor_run_exec.py + test_backend_spec.py cursor cases). Full Python suite green at 1271 tests.
## Evidence
- Commits: dcbb1a7e5a6e39a021ee56dd81290b4101bf8559
- Tests: python3 -m unittest discover -s plugins/flow-next/tests (1271 passed, skipped=2)
- PRs:
Loading