feat(review): Cursor review backend (cursor-agent CLI) — fn-74#184
feat(review): Cursor review backend (cursor-agent CLI) — fn-74#184gmickel wants to merge 35 commits into
Conversation
…n-review SHIP 4 sequential tasks (parity port: code → wire → document); flowctl core split into proof (.1) + commands (.2). codex plan-review SHIP after 2 rounds — folded its findings: scoped R1 to review-backend's real sources, explicit large-prompt error (no silent argv read-back), R8 clean-tree moved to a real .2 live test, docs-site version bump release-only, R14 parity limited to rigor fields + effort-absent assertion. Standalone (no spec deps); soft fn-54 workflow*.md note. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
…+ check + tests
- Add `cursor` to BACKEND_REGISTRY (model-yes / effort-no shape, default
gpt-5.5-high); VALID_BACKENDS derives; `review-backend` reports cursor from
config.json + FLOW_REVIEW_BACKEND (R1, R2)
- require_cursor / get_cursor_version / run_cursor_exec — positional-argv prompt
(NOT stdin), resume-only session (first call omits --resume, captures minted
id), cwd=repo_root, --mode ask --trust, NO --effort, explicit prompt-too-large
raise, non-zero on is_error/timeout/CLI-failure (R3)
- _parse_cursor_result: single-object + streaming JSON-lines, empty/unparseable
→ backend failure (never false SHIP)
- `cursor check [--skip-probe]` subcommand + cmd_cursor_check → {available,
version, authed} text + --json, schema-aligned to copilot (R4)
- Tests: test_cursor_run_exec.py (success/is_error/timeout/first-call-omits-
resume/resume-passes-id/cwd=repo_root/mode-ask-flag/prompt-too-large) +
test_backend_spec.py cursor cases; full suite 1271 green (R11)
- Sync byte-identical dogfood copy .flow/bin/flowctl.py
Task: fn-74-cursor-review-backend-cursor-agent-cli.1
Task: fn-74-cursor-review-backend-cursor-agent-cli.1
…deep + own-mode mode:cursor receipts — fn-74.2 - 3 handlers (cmd_cursor_impl_review/_plan_review/_completion_review) + _resolve_cursor_review_spec, mirroring copilot but resume-only + no effort - validate/deep-pass dispatch via new 'elif backend == cursor' branches in the shared _run_validator_pass/_run_deep_pass spines + cmd_cursor_validate/ _deep_pass wrappers - receipts: mode:cursor, spec:cursor:<model>, model, NO effort key; carry copilot rigor fields (suppressed/introduced-vs-pre_existing/unaddressed) - own-mode resume guard: resume --resume only when prior receipt mode==cursor (cross-backend receipt ⇒ fresh session; no uuid fabrication — resume-only) - 5 cursor review subcommands wired into the subparser (cursor:<model> spec, no effort); triage --backend choices unchanged (codex|copilot) - tests: test_cursor_review_commands.py (handler/dispatch/resume-guard/rigor parity/effort-absent) + test_cursor_clean_tree.py (R8 live smoke, gated on cursor-agent; ran real review, tree clean) - .flow/bin/flowctl.py kept byte-identical (dogfood parity) Task: fn-74-cursor-review-backend-cursor-agent-cli.2
… — recovered from lost-worker Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
…flow-cursor.md x2, --review literals, review.backend — fn-74.3 - new workflow-cursor.md in impl-review + spec-completion-review (mirror copilot; model-yes/effort-no, --mode ask read-only, resume-only session, mode==cursor receipt, no effort key) - plan-review: Cursor Backend block (SKILL) + Cursor Backend Workflow section (workflow.md) + anti-patterns - impl-review/spec-completion workflow-common.md: cursor dispatch-table row + cursor) deep-pass/validate branches - cursor added to every --review=rp|codex|copilot|none literal across the 8 hand-edited files (+ backend-at-a-glance, critical-rules, re-review) - flow-next-setup: HAVE_CURSOR detect + Cursor CLI review option + cursor answer-mapping + power-user spec note (cursor:gpt-5.5-high, no :effort) - scripts/sync-codex.sh re-run: 6 codex-mirror copies regenerated; R2 ask-block injection verified clean (genuine ask sites only); validators + full python suite green Task: fn-74-cursor-review-backend-cursor-agent-cli.3 Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
…GELOG + skills/teams sweep — fn-74.4 - flowctl.md: cursor in cmd list + new cursor backend section (resume-only sessions, model-yes/effort-no, --mode ask cwd=repo_root, triage note) + review-backend grammar example - README.md: 3 backend lists (adversarial gates, blind-spots, impl-review cmd) add Cursor - GLOSSARY.md: cross-model-review backends add Cursor (cursor) - CHANGELOG.md: ## Unreleased cursor review backend entry (no version bump, batched) - skills.md + teams.md: stale RepoPrompt/Codex/Copilot enumerations add Cursor Task: fn-74-cursor-review-backend-cursor-agent-cli.4
…entry — fn-74.4 Records the fn-74.4 doc sweep (repo + flow-next.dev + AI×SDLC + GF microsite + vault), mirroring the fn-68.6 downstream-coverage precedent. The downstream surfaces are committed in their own repos (out of this repo's reviewable diff). Task: fn-74-cursor-review-backend-cursor-agent-cli.4
…late — fn-74.4 - flowctl.md config table: review.backend was 'rp, codex, none' (stale, omitted copilot too) → 'rp, codex, copilot, cursor, none' + spec-form note (codex impl-review finding) - flowctl.md config-set example comment: same enum fix - setup usage.md template: review.backend comment rp|codex|copilot|none → +cursor Task: fn-74-cursor-review-backend-cursor-agent-cli.4
- teams.md stage-[6] 'Backends:' list rp/codex/copilot/none → +cursor + cursor:gpt-5.5-high spec-form note (codex impl-review finding) Task: fn-74-cursor-review-backend-cursor-agent-cli.4
…r enum — fn-74.4 - sync-codex.sh regen: codex mirror setup usage.md template picks up cursor in review.backend enum (R12) - .flow/usage.md dogfood copy matches canonical template (test_dogfood_template_parity) Task: fn-74-cursor-review-backend-cursor-agent-cli.4
…emory (codex impl-review SHIP) Task: fn-74-cursor-review-backend-cursor-agent-cli.4
…+ skill args) — fn-74 Three introduced findings from codex completion-review: - cmd_cursor_check now honors is_error: a returncode==0 + is_error:true probe is NOT authed (R4) — was checking exit code only. - cursor commands reject a non-cursor --spec: the validator/deep dispatch branches + _resolve_cursor_review_spec now enforce parsed.backend=="cursor", so `cursor impl-review --spec codex:...` errors instead of running cursor-agent with a foreign model + serializing spec:"codex:..." under mode:"cursor" (R5/R6/R14). - skill examples carry required args: cursor impl-review --base, cursor plan-review --files. +5 regression tests; codex mirror regenerated; .flow/bin/flowctl.py byte-identical. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
…|cursor|none — fn-74 completion-review R9: the two command argument-hints were stale (rp|codex|export, predating copilot); align with spec-completion-review/epic-review. R6 (cursor plan/ completion pass None to the resolve helper) is verified codex+copilot parity — same call sites, not a cursor-introduced regression — left as-is. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
Round 1: 3 introduced bugs (cursor check is_error; cursor --spec backend guard; skill examples missing --base/--files) → fixed + 5 regression tests. Rounds 2-3 churned on verified codex/copilot PARITY behaviors (plan/completion resolve fallback) + self-contradictory hint guidance — not cursor-introduced regressions. Receipt verdict SHIP; suite green (1291). Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
PR SummaryMedium Risk Overview Review prompts shift to agentic context — Copilot session handling is unified on both platforms: first call uses
Cursor-only prompt budgeting ( Reviewed by Cursor Bugbot for commit c38f1dc. Bugbot is set up for automated code reviews on this repo. Configure here. |
Bugbot couldn't run - usage limit reachedBugbot is counted against Cursor usage for this user or team, and this run hit a usage or spend limit. A user or team admin can review and increase usage limits in the Cursor dashboard. (requestId: serverGenReqId_fa14b892-d467-432d-a7dc-aa2944bebad1) |
…ackend toggle + auto-normalized keys, not part of fn-74 Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
Bugbot couldn't run - usage limit reachedBugbot is counted against Cursor usage for this user or team, and this run hit a usage or spend limit. A user or team admin can review and increase usage limits in the Cursor dashboard. (requestId: serverGenReqId_e0b8c265-fca1-46e8-8a63-67f62f8628f0) |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5dbe249b54
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- Coerce the no-`--spec` resolve fallback to cursor: `_resolve_cursor_review_spec` now returns the cursor default when `resolve_review_spec` hands back a non-cursor backend (e.g. config `review.backend=codex`), so an explicit `--review=cursor` never runs cursor-agent with a foreign model or stamps `spec:"codex:"` under `mode:"cursor"`. - Fail closed on oversized prompts via a non-zero return tuple instead of a raised ValueError, so cursor command handlers hit their `exit_code != 0` cleanup (drop stale receipt + structured error) rather than leaking a traceback. - Tests updated (oversized → non-zero return) + 2 fallback-coercion cases. Suite 1336 OK. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
Bugbot couldn't run - usage limit reachedBugbot is counted against Cursor usage for this user or team, and this run hit a usage or spend limit. A user or team admin can review and increase usage limits in the Cursor dashboard. (requestId: serverGenReqId_d180d335-5026-4010-ad8c-e71d18747a52) |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6ae2f7e291
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…ts — fn-74 Root cause of PR #184's "prompt too large": cmd_cursor_{impl,plan,completion}_review embedded full changed-file CONTENTS up to a 500KB budget (get_embedded_file_contents, FLOW_CURSOR_EMBED_MAX_BYTES). A changed flowctl.py (~270KB ×2) produced a ~538KB prompt — far over cursor's 30KB positional-argv cap — so cursor reviews failed on any diff touching a non-trivial file, even a tiny one. cursor-agent is AGENTIC: it runs read-only (`--mode ask`) with cwd=repo_root and reads files from disk itself. So we stop embedding file contents (embedded_content="", files_embedded=False across all 3 cursor handlers) and pass the diff + pointers; the reviewer reads what it needs. Validated: a real cursor review now accepts the prompt (previously rejected) — only blocked by the Cursor account usage limit, not argv. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
Bugbot couldn't run - usage limit reachedBugbot is counted against Cursor usage for this user or team, and this run hit a usage or spend limit. A user or team admin can review and increase usage limits in the Cursor dashboard. (requestId: serverGenReqId_a83e7988-6626-429c-bb63-0e1060b494ec) |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 46864c8bbb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…— fn-74 Per the agentic-review principle (CLAUDE.md: "the host agent IS the intelligence"; backends run with file access — codex sandbox cwd=repo_root, copilot --add-dir, cursor --mode ask). All three CLI backends now pass the diff + pointers and read changed files from disk instead of embedding up to a 500KB budget. This is the existing budget-overflow "read from disk" path, now always-on: smaller/cheaper prompts, no argv blow-ups, parity with rp (RepoPrompt Builder already selects context agentically). All 9 review sites (codex/copilot/cursor impl/plan/completion) updated; the get_embedded_file_contents helper is now unused (removed in a follow-up). Suite 1336 OK. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
Bugbot couldn't run - usage limit reachedBugbot is counted against Cursor usage for this user or team, and this run hit a usage or spend limit. A user or team admin can review and increase usage limits in the Cursor dashboard. (requestId: serverGenReqId_b4bf2904-1bbe-46d2-a612-991975f2797e) |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 136b1e90e3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: be51d52f93
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
… fn-74 (Finding B) Source-aware coercion completing Finding A. `resolve_review_spec` gains an opt-in `return_source` reporting which precedence rung produced the spec (task/epic/env/ config/hint). The codex/copilot/cursor resolve-helpers now coerce a non-self backend to their own default ONLY when it came from an env/config DEFAULT (e.g. `review.backend=rp`), while still honoring a deliberate per-task/per-epic cross-backend `review` spec (documented behavior). Previously codex/copilot passed the config default straight through, so `flowctl copilot impl-review` under `review.backend=rp` stamped `spec:"rp"` / `model:gpt-5.2` on the receipt while running gpt-5.5; cursor already coerced but unconditionally (overrode per-task specs too — now aligned). Verified end-to-end: copilot review under config=rp now records `copilot:gpt-5.5:high` (was `rp`/`gpt-5.2`). +5 tests (return_source tagging; codex/copilot coerce-config + honor-per-task; cursor per-task-honored). Suite 1345 OK; byte-parity; mirror in sync. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 472bfcf1b8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…t hardening — fn-74 Five P2s from the codex auto-reviewer, all verified valid (several exposed by the no-embed course change): - T4: run_codex_exec / run_copilot_exec now run with cwd=repo_root (like cursor) — the no-embed change made reviewers read repo-relative files from disk, which broke when launched from a subdir. run_codex_exec gained a repo_root param; all callers pass it. Verified: codex impl-review from a subdir → SHIP. - T7: cursor embeds the diff in its positional-argv prompt (CURSOR_ARGV_PROMPT_MAX 30k); a static cap couldn't fit because spec/template overhead varies. New fit_cursor_diff_to_budget() sizes the diff to the budget left under the cap (drops it entirely if the diff-less prompt already overflows; cursor reads files from disk). Verified: a 129KB-diff review now fits (29,733 chars) and produces a verdict. - T1: cursor/codex/copilot validate + deep-pass now use the source-aware coercion (were calling resolve_review_spec directly, bypassing Finding B). - T6: Ralph config.env FLOW_COPILOT_MODEL gpt-5.2 -> gpt-5.5 (1.0.65 rejects gpt-5.2); catalog comment realigned to the registry. - T5: /flow-next:work can route cursor/copilot reviews — REVIEW_MODE enum + worker Phase-4 gate + --review allowlist broadened (work routes via configured-backend passthrough to impl-review, which already supports both); sync-codex worker-prompt heredoc updated. Stale (T2: fn-74 tasks already done) + pre-existing (T3: per-spec default_review on plan-review, all backends) replied, not changed. Suite 1345 OK; byte-parity; mirror sync. Known limitation: cursor completion-review of a very-large-spec epic can overflow the argv cap on spec/template alone (separate from diff embedding) — fails closed cleanly. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7f475d84b9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…empty task arg — fn-74 Two findings from the re-review of 7f475d8: - NEW-1 (gap in the T7 dynamic diff-sizing): the re-review preamble was prepended AFTER fit_cursor_diff_to_budget computed the budget from the diff-less prompt, so a RESUMED cursor review could still exceed CURSOR_ARGV_PROMPT_MAX. The impl + completion handlers now detect is_rereview + build the preamble BEFORE sizing and reserve it in the budget base (and prepend it to the final prompt). Verified: first / task-resume / standalone-resume all stay at 29,733 < cap. - NEW-2: the impl-review workflow files passed a quoted empty "$TASK_ID" positional for branch/standalone reviews, which flowctl rejects as "Invalid task ID:" instead of entering standalone mode. All three backend workflows (codex/copilot/cursor) now build the command via an args array that omits the task arg when TASK_ID is empty. Suite 1345 OK; byte-parity; codex mirror synced. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a8b6847f3a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…-review — fn-74 Codex re-review finding: the worker gated on REVIEW_MODE but invoked `/flow-next:impl-review` bare, so a one-off `work --review=<backend>` override that differs from repo config was lost — impl-review re-resolved the backend from config, ignoring the override. The worker now passes `--review=$REVIEW_MODE` on both the initial and the NEEDS_WORK re-invocation (REVIEW_MODE already holds the per-run resolved backend, config OR override). Completes the T5 work-routing fix. Mirror synced; suite 1345 OK. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3a8d809581
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…be — fn-74 Two completeness gaps from the re-review of 3a8d809: - NEW-A: setup offered Cursor but Ralph init (flow-next-ralph-init) only branched rp/codex/copilot — a Cursor user who inits Ralph got autonomous prompts that never used cursor. Mirrored the copilot handling: HAVE_CURSOR detection + a Cursor option in SKILL.md, and real per-backend cursor branches in prompt_work/plan/completion.md + config.env examples (+ a FLOW_CURSOR_MODEL runtime block). Codex mirror regenerated. - NEW-B: cmd_copilot_check probed with a fresh --resume=<uuid>, which copilot 1.0.65 (resume-only) rejects -> false "auth failed" with valid credentials. The probe now uses --session-id (create), matching run_copilot_exec. Suite 1345 OK; byte-parity; mirror synced. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 513caf135b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Follow-on to the Ralph-init cursor wiring (cursor Bugbot finding): ralph.sh's verify_receipt gate only fired for rp/codex/copilot, so a WORK_REVIEW=cursor (or plan/completion) Ralph loop could mark a task done WITHOUT enforcing a valid review receipt — a safety hole. Added cursor to the 3 gate conditions (plan/work/completion) plus the display labels and the "Sending via …" UI functions. verify_receipt itself is backend-generic (validates the verdict), so no further change. Suite 1345 OK; mirror synced. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 40bef68dba
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…sk/diff) — fn-74 Root fix for the recurring cursor argv-overflow class. cursor delivers its whole review prompt as one positional argv arg (CURSOR_ARGV_PROMPT_MAX=30k); the diff was budgeted (T7) but plan/completion still embedded the FULL spec + task markdown unbounded, so a large spec (30k+) overflowed even with no diff. New fit_cursor_prompt_to_budget() is the final backstop on all 3 cursor handlers: if the assembled prompt exceeds the cap it prepends a "read the .flow spec/task + changed files from disk" header and truncates the embedded body (preserving the trailing rubric/verdict grammar), so cursor reviews of arbitrarily large specs/diffs always fit and the reviewer reads full context from disk instead of failing closed. codex/copilot unchanged (no argv limit); validate/deep-pass excluded (small resumed payloads). Verified: fn-74's own ~21KB spec plan-review 46909->29700 chars, VERDICT=SHIP. Suite 1350 OK (+5 CursorPromptArgvCap tests); byte-parity. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b48a4dc0a4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…— fn-74
fit_cursor_prompt_to_budget passed a prompt through at len <= CURSOR_ARGV_PROMPT_MAX,
but run_cursor_exec rejects len >= the cap — so a prompt of EXACTLY 30000 chars slipped
the backstop and still failed closed ("prompt too large", dropped receipt). Changed the
passthrough to strictly `< CURSOR_ARGV_PROMPT_MAX` so exactly-cap prompts get trimmed to
under it; docstring clarified to match run_cursor_exec's >= rejection. +1 boundary
regression test (exactly-at-cap is trimmed, verdict preserved). Suite 1351 OK; byte-parity.
Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5511fafb53
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…argv cap — fn-74 The general cursor argv backstop (fit_cursor_prompt_to_budget) was applied to the 3 primary review handlers but NOT to _run_validator_pass / _run_deep_pass, which render the findings payload and call run_cursor_exec directly. A verbose findings JSONL could overflow CURSOR_ARGV_PROMPT_MAX and fail the optional validator/deep phase of an otherwise-valid review. Added the fit backstop before both run_cursor_exec calls, so ALL 5 cursor argv dispatches (impl/plan/completion/validate/deep) are now bounded. Suite 1351 OK. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 7556a71. Configure here.
flowctl's `done` writes runtime state to a separate `.state.json` (uncommitted) and never touches the definition, so the committed fn-74 task definitions kept their plan-time `status: todo` even though the work shipped (flowctl reports done via the state merge; `ready` returns none). A fresh clone reads the committed definition, so this advertised the landed cursor-backend tasks as todo/backlog. Set the 4 definitions to done — consistent with the clean landed-spec precedent (e.g. fn-1's task definitions are committed `done`). Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
…letion reviews — fn-74 (T3) Last PR #184 review finding. `flowctl <backend> plan-review <spec>` (and completion-review) ignored a per-spec default_review (set via `spec set-backend`) because the handlers pass task_id=None and resolve_review_spec only discovered default_review via a task->spec lookup. resolve_review_spec gains an optional spec_id; when invoked spec-scoped (no task) it reads the spec's default_review directly (source "epic", same precedence, before env/config). The 3 resolve helpers thread spec_id through (keeping the source-aware coercion — "epic" is honored, not coerced); the 6 plan/completion handlers pass spec_id=epic_id (the 3 impl handlers, which have a real task, are unchanged). Pre-existing across all backends; now fixed uniformly. +2 tests; suite 1353 OK; byte-parity. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
Bugbot couldn't run - usage limit reachedBugbot is counted against Cursor usage for this user or team, and this run hit a usage or spend limit. A user or team admin can review and increase usage limits in the Cursor dashboard. (requestId: serverGenReqId_8e7a788c-6035-444d-900f-a24493db8c7d) |
…v/config — fn-74 Codex retest (dogfooding the final PR state) flagged a Major issue: _resolve_cursor_review_spec honored a stored per-task/per-epic cross-backend `review: codex:...` (source task/epic), but `flowctl cursor` always shells cursor-agent and Cursor's model names are format-specific (`gpt-5.5-high`, not `gpt-5.5`), so it would pass a foreign `--model` and fail — the same problem the explicit `--spec` guard rejects. Cursor now coerces ANY non-cursor resolved spec to the cursor default regardless of source (a `cursor:<model>` spec is still honored). codex/copilot stay lenient (OpenAI-style model names cross over). Test updated (per-task cross-backend honors→coerces). Suite 1353 OK; byte-parity. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
Bugbot couldn't run - usage limit reachedBugbot is counted against Cursor usage for this user or team, and this run hit a usage or spend limit. A user or team admin can review and increase usage limits in the Cursor dashboard. (requestId: serverGenReqId_286a5ece-bfc4-4315-80e5-ff724e9a7723) |
…disk-read cue — fn-74 Codex retest edge: when a huge spec/template leaves no argv budget for the diff, fit_cursor_diff_to_budget dropped it to "" — and if the diff-less prompt still fit under the cap, fit_cursor_prompt_to_budget added no disk-read header, so cursor could review branch changes with no diff AND no cue to read the changed files. The drop case now emits a short read-from-disk pointer in the <diff_content> slot (never empty); if that pushes the prompt over the cap, the prompt-level backstop's header still covers it. +1 test. Suite 1354 OK; byte-parity. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
Bugbot couldn't run - usage limit reachedBugbot is counted against Cursor usage for this user or team, and this run hit a usage or spend limit. A user or team admin can review and increase usage limits in the Cursor dashboard. (requestId: serverGenReqId_bbac9b69-2980-42bf-85df-ac00f25561e0) |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c38f1dc63e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| |------------|------| | ||
| | `codex` | [workflow-codex.md](workflow-codex.md) | | ||
| | `copilot` | [workflow-copilot.md](workflow-copilot.md) | | ||
| | `cursor` | [workflow-cursor.md](workflow-cursor.md) | |
There was a problem hiding this comment.
Route cursor task overrides before dispatch
When a task sets review: cursor:<model> but the project default is still another backend, Phase 0 still derives BACKEND from flowctl review-backend, which only reads env/config and cannot see the task id. This new cursor branch is therefore skipped; the selected codex/copilot/rp workflow runs instead, and the flowctl command can end up using Cursor's model string (for example gpt-5.5-high) with the wrong CLI or ignoring the task override entirely. Resolve the backend with task/spec context before this dispatch, or otherwise force the invoked workflow to match the stored cursor override.
Useful? React with 👍 / 👎.

TL;DR
Adds
cursoras a fourth cross-model review backend (alongsiderp/codex/copilot), shelling out to Cursor'scursor-agentCLI in headless read-only mode (-p --output-format json --trust --mode ask,cwd=repo_root). Reviews are Cursor-billed (your existing subscription, no separate API key) and reach models the others can't in one place —gpt-5.5-high(1M ctx, the default), thegpt-5.3-codexfamily,composer-2.5,claude-opus-4-8-thinking-high. A parity port of thecopilotbackend (fn-28) — no new review features, same verdict grammar, receipt schema, session-resume, and validator/deep-pass shapes — wired through/flow-next:impl-review,/flow-next:plan-review,/flow-next:spec-completion-review, and/flow-next:setup.Spec:
fn-74· 4 tasks, alldone, completion-review SHIP · rebased ontomain@ 2.4.0 · full suite 1334 passing. No version bump (rides the next release; entry under## Unreleased).Acceptance coverage (R1–R14 — all covered)
cursorinBACKEND_REGISTRY/VALID_BACKENDS;review-backendresolves it.1cursor:<model>valid, effort rejected (model-yes/effort-no).1run_cursor_exec(flags,cwd=repo_root, resume-only, JSON parse, timeout, oversized-prompt error).1flowctl cursor check(honorsis_error, not just exit code).1cursor impl-reviewwritesmode:"cursor"receipt, noeffortkey.2plan-review/completion-review/validate/deep-passdispatch.2mode == "cursor".2--mode askasserted + optional live clean-tree smoke test.1flag /.2livecursor+ every--review=…string includes it.3flow-next-setupacceptscursor/cursor:<model>.3test_cursor_run_exec/test_cursor_review_commands/test_backend_spec).1,.2sync-codex.shregenerated; cursor surfaces in the Codex mirror.3.4effortabsent.2Critical changes
plugins/flow-next/scripts/flowctl.py(+ byte-identical.flow/bin/flowctl.py, ~+1.2k) —BACKEND_REGISTRY["cursor"](10 models,efforts: None, defaultgpt-5.5-high);require_cursor/get_cursor_version/run_cursor_exec/_parse_cursor_result;cmd_cursor_check(+--skip-probe); the fivecmd_cursor_*review handlers +elif backend == "cursor"dispatch in the shared validator/deep-pass; and the cursor-only--specguard (rejects a non-cursor spec across resolve/validator/deep).workflow-cursor.md×2 —flow-next-impl-review/andflow-next-spec-completion-review/; plus a Cursor section inflow-next-plan-review/workflow.md, the Phase-0 dispatch rows, and--review=rp|codex|copilot|cursor|noneacross the review skills + command hints.test_cursor_run_exec.py,test_cursor_review_commands.py,test_cursor_clean_tree.py(live, gated oncursor-agent),test_backend_spec.pycursor cases. Suite: 1334 OK / 2 skipped.plugins/flow-next/codex/**regenerated viascripts/sync-codex.sh.docs/flowctl.md(new cursor backend section + config enum),README.md,GLOSSARY.md,docs/skills.md,docs/teams.md,CHANGELOG.md(## Unreleased). Downstream (committed in their own repos): flow-next.dev (cursor row coming→shipped), AI×SDLC, GrowthFactors microsite, Obsidian vault.Decisions & memory
codex|copilot— cursor reviews use the deterministic whitelist by default (zero extra dependency); a cursor user who enablesFLOW_TRIAGE_LLM=1also needs codex/copilot present.--resumeand persists Cursor's generatedsession_id; never fabricate a first-call id.cursor-agentheadless CLI"; this makes that published claim true.bug/integration/adding-a-review-backend-sweep-all-2026-06-29— adding a backend means sweeping every enumeration site (config table, stage lists, prose), not just the obvious lists.Review trail
Each task passed codex impl-review SHIP; codex completion-review SHIP after fixing 3 introduced bugs (
cursor checkignoringis_error; cursor--specaccepting a non-cursor backend; skill examples missing required--base/--files) + 5 regression tests. Later completion-review rounds churned on behaviors verified identical in codex/copilot (the resolve-fallback) — left as parity, not cursor-introduced regressions.Where to look
plugins/flow-next/scripts/flowctl.py—BACKEND_REGISTRYcursor entry,run_cursor_exec,cmd_cursor_*, and the--speccursor guard.plugins/flow-next/skills/flow-next-impl-review/workflow-cursor.md— the per-backend workflow (resume-only,mode:"cursor"receipt, no-effort anti-patterns).plugins/flow-next/tests/test_cursor_review_commands.py— receipt-parity + theis_error/ non-cursor---specregression cases.Generated by
/flow-next:make-prfromfn-74-cursor-review-backend-cursor-agent-cliagainstmain. Parity port of fn-28 (copilot); rebased onto 2.4.0.