feat(review): Cursor review backend (cursor-agent CLI) — fn-74 by gmickel · Pull Request #184 · gmickel/flow-next

gmickel · 2026-06-30T05:10:58Z

TL;DR

Adds cursor as a fourth cross-model review backend (alongside rp / codex / copilot), shelling out to Cursor's cursor-agent CLI in headless read-only mode (-p --output-format json --trust --mode ask, cwd=repo_root). Reviews are Cursor-billed (your existing subscription, no separate API key) and reach models the others can't in one place — gpt-5.5-high (1M ctx, the default), the gpt-5.3-codex family, composer-2.5, claude-opus-4-8-thinking-high. A parity port of the copilot backend (fn-28) — no new review features, same verdict grammar, receipt schema, session-resume, and validator/deep-pass shapes — wired through /flow-next:impl-review, /flow-next:plan-review, /flow-next:spec-completion-review, and /flow-next:setup.

Spec: fn-74 · 4 tasks, all done, completion-review SHIP · rebased onto main @ 2.4.0 · full suite 1334 passing. No version bump (rides the next release; entry under ## Unreleased).

Acceptance coverage (R1–R14 — all covered)

R-ID	What	Task
R1	`cursor` in `BACKEND_REGISTRY`/`VALID_BACKENDS`; `review-backend` resolves it	`.1`
R2	spec grammar: `cursor:<model>` valid, effort rejected (model-yes/effort-no)	`.1`
R3	`run_cursor_exec` (flags, `cwd=repo_root`, resume-only, JSON parse, timeout, oversized-prompt error)	`.1`
R4	`flowctl cursor check` (honors `is_error`, not just exit code)	`.1`
R5	`cursor impl-review` writes `mode:"cursor"` receipt, no `effort` key	`.2`
R6	`plan-review` / `completion-review` / `validate` / `deep-pass` dispatch	`.2`
R7	session resume only when prior receipt `mode == "cursor"`	`.2`
R8	read-only — `--mode ask` asserted + optional live clean-tree smoke test	`.1` flag / `.2` live
R9	skills route `cursor` + every `--review=…` string includes it	`.3`
R10	`flow-next-setup` accepts `cursor` / `cursor:<model>`	`.3`
R11	tests (`test_cursor_run_exec` / `test_cursor_review_commands` / `test_backend_spec`)	`.1`,`.2`
R12	`sync-codex.sh` regenerated; cursor surfaces in the Codex mirror	`.3`
R13	docs chain — repo + flow-next.dev + AI×SDLC + GF + vault	`.4`
R14	impl/completion receipts carry copilot's rigor fields and assert `effort` absent	`.2`

Critical changes

plugins/flow-next/scripts/flowctl.py (+ byte-identical .flow/bin/flowctl.py, ~+1.2k) — BACKEND_REGISTRY["cursor"] (10 models, efforts: None, default gpt-5.5-high); require_cursor / get_cursor_version / run_cursor_exec / _parse_cursor_result; cmd_cursor_check (+--skip-probe); the five cmd_cursor_* review handlers + elif backend == "cursor" dispatch in the shared validator/deep-pass; and the cursor-only --spec guard (rejects a non-cursor spec across resolve/validator/deep).
New workflow-cursor.md ×2 — flow-next-impl-review/ and flow-next-spec-completion-review/; plus a Cursor section in flow-next-plan-review/workflow.md, the Phase-0 dispatch rows, and --review=rp|codex|copilot|cursor|none across the review skills + command hints.
Tests — test_cursor_run_exec.py, test_cursor_review_commands.py, test_cursor_clean_tree.py (live, gated on cursor-agent), test_backend_spec.py cursor cases. Suite: 1334 OK / 2 skipped.
Codex mirror — plugins/flow-next/codex/** regenerated via scripts/sync-codex.sh.
Docs — docs/flowctl.md (new cursor backend section + config enum), README.md, GLOSSARY.md, docs/skills.md, docs/teams.md, CHANGELOG.md (## Unreleased). Downstream (committed in their own repos): flow-next.dev (cursor row coming→shipped), AI×SDLC, GrowthFactors microsite, Obsidian vault.

Decisions & memory

Triage LLM judge stays codex|copilot — cursor reviews use the deterministic whitelist by default (zero extra dependency); a cursor user who enables FLOW_TRIAGE_LLM=1 also needs codex/copilot present.
Session model is resume-only — first call omits --resume and persists Cursor's generated session_id; never fabricate a first-call id.
Doc-drift closed — the GrowthFactors cross-model-review spec already advertised "Cursor via its cursor-agent headless CLI"; this makes that published claim true.
Memory left behind: bug/integration/adding-a-review-backend-sweep-all-2026-06-29 — adding a backend means sweeping every enumeration site (config table, stage lists, prose), not just the obvious lists.

Review trail

Each task passed codex impl-review SHIP; codex completion-review SHIP after fixing 3 introduced bugs (cursor check ignoring is_error; cursor --spec accepting a non-cursor backend; skill examples missing required --base/--files) + 5 regression tests. Later completion-review rounds churned on behaviors verified identical in codex/copilot (the resolve-fallback) — left as parity, not cursor-introduced regressions.

Where to look

plugins/flow-next/scripts/flowctl.py — BACKEND_REGISTRY cursor entry, run_cursor_exec, cmd_cursor_*, and the --spec cursor guard.
plugins/flow-next/skills/flow-next-impl-review/workflow-cursor.md — the per-backend workflow (resume-only, mode:"cursor" receipt, no-effort anti-patterns).
plugins/flow-next/tests/test_cursor_review_commands.py — receipt-parity + the is_error / non-cursor---spec regression cases.

Generated by /flow-next:make-pr from fn-74-cursor-review-backend-cursor-agent-cli against main. Parity port of fn-28 (copilot); rebased onto 2.4.0.

…n-review SHIP 4 sequential tasks (parity port: code → wire → document); flowctl core split into proof (.1) + commands (.2). codex plan-review SHIP after 2 rounds — folded its findings: scoped R1 to review-backend's real sources, explicit large-prompt error (no silent argv read-back), R8 clean-tree moved to a real .2 live test, docs-site version bump release-only, R14 parity limited to rigor fields + effort-absent assertion. Standalone (no spec deps); soft fn-54 workflow*.md note. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ

…+ check + tests - Add `cursor` to BACKEND_REGISTRY (model-yes / effort-no shape, default gpt-5.5-high); VALID_BACKENDS derives; `review-backend` reports cursor from config.json + FLOW_REVIEW_BACKEND (R1, R2) - require_cursor / get_cursor_version / run_cursor_exec — positional-argv prompt (NOT stdin), resume-only session (first call omits --resume, captures minted id), cwd=repo_root, --mode ask --trust, NO --effort, explicit prompt-too-large raise, non-zero on is_error/timeout/CLI-failure (R3) - _parse_cursor_result: single-object + streaming JSON-lines, empty/unparseable → backend failure (never false SHIP) - `cursor check [--skip-probe]` subcommand + cmd_cursor_check → {available, version, authed} text + --json, schema-aligned to copilot (R4) - Tests: test_cursor_run_exec.py (success/is_error/timeout/first-call-omits- resume/resume-passes-id/cwd=repo_root/mode-ask-flag/prompt-too-large) + test_backend_spec.py cursor cases; full suite 1271 green (R11) - Sync byte-identical dogfood copy .flow/bin/flowctl.py Task: fn-74-cursor-review-backend-cursor-agent-cli.1

Task: fn-74-cursor-review-backend-cursor-agent-cli.1

…deep + own-mode mode:cursor receipts — fn-74.2 - 3 handlers (cmd_cursor_impl_review/_plan_review/_completion_review) + _resolve_cursor_review_spec, mirroring copilot but resume-only + no effort - validate/deep-pass dispatch via new 'elif backend == cursor' branches in the shared _run_validator_pass/_run_deep_pass spines + cmd_cursor_validate/ _deep_pass wrappers - receipts: mode:cursor, spec:cursor:<model>, model, NO effort key; carry copilot rigor fields (suppressed/introduced-vs-pre_existing/unaddressed) - own-mode resume guard: resume --resume only when prior receipt mode==cursor (cross-backend receipt ⇒ fresh session; no uuid fabrication — resume-only) - 5 cursor review subcommands wired into the subparser (cursor:<model> spec, no effort); triage --backend choices unchanged (codex|copilot) - tests: test_cursor_review_commands.py (handler/dispatch/resume-guard/rigor parity/effort-absent) + test_cursor_clean_tree.py (R8 live smoke, gated on cursor-agent; ran real review, tree clean) - .flow/bin/flowctl.py kept byte-identical (dogfood parity) Task: fn-74-cursor-review-backend-cursor-agent-cli.2

… — recovered from lost-worker Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ

…flow-cursor.md x2, --review literals, review.backend — fn-74.3 - new workflow-cursor.md in impl-review + spec-completion-review (mirror copilot; model-yes/effort-no, --mode ask read-only, resume-only session, mode==cursor receipt, no effort key) - plan-review: Cursor Backend block (SKILL) + Cursor Backend Workflow section (workflow.md) + anti-patterns - impl-review/spec-completion workflow-common.md: cursor dispatch-table row + cursor) deep-pass/validate branches - cursor added to every --review=rp|codex|copilot|none literal across the 8 hand-edited files (+ backend-at-a-glance, critical-rules, re-review) - flow-next-setup: HAVE_CURSOR detect + Cursor CLI review option + cursor answer-mapping + power-user spec note (cursor:gpt-5.5-high, no :effort) - scripts/sync-codex.sh re-run: 6 codex-mirror copies regenerated; R2 ask-block injection verified clean (genuine ask sites only); validators + full python suite green Task: fn-74-cursor-review-backend-cursor-agent-cli.3 Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ

Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ

…GELOG + skills/teams sweep — fn-74.4 - flowctl.md: cursor in cmd list + new cursor backend section (resume-only sessions, model-yes/effort-no, --mode ask cwd=repo_root, triage note) + review-backend grammar example - README.md: 3 backend lists (adversarial gates, blind-spots, impl-review cmd) add Cursor - GLOSSARY.md: cross-model-review backends add Cursor (cursor) - CHANGELOG.md: ## Unreleased cursor review backend entry (no version bump, batched) - skills.md + teams.md: stale RepoPrompt/Codex/Copilot enumerations add Cursor Task: fn-74-cursor-review-backend-cursor-agent-cli.4

…entry — fn-74.4 Records the fn-74.4 doc sweep (repo + flow-next.dev + AI×SDLC + GF microsite + vault), mirroring the fn-68.6 downstream-coverage precedent. The downstream surfaces are committed in their own repos (out of this repo's reviewable diff). Task: fn-74-cursor-review-backend-cursor-agent-cli.4

…late — fn-74.4 - flowctl.md config table: review.backend was 'rp, codex, none' (stale, omitted copilot too) → 'rp, codex, copilot, cursor, none' + spec-form note (codex impl-review finding) - flowctl.md config-set example comment: same enum fix - setup usage.md template: review.backend comment rp|codex|copilot|none → +cursor Task: fn-74-cursor-review-backend-cursor-agent-cli.4

- teams.md stage-[6] 'Backends:' list rp/codex/copilot/none → +cursor + cursor:gpt-5.5-high spec-form note (codex impl-review finding) Task: fn-74-cursor-review-backend-cursor-agent-cli.4

…r enum — fn-74.4 - sync-codex.sh regen: codex mirror setup usage.md template picks up cursor in review.backend enum (R12) - .flow/usage.md dogfood copy matches canonical template (test_dogfood_template_parity) Task: fn-74-cursor-review-backend-cursor-agent-cli.4

…emory (codex impl-review SHIP) Task: fn-74-cursor-review-backend-cursor-agent-cli.4

…+ skill args) — fn-74 Three introduced findings from codex completion-review: - cmd_cursor_check now honors is_error: a returncode==0 + is_error:true probe is NOT authed (R4) — was checking exit code only. - cursor commands reject a non-cursor --spec: the validator/deep dispatch branches + _resolve_cursor_review_spec now enforce parsed.backend=="cursor", so `cursor impl-review --spec codex:...` errors instead of running cursor-agent with a foreign model + serializing spec:"codex:..." under mode:"cursor" (R5/R6/R14). - skill examples carry required args: cursor impl-review --base, cursor plan-review --files. +5 regression tests; codex mirror regenerated; .flow/bin/flowctl.py byte-identical. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ

…|cursor|none — fn-74 completion-review R9: the two command argument-hints were stale (rp|codex|export, predating copilot); align with spec-completion-review/epic-review. R6 (cursor plan/ completion pass None to the resolve helper) is verified codex+copilot parity — same call sites, not a cursor-introduced regression — left as-is. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ

Round 1: 3 introduced bugs (cursor check is_error; cursor --spec backend guard; skill examples missing --base/--files) → fixed + 5 regression tests. Rounds 2-3 churned on verified codex/copilot PARITY behaviors (plan/completion resolve fallback) + self-contradictory hint guidance — not cursor-introduced regressions. Receipt verdict SHIP; suite green (1291). Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ

cursor · 2026-06-30T05:11:10Z

PR Summary

Medium Risk
Large changes to review gate infrastructure in flowctl (subprocess, sessions, spec resolution, prompt assembly); mistakes could break SHIP/NEEDS_WORK flows or mis-route backends, but there is no auth or payment surface.

Overview
Introduces cursor as a cross-model review backend: BACKEND_REGISTRY, run_cursor_exec (read-only --mode ask, resume-only sessions, JSON parse, argv size limits), flowctl cursor check and the impl/plan/completion/validate/deep-pass commands, with receipts stamped mode: "cursor" and no separate effort field.

Review prompts shift to agentic context — get_embedded_file_contents is removed; codex/copilot/cursor builds rely on diff + specs and instruct reviewers to read the repo from disk. Codex and Copilot subprocess calls set cwd=repo_root so paths resolve consistently.

Copilot session handling is unified on both platforms: first call uses --session-id, later calls --resume (Copilot ≥1.0.61 made --resume resume-only on POSIX too). Auth probe uses --session-id instead of --resume. Default model strings move toward gpt-5.5; Copilot registry drops unavailable gpt-5.2 models.

resolve_review_spec gains optional spec_id for epic-scoped plan/completion reviews, optional return_source, and coercion so explicit flowctl codex|copilot commands don’t inherit a foreign env/config default backend. Cursor always coerces non-cursor resolved specs to the cursor default.

Cursor-only prompt budgeting (fit_cursor_diff_to_budget, fit_cursor_prompt_to_budget) keeps positional argv under ~30k chars. .flow/.gitignore adds pilot-runs/; a memory note documents sweeping all backend enumeration sites when adding backends.

^{Reviewed by Cursor Bugbot for commit c38f1dc. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor · 2026-06-30T05:11:11Z

Bugbot couldn't run - usage limit reached

Bugbot is counted against Cursor usage for this user or team, and this run hit a usage or spend limit.

A user or team admin can review and increase usage limits in the Cursor dashboard.

(requestId: serverGenReqId_fa14b892-d467-432d-a7dc-aa2944bebad1)

…ackend toggle + auto-normalized keys, not part of fn-74 Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ

cursor · 2026-06-30T05:12:21Z

Bugbot couldn't run - usage limit reached

Bugbot is counted against Cursor usage for this user or team, and this run hit a usage or spend limit.

A user or team admin can review and increase usage limits in the Cursor dashboard.

(requestId: serverGenReqId_e0b8c265-fca1-46e8-8a63-67f62f8628f0)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5dbe249b54

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- Coerce the no-`--spec` resolve fallback to cursor: `_resolve_cursor_review_spec` now returns the cursor default when `resolve_review_spec` hands back a non-cursor backend (e.g. config `review.backend=codex`), so an explicit `--review=cursor` never runs cursor-agent with a foreign model or stamps `spec:"codex:"` under `mode:"cursor"`. - Fail closed on oversized prompts via a non-zero return tuple instead of a raised ValueError, so cursor command handlers hit their `exit_code != 0` cleanup (drop stale receipt + structured error) rather than leaking a traceback. - Tests updated (oversized → non-zero return) + 2 fallback-coercion cases. Suite 1336 OK. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ

cursor · 2026-06-30T06:56:56Z

Bugbot couldn't run - usage limit reached

Bugbot is counted against Cursor usage for this user or team, and this run hit a usage or spend limit.

A user or team admin can review and increase usage limits in the Cursor dashboard.

(requestId: serverGenReqId_d180d335-5026-4010-ad8c-e71d18747a52)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6ae2f7e291

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…ts — fn-74 Root cause of PR #184's "prompt too large": cmd_cursor_{impl,plan,completion}_review embedded full changed-file CONTENTS up to a 500KB budget (get_embedded_file_contents, FLOW_CURSOR_EMBED_MAX_BYTES). A changed flowctl.py (~270KB ×2) produced a ~538KB prompt — far over cursor's 30KB positional-argv cap — so cursor reviews failed on any diff touching a non-trivial file, even a tiny one. cursor-agent is AGENTIC: it runs read-only (`--mode ask`) with cwd=repo_root and reads files from disk itself. So we stop embedding file contents (embedded_content="", files_embedded=False across all 3 cursor handlers) and pass the diff + pointers; the reviewer reads what it needs. Validated: a real cursor review now accepts the prompt (previously rejected) — only blocked by the Cursor account usage limit, not argv. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ

cursor · 2026-06-30T07:12:49Z

Bugbot couldn't run - usage limit reached

Bugbot is counted against Cursor usage for this user or team, and this run hit a usage or spend limit.

A user or team admin can review and increase usage limits in the Cursor dashboard.

(requestId: serverGenReqId_a83e7988-6626-429c-bb63-0e1060b494ec)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 46864c8bbb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…— fn-74 Per the agentic-review principle (CLAUDE.md: "the host agent IS the intelligence"; backends run with file access — codex sandbox cwd=repo_root, copilot --add-dir, cursor --mode ask). All three CLI backends now pass the diff + pointers and read changed files from disk instead of embedding up to a 500KB budget. This is the existing budget-overflow "read from disk" path, now always-on: smaller/cheaper prompts, no argv blow-ups, parity with rp (RepoPrompt Builder already selects context agentically). All 9 review sites (codex/copilot/cursor impl/plan/completion) updated; the get_embedded_file_contents helper is now unused (removed in a follow-up). Suite 1336 OK. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ

cursor · 2026-06-30T08:01:56Z

Bugbot couldn't run - usage limit reached

Bugbot is counted against Cursor usage for this user or team, and this run hit a usage or spend limit.

A user or team admin can review and increase usage limits in the Cursor dashboard.

(requestId: serverGenReqId_b4bf2904-1bbe-46d2-a612-991975f2797e)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 136b1e90e3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: be51d52f93

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

… fn-74 (Finding B) Source-aware coercion completing Finding A. `resolve_review_spec` gains an opt-in `return_source` reporting which precedence rung produced the spec (task/epic/env/ config/hint). The codex/copilot/cursor resolve-helpers now coerce a non-self backend to their own default ONLY when it came from an env/config DEFAULT (e.g. `review.backend=rp`), while still honoring a deliberate per-task/per-epic cross-backend `review` spec (documented behavior). Previously codex/copilot passed the config default straight through, so `flowctl copilot impl-review` under `review.backend=rp` stamped `spec:"rp"` / `model:gpt-5.2` on the receipt while running gpt-5.5; cursor already coerced but unconditionally (overrode per-task specs too — now aligned). Verified end-to-end: copilot review under config=rp now records `copilot:gpt-5.5:high` (was `rp`/`gpt-5.2`). +5 tests (return_source tagging; codex/copilot coerce-config + honor-per-task; cursor per-task-honored). Suite 1345 OK; byte-parity; mirror in sync. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 472bfcf1b8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…t hardening — fn-74 Five P2s from the codex auto-reviewer, all verified valid (several exposed by the no-embed course change): - T4: run_codex_exec / run_copilot_exec now run with cwd=repo_root (like cursor) — the no-embed change made reviewers read repo-relative files from disk, which broke when launched from a subdir. run_codex_exec gained a repo_root param; all callers pass it. Verified: codex impl-review from a subdir → SHIP. - T7: cursor embeds the diff in its positional-argv prompt (CURSOR_ARGV_PROMPT_MAX 30k); a static cap couldn't fit because spec/template overhead varies. New fit_cursor_diff_to_budget() sizes the diff to the budget left under the cap (drops it entirely if the diff-less prompt already overflows; cursor reads files from disk). Verified: a 129KB-diff review now fits (29,733 chars) and produces a verdict. - T1: cursor/codex/copilot validate + deep-pass now use the source-aware coercion (were calling resolve_review_spec directly, bypassing Finding B). - T6: Ralph config.env FLOW_COPILOT_MODEL gpt-5.2 -> gpt-5.5 (1.0.65 rejects gpt-5.2); catalog comment realigned to the registry. - T5: /flow-next:work can route cursor/copilot reviews — REVIEW_MODE enum + worker Phase-4 gate + --review allowlist broadened (work routes via configured-backend passthrough to impl-review, which already supports both); sync-codex worker-prompt heredoc updated. Stale (T2: fn-74 tasks already done) + pre-existing (T3: per-spec default_review on plan-review, all backends) replied, not changed. Suite 1345 OK; byte-parity; mirror sync. Known limitation: cursor completion-review of a very-large-spec epic can overflow the argv cap on spec/template alone (separate from diff embedding) — fails closed cleanly. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7f475d84b9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…empty task arg — fn-74 Two findings from the re-review of 7f475d8: - NEW-1 (gap in the T7 dynamic diff-sizing): the re-review preamble was prepended AFTER fit_cursor_diff_to_budget computed the budget from the diff-less prompt, so a RESUMED cursor review could still exceed CURSOR_ARGV_PROMPT_MAX. The impl + completion handlers now detect is_rereview + build the preamble BEFORE sizing and reserve it in the budget base (and prepend it to the final prompt). Verified: first / task-resume / standalone-resume all stay at 29,733 < cap. - NEW-2: the impl-review workflow files passed a quoted empty "$TASK_ID" positional for branch/standalone reviews, which flowctl rejects as "Invalid task ID:" instead of entering standalone mode. All three backend workflows (codex/copilot/cursor) now build the command via an args array that omits the task arg when TASK_ID is empty. Suite 1345 OK; byte-parity; codex mirror synced. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a8b6847f3a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…-review — fn-74 Codex re-review finding: the worker gated on REVIEW_MODE but invoked `/flow-next:impl-review` bare, so a one-off `work --review=<backend>` override that differs from repo config was lost — impl-review re-resolved the backend from config, ignoring the override. The worker now passes `--review=$REVIEW_MODE` on both the initial and the NEEDS_WORK re-invocation (REVIEW_MODE already holds the per-run resolved backend, config OR override). Completes the T5 work-routing fix. Mirror synced; suite 1345 OK. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3a8d809581

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…be — fn-74 Two completeness gaps from the re-review of 3a8d809: - NEW-A: setup offered Cursor but Ralph init (flow-next-ralph-init) only branched rp/codex/copilot — a Cursor user who inits Ralph got autonomous prompts that never used cursor. Mirrored the copilot handling: HAVE_CURSOR detection + a Cursor option in SKILL.md, and real per-backend cursor branches in prompt_work/plan/completion.md + config.env examples (+ a FLOW_CURSOR_MODEL runtime block). Codex mirror regenerated. - NEW-B: cmd_copilot_check probed with a fresh --resume=<uuid>, which copilot 1.0.65 (resume-only) rejects -> false "auth failed" with valid credentials. The probe now uses --session-id (create), matching run_copilot_exec. Suite 1345 OK; byte-parity; mirror synced. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 513caf135b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Follow-on to the Ralph-init cursor wiring (cursor Bugbot finding): ralph.sh's verify_receipt gate only fired for rp/codex/copilot, so a WORK_REVIEW=cursor (or plan/completion) Ralph loop could mark a task done WITHOUT enforcing a valid review receipt — a safety hole. Added cursor to the 3 gate conditions (plan/work/completion) plus the display labels and the "Sending via …" UI functions. verify_receipt itself is backend-generic (validates the verdict), so no further change. Suite 1345 OK; mirror synced. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 40bef68dba

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…sk/diff) — fn-74 Root fix for the recurring cursor argv-overflow class. cursor delivers its whole review prompt as one positional argv arg (CURSOR_ARGV_PROMPT_MAX=30k); the diff was budgeted (T7) but plan/completion still embedded the FULL spec + task markdown unbounded, so a large spec (30k+) overflowed even with no diff. New fit_cursor_prompt_to_budget() is the final backstop on all 3 cursor handlers: if the assembled prompt exceeds the cap it prepends a "read the .flow spec/task + changed files from disk" header and truncates the embedded body (preserving the trailing rubric/verdict grammar), so cursor reviews of arbitrarily large specs/diffs always fit and the reviewer reads full context from disk instead of failing closed. codex/copilot unchanged (no argv limit); validate/deep-pass excluded (small resumed payloads). Verified: fn-74's own ~21KB spec plan-review 46909->29700 chars, VERDICT=SHIP. Suite 1350 OK (+5 CursorPromptArgvCap tests); byte-parity. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b48a4dc0a4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…— fn-74 fit_cursor_prompt_to_budget passed a prompt through at len <= CURSOR_ARGV_PROMPT_MAX, but run_cursor_exec rejects len >= the cap — so a prompt of EXACTLY 30000 chars slipped the backstop and still failed closed ("prompt too large", dropped receipt). Changed the passthrough to strictly `< CURSOR_ARGV_PROMPT_MAX` so exactly-cap prompts get trimmed to under it; docstring clarified to match run_cursor_exec's >= rejection. +1 boundary regression test (exactly-at-cap is trimmed, verdict preserved). Suite 1351 OK; byte-parity. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5511fafb53

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…argv cap — fn-74 The general cursor argv backstop (fit_cursor_prompt_to_budget) was applied to the 3 primary review handlers but NOT to _run_validator_pass / _run_deep_pass, which render the findings payload and call run_cursor_exec directly. A verbose findings JSONL could overflow CURSOR_ARGV_PROMPT_MAX and fail the optional validator/deep phase of an otherwise-valid review. Added the fit backstop before both run_cursor_exec calls, so ALL 5 cursor argv dispatches (impl/plan/completion/validate/deep) are now bounded. Suite 1351 OK. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ

cursor

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 7556a71. Configure here.}

flowctl's `done` writes runtime state to a separate `.state.json` (uncommitted) and never touches the definition, so the committed fn-74 task definitions kept their plan-time `status: todo` even though the work shipped (flowctl reports done via the state merge; `ready` returns none). A fresh clone reads the committed definition, so this advertised the landed cursor-backend tasks as todo/backlog. Set the 4 definitions to done — consistent with the clean landed-spec precedent (e.g. fn-1's task definitions are committed `done`). Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ

…letion reviews — fn-74 (T3) Last PR #184 review finding. `flowctl <backend> plan-review <spec>` (and completion-review) ignored a per-spec default_review (set via `spec set-backend`) because the handlers pass task_id=None and resolve_review_spec only discovered default_review via a task->spec lookup. resolve_review_spec gains an optional spec_id; when invoked spec-scoped (no task) it reads the spec's default_review directly (source "epic", same precedence, before env/config). The 3 resolve helpers thread spec_id through (keeping the source-aware coercion — "epic" is honored, not coerced); the 6 plan/completion handlers pass spec_id=epic_id (the 3 impl handlers, which have a real task, are unchanged). Pre-existing across all backends; now fixed uniformly. +2 tests; suite 1353 OK; byte-parity. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ

cursor · 2026-07-01T00:18:31Z

Bugbot couldn't run - usage limit reached

Bugbot is counted against Cursor usage for this user or team, and this run hit a usage or spend limit.

A user or team admin can review and increase usage limits in the Cursor dashboard.

(requestId: serverGenReqId_8e7a788c-6035-444d-900f-a24493db8c7d)

…v/config — fn-74 Codex retest (dogfooding the final PR state) flagged a Major issue: _resolve_cursor_review_spec honored a stored per-task/per-epic cross-backend `review: codex:...` (source task/epic), but `flowctl cursor` always shells cursor-agent and Cursor's model names are format-specific (`gpt-5.5-high`, not `gpt-5.5`), so it would pass a foreign `--model` and fail — the same problem the explicit `--spec` guard rejects. Cursor now coerces ANY non-cursor resolved spec to the cursor default regardless of source (a `cursor:<model>` spec is still honored). codex/copilot stay lenient (OpenAI-style model names cross over). Test updated (per-task cross-backend honors→coerces). Suite 1353 OK; byte-parity. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ

cursor · 2026-07-01T01:11:55Z

Bugbot couldn't run - usage limit reached

Bugbot is counted against Cursor usage for this user or team, and this run hit a usage or spend limit.

A user or team admin can review and increase usage limits in the Cursor dashboard.

(requestId: serverGenReqId_286a5ece-bfc4-4315-80e5-ff724e9a7723)

…disk-read cue — fn-74 Codex retest edge: when a huge spec/template leaves no argv budget for the diff, fit_cursor_diff_to_budget dropped it to "" — and if the diff-less prompt still fit under the cap, fit_cursor_prompt_to_budget added no disk-read header, so cursor could review branch changes with no diff AND no cue to read the changed files. The drop case now emits a short read-from-disk pointer in the <diff_content> slot (never empty); if that pushes the prompt over the cap, the prompt-level backstop's header still covers it. +1 test. Suite 1354 OK; byte-parity. Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ

cursor · 2026-07-01T01:23:36Z

Bugbot couldn't run - usage limit reached

Bugbot is counted against Cursor usage for this user or team, and this run hit a usage or spend limit.

A user or team admin can review and increase usage limits in the Cursor dashboard.

(requestId: serverGenReqId_bbac9b69-2980-42bf-85df-ac00f25561e0)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c38f1dc63e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-07-01T01:35:51Z

 |------------|------|
 | `codex`    | [workflow-codex.md](workflow-codex.md) |
 | `copilot`  | [workflow-copilot.md](workflow-copilot.md) |
+| `cursor`   | [workflow-cursor.md](workflow-cursor.md) |


Route cursor task overrides before dispatch

When a task sets review: cursor:<model> but the project default is still another backend, Phase 0 still derives BACKEND from flowctl review-backend, which only reads env/config and cannot see the task id. This new cursor branch is therefore skipped; the selected codex/copilot/rp workflow runs instead, and the flowctl command can end up using Cursor's model string (for example gpt-5.5-high) with the wrong CLI or ignoring the task override entirely. Resolve the backend with task/spec context before this dispatch, or otherwise force the invoked workflow to match the stored cursor override.

Useful? React with 👍 / 👎.

gmickel added 16 commits June 30, 2026 00:07

chore(flow): fn-74.1 done-summary + evidence (codex impl-review SHIP)

af7008a

Task: fn-74-cursor-review-backend-cursor-agent-cli.1

chore(flow): fn-74.2 done-summary + evidence (codex impl-review SHIP)…

0dd4205

… — recovered from lost-worker Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ

chore(flow): fn-74.3 done-summary + evidence (codex impl-review SHIP)

3d58677

Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ

docs(review): teams.md backend enumeration adds cursor — fn-74.4

a468214

- teams.md stage-[6] 'Backends:' list rp/codex/copilot/none → +cursor + cursor:gpt-5.5-high spec-form note (codex impl-review finding) Task: fn-74-cursor-review-backend-cursor-agent-cli.4

chore(flow): fn-74.4 done-summary + evidence + review-backend sweep m…

ca3511d

…emory (codex impl-review SHIP) Task: fn-74-cursor-review-backend-cursor-agent-cli.4

chore(flow): drop incidental .flow/config.json drift — local review.b…

6e502a9

…ackend toggle + auto-normalized keys, not part of fn-74 Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ

chatgpt-codex-connector Bot reviewed Jun 30, 2026

View reviewed changes

Comment thread plugins/flow-next/scripts/flowctl.py Outdated

Comment thread plugins/flow-next/scripts/flowctl.py Outdated

chatgpt-codex-connector Bot reviewed Jun 30, 2026

View reviewed changes

Comment thread plugins/flow-next/scripts/flowctl.py Outdated

chatgpt-codex-connector Bot reviewed Jun 30, 2026

View reviewed changes

Comment thread .flow/tasks/fn-74-cursor-review-backend-cursor-agent-cli.1.json Outdated

chatgpt-codex-connector Bot reviewed Jun 30, 2026

View reviewed changes

Comment thread plugins/flow-next/scripts/flowctl.py Outdated

chatgpt-codex-connector Bot reviewed Jun 30, 2026

View reviewed changes

Comment thread plugins/flow-next/skills/flow-next-setup/workflow.md

Comment thread plugins/flow-next/skills/flow-next-ralph-init/templates/config.env Outdated

chatgpt-codex-connector Bot reviewed Jun 30, 2026

View reviewed changes

Comment thread plugins/flow-next/scripts/flowctl.py Outdated

cursor Bot reviewed Jun 30, 2026

View reviewed changes

Comment thread plugins/flow-next/scripts/flowctl.py Outdated

chatgpt-codex-connector Bot reviewed Jun 30, 2026

View reviewed changes

Comment thread plugins/flow-next/skills/flow-next-impl-review/workflow-cursor.md Outdated

chatgpt-codex-connector Bot reviewed Jun 30, 2026

View reviewed changes

Comment thread plugins/flow-next/agents/worker.md Outdated

chatgpt-codex-connector Bot reviewed Jun 30, 2026

View reviewed changes

Comment thread plugins/flow-next/skills/flow-next-setup/workflow.md

Comment thread plugins/flow-next/scripts/flowctl.py

cursor Bot reviewed Jun 30, 2026

View reviewed changes

Comment thread plugins/flow-next/skills/flow-next-ralph-init/SKILL.md

chatgpt-codex-connector Bot reviewed Jun 30, 2026

View reviewed changes

Comment thread plugins/flow-next/skills/flow-next-ralph-init/templates/config.env

chatgpt-codex-connector Bot reviewed Jun 30, 2026

View reviewed changes

Comment thread plugins/flow-next/scripts/flowctl.py

cursor Bot reviewed Jun 30, 2026

View reviewed changes

Comment thread plugins/flow-next/scripts/flowctl.py

chatgpt-codex-connector Bot reviewed Jun 30, 2026

View reviewed changes

Comment thread plugins/flow-next/scripts/flowctl.py

chatgpt-codex-connector Bot reviewed Jun 30, 2026

View reviewed changes

Comment thread .flow/tasks/fn-74-cursor-review-backend-cursor-agent-cli.1.json Outdated

cursor Bot reviewed Jun 30, 2026

View reviewed changes

Comment thread plugins/flow-next/scripts/flowctl.py

gmickel added 2 commits July 1, 2026 01:29

chatgpt-codex-connector Bot reviewed Jul 1, 2026

View reviewed changes

Uh oh!

Conversation

gmickel commented Jun 30, 2026

TL;DR

Acceptance coverage (R1–R14 — all covered)

Critical changes

Decisions & memory

Review trail

Where to look

Uh oh!

cursor Bot commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

cursor Bot commented Jun 30, 2026

Bugbot couldn't run - usage limit reached

Uh oh!

cursor Bot commented Jun 30, 2026

Bugbot couldn't run - usage limit reached

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

cursor Bot commented Jun 30, 2026

Bugbot couldn't run - usage limit reached

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

cursor Bot commented Jun 30, 2026

Bugbot couldn't run - usage limit reached

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

cursor Bot commented Jun 30, 2026

Bugbot couldn't run - usage limit reached

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

cursor Bot commented Jun 30, 2026 •

edited

Loading