Skip to content

feat(review): Cursor review backend (cursor-agent CLI) — fn-74#184

Open
gmickel wants to merge 35 commits into
mainfrom
fn-74-cursor-review-backend-cursor-agent-cli
Open

feat(review): Cursor review backend (cursor-agent CLI) — fn-74#184
gmickel wants to merge 35 commits into
mainfrom
fn-74-cursor-review-backend-cursor-agent-cli

Conversation

@gmickel

@gmickel gmickel commented Jun 30, 2026

Copy link
Copy Markdown
Owner

TL;DR

Adds cursor as a fourth cross-model review backend (alongside rp / codex / copilot), shelling out to Cursor's cursor-agent CLI in headless read-only mode (-p --output-format json --trust --mode ask, cwd=repo_root). Reviews are Cursor-billed (your existing subscription, no separate API key) and reach models the others can't in one place — gpt-5.5-high (1M ctx, the default), the gpt-5.3-codex family, composer-2.5, claude-opus-4-8-thinking-high. A parity port of the copilot backend (fn-28) — no new review features, same verdict grammar, receipt schema, session-resume, and validator/deep-pass shapes — wired through /flow-next:impl-review, /flow-next:plan-review, /flow-next:spec-completion-review, and /flow-next:setup.

Spec: fn-74 · 4 tasks, all done, completion-review SHIP · rebased onto main @ 2.4.0 · full suite 1334 passing. No version bump (rides the next release; entry under ## Unreleased).

Acceptance coverage (R1–R14 — all covered)

R-ID What Task
R1 cursor in BACKEND_REGISTRY/VALID_BACKENDS; review-backend resolves it .1
R2 spec grammar: cursor:<model> valid, effort rejected (model-yes/effort-no) .1
R3 run_cursor_exec (flags, cwd=repo_root, resume-only, JSON parse, timeout, oversized-prompt error) .1
R4 flowctl cursor check (honors is_error, not just exit code) .1
R5 cursor impl-review writes mode:"cursor" receipt, no effort key .2
R6 plan-review / completion-review / validate / deep-pass dispatch .2
R7 session resume only when prior receipt mode == "cursor" .2
R8 read-only — --mode ask asserted + optional live clean-tree smoke test .1 flag / .2 live
R9 skills route cursor + every --review=… string includes it .3
R10 flow-next-setup accepts cursor / cursor:<model> .3
R11 tests (test_cursor_run_exec / test_cursor_review_commands / test_backend_spec) .1,.2
R12 sync-codex.sh regenerated; cursor surfaces in the Codex mirror .3
R13 docs chain — repo + flow-next.dev + AI×SDLC + GF + vault .4
R14 impl/completion receipts carry copilot's rigor fields and assert effort absent .2

Critical changes

  • plugins/flow-next/scripts/flowctl.py (+ byte-identical .flow/bin/flowctl.py, ~+1.2k) — BACKEND_REGISTRY["cursor"] (10 models, efforts: None, default gpt-5.5-high); require_cursor / get_cursor_version / run_cursor_exec / _parse_cursor_result; cmd_cursor_check (+--skip-probe); the five cmd_cursor_* review handlers + elif backend == "cursor" dispatch in the shared validator/deep-pass; and the cursor-only --spec guard (rejects a non-cursor spec across resolve/validator/deep).
  • New workflow-cursor.md ×2flow-next-impl-review/ and flow-next-spec-completion-review/; plus a Cursor section in flow-next-plan-review/workflow.md, the Phase-0 dispatch rows, and --review=rp|codex|copilot|cursor|none across the review skills + command hints.
  • Teststest_cursor_run_exec.py, test_cursor_review_commands.py, test_cursor_clean_tree.py (live, gated on cursor-agent), test_backend_spec.py cursor cases. Suite: 1334 OK / 2 skipped.
  • Codex mirrorplugins/flow-next/codex/** regenerated via scripts/sync-codex.sh.
  • Docsdocs/flowctl.md (new cursor backend section + config enum), README.md, GLOSSARY.md, docs/skills.md, docs/teams.md, CHANGELOG.md (## Unreleased). Downstream (committed in their own repos): flow-next.dev (cursor row coming→shipped), AI×SDLC, GrowthFactors microsite, Obsidian vault.

Decisions & memory

  • Triage LLM judge stays codex|copilot — cursor reviews use the deterministic whitelist by default (zero extra dependency); a cursor user who enables FLOW_TRIAGE_LLM=1 also needs codex/copilot present.
  • Session model is resume-only — first call omits --resume and persists Cursor's generated session_id; never fabricate a first-call id.
  • Doc-drift closed — the GrowthFactors cross-model-review spec already advertised "Cursor via its cursor-agent headless CLI"; this makes that published claim true.
  • Memory left behind: bug/integration/adding-a-review-backend-sweep-all-2026-06-29 — adding a backend means sweeping every enumeration site (config table, stage lists, prose), not just the obvious lists.

Review trail

Each task passed codex impl-review SHIP; codex completion-review SHIP after fixing 3 introduced bugs (cursor check ignoring is_error; cursor --spec accepting a non-cursor backend; skill examples missing required --base/--files) + 5 regression tests. Later completion-review rounds churned on behaviors verified identical in codex/copilot (the resolve-fallback) — left as parity, not cursor-introduced regressions.

Where to look

  1. plugins/flow-next/scripts/flowctl.pyBACKEND_REGISTRY cursor entry, run_cursor_exec, cmd_cursor_*, and the --spec cursor guard.
  2. plugins/flow-next/skills/flow-next-impl-review/workflow-cursor.md — the per-backend workflow (resume-only, mode:"cursor" receipt, no-effort anti-patterns).
  3. plugins/flow-next/tests/test_cursor_review_commands.py — receipt-parity + the is_error / non-cursor---spec regression cases.

Generated by /flow-next:make-pr from fn-74-cursor-review-backend-cursor-agent-cli against main. Parity port of fn-28 (copilot); rebased onto 2.4.0.

gmickel added 16 commits June 30, 2026 00:07
…n-review SHIP

4 sequential tasks (parity port: code → wire → document); flowctl core split
into proof (.1) + commands (.2). codex plan-review SHIP after 2 rounds — folded
its findings: scoped R1 to review-backend's real sources, explicit large-prompt
error (no silent argv read-back), R8 clean-tree moved to a real .2 live test,
docs-site version bump release-only, R14 parity limited to rigor fields +
effort-absent assertion. Standalone (no spec deps); soft fn-54 workflow*.md note.

Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
…+ check + tests

- Add `cursor` to BACKEND_REGISTRY (model-yes / effort-no shape, default
  gpt-5.5-high); VALID_BACKENDS derives; `review-backend` reports cursor from
  config.json + FLOW_REVIEW_BACKEND (R1, R2)
- require_cursor / get_cursor_version / run_cursor_exec — positional-argv prompt
  (NOT stdin), resume-only session (first call omits --resume, captures minted
  id), cwd=repo_root, --mode ask --trust, NO --effort, explicit prompt-too-large
  raise, non-zero on is_error/timeout/CLI-failure (R3)
- _parse_cursor_result: single-object + streaming JSON-lines, empty/unparseable
  → backend failure (never false SHIP)
- `cursor check [--skip-probe]` subcommand + cmd_cursor_check → {available,
  version, authed} text + --json, schema-aligned to copilot (R4)
- Tests: test_cursor_run_exec.py (success/is_error/timeout/first-call-omits-
  resume/resume-passes-id/cwd=repo_root/mode-ask-flag/prompt-too-large) +
  test_backend_spec.py cursor cases; full suite 1271 green (R11)
- Sync byte-identical dogfood copy .flow/bin/flowctl.py

Task: fn-74-cursor-review-backend-cursor-agent-cli.1
Task: fn-74-cursor-review-backend-cursor-agent-cli.1
…deep + own-mode mode:cursor receipts — fn-74.2

- 3 handlers (cmd_cursor_impl_review/_plan_review/_completion_review) +
  _resolve_cursor_review_spec, mirroring copilot but resume-only + no effort
- validate/deep-pass dispatch via new 'elif backend == cursor' branches in
  the shared _run_validator_pass/_run_deep_pass spines + cmd_cursor_validate/
  _deep_pass wrappers
- receipts: mode:cursor, spec:cursor:<model>, model, NO effort key; carry
  copilot rigor fields (suppressed/introduced-vs-pre_existing/unaddressed)
- own-mode resume guard: resume --resume only when prior receipt mode==cursor
  (cross-backend receipt ⇒ fresh session; no uuid fabrication — resume-only)
- 5 cursor review subcommands wired into the subparser (cursor:<model> spec,
  no effort); triage --backend choices unchanged (codex|copilot)
- tests: test_cursor_review_commands.py (handler/dispatch/resume-guard/rigor
  parity/effort-absent) + test_cursor_clean_tree.py (R8 live smoke, gated on
  cursor-agent; ran real review, tree clean)
- .flow/bin/flowctl.py kept byte-identical (dogfood parity)

Task: fn-74-cursor-review-backend-cursor-agent-cli.2
…flow-cursor.md x2, --review literals, review.backend — fn-74.3

- new workflow-cursor.md in impl-review + spec-completion-review (mirror copilot; model-yes/effort-no, --mode ask read-only, resume-only session, mode==cursor receipt, no effort key)
- plan-review: Cursor Backend block (SKILL) + Cursor Backend Workflow section (workflow.md) + anti-patterns
- impl-review/spec-completion workflow-common.md: cursor dispatch-table row + cursor) deep-pass/validate branches
- cursor added to every --review=rp|codex|copilot|none literal across the 8 hand-edited files (+ backend-at-a-glance, critical-rules, re-review)
- flow-next-setup: HAVE_CURSOR detect + Cursor CLI review option + cursor answer-mapping + power-user spec note (cursor:gpt-5.5-high, no :effort)
- scripts/sync-codex.sh re-run: 6 codex-mirror copies regenerated; R2 ask-block injection verified clean (genuine ask sites only); validators + full python suite green

Task: fn-74-cursor-review-backend-cursor-agent-cli.3

Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
…GELOG + skills/teams sweep — fn-74.4

- flowctl.md: cursor in cmd list + new cursor backend section (resume-only sessions, model-yes/effort-no, --mode ask cwd=repo_root, triage note) + review-backend grammar example
- README.md: 3 backend lists (adversarial gates, blind-spots, impl-review cmd) add Cursor
- GLOSSARY.md: cross-model-review backends add Cursor (cursor)
- CHANGELOG.md: ## Unreleased cursor review backend entry (no version bump, batched)
- skills.md + teams.md: stale RepoPrompt/Codex/Copilot enumerations add Cursor

Task: fn-74-cursor-review-backend-cursor-agent-cli.4
…entry — fn-74.4

Records the fn-74.4 doc sweep (repo + flow-next.dev + AI×SDLC + GF microsite + vault),
mirroring the fn-68.6 downstream-coverage precedent. The downstream surfaces are
committed in their own repos (out of this repo's reviewable diff).

Task: fn-74-cursor-review-backend-cursor-agent-cli.4
…late — fn-74.4

- flowctl.md config table: review.backend was 'rp, codex, none' (stale, omitted copilot too) → 'rp, codex, copilot, cursor, none' + spec-form note (codex impl-review finding)
- flowctl.md config-set example comment: same enum fix
- setup usage.md template: review.backend comment rp|codex|copilot|none → +cursor

Task: fn-74-cursor-review-backend-cursor-agent-cli.4
- teams.md stage-[6] 'Backends:' list rp/codex/copilot/none → +cursor + cursor:gpt-5.5-high spec-form note (codex impl-review finding)

Task: fn-74-cursor-review-backend-cursor-agent-cli.4
…r enum — fn-74.4

- sync-codex.sh regen: codex mirror setup usage.md template picks up cursor in review.backend enum (R12)
- .flow/usage.md dogfood copy matches canonical template (test_dogfood_template_parity)

Task: fn-74-cursor-review-backend-cursor-agent-cli.4
…emory (codex impl-review SHIP)

Task: fn-74-cursor-review-backend-cursor-agent-cli.4
…+ skill args) — fn-74

Three introduced findings from codex completion-review:
- cmd_cursor_check now honors is_error: a returncode==0 + is_error:true probe
  is NOT authed (R4) — was checking exit code only.
- cursor commands reject a non-cursor --spec: the validator/deep dispatch
  branches + _resolve_cursor_review_spec now enforce parsed.backend=="cursor",
  so `cursor impl-review --spec codex:...` errors instead of running cursor-agent
  with a foreign model + serializing spec:"codex:..." under mode:"cursor" (R5/R6/R14).
- skill examples carry required args: cursor impl-review --base, cursor plan-review --files.
+5 regression tests; codex mirror regenerated; .flow/bin/flowctl.py byte-identical.

Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
…|cursor|none — fn-74

completion-review R9: the two command argument-hints were stale (rp|codex|export,
predating copilot); align with spec-completion-review/epic-review. R6 (cursor plan/
completion pass None to the resolve helper) is verified codex+copilot parity — same
call sites, not a cursor-introduced regression — left as-is.

Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
Round 1: 3 introduced bugs (cursor check is_error; cursor --spec backend guard;
skill examples missing --base/--files) → fixed + 5 regression tests.
Rounds 2-3 churned on verified codex/copilot PARITY behaviors (plan/completion
resolve fallback) + self-contradictory hint guidance — not cursor-introduced
regressions. Receipt verdict SHIP; suite green (1291).

Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
@cursor

cursor Bot commented Jun 30, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Large changes to review gate infrastructure in flowctl (subprocess, sessions, spec resolution, prompt assembly); mistakes could break SHIP/NEEDS_WORK flows or mis-route backends, but there is no auth or payment surface.

Overview
Introduces cursor as a cross-model review backend: BACKEND_REGISTRY, run_cursor_exec (read-only --mode ask, resume-only sessions, JSON parse, argv size limits), flowctl cursor check and the impl/plan/completion/validate/deep-pass commands, with receipts stamped mode: "cursor" and no separate effort field.

Review prompts shift to agentic contextget_embedded_file_contents is removed; codex/copilot/cursor builds rely on diff + specs and instruct reviewers to read the repo from disk. Codex and Copilot subprocess calls set cwd=repo_root so paths resolve consistently.

Copilot session handling is unified on both platforms: first call uses --session-id, later calls --resume (Copilot ≥1.0.61 made --resume resume-only on POSIX too). Auth probe uses --session-id instead of --resume. Default model strings move toward gpt-5.5; Copilot registry drops unavailable gpt-5.2 models.

resolve_review_spec gains optional spec_id for epic-scoped plan/completion reviews, optional return_source, and coercion so explicit flowctl codex|copilot commands don’t inherit a foreign env/config default backend. Cursor always coerces non-cursor resolved specs to the cursor default.

Cursor-only prompt budgeting (fit_cursor_diff_to_budget, fit_cursor_prompt_to_budget) keeps positional argv under ~30k chars. .flow/.gitignore adds pilot-runs/; a memory note documents sweeping all backend enumeration sites when adding backends.

Reviewed by Cursor Bugbot for commit c38f1dc. Bugbot is set up for automated code reviews on this repo. Configure here.

@cursor

cursor Bot commented Jun 30, 2026

Copy link
Copy Markdown

Bugbot couldn't run - usage limit reached

Bugbot is counted against Cursor usage for this user or team, and this run hit a usage or spend limit.

A user or team admin can review and increase usage limits in the Cursor dashboard.

(requestId: serverGenReqId_fa14b892-d467-432d-a7dc-aa2944bebad1)

@cursor

cursor Bot commented Jun 30, 2026

Copy link
Copy Markdown

Bugbot couldn't run - usage limit reached

Bugbot is counted against Cursor usage for this user or team, and this run hit a usage or spend limit.

A user or team admin can review and increase usage limits in the Cursor dashboard.

(requestId: serverGenReqId_e0b8c265-fca1-46e8-8a63-67f62f8628f0)

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5dbe249b54

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/flow-next/scripts/flowctl.py Outdated
Comment thread plugins/flow-next/scripts/flowctl.py Outdated
- Coerce the no-`--spec` resolve fallback to cursor: `_resolve_cursor_review_spec`
  now returns the cursor default when `resolve_review_spec` hands back a non-cursor
  backend (e.g. config `review.backend=codex`), so an explicit `--review=cursor`
  never runs cursor-agent with a foreign model or stamps `spec:"codex:"` under
  `mode:"cursor"`.
- Fail closed on oversized prompts via a non-zero return tuple instead of a raised
  ValueError, so cursor command handlers hit their `exit_code != 0` cleanup (drop
  stale receipt + structured error) rather than leaking a traceback.
- Tests updated (oversized → non-zero return) + 2 fallback-coercion cases. Suite 1336 OK.

Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
@cursor

cursor Bot commented Jun 30, 2026

Copy link
Copy Markdown

Bugbot couldn't run - usage limit reached

Bugbot is counted against Cursor usage for this user or team, and this run hit a usage or spend limit.

A user or team admin can review and increase usage limits in the Cursor dashboard.

(requestId: serverGenReqId_d180d335-5026-4010-ad8c-e71d18747a52)

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6ae2f7e291

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/flow-next/scripts/flowctl.py Outdated
…ts — fn-74

Root cause of PR #184's "prompt too large": cmd_cursor_{impl,plan,completion}_review
embedded full changed-file CONTENTS up to a 500KB budget (get_embedded_file_contents,
FLOW_CURSOR_EMBED_MAX_BYTES). A changed flowctl.py (~270KB ×2) produced a ~538KB
prompt — far over cursor's 30KB positional-argv cap — so cursor reviews failed on
any diff touching a non-trivial file, even a tiny one.

cursor-agent is AGENTIC: it runs read-only (`--mode ask`) with cwd=repo_root and
reads files from disk itself. So we stop embedding file contents (embedded_content="",
files_embedded=False across all 3 cursor handlers) and pass the diff + pointers; the
reviewer reads what it needs. Validated: a real cursor review now accepts the prompt
(previously rejected) — only blocked by the Cursor account usage limit, not argv.

Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
@cursor

cursor Bot commented Jun 30, 2026

Copy link
Copy Markdown

Bugbot couldn't run - usage limit reached

Bugbot is counted against Cursor usage for this user or team, and this run hit a usage or spend limit.

A user or team admin can review and increase usage limits in the Cursor dashboard.

(requestId: serverGenReqId_a83e7988-6626-429c-bb63-0e1060b494ec)

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 46864c8bbb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .flow/tasks/fn-74-cursor-review-backend-cursor-agent-cli.1.json Outdated
…— fn-74

Per the agentic-review principle (CLAUDE.md: "the host agent IS the intelligence";
backends run with file access — codex sandbox cwd=repo_root, copilot --add-dir,
cursor --mode ask). All three CLI backends now pass the diff + pointers and read
changed files from disk instead of embedding up to a 500KB budget. This is the
existing budget-overflow "read from disk" path, now always-on: smaller/cheaper
prompts, no argv blow-ups, parity with rp (RepoPrompt Builder already selects
context agentically). All 9 review sites (codex/copilot/cursor impl/plan/completion)
updated; the get_embedded_file_contents helper is now unused (removed in a follow-up).
Suite 1336 OK.

Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
@cursor

cursor Bot commented Jun 30, 2026

Copy link
Copy Markdown

Bugbot couldn't run - usage limit reached

Bugbot is counted against Cursor usage for this user or team, and this run hit a usage or spend limit.

A user or team admin can review and increase usage limits in the Cursor dashboard.

(requestId: serverGenReqId_b4bf2904-1bbe-46d2-a612-991975f2797e)

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 136b1e90e3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/flow-next/scripts/flowctl.py Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: be51d52f93

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/flow-next/skills/flow-next-setup/workflow.md
Comment thread plugins/flow-next/skills/flow-next-ralph-init/templates/config.env Outdated
… fn-74 (Finding B)

Source-aware coercion completing Finding A. `resolve_review_spec` gains an opt-in
`return_source` reporting which precedence rung produced the spec (task/epic/env/
config/hint). The codex/copilot/cursor resolve-helpers now coerce a non-self backend
to their own default ONLY when it came from an env/config DEFAULT (e.g.
`review.backend=rp`), while still honoring a deliberate per-task/per-epic cross-backend
`review` spec (documented behavior). Previously codex/copilot passed the config default
straight through, so `flowctl copilot impl-review` under `review.backend=rp` stamped
`spec:"rp"` / `model:gpt-5.2` on the receipt while running gpt-5.5; cursor already
coerced but unconditionally (overrode per-task specs too — now aligned).

Verified end-to-end: copilot review under config=rp now records `copilot:gpt-5.5:high`
(was `rp`/`gpt-5.2`). +5 tests (return_source tagging; codex/copilot coerce-config +
honor-per-task; cursor per-task-honored). Suite 1345 OK; byte-parity; mirror in sync.

Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 472bfcf1b8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/flow-next/scripts/flowctl.py Outdated
…t hardening — fn-74

Five P2s from the codex auto-reviewer, all verified valid (several exposed by the
no-embed course change):
- T4: run_codex_exec / run_copilot_exec now run with cwd=repo_root (like cursor) — the
  no-embed change made reviewers read repo-relative files from disk, which broke when
  launched from a subdir. run_codex_exec gained a repo_root param; all callers pass it.
  Verified: codex impl-review from a subdir → SHIP.
- T7: cursor embeds the diff in its positional-argv prompt (CURSOR_ARGV_PROMPT_MAX 30k);
  a static cap couldn't fit because spec/template overhead varies. New
  fit_cursor_diff_to_budget() sizes the diff to the budget left under the cap (drops it
  entirely if the diff-less prompt already overflows; cursor reads files from disk).
  Verified: a 129KB-diff review now fits (29,733 chars) and produces a verdict.
- T1: cursor/codex/copilot validate + deep-pass now use the source-aware coercion (were
  calling resolve_review_spec directly, bypassing Finding B).
- T6: Ralph config.env FLOW_COPILOT_MODEL gpt-5.2 -> gpt-5.5 (1.0.65 rejects gpt-5.2);
  catalog comment realigned to the registry.
- T5: /flow-next:work can route cursor/copilot reviews — REVIEW_MODE enum + worker
  Phase-4 gate + --review allowlist broadened (work routes via configured-backend
  passthrough to impl-review, which already supports both); sync-codex worker-prompt
  heredoc updated.

Stale (T2: fn-74 tasks already done) + pre-existing (T3: per-spec default_review on
plan-review, all backends) replied, not changed. Suite 1345 OK; byte-parity; mirror sync.

Known limitation: cursor completion-review of a very-large-spec epic can overflow the
argv cap on spec/template alone (separate from diff embedding) — fails closed cleanly.

Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
Comment thread plugins/flow-next/scripts/flowctl.py Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7f475d84b9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/flow-next/skills/flow-next-impl-review/workflow-cursor.md Outdated
…empty task arg — fn-74

Two findings from the re-review of 7f475d8:
- NEW-1 (gap in the T7 dynamic diff-sizing): the re-review preamble was prepended AFTER
  fit_cursor_diff_to_budget computed the budget from the diff-less prompt, so a RESUMED
  cursor review could still exceed CURSOR_ARGV_PROMPT_MAX. The impl + completion handlers
  now detect is_rereview + build the preamble BEFORE sizing and reserve it in the budget
  base (and prepend it to the final prompt). Verified: first / task-resume /
  standalone-resume all stay at 29,733 < cap.
- NEW-2: the impl-review workflow files passed a quoted empty "$TASK_ID" positional for
  branch/standalone reviews, which flowctl rejects as "Invalid task ID:" instead of
  entering standalone mode. All three backend workflows (codex/copilot/cursor) now build
  the command via an args array that omits the task arg when TASK_ID is empty.

Suite 1345 OK; byte-parity; codex mirror synced.

Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a8b6847f3a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/flow-next/agents/worker.md Outdated
…-review — fn-74

Codex re-review finding: the worker gated on REVIEW_MODE but invoked
`/flow-next:impl-review` bare, so a one-off `work --review=<backend>` override that
differs from repo config was lost — impl-review re-resolved the backend from config,
ignoring the override. The worker now passes `--review=$REVIEW_MODE` on both the initial
and the NEEDS_WORK re-invocation (REVIEW_MODE already holds the per-run resolved backend,
config OR override). Completes the T5 work-routing fix. Mirror synced; suite 1345 OK.

Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3a8d809581

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/flow-next/skills/flow-next-setup/workflow.md
Comment thread plugins/flow-next/scripts/flowctl.py
…be — fn-74

Two completeness gaps from the re-review of 3a8d809:
- NEW-A: setup offered Cursor but Ralph init (flow-next-ralph-init) only branched
  rp/codex/copilot — a Cursor user who inits Ralph got autonomous prompts that never
  used cursor. Mirrored the copilot handling: HAVE_CURSOR detection + a Cursor option
  in SKILL.md, and real per-backend cursor branches in prompt_work/plan/completion.md +
  config.env examples (+ a FLOW_CURSOR_MODEL runtime block). Codex mirror regenerated.
- NEW-B: cmd_copilot_check probed with a fresh --resume=<uuid>, which copilot 1.0.65
  (resume-only) rejects -> false "auth failed" with valid credentials. The probe now
  uses --session-id (create), matching run_copilot_exec.

Suite 1345 OK; byte-parity; mirror synced.

Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
Comment thread plugins/flow-next/skills/flow-next-ralph-init/SKILL.md

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 513caf135b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/flow-next/skills/flow-next-ralph-init/templates/config.env
Follow-on to the Ralph-init cursor wiring (cursor Bugbot finding): ralph.sh's
verify_receipt gate only fired for rp/codex/copilot, so a WORK_REVIEW=cursor (or
plan/completion) Ralph loop could mark a task done WITHOUT enforcing a valid review
receipt — a safety hole. Added cursor to the 3 gate conditions (plan/work/completion)
plus the display labels and the "Sending via …" UI functions. verify_receipt itself is
backend-generic (validates the verdict), so no further change. Suite 1345 OK; mirror synced.

Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 40bef68dba

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/flow-next/scripts/flowctl.py
…sk/diff) — fn-74

Root fix for the recurring cursor argv-overflow class. cursor delivers its whole review
prompt as one positional argv arg (CURSOR_ARGV_PROMPT_MAX=30k); the diff was budgeted
(T7) but plan/completion still embedded the FULL spec + task markdown unbounded, so a
large spec (30k+) overflowed even with no diff. New fit_cursor_prompt_to_budget() is the
final backstop on all 3 cursor handlers: if the assembled prompt exceeds the cap it
prepends a "read the .flow spec/task + changed files from disk" header and truncates the
embedded body (preserving the trailing rubric/verdict grammar), so cursor reviews of
arbitrarily large specs/diffs always fit and the reviewer reads full context from disk
instead of failing closed. codex/copilot unchanged (no argv limit); validate/deep-pass
excluded (small resumed payloads). Verified: fn-74's own ~21KB spec plan-review
46909->29700 chars, VERDICT=SHIP. Suite 1350 OK (+5 CursorPromptArgvCap tests); byte-parity.

Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
Comment thread plugins/flow-next/scripts/flowctl.py

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b48a4dc0a4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/flow-next/scripts/flowctl.py
…— fn-74

fit_cursor_prompt_to_budget passed a prompt through at len <= CURSOR_ARGV_PROMPT_MAX,
but run_cursor_exec rejects len >= the cap — so a prompt of EXACTLY 30000 chars slipped
the backstop and still failed closed ("prompt too large", dropped receipt). Changed the
passthrough to strictly `< CURSOR_ARGV_PROMPT_MAX` so exactly-cap prompts get trimmed to
under it; docstring clarified to match run_cursor_exec's >= rejection. +1 boundary
regression test (exactly-at-cap is trimmed, verdict preserved). Suite 1351 OK; byte-parity.

Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5511fafb53

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .flow/tasks/fn-74-cursor-review-backend-cursor-agent-cli.1.json Outdated
…argv cap — fn-74

The general cursor argv backstop (fit_cursor_prompt_to_budget) was applied to the 3
primary review handlers but NOT to _run_validator_pass / _run_deep_pass, which render
the findings payload and call run_cursor_exec directly. A verbose findings JSONL could
overflow CURSOR_ARGV_PROMPT_MAX and fail the optional validator/deep phase of an
otherwise-valid review. Added the fit backstop before both run_cursor_exec calls, so ALL
5 cursor argv dispatches (impl/plan/completion/validate/deep) are now bounded. Suite 1351 OK.

Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 7556a71. Configure here.

Comment thread plugins/flow-next/scripts/flowctl.py
gmickel added 2 commits July 1, 2026 01:29
flowctl's `done` writes runtime state to a separate `.state.json` (uncommitted) and never
touches the definition, so the committed fn-74 task definitions kept their plan-time
`status: todo` even though the work shipped (flowctl reports done via the state merge;
`ready` returns none). A fresh clone reads the committed definition, so this advertised the
landed cursor-backend tasks as todo/backlog. Set the 4 definitions to done — consistent
with the clean landed-spec precedent (e.g. fn-1's task definitions are committed `done`).

Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
…letion reviews — fn-74 (T3)

Last PR #184 review finding. `flowctl <backend> plan-review <spec>` (and completion-review)
ignored a per-spec default_review (set via `spec set-backend`) because the handlers pass
task_id=None and resolve_review_spec only discovered default_review via a task->spec lookup.
resolve_review_spec gains an optional spec_id; when invoked spec-scoped (no task) it reads the
spec's default_review directly (source "epic", same precedence, before env/config). The 3
resolve helpers thread spec_id through (keeping the source-aware coercion — "epic" is honored,
not coerced); the 6 plan/completion handlers pass spec_id=epic_id (the 3 impl handlers, which
have a real task, are unchanged). Pre-existing across all backends; now fixed uniformly.
+2 tests; suite 1353 OK; byte-parity.

Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
@cursor

cursor Bot commented Jul 1, 2026

Copy link
Copy Markdown

Bugbot couldn't run - usage limit reached

Bugbot is counted against Cursor usage for this user or team, and this run hit a usage or spend limit.

A user or team admin can review and increase usage limits in the Cursor dashboard.

(requestId: serverGenReqId_8e7a788c-6035-444d-900f-a24493db8c7d)

…v/config — fn-74

Codex retest (dogfooding the final PR state) flagged a Major issue: _resolve_cursor_review_spec
honored a stored per-task/per-epic cross-backend `review: codex:...` (source task/epic), but
`flowctl cursor` always shells cursor-agent and Cursor's model names are format-specific
(`gpt-5.5-high`, not `gpt-5.5`), so it would pass a foreign `--model` and fail — the same problem
the explicit `--spec` guard rejects. Cursor now coerces ANY non-cursor resolved spec to the
cursor default regardless of source (a `cursor:<model>` spec is still honored). codex/copilot stay
lenient (OpenAI-style model names cross over). Test updated (per-task cross-backend honors→coerces).
Suite 1353 OK; byte-parity.

Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
@cursor

cursor Bot commented Jul 1, 2026

Copy link
Copy Markdown

Bugbot couldn't run - usage limit reached

Bugbot is counted against Cursor usage for this user or team, and this run hit a usage or spend limit.

A user or team admin can review and increase usage limits in the Cursor dashboard.

(requestId: serverGenReqId_286a5ece-bfc4-4315-80e5-ff724e9a7723)

…disk-read cue — fn-74

Codex retest edge: when a huge spec/template leaves no argv budget for the diff,
fit_cursor_diff_to_budget dropped it to "" — and if the diff-less prompt still fit under
the cap, fit_cursor_prompt_to_budget added no disk-read header, so cursor could review
branch changes with no diff AND no cue to read the changed files. The drop case now emits
a short read-from-disk pointer in the <diff_content> slot (never empty); if that pushes
the prompt over the cap, the prompt-level backstop's header still covers it. +1 test.
Suite 1354 OK; byte-parity.

Claude-Session: https://claude.ai/code/session_01PCBrK1UKXt1b9oWZordedJ
@cursor

cursor Bot commented Jul 1, 2026

Copy link
Copy Markdown

Bugbot couldn't run - usage limit reached

Bugbot is counted against Cursor usage for this user or team, and this run hit a usage or spend limit.

A user or team admin can review and increase usage limits in the Cursor dashboard.

(requestId: serverGenReqId_bbac9b69-2980-42bf-85df-ac00f25561e0)

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c38f1dc63e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

|------------|------|
| `codex` | [workflow-codex.md](workflow-codex.md) |
| `copilot` | [workflow-copilot.md](workflow-copilot.md) |
| `cursor` | [workflow-cursor.md](workflow-cursor.md) |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Route cursor task overrides before dispatch

When a task sets review: cursor:<model> but the project default is still another backend, Phase 0 still derives BACKEND from flowctl review-backend, which only reads env/config and cannot see the task id. This new cursor branch is therefore skipped; the selected codex/copilot/rp workflow runs instead, and the flowctl command can end up using Cursor's model string (for example gpt-5.5-high) with the wrong CLI or ignoring the task override entirely. Resolve the backend with task/spec context before this dispatch, or otherwise force the invoked workflow to match the stored cursor override.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant