Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 83 additions & 0 deletions docs/BACKLOG/open.md
Original file line number Diff line number Diff line change
Expand Up @@ -347,3 +347,86 @@ Checks:
symbols) and confirm the filtered finding count = 0

---

## 56. Fix the `--rework-pr` workflow in `solve_issues.py`

Labels: `kind/bug`, `theme/solver`, `area/cost-cap`, `priority/2`

Priority: `2`

The `--rework-pr` workflow in `solve_issues.py` is broken for
follow-up-edits. Measured across this session (2026-06-24/25):

- **Run 1** (PR #425, `opencode/deepseek-v4-flash-free` via
`--model opencode`): OpenRouter 400 Bad Request. opencode-serve
1.14.28 rejects the slug in this code path (the same model
worked in the normal solve path for the same PR's first push).
- **Run 2** (PR #425, `mistral/mistral-large-latest` via
`--model openrouter_direct`): same 400. Not the model — the
request shape.
- **Run 3** (PR #425, `openai/gpt-4o-mini` via
`--model openrouter_direct`): no 400, model produced valid
patch JSON, but output truncated mid-JSON by the 4096-token
output cap. Worker reports `status: no_patches`.
- **Run 4** (PR #425, `openai/gpt-4o` via
`--model openrouter_direct`): no 400, full patch JSON produced,
but the worker-patcher (likely `git apply`) rejected it 25x with
"Keine gueltigen Patches in der Eingabe". Most likely cause:
the model wrote a diff against the pre-PR-433 code, but the
worker-patcher applies against the current branch tip (which
already has PR #433's commits). Mismatch.
- **Run 5** (PR #437, `openai/gpt-4o` via
`--model openrouter_direct`): same `patches_failed` /
`worker_exit_code: 3` pattern. Confirms the failure is not
PR-specific.

`/tmp/compare_temperatures.py` (still on disk in this session)
can reproduce the 400 / no_patches / patches_failed modes
deterministically.

**Most likely root cause for Run 4/5 (the patches_failed case):**
the rework-pr prompt gives the model the PR diff as "what to fix"
but does not anchor the model to "your diff must apply cleanly on
top of the current branch tip". The model writes a
rewrite-from-scratch diff and the patcher rejects it. Fix would
need to change the rework prompt in `solve_issues.py` (out of
scope for the §54 review_pr.py work).

**Secondary root cause for Runs 1/2 (the OpenRouter 400):**
unknown — both `deepseek-v4-flash-free` and `mistral-large-latest`
are listed as valid OpenRouter slugs. Could be a request-shape
mismatch in `_gh_api`-equivalent call inside `solve_issues.py`,
or could be a transient OpenRouter API issue. Needs a 3-run
reproduction in a controlled environment before any fix is
designed.

Suggested scope:
- run a deterministic reproduction with the existing
`/tmp/compare_temperatures.py` against 3 different PRs to
characterise the failure surface
- read the rework-pr prompt construction in `solve_issues.py`
and identify the anchoring gap (model doesn't know about the
current branch tip's existing commits)
- design a fix that:
- explicitly passes the current branch tip SHA + the
already-existing commits on the PR branch to the model
- asks the model for an **incremental** diff against that
base, not a full rewrite
- verifies each produced patch with `git apply --check`
*before* the worker marks the run as successful
- add unit tests in `tests/test_solve_issues.py` for the
rework-pr prompt shape (mock at the openrouter-call boundary)

Touches: `scripts/solve_issues.py`, `tests/test_solve_issues.py`

Checks:
- `git diff --check`
- `python -m unittest tests.test_solve_issues -v`
- `python -m unittest discover -s tests`
- new unit test that exercises the rework-pr prompt shape with a
realistic PR-diff + current-branch-tip context
- 3-run reproduction against 3 different real PRs from this repo:
all 3 should now succeed or fail with a clear, actionable error
(not the current opaque `patches_failed` 25x retry loop)

---
12 changes: 12 additions & 0 deletions prompts/rework_pr.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,19 @@ provided that must be addressed by making additional changes on the same branch.
- **PR #{pr_number}** in `{owner}/{repo}`
- **Base branch:** `{base_branch}`
- **Head branch:** `{head_branch}` (the branch you will modify)
- **Current head commit:** `{head_sha}`
- **Reviewers:** {reviewer_usernames}

## Current Branch State

The PR branch already contains the following commits, newest first:

{existing_commits_list}

Your patch MUST be a minimal incremental diff that applies cleanly on top of
commit `{head_sha}`. Do not rewrite files from scratch against older PR or base
branch content. Reference files by their current post-`{head_sha}` content.

## PR Diff (current state vs base)

```
Expand All @@ -29,6 +40,7 @@ The following review comments must be addressed:
3. Preserve the existing branch name — push follow-up commits to `{head_branch}`.
4. If a reviewer comment is unclear, use your best judgment to resolve it.
5. After applying changes, verify the code is syntactically correct.
6. Return only an incremental patch for the current branch tip `{head_sha}`.

## Constraints

Expand Down
12 changes: 12 additions & 0 deletions scripts/validation/github_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,18 @@ def get_pull_request(self, repo: str, number: int) -> PullRequestInfo | None:
merge_commit_sha=(pr.get("merge_commit_sha") or None),
)

def get_pull_request_commits(self, repo: str, number: int) -> list[dict[str, Any]]:
"""Return commit metadata for a pull request, newest commit last."""
resp = self.session.get(
f"{self.BASE}/repos/{self.owner}/{repo}/pulls/{number}/commits",
params={"per_page": 100},
)
if resp.status_code == 404:
return []
self._raise_for_status(resp, f"get PR commits: {repo}#{number}")
data = resp.json()
return data if isinstance(data, list) else []

def get_ci_status(self, repo: str, ref: str) -> CiStatus:
resp = self.session.get(
f"{self.BASE}/repos/{self.owner}/{repo}/commits/{ref}/status",
Expand Down
49 changes: 46 additions & 3 deletions scripts/validation/rework.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
_REPO_ROOT = Path(__file__).resolve().parents[2]
REWORK_PROMPT_PATH = _REPO_ROOT / "prompts" / "rework_pr.md"
REWORK_COMMIT_MESSAGE_PREFIX = "rework: apply review feedback"
DEFAULT_REWORK_MAX_TOKENS = 16384


def _load_rework_prompt_template() -> str:
Expand All @@ -55,6 +56,8 @@ def _build_rework_prompt(
head_branch: str,
diff: str,
review_threads: list[ReviewThread],
head_sha: str = "",
existing_commits_list: str = "",
) -> str:
"""Build the final rework prompt by filling in the template."""
reviewer_usernames = ", ".join(
Expand All @@ -73,12 +76,44 @@ def _build_rework_prompt(
repo=repo,
base_branch=base_branch,
head_branch=head_branch,
head_sha=head_sha or "(unknown)",
existing_commits_list=existing_commits_list or "(commit list unavailable)",
reviewer_usernames=reviewer_usernames,
diff=diff,
review_threads=review_text,
)


def _format_pr_commits_for_prompt(commits: list[dict[str, Any]], *, limit: int = 12) -> str:
"""Format PR commits for the rework prompt, newest first."""
if not commits:
return "(no existing PR commits returned by GitHub)"

lines: list[str] = []
for item in reversed(commits[-limit:]):
sha = str(item.get("sha") or "")[:12] or "unknown"
commit = item.get("commit") if isinstance(item.get("commit"), dict) else {}
message = str(commit.get("message") or "").splitlines()[0].strip()
if not message:
message = "(no commit message)"
lines.append(f"- {sha} {message}")
if len(commits) > limit:
lines.append(f"- ... {len(commits) - limit} older commit(s) omitted")
return "\n".join(lines)


def _rework_max_tokens_from_env(default: int = DEFAULT_REWORK_MAX_TOKENS) -> int:
"""Return the OpenRouter completion cap for PR rework runs."""
raw = os.getenv("OPENROUTER_REWORK_MAX_TOKENS")
if not raw:
return default
try:
value = int(raw)
except ValueError:
return default
return value if value > 0 else default


def _run_worker_via_subprocess(
prompt: str,
repo_dir: str,
Expand All @@ -104,14 +139,14 @@ def _run_worker_via_subprocess(
api_key=openrouter_key,
model=model,
request_timeout_seconds=timeout_seconds,
use_structured_output=True,
use_structured_output=False,
)

try:
response_text, usage = worker.generate_with_usage(
prompt=prompt,
prompt=worker.build_patch_prompt(prompt, structured=False),
temperature=0.3,
max_tokens=8192,
max_tokens=_rework_max_tokens_from_env(),
timeout=timeout_seconds,
)
except Exception as exc:
Expand Down Expand Up @@ -261,6 +296,12 @@ def run_pr_rework(
except RuntimeError:
review_threads = []

try:
existing_commits = client.get_pull_request_commits(repo, pr_number)
except RuntimeError:
existing_commits = []
existing_commits_list = _format_pr_commits_for_prompt(existing_commits)

# 3. Build prompt
try:
template = _load_rework_prompt_template()
Expand All @@ -285,6 +326,8 @@ def run_pr_rework(
head_branch=head_branch,
diff=diff,
review_threads=review_threads,
head_sha=pr_info.head_sha,
existing_commits_list=existing_commits_list,
)

# Dry-run: print metadata and prompt, return early
Expand Down
17 changes: 17 additions & 0 deletions tests/test_validation/test_github_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,23 @@ def test_get_pull_request_returns_none_on_404(self):
pr = self.client.get_pull_request("repo", 999)
self.assertIsNone(pr)

def test_get_pull_request_commits_returns_list(self):
commits = [
{"sha": "abc123", "commit": {"message": "first"}},
{"sha": "def456", "commit": {"message": "second"}},
]
self.session.get.return_value = self._mock_response(200, commits)

result = self.client.get_pull_request_commits("repo", 10)

self.assertEqual(result, commits)
self.session.get.assert_called_once()
self.assertIn("/pulls/10/commits", self.session.get.call_args.args[0])

def test_get_pull_request_commits_returns_empty_on_404(self):
self.session.get.return_value = self._mock_response(404)
self.assertEqual(self.client.get_pull_request_commits("repo", 999), [])

def test_get_ci_status_returns_status(self):
self.session.get.return_value = self._mock_response(200, {
"state": "success", "total_count": 3, "statuses": [
Expand Down
Loading
Loading