feat(core,cli): auto-compact follow-up — /compress instructions, PreCompact hook plumb, plan/subagent attachments#4688
Conversation
📋 Review SummaryThis PR implements three follow-up improvements to the auto-compact subsystem: (1) 🔍 General Feedback
🎯 Specific Feedback🟡 High
🟢 Medium
🔵 Low
✅ Highlights
|
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
pomelo-nwu
left a comment
There was a problem hiding this comment.
Stage 1: Template Gate — ❌ Changes Requested
This PR does not follow the required PR template. The body uses custom headings (## Summary, ## Why these three together, ## Test plan) instead of the standard template headings.
Missing required headings:
## What this PR does(replaced by## Summary)## Why it's needed(replaced by## Why these three together)## Reviewer Test Planwith### How to verifyand### Evidence (Before & After)subsections### Tested onOS table## Risk & Scope## Linked Issues<details><summary>中文说明</summary>bilingual section
The content itself is substantive and well-written — please restructure it to match the template at .github/pull_request_template.md. The test plan checklist items should go under ## Reviewer Test Plan, the "Why these three together" rationale under ## Why it's needed, etc.
Please reformat and push the updated description, then this gate will pass automatically on re-triage.
中文说明
Stage 1: 模板检查 — ❌ 需要修改
本 PR 未使用要求的 PR 模板。正文使用了自定义标题(## Summary、## Why these three together、## Test plan),而非标准模板标题。
缺少的必需标题:
## What this PR does(被## Summary替代)## Why it's needed(被## Why these three together替代)## Reviewer Test Plan及其子标题### How to verify和### Evidence (Before & After)### Tested on操作系统表格## Risk & Scope## Linked Issues<details><summary>中文说明</summary>双语段落
内容本身质量不错——请按 .github/pull_request_template.md 重新组织。测试计划清单项应放入 ## Reviewer Test Plan,"Why these three together" 放入 ## Why it's needed,以此类推。
请重新格式化并更新 PR 描述,重新 triage 时此检查将自动通过。
Stage 1: Gate
中文说明
|
Stage 2: Review + TestCode Review
Test Results
No critical or high-confidence blockers. Code is well-structured, tests are thorough (covering XML injection, overflow caps, whitespace edge cases, NOOP guard ordering, instruction merge order). 中文说明代码审查
测试结果
无 critical 或高置信度 blocker。代码结构良好,测试覆盖全面(涵盖 XML 注入、溢出上限、边界情况、NOOP 守卫顺序、指令合并顺序)。 |
Stage 3: Final Decision
Verdict: APPROVED ✅ 中文说明
结论:APPROVED ✅ |
pomelo-nwu
left a comment
There was a problem hiding this comment.
Stage 3: Final Decision
- Need real? Yes — three genuine gaps: users can't steer long-session summaries toward what matters, PreCompact hooks are configured but can't influence the summary they're meant to shape, plan-mode constraint and background subagent state silently vanish after compaction
- Code simple? Yes — clean 4-layer parameter threading, no unnecessary abstractions, well-bounded defensive measures (XML escaping, snapshot caps, whitespace flattening)
- Best approach? Yes — bundling justified (same two core files, would otherwise produce three near-identical review cycles); implementation matches motivation point-for-point
- Confident to merge? Yes — backward-compatible (all new params trailing-optional), blast radius limited to the compression hot path, 282 tests pass, typecheck clean
Verdict: APPROVED ✅
中文说明
- 需求真实? 是——三个实际缺口:用户无法引导长会话摘要聚焦重点、PreCompact hook 配了但无法影响摘要、plan mode 约束和后台 subagent 状态在压缩后静默丢失
- 代码简洁? 是——清晰的 4 层参数透传、无多余抽象、防御性措施有良好边界(XML 转义、快照上限、空白扁平化)
- 最优方案? 是——打包合理(同样两个核心文件,否则会产生三次几乎一样的 review);实现与动机逐点对应
- 有信心合并? 是——向后兼容(所有新参数尾部 optional)、影响范围限于压缩热路径、282 个测试通过、typecheck 无错
结论:APPROVED ✅
Stale: author restructured PR body to match template. Re-triaged and approved.
There was a problem hiding this comment.
Pull request overview
This PR extends the auto-compaction pipeline to better preserve “session state” and allow user steering, closing gaps vs. the claude-code reference flow.
Changes:
- Adds
/compress <focus instructions>plumbing end-to-end into the compression side-query system prompt and PreCompact hook. - Enhances post-compact restoration attachments to include a plan-mode reminder and a bounded snapshot of active subagent tasks.
- Adds unit tests and an interactive integration smoke test for the new wiring.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/core/src/services/postCompactAttachments.ts | Adds plan-mode reminder + <background-tasks> snapshot rendering in post-compact attachments. |
| packages/core/src/services/postCompactAttachments.test.ts | Adds coverage for plan-mode reminder and subagent snapshot formatting/escaping/bounding. |
| packages/core/src/services/chatCompressionService.ts | Threads custom instructions + PreCompact hook context into the compression prompt; wires plan mode and subagent snapshot into post-compact history. |
| packages/core/src/services/chatCompressionService.test.ts | Adds tests for custom-instructions plumbing, hook ordering, and attachment wiring. |
| packages/core/src/core/geminiChat.ts | Extends TryCompressOptions to carry customInstructions into the compression service. |
| packages/core/src/core/client.ts | Extends tryCompressChat to accept customInstructions and forwards via options bag. |
| packages/core/src/core/client.test.ts | Updates/extends tests for new tryCompressChat signature and forwarding behavior. |
| packages/cli/src/ui/commands/compressCommand.ts | Parses trailing /compress args, trims, caps to 2000 chars, and forwards to core client. |
| packages/cli/src/ui/commands/compressCommand.test.ts | Adds tests for trimming/undefined behavior and 2000-char cap. |
| integration-tests/interactive/context-compress-interactive.test.ts | Adds a skipped-on-win32 interactive smoke test ensuring /compress instructions plumbing reaches telemetry. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
wenshao
left a comment
There was a problem hiding this comment.
Typecheck finding (not inline): integration-tests/interactive/context-compress-interactive.test.ts — parameter data implicitly has any type (TS7006) at lines 37, 83, and 125. Only line 125 is in this PR's diff; the other two are pre-existing. Fix: (data: string) =>.
— qwen3.7-max via Qwen Code /review
Extends /compress to take a trailing instruction string (max 2000 chars) that is passed through tryCompressChat → tryCompress → CompressOptions and appended to the compression side-query system prompt as an "Additional Instructions:" block. Mirrors claude-code /compact <text>. Empty / whitespace-only args fall back to the prior behaviour.
Adds two optional ComposePostCompactOptions: - planModeActive: when true, emits a <plan-mode-active> reminder so the post-compact agent does not forget destructive tools remain gated. - runningSubagents: when non-empty, emits a <background-tasks> block listing each running/paused task by id, status, and description. Both blocks are spliced into the merged user attachment Content before file/image restorations. XML-significant characters in descriptions are escaped to prevent an adversarial subagent description from closing the wrapper tag. Wiring at the call site arrives in the next commit.
…chments ChatCompressionService.compress now passes: - planModeActive: derived from config.getApprovalMode() === ApprovalMode.PLAN - runningSubagents: filtered from BackgroundTaskRegistry to agent-kind tasks in 'running' or 'paused' state into composePostCompactHistory. Adds collectActiveSubagents() helper that returns [] when the registry is absent so older SDK consumers without it keep working.
…est /compress - chatCompressionService: read PreCompact hook output via result.getAdditionalContext() — the wrapper returns DefaultHookOutput (not the raw AggregatedHookResult), and the accessor sanitises < / > consistently with every other call-site in the repo. - Test mocks now return a DefaultHookOutput-shaped stub via a tiny makeHookOutput() helper rather than the aggregator shape. - New integration smoke test for `/compress focus on the scientist mentioned` exercising the args plumbing end-to-end.
Code-review follow-up. Pathological sessions with hundreds of backgrounded agents could otherwise produce a multi-KB block. Newest 30 rows are kept (highest startTime); older ones are summarised on a trailing line so the model knows the snapshot is partial.
…ovalMode in tests Second code-review pass found two real issues: 1. Subagent descriptions containing `\n`/`\r`/`\t` would split across multiple lines inside the `<background-tasks>` bullet list, letting the second line read as a sibling row (or worse, an orphan paragraph between two `- [..]` entries). Flatten whitespace before the slice so each task stays on one line. 2. The plan-mode wiring tests passed `'plan'` / `'auto-edit'` as plain strings instead of `ApprovalMode.PLAN` / `ApprovalMode.AUTO_EDIT`. Source code compares against the enum; a future enum value change would have silently passed the tests. Import and use the enum.
…ode tool names Round 3 code review surfaced two issues: 1. PreCompact hook fired BEFORE the curatedHistory.length < 2 guard, so a single-message session would trigger any hook side effects (transcript dump, external notification, etc.) and then NOOP. Move the hook fire below the guard so hooks only run when compression is actually possible. New regression test asserts the contract. 2. PLAN_MODE_REMINDER_TEXT said "shell mutations" but the real qwen-code tool is `run_shell_command` (tool-names.ts:26). Use the verbatim tool names so a future rename is grep-discoverable.
…, extract reminder builder Code-review follow-ups on post-compact attachments: - Replace the local 3-char escapeForXmlText with the shared 5-char escapeXml from utils/xml.ts, and apply it to the subagent id and status as well as the description. Subagent ids derive from a user-configurable subagentConfig.name, so an unescaped `<`/`&` there could close the <background-tasks> wrapper or forge sibling markup. - PLAN_MODE_REMINDER_TEXT now interpolates ToolNames.WRITE_FILE / .EDIT / .SHELL instead of retyping the names, so a future rename stays in sync. - Extract buildStateReminderParts() as the single source of truth for the plan-mode + subagent reminder blocks, used by both composePostCompactHistory and (next commit) its catch-fallback so the two paths can't drift.
…inders on fallback; cap hook context Three code-review fixes in the compaction service: - collectActiveSubagents now also requires isBackgrounded — foreground agents are the parent's synchronously-awaited tool call and don't belong in a <background-tasks> roster. Mirrors getRunningBackgroundCount. - The composePostCompactHistory catch-fallback re-applies the plan-mode + subagent reminders via the shared buildStateReminderParts (pure, no I/O), so a restoration failure no longer silently drops plan-mode enforcement and the subagent roster. - The PreCompact hook's additionalContext is capped at MAX_HOOK_INSTRUCTIONS_CHARS before entering the side-query prompt, closing the unbounded-input hole the user-text cap was meant to prevent. The fallback and hook-cap fixes have RED-verified regression tests.
…test pty typing - /compress now emits an INFO notice (interactive), a stream message (acp), and a prefixed return message (non-interactive) when the instruction string exceeds MAX_COMPRESS_INSTRUCTIONS_CHARS, so the silent 2000-char clip is no longer invisible to the user. - Annotate the three `ptyProcess.onData((data: string) => ...)` callbacks in the compress integration test to clear the TS7006 implicit-any the reviewer's typecheck flagged (fixed all three occurrences, not only the one inside this PR's diff).
7b2c17c to
2af0022
Compare
Review round addressed — 3 commits (
|
| Finding | Outcome |
|---|---|
collectActiveSubagents includes foreground agents (copilot) |
✅ Fixed 23f5a40 — now requires isBackgrounded |
| Subagent id interpolated without escaping (copilot) | ✅ Fixed 295cf40 — shared escapeXml on id/status/description |
| [Critical] Fallback path drops plan-mode/subagent reminders | ✅ Fixed 23f5a40 — shared buildStateReminderParts re-applied in catch; RED-verified test |
[Critical] hookExtraInstructions uncapped |
✅ Fixed 23f5a40 — capped at MAX_HOOK_INSTRUCTIONS_CHARS (4000); RED-verified test |
escapeForXmlText duplicates shared escapeXml |
✅ Fixed 295cf40 — local helper removed |
PLAN_MODE_REMINDER_TEXT hardcodes tool names |
✅ Fixed 295cf40 — uses ToolNames.* |
| Silent 2000-char truncation has no UI warning | ✅ Fixed 2af0022 — notice across interactive/acp/non-interactive |
Typecheck TS7006 implicit-any on onData (review body) |
✅ Fixed 2af0022 — annotated all three onData((data: string) => …) callbacks |
| acp branch untested (suggestion) | ❌ Declined — out of scope; the acp branch pre-existed and was untested before this change, and the new doCompress plumbing is exercised by the interactive-path cases. Better as its own change. |
The two [Critical] fixes each ship with a regression test verified RED before the fix (break source → test fails → restore → passes). Branch rebased onto latest main; core suite (9820) + typecheck green locally.
wenshao
left a comment
There was a problem hiding this comment.
LGTM ✅ — well-structured PR with thorough test coverage (144 unit tests, all passing). The three-part design (/compress instructions, PreCompact hook plumbing, plan-mode + subagent attachments) is clean, the edge-case handling (XML escaping, whitespace flattening, overflow caps) is solid, and the catch-fallback path correctly preserves plan-mode enforcement.
Low-confidence items for future consideration (not blockers):
collectActiveSubagents(config)in the catch-fallback block (line ~607) could theoretically throw ifregistry.getAll()fails, defeating the degraded-success design. Wrapping it in its own try/catch or makingcollectActiveSubagentsitself swallow errors would harden the fallback path.- User
/compresstext is injected into the side-query system prompt without escaping, while hook and subagent paths both use sanitization. Consider wrapping in a<user-instructions>tag withescapeXml()for consistency. MAX_COMPRESS_INSTRUCTIONS_CHARSis only enforced in the CLI layer. The core service accepts any length — future callers bypassing the CLI could inject unbounded text.- ACP and headless truncation-notice paths lack test coverage (only interactive mode is tested).
- Catch-fallback
extraHistorylayout (2 entries) meansslice(2)under-counts token cost vs. the normal path (3+ entries). buildCompressionSystemPromptcallsuserInstructions.trim()twice — minor redundant allocation.
— qwen3.7-max via Qwen Code /review
Verification Report — PR #4688Commit: Test Results
Test File Breakdown
Execution Environment
VerdictAll 6 checks pass. 287 total tests (277 core + 10 cli) cover the changed chatCompressionService, postCompactAttachments, client, and compressCommand files. No regressions detected. |
What this PR does
Closes three remaining gaps in qwen-code's auto-compact subsystem versus the claude-code reference implementation, identified after PR #4599 (summary + restoration attachments) and PR #4345 (three-tier threshold ladder).
First,
/compressnow accepts a trailing focus instruction string (/compress focus on the auth bug), threaded throughtryCompressChat → tryCompress → ChatCompressionService.compressand appended to the compression side-query system prompt as anAdditional Instructions:block. The string is capped at 2000 characters because the side-query has no input-PTL retry today.Second, the existing
PreCompacthook now receives the user's/compressinstructions, and anyadditionalContextit returns viahookSpecificOutputis merged back into the side-query prompt. User text comes first; hook text is appended (matches claude-code'smergeHookInstructionssemantics). The hook fires only after thecuratedHistory.length < 2early-NOOP guard so hooks with side effects don't trigger on single-message sessions.DefaultHookOutput.getAdditionalContext()is the accessor used, consistent with every other hook-output consumer in the repo.Third,
composePostCompactHistorynow restores two reminder blocks before the existing file/image attachments: a<plan-mode-active>block whengetApprovalMode() === ApprovalMode.PLAN, and a<background-tasks>snapshot listingrunning/pausedagent tasks (viaBackgroundTaskRegistry.getAll()). The snapshot is bounded — descriptions are flattened of newlines, XML-escaped, capped at 200 chars per row and 30 rows total with an overflow notice. Both blocks reduce to no-ops when their source is empty, so sessions without plan mode or background agents see no change in attachment shape.Why it's needed
These three live in the same compaction hot path. Without
/compressinstructions, users have no way to steer a long-session summary toward what they care about. Without PreCompact hook plumbing, hooks can be configured but can't actually influence the summary they were intended to influence. Without plan-mode and subagent attachments, a long session that compacts in plan mode silently loses the constraint (the model resumes thinking it's free to mutate files) and forgets about backgrounded subagents (the model resumes thinking they completed).Bundling them into a single PR keeps the review round trip consistent with prior auto-compact PRs (#4345, #4599) — all three touch the same two source files (
chatCompressionService.ts,postCompactAttachments.ts) and would otherwise produce three near-duplicate review cycles over the same code.Reviewer Test Plan
How to verify
Local unit tests:
Headless end-to-end smoke (requires
OPENAI_API_KEYenv):Interactive sanity check:
/compress focus on the scientist mentionedafter a model turn — the COMPRESSION transcript entry should render and thechat_compressiontelemetry event should fire.Evidence (Before & After)
N/A — this PR has no user-visible TUI change. New behaviour is entirely under the
/compresscommand path and inside post-compact attachment XML blocks the model sees but the user does not. Verification is via the unit and integration tests listed above (10 new tests across two suites, plus 1 headless smoke).Tested on
Environment (optional)
node 22.x, monoreponpm testandvitest run --root ./integration-testsagainst the real OpenAI provider for the smoke test.Risk & Scope
customInstructionssilently truncates without warning the user — chosen over implementing a warning path because the side-query has no input-PTL recovery yet (separate gap, tracked but out of scope here). If a user genuinely needs a longer focus directive, they will not learn that their input was clipped.tryCompressChat's new 4th parameter andTryCompressOptions.customInstructionsare trailing optional fields, backwards-compatible with existing SDK consumers.ComposePostCompactOptions.planModeActiveandrunningSubagentsare likewise optional and default to no-op behaviour. TheBackgroundTaskRegistryaccess is optional-chained, so SDK harnesses without a registry impl continue to work.Linked Issues
None — direct follow-up to #4599 and #4345.
中文说明
本 PR 做了什么
收口 qwen-code auto-compact 子系统相对 claude-code 参考实现仍存在的三处 gap,跟在 PR #4599(summary + restoration attachments)与 PR #4345(三阶梯阈值)之后。
第一,
/compress现在接受尾部 focus 指令字符串(/compress focus on the auth bug),通过tryCompressChat → tryCompress → ChatCompressionService.compress透传,作为Additional Instructions:块拼接到压缩 side-query 的 system prompt 上。指令被截断到 2000 字符,因为 qwen-code 的 side-query 当前没有 input-PTL 重试。第二,既有的
PreCompacthook 现在能收到用户的/compress指令,hook 通过hookSpecificOutput返回的additionalContext也被合并回 side-query prompt。用户文本在前,hook 文本附加在后(与 claude-code 的mergeHookInstructions语义一致)。Hook 在curatedHistory.length < 2早退 NOOP 检查之后才触发,这样有副作用的 hook 不会因单条消息会话被白白调用。读取 hook 输出统一用DefaultHookOutput.getAdditionalContext(),跟仓库里所有其他 hook 消费者保持一致。第三,
composePostCompactHistory在既有的 file/image attachment 之前注入两个 reminder 块:当getApprovalMode() === ApprovalMode.PLAN时注入<plan-mode-active>块,当BackgroundTaskRegistry.getAll()含running/pausedagent 任务时注入<background-tasks>快照。快照有边界 — 描述会先 flatten 换行、做 XML 转义、按 200 字符截断单行,按 30 行总数截断列表(超出加 overflow 行)。两个块在源为空时都降级为 no-op,所以非 plan mode、无后台 agent 的会话 attachment 形状不变。为什么需要
三件事都在同一个 compaction 热路径里。没有
/compress指令,用户无法引导长会话摘要聚焦到自己关心的点。PreCompact hook 链路不打通,用户即使配了 hook 也无法影响它本应影响的摘要。没有 plan-mode / subagent attachment,长会话在 plan mode 下压缩后会静默丢失约束(模型恢复后以为可以随意修改文件)、忘记后台 subagent(模型恢复后以为它们已完成)。打成一个 PR 而非三个,是为了和此前的 auto-compact PR(#4345、#4599)对齐 review 周期 — 三处都改同样两个源文件(
chatCompressionService.ts、postCompactAttachments.ts),拆三个 PR 会让 review 在同一段代码上来回三次。Reviewer 测试方案
验证方式
本地单元测试:
Headless 端到端冒烟(需
OPENAI_API_KEY环境变量):交互手验:一次 model 轮次后
/compress focus on the scientist mentioned— COMPRESSION 历史条目应渲染,chat_compression遥测事件应触发。Before & After 证据
N/A — 本 PR 没有用户可见的 TUI 改动。新行为完全在
/compress命令路径与 post-compact attachment 的 XML 块里,模型能看到但用户看不到。验证依赖上述单元 + 集成测试(两套件新增 10 个测试 + 1 个 headless 冒烟)。测试环境
环境(可选)
node 22.x,monoreponpm test与vitest run --root ./integration-tests,冒烟测试连真 OpenAI provider。风险与范围
customInstructions的 2000 字符 cap 是静默截断,没有提示用户 — 选这个是因为 side-query 当前无 input-PTL 恢复(单独 gap,本 PR 不在范围)。如果用户真的需要更长的 focus 指令,他不会知道自己的输入被裁剪了。tryCompressChat新增第 4 个参数和TryCompressOptions.customInstructions都是尾部 optional 字段,对已有 SDK 消费者向后兼容。ComposePostCompactOptions.planModeActive和runningSubagents也都是 optional 且默认 no-op。BackgroundTaskRegistry访问走 optional chaining,SDK harness 没注册 registry 仍能跑。关联 issue
无 — 直接接续 #4599 与 #4345。