Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 28 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,32 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

## [0.9.18] - 2026-05-26

### Fixed

- **Claude Code native `Agent` delegation is disabled inside managed
roles.** CoreRoom now starts Claude Code with `--disallowedTools Agent`
and the permission hook hard-denies `Agent`, so peer work must go
through `@role: <brief>` where parent turns, lifecycle, interrupts,
cost, and evidence stay host-controlled.
- **`cr cost` normalizes Claude Code cumulative session totals.** Claude
Code reports `total_cost_usd` as a session total, not a per-turn
increment; CoreRoom now converts those samples to monotonic deltas per
role/session before aggregating.
- **Root turns no longer render as sub-agent spawns.** Public user/host
turns without a parent turn are ignored by the spawn lifecycle tracker,
fixing footer noise such as `@host spawning`.
- **Terminal text selection works by default.** The live room no longer
enables mouse capture unless `COREROOM_MOUSE_CAPTURE=1` is set, so users
can select and copy transcript text normally.

### Changed

- **Engine-native delegation is documented as a threat-model invariant.**
The architecture and README now state that CoreRoom-owned delegation is
`@role: <brief>`, not an engine-native subagent plane.

## [0.9.17] - 2026-05-26

### Added
Expand Down Expand Up @@ -1674,7 +1700,8 @@ API stability, not feature completeness.
- **No timestamps in CREP events.** `cr cost --since` honors the log
file's mtime only; per-event timestamps land in v0.2.

[Unreleased]: https://github.com/spytensor/CoreRoom/compare/v0.9.17...HEAD
[Unreleased]: https://github.com/spytensor/CoreRoom/compare/v0.9.18...HEAD
[0.9.18]: https://github.com/spytensor/CoreRoom/compare/v0.9.17...v0.9.18
[0.9.17]: https://github.com/spytensor/CoreRoom/compare/v0.9.16...v0.9.17
[0.9.16]: https://github.com/spytensor/CoreRoom/compare/v0.9.15...v0.9.16
[0.9.15]: https://github.com/spytensor/CoreRoom/compare/v0.9.14...v0.9.15
Expand Down
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "coreroom"
version = "0.9.17"
version = "0.9.18"
edition = "2021"
rust-version = "1.88"
authors = ["Charlie Zhu <chaojie.zhu.cn@gmail.com>"]
Expand Down
13 changes: 10 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,10 @@ delegation line like `@x: <brief>` in its reply.
Claude Code is gated by a CoreRoom-injected PreToolUse hook; Codex and
Gemini approval support follows each engine's native protocol and is shown
only when CoreRoom can supervise it.
- **Control-plane delegation.** CoreRoom disables Claude Code's native
`Agent` delegation tool inside managed roles. Peer work must route through
`@role: <brief>` so parent turns, lifecycle, cost, interrupts, and evidence
stay visible to `@host`.

## Design docs

Expand Down Expand Up @@ -119,7 +123,7 @@ Disable that with `COREROOM_NO_UPDATE_CHECK=1` or
<summary>Don't have npm? Direct binary install.</summary>

```bash
TAG=v0.9.17
TAG=v0.9.18
ARCH=$(uname -m); case "$ARCH" in arm64|aarch64) ARCH=aarch64 ;; *) ARCH=x86_64 ;; esac
OS=$(uname -s | tr '[:upper:]' '[:lower:]')
curl -fsSL "https://github.com/spytensor/CoreRoom/releases/download/${TAG}/cr-${TAG}-${OS}-${ARCH}.tar.gz" \
Expand Down Expand Up @@ -243,6 +247,9 @@ Useful commands:
replays the full event log when you need to audit what happened. Set
`COREROOM_VERBOSE_TOOLS=1` to opt the live REPL back into the full
per-tool trace stream when you need it inline.
- The live room leaves terminal mouse capture off by default so transcript
text can be selected and copied normally. Set `COREROOM_MOUSE_CAPTURE=1`
if you prefer mouse-wheel routing inside the TUI.
- Permission prompts appear only while a decision is needed. Successful
once-only allows clear the prompt and stay out of the chat stream; session
approvals and denials remain visible because they change what the role can
Expand All @@ -254,9 +261,9 @@ Useful commands:
| ---------- | ------------------ | ----- | ------ |
| Prompt isolation | system-prompt file | MCP base instructions | requires `--system-instruction-file` |
| Tool trace events | proposed + executed | exec notifications when emitted | stream-json tool_use/tool_result |
| Cost reporting | per turn | — | — |
| Cost reporting | normalized from session total | — | — |
| Budget enforcement | native cap | — | — |
| Permission gating | `ask` / `auto` / `bypass` via PreToolUse hook | `ask` / `auto` / `bypass` via MCP approval bridge in live REPL | explicit `bypass` only |
| Permission gating | `ask` / `auto` / `bypass` via PreToolUse hook; native `Agent` delegation disabled | `ask` / `auto` / `bypass` via MCP approval bridge in live REPL | explicit `bypass` only |

`cr cost` excludes unsupported engines from the numeric total and marks them
with `—`. This is deliberate: older builds displayed `$0.00` for engines
Expand Down
8 changes: 8 additions & 0 deletions data/splash_content.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,14 @@ items = [
"/journal <role> captures today's lessons-learned",
]

[[whats_new]]
version = "0.9.18"
items = [
"Claude Agent delegation is disabled inside CoreRoom roles; use @role routing",
"cr cost now normalizes Claude Code cumulative session totals",
"terminal text selection works by default; mouse capture is opt-in",
]

[[whats_new]]
version = "0.9.17"
items = [
Expand Down
14 changes: 10 additions & 4 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -285,9 +285,9 @@ partial scope coverage is a blocker.
### Claude Code adapter

- Spawn: `claude --print --input-format=stream-json --output-format=stream-json
--verbose --dangerously-skip-permissions --append-system-prompt-file <priors>
--settings <hooks-config>` when permission mode is `ask` or `auto`;
`bypass` omits the hook settings.
--verbose --dangerously-skip-permissions --disallowedTools Agent
--append-system-prompt-file <priors> --settings <hooks-config>` when
permission mode is `ask` or `auto`; `bypass` omits the hook settings.
- Input: stream-json messages on stdin. `content` must be array of blocks
(`[{"type":"text","text":"…"}]`), not bare string.
- Output: stream-json on stdout (`system`, `assistant`, `result`,
Expand All @@ -300,8 +300,13 @@ partial scope coverage is a blocker.
`/deny <tool>` update the session policy file read by that hook. In
Claude Code's non-interactive stream mode, `ask` is represented as a
safe denial in `permission_denials`; the user can `/allow` and retry.
- Native delegation guard: Claude Code's `Agent` tool is disabled for
managed CoreRoom roles. Peer work must use `@role: <brief>` so the
dispatcher owns parent turns, lifecycle, cost attribution, interrupts,
and tracker evidence.
- Session ID: extracted from `system.subtype="init"` event at start.
- Cost: per-turn `result.total_cost_usd`. Wrapper aggregates per role per day.
- Cost: `result.total_cost_usd` is a cumulative Claude Code session sample;
`cr cost` normalizes monotonic samples before aggregating per role.

### Codex adapter

Expand Down Expand Up @@ -499,6 +504,7 @@ Raw CREP JSONL per role per session. Never auto-loaded — used for forensics,
| Patch directory bloat | Hard 50-cap per role + FIFO archive at v0.1 |
| Routing loops (`@a` ↔ `@b` ↔ `@a`) | Dispatcher-owned routing state. Auto-router only acts on explicit delegation lines that start with `@role:`, and skips self-delegation (`@a` delegating to itself), unknown roles (`@<not-running>`), and ungrounded turns (tool calls were systematically denied → reply is a guess). User-origin depth is 0; each auto-route child is parent depth + 1; default max hop depth is 5. Fan-out and queued-turn limits are separate; chains also end when the queue drains or the user halts (`Ctrl-C` × 2 or `/halt`). |
| Permission gate fail-open | Hook script defaults to deny on any error; wrapper supervises hook process and treats non-zero exit without decision-file as deny |
| Engine-native delegation escape | Claude Code `Agent` is disallowed at spawn time and denied by the CoreRoom hook so sub-work cannot bypass `@role` routing, lifecycle, cost, or host interrupts |
| Concurrency / SIGINT mid-tool | Each role's tool calls wrapped in `.coreroom/locks/<role>.inflight`. On startup, stale inflight markers put the role in recovery mode (no new tool calls until user acknowledges) |
| Token cost runaway | User halts with `Ctrl-C` × 2 or `/halt`; cost per turn is surfaced in the WorkCard so runaway behavior is visible |
| Role identity drift over months | v0.2 `cr review` diffs journal-self vs priors-self and surfaces contradictions |
Expand Down
32 changes: 19 additions & 13 deletions docs/threat-model.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,83 +70,89 @@ architecture amendment before implementation.
route. Auto-routing only acts on explicit delegation lines accepted by the
parser, such as `@backend: <brief>` or `@backend @ci: <brief>`.

4. Peer output is quoted evidence, not a command channel.
4. Engine-native delegation is not a CoreRoom route.
Managed roles must not use engine-native subagent tools such as Claude
Code `Agent`. Those tools create work and cost outside dispatcher-owned
parent turns, lifecycle, interrupts, and evidence. Peer work must route
through `@role: <brief>`.

5. Peer output is quoted evidence, not a command channel.
Cross-role payloads are treated as data from the sending role. A receiving
role can use that content as context, but embedded instructions inside the
quote do not override its kernel, priors, or current user request.

5. Current-thread evidence is required for peer claims.
6. Current-thread evidence is required for peer claims.
A role may claim consensus, approval, review completion, or "merged
perspectives" only from current-thread peer evidence surfaced by the
runtime, such as peer-quote envelopes, current turn ids, or user-pasted
current-thread text. Memory, priors, journals, and resumed engine context
are not enough.

6. Editable logs are not enforcement state.
7. Editable logs are not enforcement state.
`.coreroom/messages.jsonl` supports replay and audit, but live safety
decisions must come from runtime-owned state or explicit user commands.
Future budget enforcement must not trust a mutable log total.

7. Permission policy is visible and resettable.
8. Permission policy is visible and resettable.
Existing allow/deny decisions must be visible at startup and through
`/permissions`. Review or release workflows that require fresh attention
should use `/permissions clear` and, when stale engine context matters,
`/fresh`.

8. Resume is convenience, not provenance.
9. Resume is convenience, not provenance.
Resuming an engine session may carry useful context, but it also carries
stale claims. `cr` must surface resumed roles and the clean-start controls.
Release reviews, audits, and incident work should prefer `cr start --fresh`
or `/fresh` unless the user intentionally wants continuity.

9. Tier 0 is inline.
10. Tier 0 is inline.
Tier 0/read-only review may inspect files and commands needed for evidence,
but it does not write hidden `.coreroom/` review artifacts. Persistent
evidence, cross-model review, or release sign-off belongs in Tier 1.

10. Authority-scoped veto is explicit.
11. Authority-scoped veto is explicit.
A role can block plan advancement only when all of these are true: the
role has a validated authority scope in configuration, the plan artifact
declares an intersecting scope, and the role records an explicit review
decision for the current plan SHA. Model prose, stale resumed context, or
editable logs cannot create authority, expand scope, reject a plan, or
override a rejection.

11. User override is a command, not a claim.
12. User override is a command, not a claim.
A scoped veto can be overruled only by an explicit user command with a
reason. The override is recorded in the gate ledger and CREP audit trail.
Text emitted by a role, transcript replay, or a journal entry may explain
the override after the fact, but cannot substitute for it.

12. Host-led control is visible and confirmable.
13. Host-led control is visible and confirmable.
`@host` is the highest in-room coordination authority, but host output is
still model text. Persistent project state changes require explicit user
confirmation or a visible command path. Non-host roles cannot create
WorkOrders, register sources, update trackers, prepare completion claims,
or close evidence gaps by prose.

13. WorkOrders bind state; they do not prove state.
14. WorkOrders bind state; they do not prove state.
A WorkOrder can link a GitHub Issue, gate thread, branch, PR, tracker row,
and evidence expectations, but it is still a local project file. GitHub
Issue creation or binding requires confirmation. Binding an existing issue
must not silently mutate the issue body, labels, milestone, or comments.
Completion still depends on external evidence and tracker closure.

14. Source Registry is pinned context, not prompt memory.
15. Source Registry is pinned context, not prompt memory.
Project sources must carry pins, trust levels, owners, visible roles,
purpose, and refresh policy before they can be used for WorkOrder context.
Registering or re-pinning a source requires confirmation. Remote and
external sources must never silently refresh. Adding a source does not
mount it into role knowledge or make it part of a ContextPack.

15. ContextPacks are scoped selections.
16. ContextPacks are scoped selections.
A ContextPack can select path/range or snapshot references from registered
sources for specific target roles. It must not imply that all project
sources are loaded into every role. Stale pins and unpinned selected
sources must be surfaced before delegation; they are not hidden evidence
of freshness.

16. Evidence Packets are structured claims.
17. Evidence Packets are structured claims.
Evidence Packets can support host PR summaries, but completion still
depends on required fields being present and tracker state being updated.
Model prose alone cannot satisfy changed-file, command, test, review, risk,
Expand Down
2 changes: 1 addition & 1 deletion npm/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@spytensor/coreroom",
"version": "0.9.17",
"version": "0.9.18",
"description": "CoreRoom is the Engineering Control Room for AI Agents: host-led, GitHub-gated AI-assisted software engineering control.",
"keywords": [
"cli",
Expand Down
29 changes: 19 additions & 10 deletions src/adapter/cc.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
//! ```text
//! claude --print --input-format=stream-json --output-format=stream-json \
//! --include-hook-events --verbose --dangerously-skip-permissions \
//! --disallowedTools Agent \
//! --append-system-prompt-file=<priors> [--model=<model>]
//! ```
//!
Expand Down Expand Up @@ -52,6 +53,7 @@ use crate::turn::{TurnId, LEGACY_TURN_ID};
/// event outbound queue. Sized for typical interactive usage; can be
/// revisited if back-pressure becomes a real problem.
const CHANNEL_CAPACITY: usize = 64;
const NATIVE_DELEGATION_TOOL: &str = "Agent";

/// Adapter that drives the Claude Code CLI.
#[derive(Debug, Clone)]
Expand Down Expand Up @@ -105,16 +107,9 @@ impl EngineAdapter for CcAdapter {

let mut tempfiles = Vec::new();
let mut cmd = Command::new(&self.claude_path);
cmd.arg("--print")
.arg("--input-format=stream-json")
.arg("--output-format=stream-json")
.arg("--include-hook-events")
.arg("--verbose")
.arg("--dangerously-skip-permissions")
.arg(format!(
"--append-system-prompt-file={}",
config.priors_path.display()
));
for arg in base_claude_args(&config.priors_path) {
cmd.arg(arg);
}
// Per amendment A-006: if the REPL handed us a session id
// saved by a previous `cr start`, ask cc to resume that
// conversation instead of opening a fresh one. cc tracks
Expand Down Expand Up @@ -235,6 +230,20 @@ impl EngineAdapter for CcAdapter {
}
}

fn base_claude_args(priors_path: &Path) -> Vec<String> {
vec![
"--print".to_owned(),
"--input-format=stream-json".to_owned(),
"--output-format=stream-json".to_owned(),
"--include-hook-events".to_owned(),
"--verbose".to_owned(),
"--dangerously-skip-permissions".to_owned(),
"--disallowedTools".to_owned(),
NATIVE_DELEGATION_TOOL.to_owned(),
format!("--append-system-prompt-file={}", priors_path.display()),
]
}

#[derive(Debug, Clone)]
struct ActiveTurn {
turn_id: TurnId,
Expand Down
19 changes: 19 additions & 0 deletions src/adapter/cc/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ use crate::turn::TurnId;
use pretty_assertions::assert_eq;
use serde_json::json;
use std::collections::HashSet;
use std::path::Path;

#[test]
fn fingerprint_is_stable_for_same_input() {
Expand All @@ -17,6 +18,24 @@ fn fingerprint_changes_with_content() {
assert_ne!(fingerprint("a"), fingerprint("b"));
}

#[test]
fn base_claude_args_disallow_native_agent_delegation() {
let args = base_claude_args(Path::new("/tmp/priors.md"));
let disallow_idx = args
.iter()
.position(|arg| arg == "--disallowedTools")
.expect("claude args should include a native delegation guard");
assert_eq!(
args.get(disallow_idx + 1).map(String::as_str),
Some("Agent")
);
assert!(
args.iter()
.any(|arg| arg == "--append-system-prompt-file=/tmp/priors.md"),
"priors must still be appended: {args:?}"
);
}

#[test]
fn parse_mentions_picks_up_simple_names() {
let text = "Will check with @security and @frontend.";
Expand Down
Loading