Skip to content

docs(roadmap): reconcile gate-state numbers to CASE_STUDY#13

Open
adithya0597 wants to merge 3 commits into
mainfrom
docs/reconcile-roadmap-calibration
Open

docs(roadmap): reconcile gate-state numbers to CASE_STUDY#13
adithya0597 wants to merge 3 commits into
mainfrom
docs/reconcile-roadmap-calibration

Conversation

@adithya0597

Copy link
Copy Markdown
Owner

The ROADMAP "Gate state" paragraph cited private-spine run numbers that contradicted both 03-evals/CASE_STUDY_calibration.md (the public authoritative calibration record) and the ROADMAP's own Near-item-2 / G3 sections (which already said "negative weight"). Reconciled the Status paragraph to the CASE_STUDY.

Was (private-run) Now (CASE_STUDY) Source
judge κ=1.0 κ unmeasurable (8/10 deterministic, N=10 smoke test) CASE_STUDY:29,72
sufficiency weight positive negative (−3.167 → −4.089, anti-correlated) CASE_STUDY:25,57
selective gain −3.0pp +10.0pp, stable across 3 fits CASE_STUDY:26,56
"(the private spine, 2026-06-12)" "(recorded in CASE_STUDY_calibration.md)"

Also disclosure-positive: removes private-spine figures from a public doc. Corrected story: the gate refused not on accuracy (it gained +10pp) but because the sufficiency signal refit negative — the gain came from confidence alone. Doc-only; no code change.

🤖 Generated with Claude Code

adithya0597 and others added 3 commits June 25, 2026 14:17
…ive run)

The Status "Gate state" paragraph cited private-spine run numbers that
contradicted both 03-evals/CASE_STUDY_calibration.md (the public authoritative
record) and the ROADMAP's own Near-item-2 / G3 sections. Reconciled to the
CASE_STUDY:
  - judge "κ=1.0"            -> κ unmeasurable (8/10 deterministic, N=10 smoke test)
  - sufficiency weight "positive" -> negative (-3.167 -> -4.089, anti-correlated)
  - selective gain "-3.0pp"  -> +10.0pp, stable across three fits
  - "(the private spine, 2026-06-12)" -> "(recorded in CASE_STUDY_calibration.md)"

Also disclosure-positive: removes private-run figures from the public doc. The
corrected story: the gate refused not on accuracy (it gained +10pp) but because
the sufficiency signal itself refit negative (gain came from confidence alone) --
the exact gap 11b's _support_coverage rewrite must close. Doc-only; no code change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MU8Dnc1C9Q63xd63Gy4QKr
Conductor's setup script symlinks the monorepo ~/.claude harness into
each fresh worktree (gitignored, never committed). Add ignore rules so
the symlink and the personal .conductor/settings.local.toml stay out of
this public repo.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ettings

- CLAUDE.md: project instructions (layer boundaries, core invariants, stage)
  so Conductor / Claude Code workspaces inherit repo guidance.
- .conductor/setup.sh: per-worktree bootstrap — shared local Neo4j (--wait on
  the compose healthcheck) + per-worktree venv + deps. Environment-ready, not
  demo-seeded.
- .conductor/settings.toml: run_mode=concurrent + setup hook.

Machine-specific config stays in gitignored .conductor/settings.local.toml.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant