Crucible

What survives the test, ships — evidence-gated execution for Claude Code.

Live site: crucible.withagents.dev Field journal entry: (coming)

What survives the test, ships.

A Claude Code plugin that converts task execution into a scientific procedure. Every change-producing run produces a reproducible evidence package, and completion is forbidden unless every Mandatory Success Criterion is backed by an inspectable artifact and a quorum of independent Oracles approves.

In one sentence: Crucible is the gate between "I did the work" and "the work is done."

What's in the box

	Count
Slash commands (`/crucible:*`)	19 — three tiers: orchestration, authoring, inspection
Skills	12 — codebase-analysis, docs-research, planning, skill-enrichment (NEW v0.4), validation, evidence-indexing, session-log-audit, oracle-review, completion-gate, enable, disable, setup
Subagents	11 — planner, codebase-analyst, docs-researcher, skill-discoverer (NEW v0.4), validator, 3 reviewers, 3 oracles
Hooks	4 — SessionStart, PreToolUse, PostToolUse, Stop
Bin scripts	4 — hook handlers (read JSON stdin, exit 2 to block)
Setup scripts	2 — CLAUDE.md installer + progress tracker
Skill scripts	3 — `gate.py` (completion gate), `build_index.py` (evidence indexer), `discover_skills.py` (NEW v0.4: skill-enrichment)
Rule templates	4 — Iron-Rule, Cite-or-Refuse, Cite-Paths, No-Self-Review

Iron-Rule violations: 0. Crucible was itself built under its own discipline — the build evidence package lives in ../evidence/ of this repo.

Why it exists

LLM-driven engineering systems routinely declare success without proof. They claim a feature works because the code looks right; they claim a refactor is safe because it compiles; they emit "Done!" while leaving silent test failures and missing migrations behind. This isn't adversarial — it's the default failure mode of a context-bounded system trained to produce coherent text. Coherent text is not evidence.

Crucible removes the option to fake completion. Three moves at the plugin layer:

Hooks watch every tool use. PreToolUse rejects writes to test files, mocks, stubs, fixtures. Stop refuses session end unless evidence/completion-gate/report.json shows overall=COMPLETE.
Verdicts cite paths or are invalid. Every PASS / FAIL / APPROVE / BLOCK must point to a specific file (and ideally line range). Prose isn't a citation.
Independence is structural, not advisory. The agent that produced an artifact may not also approve it. Three reviewers in isolation. Three Oracles in isolation. The synthesizer aggregates raw verdicts; it never rewrites them.

When Crucible says COMPLETE, an outside reviewer with only evidence/ can independently verify. When it refuses, the refusal is structured, machine-readable, and remediable.

Quick start

# 1. install (once per machine)
claude plugin marketplace add krzemienski/crucible
claude plugin install crucible@crucible-local

# 2. set up (once per project)
cd my-project
/crucible:setup --local

# 3. work
/crucible:forge "Add /healthz endpoint that returns {status:ok}"

If /crucible:forge refuses:

/crucible:remediate          # auto-generates delta plan from REFUSAL.md
/crucible:forge              # retry

Or use /crucible:autopilot <task> to loop forge → remediate → forge up to 3 attempts automatically.

If you're stuck and need out:

/crucible:disable             # clean opt-out
touch .crucible/disabled      # nuclear opt-out
CRUCIBLE_DISABLE=1 claude     # one-shell escape

Documentation

Doc	When to read it
`docs/OVERVIEW.md`	Architecture, philosophy, evidence model, gate sequence, quorum mechanics, refusal protocol — the conceptual reference
`docs/USAGE.md`	Per-command reference (all 19), per-skill reference (all 12), per-subagent reference (all 11), three worked walkthroughs, refusal recovery playbook, FAQ
`docs/CRUCIBLE-CLAUDE-MD.md`	The canonical CLAUDE.md fragment that `/crucible:setup` installs
`INSTALL.md`	Three install paths, prerequisites, troubleshooting, activation lifecycle
`CHANGELOG.md`	Release history (v0.1.0 → v0.4.0)

For "what does X actually do?" questions, run:

/crucible:explain forge          # DAG of any pipeline
/crucible:doctor                 # 9-check installation health
/crucible:status                 # current gate state

Command tiers (at a glance)

Tier 1 — Orchestration (the conductors)

/crucible:forge · /crucible:autopilot · /crucible:remediate · /crucible:resume · /crucible:trial

/crucible:forge is the 80% case: codebase-analysis → docs-research → planning → oracle plan-review → execute → validation → evidence-indexing → 3-reviewer consensus → 3-oracle quorum → completion-gate.

Tier 2 — Authoring (extend Crucible itself)

/crucible:setup · /crucible:stack-new · /crucible:skill-new · /crucible:agent-new · /crucible:rule-new · /crucible:hook-new · /crucible:command-new

Tier 3 — Inspection (read-only)

/crucible:doctor · /crucible:status · /crucible:explain · /crucible:fix · /crucible:graph

Tier 0 — Activation primitives (composed by Tier 1)

/crucible:enable · /crucible:disable · /crucible:planning · /crucible:validation · /crucible:codebase-analysis · /crucible:documentation-research · /crucible:evidence-indexing · /crucible:session-log-audit · /crucible:oracle-review · /crucible:completion-gate · /crucible:plan-and-execute · /crucible:validate · /crucible:audit

Full reference — including allowed flags, refusal modes, and worked examples — lives in docs/USAGE.md.

The four iron rules

These are installed into your project's CLAUDE.md by /crucible:setup, between  and  markers (idempotent, with backup).

RL-1 — Iron Rule (no mocks). Validation runs against real systems only. Forbidden: mocks, stubs, fakes, fixtures, test files, test frameworks, hand-written "expected" output presented as actual output.
RL-2 — Cite or Refuse. Every verdict cites a specific evidence file path. Prose isn't a citation.
RL-3 — No Self-Review. The agent that produced an artifact may not also review or approve it. Independence is structural.
RL-4 — Cite Paths. Citations must be maximally specific: file:lineN-M ideal; bare directory only if the whole dir IS the artifact; subtree paths invalid.

The canonical fragment with full text: docs/CRUCIBLE-CLAUDE-MD.md.

Activation is opt-in

A user-scope install does not enforce in every project. Hooks are silent no-ops unless ${CLAUDE_PROJECT_DIR}/.crucible/active exists. This was a deliberate v0.1.1 fix after the original design broke unrelated workflows.

Three escape hatches if you ever need out:

Method	Scope
`/crucible:disable`	This project (clean opt-out)
`touch .crucible/disabled`	This project (overrides active)
`CRUCIBLE_DISABLE=1 claude`	One shell session

The Stop-hook refusal message lists all four hatches inline so you never have to remember.

Full lifecycle: INSTALL.md.

Refusal is a feature

When evidence is missing or oracles BLOCK, Crucible writes a structured REFUSAL.md and stops. There is no override flag. There is no force-complete. A refusal is not a bug — it's the system functioning correctly.

The refusal lists exactly which MSCs failed, with cited evidence paths and a machine-readable delta plan. Run /crucible:remediate and Crucible auto-generates a focused fix plan that targets only the failing criteria.

If autopilot exits REFUSED at --max-attempts, the surviving cited gaps are real defects in the underlying system — not transient agent failures. Take the refusal to the team that owns the system.

Recovery playbook: docs/USAGE.md#8-refusal-recovery-playbook.

Status

Version	Date	Highlights
0.3.0	2026-04-27	Comprehensive docs (`docs/OVERVIEW.md`, `docs/USAGE.md`); README rewritten as top-of-funnel; setup-mechanism shipped (`scripts/`, `skills/setup/`, `templates/rules/`) — this release
0.2.1	2026-04-26	Documented 14 commands shipped silently in 0.2.0
0.2.0	2026-04-25	PRD gap remediation (16/21 closed); 5 new top-level commands; 4 declarative rules
0.1.1	2026-04-25	Critical opt-in fix: hooks were enforcing globally and breaking unrelated projects
0.1.0	2026-04-25	Initial release

Full history: CHANGELOG.md.

License

Build provenance

Crucible was itself built under its own discipline. Sixteen verification gates (VG-0 through VG-15) plus reviewer-consensus + oracle-quorum gated its delivery. The full build evidence package — including 26 cited upstream sources, dual-path install receipts, four robust trials, session-log audits with line citations, three independent reviewer reports, and three Oracle audit reports — lives at ../evidence/ (in this repo, alongside the plugin).

This is the longest possible answer to "does it work?" — the system you're about to install was held to the same standard it imposes on yours.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crucible

What's in the box

Why it exists

Quick start

Documentation

Command tiers (at a glance)

Tier 1 — Orchestration (the conductors)

Tier 2 — Authoring (extend Crucible itself)

Tier 3 — Inspection (read-only)

Tier 0 — Activation primitives (composed by Tier 1)

The four iron rules

Activation is opt-in

Refusal is a feature

Status

License

Build provenance

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.claude-plugin		.claude-plugin
agents		agents
bin		bin
commands		commands
docs		docs
hooks		hooks
rules		rules
scripts		scripts
skills		skills
templates/rules		templates/rules
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
INSTALL.md		INSTALL.md
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Crucible

What's in the box

Why it exists

Quick start

Documentation

Command tiers (at a glance)

Tier 1 — Orchestration (the conductors)

Tier 2 — Authoring (extend Crucible itself)

Tier 3 — Inspection (read-only)

Tier 0 — Activation primitives (composed by Tier 1)

The four iron rules

Activation is opt-in

Refusal is a feature

Status

License

Build provenance

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages