diff --git a/.conductor/settings.toml b/.conductor/settings.toml new file mode 100644 index 0000000..97e64ae --- /dev/null +++ b/.conductor/settings.toml @@ -0,0 +1,5 @@ +"$schema" = "https://conductor.build/schemas/settings.repo.schema.json" + +[scripts] +run_mode = "concurrent" +setup = "bash .conductor/setup.sh" diff --git a/.conductor/setup.sh b/.conductor/setup.sh new file mode 100755 index 0000000..c273746 --- /dev/null +++ b/.conductor/setup.sh @@ -0,0 +1,22 @@ +#!/usr/bin/env bash +# Conductor workspace bootstrap for builder-guild. +# Starts the shared local Neo4j instance, then creates a per-worktree venv and installs deps. +# Environment-ready, NOT demo-seeded: does not run etl.py / demo_seed.py — seed explicitly when needed. +set -euo pipefail +cd "$(dirname "$0")/.." + +# Shared Neo4j for local development. The compose file uses a fixed container name (cb-neo4j) and +# fixed host ports (7474/7687), so Conductor worktrees share one local DB instance. -p is for +# consistency, not isolation. --wait blocks until the compose healthcheck reports healthy. +( + cd 01-context + docker compose -p builder-guild up -d --wait +) + +# Per-worktree venv + neo4j driver + write/read smoke test against localhost:7687. +bash 01-context/setup_a2.sh + +# Project dependencies for this worktree's venv. +./01-context/.venv/bin/python -m pip install -q -r requirements.txt -r requirements-dev.txt + +echo "CONDUCTOR_SETUP_DONE" diff --git a/.gitignore b/.gitignore index f4a3290..9b77904 100644 --- a/.gitignore +++ b/.gitignore @@ -55,3 +55,8 @@ audits/ # Beads issue tracker — local project management; issues.jsonl references git-ignored audits/ and # internal decisions, so keep the whole tracker out of the public repo. .beads/ + +# Conductor: monorepo harness is symlinked into worktrees at setup, never committed (public repo). +# Pattern has NO trailing slash so it also ignores a SYMLINK named .claude (git treats symlinks as files). +.claude +.conductor/settings.local.toml diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..7caac2d --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,69 @@ +# Builder Guild — Project Instructions + +## What this repo is +Builder Guild is a graph-primary, bi-temporal, role-scoped knowledge base for AI agents, with a calibrated evaluation layer. The hard product boundary is: +- `01-context/` = online context + retrieval + enforcement +- `02-agents/` = agent consumer layer / trust boundary +- `03-evals/` = offline evaluation and calibration + +Do not blur those layers. Online enforcement lives in `01-context`. Offline evaluation lives in `03-evals` and must not become a live decision signal. + +## Stage (do not overstate — see README status) +- `01-context` runs end-to-end on the synthetic demo graph, with CI gating an import smoke plus the ingest/write engine and invariant sweeps on a real Neo4j. +- `03-evals` has run its first real calibration, which correctly **refused** to certify autonomy. +- `02-agents` is the design + contract + a demo consumer; fleet orchestration is **roadmap**. + +Describe no layer as more finished than that. + +## Core invariants +- Never let an LLM write a fact. +- Namespace on node and edge. +- Bi-temporal by default. +- No secrets in commits. +- Current truth, role-scoped truth, and temporal truth are separate concepts. Do not collapse them. +- Uncalibrated roles are suggest-only. Do not imply autonomy is leased unless `CALIBRATED[role]` is actually true in code. + +## Repo-specific rules +- Facts enter through deterministic writes only (`MERGE` / `MATCH ... SET` style ETL). +- LLM-generated content belongs only in the recall layer, not in entity properties or factual edges. +- Any new node type or relation must carry `namespace` and preserve read isolation. +- Functional edges supersede old truth; additive edges accumulate. Check `01-context/schema/relations.yaml` before changing write semantics. +- If you touch retrieval, preserve honest role separation: + - keyword = exact IDs / names + - graph = structural truth + - vector = fuzzy recall +- If you touch serving, preserve the abstain / execution boundary. Passing the gate is not enough when the role is uncalibrated. + +## What to read first by task +- For overall product and status: `README.md` +- For invariants and contribution constraints: `CONTRIBUTING.md` +- For near/mid/later priorities: `docs/ROADMAP.md` +- For schema and write semantics: `01-context/ONTOLOGY_SCHEMA.md`, `01-context/schema/relations.yaml` +- For retrieval design: `01-context/HYBRID_RETRIEVAL_ARCHITECTURE.md`, `01-context/RETRIEVAL.md`, `01-context/PAGEINDEX_PILOT.md` +- For agent trust boundary: `02-agents/AGENT_ARCHITECTURE.md` +- For calibration status and why autonomy is still blocked: `03-evals/CASE_STUDY_calibration.md`, `03-evals/CONTEXT_EVALS.md` + +## Engineering expectations +- Keep changes focused and explain why. +- If you touch a write path, verify determinism and idempotency. +- If you touch isolation, verify subject + edge + object scoping, not just one surface. +- If you touch temporal behavior, verify as-of semantics explicitly. +- If you touch retrieval, preserve namespace safety and disclose fallback behavior honestly. +- If you touch calibration or autonomy, separate mechanism from grant. Code may revoke autonomy; it should not silently grant it. + +## Things to avoid +- Do not treat roadmap items as shipped. +- Do not call GraphRAG / Corrective RAG the default live path unless the code path actually proves it. +- Do not add provenance / explainability systems to `01-context` casually; current priority is trust/calibration and temporal evidence. +- Do not move logic from `03-evals` into `01-context` just because it improves scores offline. +- Do not weaken namespace isolation for convenience. + +## Testing / verification mindset +Before claiming a change is done: +- run the narrowest relevant proof +- verify the right layer changed +- confirm no invariant was broken +- if behavior is only documented, say that; if it was executed, say what command proved it + +## Keep this repo publishable +Public and AGPL-licensed. No secrets (env vars only — see `.env.example`), no personal or operator notes, no cross-project context. Durable project guidance only. diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md index b27f8ea..732fb1b 100644 --- a/docs/ROADMAP.md +++ b/docs/ROADMAP.md @@ -13,10 +13,14 @@ RAG-pattern coverage (see `01-context/RETRIEVAL.md` + `HYBRID_RETRIEVAL_ARCHITEC - **Multimodal RAG** — ✗ NOT built (text-only) → gap **G2** below. **Gate state: suggest-only.** `01-context/src/abstain.py` has `CALIBRATED=False` with provisional -weights, so every decision routes to a human — autonomy is not leased. The last calibration run -(the private spine, 2026-06-12 — *last-measured, not re-run since*) reported judge κ=1.0 but a -FAILED coverage gate (sufficiency refit weight positive yet selective gain −3.0pp), so the gate -correctly refuses to certify → trust track **G3** below. +weights, so every decision routes to a human — autonomy is not leased. The calibration run recorded +in `03-evals/CASE_STUDY_calibration.md` (N=10 human-validated golden, 6-namespace graph, external $0 +judge) measured a **+10.0pp selective gain** over confidence-alone (stable across three fits) yet +**refused to certify** — because the sufficiency proxy refit with a **negative weight** (anti-correlated, +−3.167 → −4.089: the selective gain came from confidence alone), the serving layer was only 40% correct +on its own golden before the fixes (80% after), and **judge κ was unmeasurable** (8/10 items scored +deterministically — N=10 is a smoke test). The gate correctly refuses on a broken sufficiency signal → +trust track **G3** below. ## Near