Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .conductor/settings.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"$schema" = "https://conductor.build/schemas/settings.repo.schema.json"

[scripts]
run_mode = "concurrent"
setup = "bash .conductor/setup.sh"
22 changes: 22 additions & 0 deletions .conductor/setup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/usr/bin/env bash
# Conductor workspace bootstrap for builder-guild.
# Starts the shared local Neo4j instance, then creates a per-worktree venv and installs deps.
# Environment-ready, NOT demo-seeded: does not run etl.py / demo_seed.py — seed explicitly when needed.
set -euo pipefail
cd "$(dirname "$0")/.."

# Shared Neo4j for local development. The compose file uses a fixed container name (cb-neo4j) and
# fixed host ports (7474/7687), so Conductor worktrees share one local DB instance. -p is for
# consistency, not isolation. --wait blocks until the compose healthcheck reports healthy.
(
cd 01-context
docker compose -p builder-guild up -d --wait
)

# Per-worktree venv + neo4j driver + write/read smoke test against localhost:7687.
bash 01-context/setup_a2.sh

# Project dependencies for this worktree's venv.
./01-context/.venv/bin/python -m pip install -q -r requirements.txt -r requirements-dev.txt

echo "CONDUCTOR_SETUP_DONE"
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -55,3 +55,8 @@ audits/
# Beads issue tracker — local project management; issues.jsonl references git-ignored audits/ and
# internal decisions, so keep the whole tracker out of the public repo.
.beads/

# Conductor: monorepo harness is symlinked into worktrees at setup, never committed (public repo).
# Pattern has NO trailing slash so it also ignores a SYMLINK named .claude (git treats symlinks as files).
.claude
.conductor/settings.local.toml
69 changes: 69 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Builder Guild — Project Instructions

## What this repo is
Builder Guild is a graph-primary, bi-temporal, role-scoped knowledge base for AI agents, with a calibrated evaluation layer. The hard product boundary is:
- `01-context/` = online context + retrieval + enforcement
- `02-agents/` = agent consumer layer / trust boundary
- `03-evals/` = offline evaluation and calibration

Do not blur those layers. Online enforcement lives in `01-context`. Offline evaluation lives in `03-evals` and must not become a live decision signal.

## Stage (do not overstate — see README status)
- `01-context` runs end-to-end on the synthetic demo graph, with CI gating an import smoke plus the ingest/write engine and invariant sweeps on a real Neo4j.
- `03-evals` has run its first real calibration, which correctly **refused** to certify autonomy.
- `02-agents` is the design + contract + a demo consumer; fleet orchestration is **roadmap**.

Describe no layer as more finished than that.

## Core invariants
- Never let an LLM write a fact.
- Namespace on node and edge.
- Bi-temporal by default.
- No secrets in commits.
- Current truth, role-scoped truth, and temporal truth are separate concepts. Do not collapse them.
- Uncalibrated roles are suggest-only. Do not imply autonomy is leased unless `CALIBRATED[role]` is actually true in code.

## Repo-specific rules
- Facts enter through deterministic writes only (`MERGE` / `MATCH ... SET` style ETL).
- LLM-generated content belongs only in the recall layer, not in entity properties or factual edges.
- Any new node type or relation must carry `namespace` and preserve read isolation.
- Functional edges supersede old truth; additive edges accumulate. Check `01-context/schema/relations.yaml` before changing write semantics.
- If you touch retrieval, preserve honest role separation:
- keyword = exact IDs / names
- graph = structural truth
- vector = fuzzy recall
- If you touch serving, preserve the abstain / execution boundary. Passing the gate is not enough when the role is uncalibrated.

## What to read first by task
- For overall product and status: `README.md`
- For invariants and contribution constraints: `CONTRIBUTING.md`
- For near/mid/later priorities: `docs/ROADMAP.md`
- For schema and write semantics: `01-context/ONTOLOGY_SCHEMA.md`, `01-context/schema/relations.yaml`
- For retrieval design: `01-context/HYBRID_RETRIEVAL_ARCHITECTURE.md`, `01-context/RETRIEVAL.md`, `01-context/PAGEINDEX_PILOT.md`
- For agent trust boundary: `02-agents/AGENT_ARCHITECTURE.md`
- For calibration status and why autonomy is still blocked: `03-evals/CASE_STUDY_calibration.md`, `03-evals/CONTEXT_EVALS.md`

## Engineering expectations
- Keep changes focused and explain why.
- If you touch a write path, verify determinism and idempotency.
- If you touch isolation, verify subject + edge + object scoping, not just one surface.
- If you touch temporal behavior, verify as-of semantics explicitly.
- If you touch retrieval, preserve namespace safety and disclose fallback behavior honestly.
- If you touch calibration or autonomy, separate mechanism from grant. Code may revoke autonomy; it should not silently grant it.

## Things to avoid
- Do not treat roadmap items as shipped.
- Do not call GraphRAG / Corrective RAG the default live path unless the code path actually proves it.
- Do not add provenance / explainability systems to `01-context` casually; current priority is trust/calibration and temporal evidence.
- Do not move logic from `03-evals` into `01-context` just because it improves scores offline.
- Do not weaken namespace isolation for convenience.

## Testing / verification mindset
Before claiming a change is done:
- run the narrowest relevant proof
- verify the right layer changed
- confirm no invariant was broken
- if behavior is only documented, say that; if it was executed, say what command proved it

## Keep this repo publishable
Public and AGPL-licensed. No secrets (env vars only — see `.env.example`), no personal or operator notes, no cross-project context. Durable project guidance only.
12 changes: 8 additions & 4 deletions docs/ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,14 @@ RAG-pattern coverage (see `01-context/RETRIEVAL.md` + `HYBRID_RETRIEVAL_ARCHITEC
- **Multimodal RAG** — ✗ NOT built (text-only) → gap **G2** below.

**Gate state: suggest-only.** `01-context/src/abstain.py` has `CALIBRATED=False` with provisional
weights, so every decision routes to a human — autonomy is not leased. The last calibration run
(the private spine, 2026-06-12 — *last-measured, not re-run since*) reported judge κ=1.0 but a
FAILED coverage gate (sufficiency refit weight positive yet selective gain −3.0pp), so the gate
correctly refuses to certify → trust track **G3** below.
weights, so every decision routes to a human — autonomy is not leased. The calibration run recorded
in `03-evals/CASE_STUDY_calibration.md` (N=10 human-validated golden, 6-namespace graph, external $0
judge) measured a **+10.0pp selective gain** over confidence-alone (stable across three fits) yet
**refused to certify** — because the sufficiency proxy refit with a **negative weight** (anti-correlated,
−3.167 → −4.089: the selective gain came from confidence alone), the serving layer was only 40% correct
on its own golden before the fixes (80% after), and **judge κ was unmeasurable** (8/10 items scored
deterministically — N=10 is a smoke test). The gate correctly refuses on a broken sufficiency signal →
trust track **G3** below.

## Near

Expand Down
Loading