vs-code-foundry — backlog

Last Updated: 2026-06-08 (v0.5.0 dynamic perspective dispatch built on feature/v0.5-dynamic-perspective-dispatch, pending win11 MS1-MS5 validation + tag: roster.py + perspective_policy sibling + proposer/steelman + challenger/analyst cross-vendor recast; #48 + config-driven model selection closed by design) Prior Update: 2026-06-07 (v0.4.0 env-adaptive cascade shipped: #46/#47 marked done pending win11 M1-M3 validation; filed #48 Tier-2 analyst re-point after gemini CLI retires 2026-06-18) Prior Update: 2026-06-05 (added #46 first-run env scan + #47 capability-adaptive cascade content — user requirement: prod has no CLIs; VS Code 1.123 research ingested into new .wiki/) Prior Update: 2026-05-23 (self-review — full-cascade architecture review of foundry → forge → bob → alf → pa (+kit) against stated intent + against sibling internal-rnd. Read-only analysis; no flow files modified. Review doc at docs/reviews/2026-05-23-cascade-architecture-review.md. 13 gaps (V1–V13); 9 filed as tasks #35–#43 (V9→#33, V12→#7 cross-referenced not re-filed; V10/V11/V13 bundled into #43). Defining finding: this fork dropped internal-rnd's entire enforcement engine — NO gates.py/claims.py/dual-verdict/ledger; all persona hard rules are prose, and progress/contract-map.yaml+.sig are HMAC-signed but read/verified by NOTHING (dead code / false guarantee). Trade-off is reasonable for a Copilot fork (more legible — bob 172 lines vs 573; better platform maturity — CI, Windows hardening, MCP conformance) but cascade correctness now rests entirely on prompt discipline + IDE host safety. Top moves: #35 decide contract-map fate (wire or delete), #36 telemetry rollup on existing actions.jsonl, #37 windows-latest CI. Uses THIS repo's additive alf formula (I+E+U−Eff), not internal-rnd's multiplicative one. No code changed.) Prior Update: 2026-05-22 (added #24-#34 from VS Code 1.121 / Copilot April-2026 release review)

Active Tasks

#	Task	Priority	Status	Notes
1	Validate v0.1.2 on real Windows 11 + VS Code 1.107+	high	open	The M3 capability-shape question gets answered live by VS Code's response to our dual-declaration (`tools.tasks: true` AND top-level `tasks: {}`). Run on a hardened enterprise machine to validate the cmd→ps1→py wrapper chain. F1 fix (sys.executable everywhere) means this should now succeed even on hardened machines with no `python3` alias.
2	Flesh out reference stub: `skills/vs-code-copilot-foundry/references/architecture.md`	medium	open	Currently 4-line placeholder. Should cover: foundry-server internals (job manager, skill registry, resource layer), persona-to-server data flow, sampling vs subprocess delegation trade-off.
3	Flesh out reference stub: `skills/vs-code-copilot-foundry/references/byok-setup.md`	medium	open	Currently placeholder. Should cover: VS Code 1.117+ BYOK setup for Business/Enterprise, plugging Anthropic / OpenAI / Google keys directly, how it integrates with foundry persona model: chains.
4	Flesh out reference stub: `skills/vs-code-copilot-foundry/references/custom-agents-spec.md`	medium	open	Currently placeholder. Should cover: `.agent.md` frontmatter schema, handoff buttons, model fallback chains, target field, agents allowlist, runSubagent semantics, depth=5 cap, cost-tier constraint.
5	Flesh out reference stub: `skills/vs-code-copilot-foundry/references/setup-copilot-cli.md`	medium	open	Currently placeholder. Should cover: `copilot --agent foundry -p ...` headless mode, `/agent` slash command, `/fleet` parallel orchestration, model selection in CLI vs picker.
6	Flesh out reference stub: `skills/vs-code-copilot-foundry/references/troubleshooting.md`	medium	open	Currently placeholder. Should consolidate all the "common issues" tables from INSTALL.md and SKILL.md + setup-vscode-chat.md into one canonical troubleshooting reference. Add Zone Identifier section, mcp.json troubleshooting, persona auto-discovery debugging.
7	Write `foundry_design` coordinator script	medium	open	v0.1.x returns a hint asking the caller (`@forge`) to do parallel codex+gemini calls manually. v1.2 should ship `foundry-server/coordinator_design.py` that runs the parallel delegations, aggregates, returns a synthesized design hypothesis to forge.
8	Build `foundry-cli` helper binary	medium	open	Currently config edits happen by hand-editing `~/.vs-code-foundry/config.json`. Ship a small CLI: `foundry-cli enable claude-bridge`, `foundry-cli redeploy --workspace <path>`, `foundry-cli status`, `foundry-cli logs`. Install to `~/.vs-code-foundry/bin/`.
9	Extract foundry-server modules if it grows	low	open	`foundry_server.py` is ~1080 LOC currently. If it grows past ~1300 LOC, split into `foundry_server.py` (protocol), `foundry_jobs.py` (job manager), `foundry_skills.py` (skill registry), `foundry_resources.py` (MCP resources). Tests follow.
10	Optional: thin VS Code extension wrapping foundry-server	low / v2	open	Use `mcpServerDefinitionProviders` to ship foundry-server as part of an extension, eliminating manual `.vscode/mcp.json` editing for users. ~1000-1500 LOC TS, 5-7 dev-days, Marketplace publication. Defer until real-user-demand-signal is clear.
13	Sign the PowerShell scripts with a code-signing certificate	low	open / future	For environments that enforce `AllSigned`. Requires acquiring a code-signing cert; cost + maintenance burden. Defer until a real user requests it.
14	Public-repo decision (currently PRIVATE)	open	future	If/when going public: scrub paths, polish README for public audience, add Marketplace/installer landing pages, decide LICENSE. Currently MIT but not yet committed publicly.
15	Real-fleet user feedback gathering	low	future	Once 5+ real users have installed foundry, gather telemetry on which personas + tools are actually used. Drives the v1.2 polish backlog.
16	Remove v0.1.x backward-compat symlink at `~/.vs-code-foundry/skills/vs-code-copilot-foundry`	low	open / v0.3.x	The v0.2.0 installer creates this symlink so anyone with hand-coded absolute paths or custom MCP clients pointing at the old location keeps working. Removable in v0.3.x once enough time has passed for consumers to migrate. Drop the migration block in `install_skill_family()` and document the removal in CHANGELOG.
17	Schema-validate one SKILL.md from the family against the published agentskills.io schema as part of installer smoke	low	open	Defensive guard against schema drift between foundry's installed skills and MS Copilot's parser expectations. Both use agentskills.io; this would catch any future divergence. ~30 LOC in `installer/install-foundry.py` smoke_test() or a new helper.
18	Real Windows validation of v0.2.0 (Developer Mode + non-admin symlink path)	medium	done (superseded)	Closed 2026-05-21 by the win11-laptop install of v0.3.1+. Confirmed: (a) WinError 1314 warning fires cleanly on non-Developer-Mode + try/except continues correctly; (b) `~/.copilot/skills/` canonical location works regardless; (c) `Path.home()` honors `USERPROFILE`. Three subsequent fixes (v0.3.1 / v0.3.2 / v0.3.3) shipped same day from findings. v0.3.3 full re-install still pending real-machine validation — tracked as #23.
23	Real Windows validation of v0.3.3 full re-install path	medium	open	Re-run `install-foundry.cmd -Yes -PrefixPath %USERPROFILE%\.vs-code-foundry` on win11-laptop after `git pull` + `git checkout v0.3.3`. Confirm: (a) 6 personas land in `~/.copilot/agents/` AND `~/.vs-code-foundry/agents/`; (b) no `DeprecationWarning: datetime.utcnow()`; (c) personal-os smoke leg completes with `status: PASS` (no UnicodeEncodeError on report-print); (d) `Developer: Reload Window` in VS Code → `@` autocomplete shows all 6 personas. Then workspace-install round (`-WorkspacePath C:\dev\vs-code-foundry`) → confirm `.vscode\mcp.json` registers both servers + `@foundry foundry_health` + `@kit kit_health` both respond. File REPLY thread in `remote_claude.md`.
19	Installer auto-creates workspace dir when `--workspace` points at a missing path	low	open	Discovered during WP-14 smoke 2026-05-18: when `--workspace <path>` points at a non-existent dir, `install-foundry.py` prints `ERROR: Workspace path is not a directory` but proceeds anyway. `install_workspace_skills_scaffold()` creates the dir (and a `.github/skills/README.md`), then `install_workspace()` short-circuits because the dir didn't pre-exist. Net result: partial workspace install (just the skills scaffold). Pre-existing v0.2.0 behavior, not v0.3.0 regression. Fix: have `install_workspace()` also `Path.mkdir(parents=True, exist_ok=True)` before writing files, OR have `_validate_workspace_path()` mkdir as part of its check. Either way the misleading ERROR log line should be downgraded to INFO + auto-creation message.
20	Document Claude Code agent name collision in INSTALL.md / AGENTS.md R-rules	low	done (v0.5.1)	Discovered 2026-05-21 on win11-laptop: VS Code Copilot scans both `~/.copilot/agents/` (foundry) and `~/.claude/agents/` (Claude Code) per MS Agent Skills spec → `alf`/`bob`/`pa` name dupes in Copilot Chat's `@` autocomplete. Closed in v0.5.1 (`feature/v0.5.1-installer-agent-reconcile`): (a) INSTALL.md "Agent scopes & dedup" + "Known interactions (#20)" sections shipped; (b) AGENTS.md R2 carve-out wording (the installer may do a metadata `exists()` only on `~/.claude/agents/{bob,alf,pa}.md` for the warning — no content read/write/delete); (c) the installer now DETECTS the collision at install time and prints a metadata-only warning (`warn_claude_collision()`), and the unrelated foundry-internal `2× bob` duplication is fixed by the reconcile engine + `--agent-scope` default global. NOTE: the `~/.claude/` name collision itself is a cross-tool fact that foundry only warns about (never deletes Claude's files); real win11 smoke of the warning is tracked in `remote_claude.md`.
21	Installer "user-level-only" mode should warn that `kit_*` tools won't function without workspace MCP wiring	low	open	Discovered 2026-05-21: when user answered `Install to current workspace? n`, persona files (post-v0.3.3) reach `~/.copilot/agents/` and Copilot Chat shows `@kit`. But `kit_health` / `kit_status` / etc. all fail because `personal-os-server` isn't registered in any workspace's `.vscode/mcp.json`. User saw `@kit` appearing but tools nonfunctional — confusing. Fix: when `--with-personal-os` is ON and workspace install is being skipped, emit a clear INFO line at end of install explaining the limitation + how to wire a workspace later.
22	Smoke runner status line on Windows: avoid printing dynamic content that could exceed cp1252 even with reconfigure	low	open	v0.3.2 fixed the immediate UnicodeEncodeError by reconfigure(utf-8, errors=replace), but the underlying issue is that log lines accumulating across all 14 KIT tool exercises CAN contain user-data with non-cp1252 chars (e.g., file paths with em-dashes, sample task titles). Long-term: audit smoke_runner.py's log accumulation for any string interpolation that could surface user content. Belt-and-braces with the v0.3.2 reconfigure should hold; this is preventive.
24	Evaluate Agent Host Protocol (AHP) for foundry	medium	open / v0.4 eval	VS Code 1.121 introduced AHP (microsoft.github.io/agent-host-protocol/) plus Remote Agents (Preview): agent sessions coordinated across SSH / Dev Tunnels, with a lightweight "agent host" process that survives client disconnection. Foundry today is local-stdio MCP only. Decide one of: (a) ignore for v0.x (foundry stays local), (b) prototype `@bob` / `@kit` running on a remote AHP host in v0.4, (c) declare no-fit. Inputs: AHP spec, multi-client coordination story, whether foundry-server can host as AHP server, whether stdio MCP still works inside an AHP host. Deliverable: 1-page decision doc in `docs/plans/`.
25	Document Claude-agent permission settings in `agents/bob.agent.md` + INSTALL.md	medium	open	VS Code 1.121 added `github.copilot.chat.claudeAgent.allowAutoPermissions` (Auto Mode — execute without permission prompts but with background safety checks) and `github.copilot.chat.claudeAgent.allowDangerouslySkipPermissions` (unrestricted). `@bob` is the persona that writes code; users currently get prompted for every edit. Document both settings in `agents/bob.agent.md` body (recommend `allowAutoPermissions` for trusted workspaces, never `allowDangerouslySkipPermissions`) and add an INSTALL.md "Autonomy modes" subsection. R6 (no `--no-verify`) still holds — these settings affect prompts only, not commit policy.
26	Add `chat.utilityModel` / `chat.utilitySmallModel` recommendations to BYOK reference stub	low	open	Folds into task #3 (BYOK reference stub). VS Code 1.121 added two settings to override the default model for general utility flows (titles, summaries, commit messages, rename suggestions) and lightweight utility tasks. Orthogonal to persona `model:` chains (R9) — these are user-tier. Recommend cheap small models here to cut token cost; suggest `claude-haiku-4-5-20251001` for `chat.utilitySmallModel` and `gemini-flash-line` or `gpt-5.4-mini` (subject to user BYOK) for `chat.utilityModel`.
27	Verify persona `model:` display-names resolve under all 6 BYOK providers	medium	open	April-2026 Copilot release: Business/Enterprise BYOK now covers OpenRouter, Microsoft Foundry, Google, Anthropic, OpenAI, and other Chat-Completions/Responses/Messages-compatible endpoints. R9 mandates display-name strings (e.g., `'Claude Opus 4.7 (anthropic)'`). Smoke-test that each of the 6 personas' `model:` declarations actually resolve when the workspace BYOK admin policy is set. Also note: the Insiders Custom Endpoint Provider replaces the deprecated `customoai` provider — update `references/byok-setup.md` (task #3). Output: a `docs/byok-matrix.md` mapping each persona × each BYOK provider → resolved model.
28	Audit `JobManager.spawn` for `VSCODE_AGENT` env-var awareness	medium	open	VS Code 1.121 sets `VSCODE_AGENT` on agent-initiated terminals so CLIs can detect agent context and switch to machine-readable output. foundry-server's `JobManager.spawn` and the future `coordinator_design.py` (task #7) run subprocesses (`foundry_codex` etc.) that could benefit. Two-part fix: (a) `JobManager.spawn` passes `VSCODE_AGENT=1` to its subprocess env (so child CLIs know they're under an agent — they already are; just be explicit), (b) when foundry-server itself sees `VSCODE_AGENT` set by VS Code Copilot, log a single INFO line at startup confirming agent context.
29	Test `foundry_codex` long-running jobs vs VS Code background-terminal auto-cleanup	medium	open	VS Code 1.121: background terminals created by chat agents auto-dispose upon command completion. foundry-server's `JobManager` spawns its own subprocess directly (not via the chat terminal) so the disposal shouldn't reach it, but the new behavior changes the affordance landscape. Smoke-test: start a long-running `foundry_codex` job, switch chat context, verify the job continues + result is retrievable via `foundry_jobs_get`. Add to `tests/test_foundry_server.py` if reproducible.
30	Evaluate deferring `foundry_search_skills` to Copilot's local semantic index	low / v2	open	April-2026: semantic indexing now works in all workspaces; agents can run grep-style search across GitHub repos/orgs via the new `githubTextSearch` tool; experimental `/chronicle` queries chat history (`github.copilot.chat.localIndex.enabled`). foundry-server's `SkillRegistry` / `foundry_search_skills` is glob+regex based. Evaluate whether to (a) keep glob (simple, stdlib), (b) defer to Copilot's local index when present, (c) emit both. Decision constrained by R8 stdlib-only.
31	Optional OTel emission from `foundry_` and `kit_` tool handlers	low / v2	open	VS Code 1.121 ships prebuilt Azure Managed Grafana dashboard visualizing agent operations / token usage / chat sessions / tool calls / per-model latency. If foundry-server emitted OpenTelemetry spans per tool call, users' dashboards would show foundry tool usage natively. Tension with R8 stdlib-only — `opentelemetry-api` would be foundry's first dep. Possible compromise: env-gated optional dep (`pip install vs-code-foundry[otel]`); off by default; foundry-server runs without it if not installed. Defer until a user requests dashboard integration.
32	Review `agents/bob.agent.md` for stale text vs new in-chat diff visualization	low	open	April-2026: code changes display as inline diffs directly in chat threads. `@bob` persona body may contain instructions like "your edits will be applied silently" or "the user can't see the diff" that contradict the new affordance. Grep `agents/bob.agent.md` for such language and update — the new affordance is better UX so any stale defensive text just reads weird now. ~10 LOC change at most.
33	Document terminal read/write security boundary in INSTALL.md "Security" section	medium	open	April-2026: agents gained read/write capabilities to any open terminal in VS Code Copilot. This is orthogonal to foundry-server's R2 boundary (which governs what foundry-server itself reads), but it changes the user's threat model — any other agent in the same workspace can now read terminals foundry-server's subprocesses (`foundry_codex` etc.) are running in. INSTALL.md needs a new subsection alongside the Claude Code agent name collision (task #20) covering: (a) what VS Code's new terminal access means, (b) recommend isolating sensitive workspaces, (c) note that VSCODE_AGENT being set is the signal.
34	Update `@forge` persona body to emit mermaid for design exploration	low	open	VS Code 1.121: built-in `Mermaid Markdown Features` extension renders mermaid code blocks in Markdown preview, notebooks, AND chats, with pan/zoom support. `@forge` does design exploration in chat — emitting mermaid sequence diagrams, component graphs, state machines would substantially improve the design conversation. Update `agents/forge.agent.md` body to encourage mermaid output for: component relationship graphs, sequence diagrams during cross-CLI deliberation, state machines for cascade transitions. No tools change needed.
35	Self-review V2 — decide the fate of the signed `progress/contract-map.yaml` (wire it or remove it)	high	open	Architecture review 2026-05-23 (`docs/reviews/2026-05-23-cascade-architecture-review.md`). `progress/contract-map.yaml` + `.sig` exist (HMAC-signed, "rev 2 covers both servers", AGENTS.md 245-248) but a full grep of `foundry-server/`, `agents/`, and tests finds NO runtime read, NO signature verification, NO gate consuming them — vestigial copy of internal-rnd's pattern, never wired in. A signed artifact nothing verifies implies integrity the system doesn't enforce. Decision: (a) wire a real verify step bob/forge must pass before execution (gives the fork one genuine mechanical gate, partially closes V1), OR (b) delete the artifact + document in AGENTS.md/PROJECT.md that vs-code-foundry is prompt-discipline-only. Cheapest high-value item; do first. alf score 6.
36	Self-review V4 — build efficacy-telemetry rollup on the existing `actions.jsonl` substrate	medium	open	Architecture review 2026-05-23. No metric exists for bob-PARTIAL rate, Codex/Gemini false-positive rate (forge/alf both cite "~60% FP" with zero measurement behind it), user-override rate, or test-failure-at-completion rate. BUT `foundry_log_action` + `~/.vs-code-foundry/actions.jsonl` already log actions — substrate is there, only the rollup/metric layer is missing. Ship a `foundry-cli metrics` (or MCP `foundry_metrics` read-only tool) that aggregates actions.jsonl into the above rates. Makes the "triple-model coverage is the value-add" claim falsifiable. Highest ROI for making every other claim measurable. alf score 8 (MODERATE).
37	Self-review V7 — add a `windows-latest` CI job (riskiest platform, least automated coverage)	medium	open	Architecture review 2026-05-23. CI (#11) runs ubuntu+macos × py3.10/3.11/3.12 but EXCLUDES Windows (tests use POSIX paths `/bin/cat`, `/tmp`). Yet Windows is where bugs keep surfacing: cp1252 console (#22), py3.14 (v0.3.2), WinError 1314 symlink (#18), datetime.utcnow (#23). The most-targeted enterprise platform is the only one without a CI job. Fix: add `@unittest.skipIf(os.name=='nt')` (or tempfile/shutil.which abstractions) to the POSIX-path tests, then add a windows-latest leg to `.github/workflows/ci.yml`. Distinct from #23 (one-off manual re-validation) — this AUTOMATES it. alf score 8 (MODERATE).
38	Self-review V3 — add an independent verification pass to the design→build arc	medium	open	Architecture review 2026-05-23. Unlike internal-rnd's cold-context dual-verdict, here bob SELF-reports COMPLETE/PARTIAL/FAILED (`bob.agent.md` Step 5); the only objective check is "run the test suite" (Step 4); "report PARTIAL honestly" is prose (line 156). `@alf` can review bob's output but only runs on explicit user invocation — it is NOT in the build path. A padded COMPLETE is caught only if a human or separately-invoked alf looks. Candidate: make forge's bob-handoff include an auto-alf-review step on completion (button → alf), OR a lightweight `foundry_verify` MCP tool that re-runs tests + diffs against the WP plan independently of bob's self-report. alf score 8 (MODERATE).
39	Self-review V8 — no SAST/secrets gate on bob-generated code	medium	open	Architecture review 2026-05-23. bob writes code with only an OPTIONAL `foundry_codex` review (bob line 143). No secrets scan, no SAST, no pre-commit security gate. internal-rnd has a pre-push secrets-scan + the S038 SAST batch in flight (#109/#112 there); none exists here. The Copilot fork ships code to user workspaces with no security floor beyond "ask Codex if you feel like it." Candidate: port internal-rnd's `scripts/secrets-scan.{sh,py}` as a foundry pre-commit hook the installer can wire, + optional bandit/semgrep invocation via a `foundry_sast` tool. Coordinate with cross-repo-review.md (internal-rnd S038 is the upstream source). alf score 7 (MINOR, just under threshold).
40	Self-review V1 — no mechanical enforcement floor; all persona hard rules are prose (keystone)	high	open	Architecture review 2026-05-23. The defining structural gap. internal-rnd's thesis ("convert prose rules to subprocess gates because LLMs drift") is dropped entirely here — bob's "Hard rules" (lines 147-157) are unenforced prose with no backstop if the model ignores them. Intrinsic to the Copilot platform choice to a degree (no easy mid-agent subprocess gates), but it means cascade correctness == model instruction-following on a given turn, with no floor. Root cause beneath V2/V3/V5. Not a single fix — a direction: decide how much mechanical floor this fork wants. Minimum viable floor = wire the contract-map (V2/#35) + an independent verify (V3/#38) + a security gate (V8/#39). Revisit scope after #35 and #36 land. alf score 8 (CRITICAL-structural; additive formula compresses it).
41	Self-review V5 — autonomy fully delegated to IDE host; foundry has no backstop of its own	low	open	Architecture review 2026-05-23. `bob.agent.md` 118-129 cedes the permission model to VS Code 1.121 `claudeAgent.allowAutoPermissions` ("execute without prompts but with background safety checks") — but those checks are COPILOT'S, not foundry's. If a user flips `allowDangerouslySkipPermissions`, nothing foundry-side catches a destructive action. internal-rnd had gates as a host-independent backstop. Low-urgency given the warning text already discourages the dangerous setting, but worth a documented "foundry provides no independent safety check; you are trusting the IDE host" note in INSTALL.md Security section (folds with #33). alf score 6.
42	Self-review V6 — fork drift from internal-rnd is structural and one-directional	low	open	Architecture review 2026-05-23. AGENTS.md R2 accepts persona drift by design ("this is a fork"). Consequence: internal-rnd's rigor advances (gate system, dual-verdict, S038 security batch) and NONE flows here automatically; `cross-repo-review.md` is a manual queue. This fork falls progressively behind the original on rigor unless human-pollinated. Not necessarily wrong — but make it a tracked cadence: a periodic (monthly?) pass over internal-rnd's `history.md` + `cross-repo-review.md` Outbound entries to decide what to adopt. Otherwise drift compounds silently. alf score 6.
43	Self-review V10/V11/V13 — minor hygiene: read-only invariant, version drift, bob rollback	low	open	Architecture review 2026-05-23. Three MINOR items bundled. V10: personal-os→foundry "read-only" (R13) is SQLite `mode=ro` convention, not enforced — add a test asserting no rw open path exists across personal-os reads (score 5). V11: VERSION constant must be hand-synced across `foundry_server.py:34` + `install-foundry.py` ("pre-existing drift acknowledged" in v0.1.2 notes) — single-source it (e.g. read from one `VERSION` file) so a release can't ship mismatched self-reported versions (score 5). V13: bob runs sequentially in one chat with no `.bob-checkpoint.md` equivalent; an interrupted multi-WP run loses progress state (per-WP commits mitigate partially) — consider a lightweight resume protocol (score 5).
44	@testbed forge cycle continuation — finish design sections 3-9 + write canonical design doc + spawn bob	high	open / v0.4	Forge cycle paused 2026-05-23 at sections 1 (Goal) + 2 (Approach) user-approved. WIP captured at `docs/plans/2026-05-23-testbed-design-WIP.md`. Decisions frozen: B+C+D modalities (A visual deferred to v0.5 per R8 stdlib-only); terminal testbed (no auto-fix loop, max 1 user-approved recheck); separate `testbed-server` (3rd MCP server, mirrors personal-os-server per R13); HMAC freshness gate on `progress/contract-map.yaml.sig`; opt-in dev-server policy. Do NOT re-spawn the triple-model design team — full convergence captured in WIP doc (Claude challenger NEEDS-REWORK 4 CRITICAL, Codex challenger NEEDS-REWORK 3 CRITICAL, Gemini analyst 8 research areas). Continue with: section 3 (Components — files + line-count), 4 (Data flow — verdict pipeline), 5 (Error handling — INCONCLUSIVE / INFRA-FAILED vs DELIVERED-BROKEN), 6 (Testing — stdlib-unittest mocking strategy for browser tools), 7 (Performance — modality run-time budgets), 8 (Open questions), 9 (WP plan for bob). Per forge protocol pause after each section. Then write canonical design at `docs/plans/2026-05-23-testbed-and-contract-loop-design.md`, generate signed contract map via `component-contract-mapping` skill, run G1 verify, spawn bob.
45	Implement testbed-server (v0.4) — Phase 1 B+C+D modalities, terminal posture, dedicated 3rd MCP server	high	open / v0.4	Blocked on #44 (design doc + contract map). Phase 1 scope (per user decisions captured in #44 WIP): new `testbed-server/` Python stdlib MCP server (~6 tools, mirrors `foundry-server/` shape) + new persona `agents/testbed.agent.md` + new skill `skills/vs-code-copilot-foundry/references/visual-testing-playbook.md`. Cascade integration: @bob auto-invokes when HMAC-fresh ledger artifacts present (Step 4 verification gate); @kit routes via `kit_queue_prompt` to target-workspace @bob/@pa (no direct invocation); @alf delegates rendered-behavior URL targets; @forge gains optional "Preview with @testbed" handoff button. R-rule update: R4 combined ≤40 tool cap stays under (~15 foundry + 14 personal-os + ~6 testbed = ~35); R13 extends to three-server topology. New tests: ~25 in `testbed-server/tests/test_testbed_server.py` + ~5 cascade-edge tests in `foundry-server/tests/test_foundry_server.py`. Installer (`install-foundry.py`) gains `--with-testbed` flag (default ON), registers third server in `.vscode/mcp.json`. R11 cross-repo-review.md entry: informational only (internal-rnd has the analogous `visual-arbiter` / `verification-arbiter` but vs-code-foundry's testbed is side-loaded with different boundaries — not a port request).
46	First-run environment scan + capabilities manifest — foundry-native env-adoption for no-CLI prod environments	high	done (pending win11 M1-M3 validation) 2026-06-07	Shipped in v0.4.0 (`docs/plans/2026-06-06-env-adaptive-cascade-design.md`): `foundry_env.py` scanner + `capabilities.json` schema v1 + `foundry_capabilities` MCP tool + first-start daemon scan + installer manifest write + `refresh_capabilities.py` + verifier G13. M1-M3 manual smoke on win11-laptop still gates the v0.4.0 tag (see `remote_claude.md`). User requirement 2026-06-05. vs-code-foundry sits on top of agent-foundry-derived content that is CLI-flavored, but prod environments have NO CLIs (no claude/codex/agy, possibly no copilot binary) — VS Code + Copilot only. First run (install-time AND first foundry-server start — prod users may never re-run the installer) must scan the environment: copilot/codex/gemini/claude CLI presence+version, Python, git, network posture. Output: `~/.vs-code-foundry/capabilities.json` (tier + per-tool availability, analogous to `~/.claude/state/inventory.json` from the env-adoption skill — port the pattern, not the code). Consumers: `foundry_health` (extend existing probes), new read-only `foundry_capabilities` MCP tool (R4 budget: 16/25 after add), installer deploy decisions, #47 adaptation layer. Wiki analysis: `.wiki/wiki/comparisons/foundry-multi-model-orchestration-options.md` §Environment-tier availability.
47	Capability-adaptive cascade content — no-CLI degradation paths in personas/skills	high	done (pending win11 M1-M3 validation) 2026-06-07	Shipped in v0.4.0: cascade personas (foundry/forge/bob/alf) got `'agent'` tool + `## Capability routing` + baked `## Capability floor (Tier 0)` + mechanism-conditional hard rules + tier banner; 2 Tier-0 worker personas (challenger/analyst, `(copilot)` model arrays, subagent-only); `TestPersonaTripwires` mechanical gates. Chose adaptation mode (c) hybrid → resolved to static-floor + manifest-hint + live-tool (no install-time templating). M1-M3 win11 smoke gates the tag. Original requirement: re-route by tier — Tier 2 (CLIs) = current design; Tier 1/0 = native `runSubagent` fan-out with per-worker `model:` pins from Copilot's three-vendor catalog (triple-model challenge SURVIVES with zero CLIs). HARD boundary held: adaptation rewrote foundry's OWN shipped copies only, NEVER `~/.claude/` (R2). User requirement 2026-06-05; depends on #46.
48	Re-point the Tier-2 analyst lane after the gemini CLI retires 2026-06-18	medium	closed by design (v0.5.0) 2026-06-08	Resolved by the v0.5 dynamic perspective dispatch (`docs/plans/2026-06-08-dynamic-perspective-dispatch-design.md`). The analyst lane is no longer hard-pinned to a single CLI delegate: the analyst STANCE is dispatched dynamically from the `perspective_policy` plan (the roster), and on a host without a live external CLI it falls to the Tier-0 native floor `runSubagent('analyst', { model: <plan model> })` — design-decision (b), now the primary path. `foundry_capabilities._recommended_routing` already treats `avail("gemini") or avail("agy")` as the Tier-2 analyst signal (unchanged in v0.5), and `agy` is probed by the manifest, so the 2026-06-18 gemini-CLI cutover degrades cleanly with zero code change (analyst stance routes to the floor or to `agy` when present). Subprocess delegate swap to `agy -p` (option (a)) remains an OPTIONAL future polish, not a blocker — file a fresh narrow task if/when desired.
49	Config-driven model selection (was tracked in `docs/models.md`)	medium	closed by design (v0.5.0) 2026-06-08	The old `docs/models.md` "make Layer 3 config-driven" item (hardcoded reviewer model names → read from `config.json`) is subsumed by the v0.5 `model_roster`. Model selection is now dispatch-time + roster-driven: `roster.py`'s `DEFAULT_ROSTER` + `read_roster(home)` deep-merges a `model_roster` block from `config.json`, and `resolve_perspectives` projects the `(stance, model, angle)` plan — one config edit swaps models with no code change or test run, and a reinstall preserves the edit (deep-merge, not clobber). `docs/models.md` is now a thin pointer. R15 codifies the principle.
50	v0.6 — vs-code-foundry as the VS Code BRIDGE to agent-foundry (kill the "2 sets" maintenance; thin bridge + generated subagent-team floor)	high	design brief ready 2026-06-08	DESIGN BRIEF → `docs/plans/2026-06-08-v0.6-agent-foundry-bridge-design-BRIEF.md`; start a fresh `forge` COMPLEX cycle on it. User direction: agent-foundry = single source of truth (176 skills + flows + canonical agents, "where main dev lives"); vs-code-foundry = thin Copilot BRIDGE (personas bridge the claude/codex/agy CLI use case). ADAPTIVE (standalone floor preserved). GENERATE the Tier-0 floor personas from agent-foundry (build-time adapter) to drive native VS Code 1.123 subagent TEAMS; DELEGATE to agent-foundry's CLI flows when present; ROUTE skills to the 176. Measured this session: the "2 sets" pain is the PERSONAS (bob 624L↔226L, 5 shared lines, drifting); skills barely duplicated; the 1338L server is legitimate bridge. Open sub-Qs (for the design team): generator transform; the enforcement-engine re-import decision (entangled with #35/#40 — its own phase); the agent collision (#20); the R2 amendment. Likely phased: P1 generator+floor / P2 delegation+skills / P3 enforcement. Revises the 2026-05-11 "deliberately separate" directive (governance — reflect in AGENTS.md R2 + cross-repo-review). 1.123 currency verified live 2026-06-08.

Done

#	Task	Done date	Notes
0	Initial repo creation + v0.1.0 ship	2026-05-11	5 personas + foundry-server + tests + installer + templates + skill family — see history.md 2026-05-11
0a	v0.1.1: Windows installer + cross-platform Python fix + R&D docs moved in	2026-05-11	install-foundry.cmd / .ps1 enterprise-hardened, install-foundry.py uses sys.executable for mcp.json, design.md + spec-review.md + 5 research briefs added
0b	Belt-and-braces guards in `~/.claude/publish-config.json` to prevent accidental publication via agent-foundry pipeline	2026-05-11	Added `_vs_code_foundry_separation_doc` + 2 new exclusions
0c	Initial release tags pushed	2026-05-11	v0.1.0 + v0.1.1
F1	Fix installer hardcoding `python3` instead of `sys.executable` (Codex F1, critical)	2026-05-12	install-foundry.py 3 sites + test_foundry_server.py 2 sites — see history.md 2026-05-12. Side-fixed test suite portability to Windows.
F2	Fix `JobManager.spawn` closing stdin before `communicate()` (Codex F2, critical — foundry_codex fully broken)	2026-05-12	Stored stdin on `Job.stdin_data`; watcher passes it via `communicate(input=...)`. Tightened no-stdin branch from inherited to DEVNULL. New regression test `TestJobManagerStdin`.
F3	Fix `foundry://` path traversal in `resources/read` (Codex F3, moderate — local info-disclosure)	2026-05-12	New helper `_safe_resolve_under` with separator-rejection + `Path.relative_to` containment check. Both agent + design handlers route through it. 3 new tests in `TestResourceTraversal`.
v0.1.2	Release of codex-verified bugfix bundle	2026-05-12	4 commits on `feature/v0.1.2-codex-bundle` (WP1 F2, WP2 F3, WP3 F1, WP4 version+docs); PR opened; user merges manually. VERSION constants bumped to "0.1.2" in BOTH `foundry_server.py:34` AND `install-foundry.py:144` (pre-existing drift acknowledged).
F4	Fix `foundry_health` bypassing Claude-skills dual-opt-in (Codex F4 — correctly identified)	2026-05-13	The original tasks.md F4 description was a hallucination (talked about `_check_cli("claude")` over-probing — that's not what the code does). The REAL Codex F4 was about `SkillRegistry` reading `~/.claude/skills/` whenever `read_claude_skills: true` in config, bypassing the per-call dual-opt-in documented in R2. Fix: drop the constructor flag, make `include_claude` per-call, gate at the tool layer with `caller_param AND config_flag`. Regression: new `TestClaudeSkillsDualOptIn` class with 4 tests covering the full matrix.
F5	Relax R1 PowerShell line cap from 200 to 250 with rationale	2026-05-13	AGENTS.md R1 row updated. Two-tier rule: 250 for read-only verifiers / installers with no complex flow; 200 if the script has conditional logic / functions / non-trivial loops. `verify-foundry-setup.ps1` (227 lines, `Write-Host` formatting only) is now under cap; no code change required there.
F6	Fix `foundry_design` coordinator hardcoding `python3` (same bug as F1, dead code in v0.1.x)	2026-05-13	`foundry_server.py:758` — `cmd[0]` `"python3"` → `sys.executable`. Surrounding `if coordinator.exists():` block is unreachable in v0.1.x (the script doesn't ship; tasks.md #7), but fixed proactively so F1's bug doesn't resurface when the script lands. Post-fix grep confirms no live-code `"python3"` strings remain anywhere.
v0.1.3	Release of cleanup bundle	2026-05-13	4 commits on `feature/v0.1.3-cleanup-bundle` (WP1 F4, WP2 F5, WP3 F6, WP4 release prep). VERSION bumped to `"0.1.3"` in both `foundry_server.py:34` and `install-foundry.py:154`. AGENTS.md R7 test count refreshed from "19 unit tests" to "27 unit tests (as of v0.1.3)".
11	GitHub Actions CI: run the test suite on push + PR	2026-05-13	`.github/workflows/ci.yml`. Matrix: ubuntu-latest + macos-latest × Python 3.10 / 3.11 / 3.12 (6 jobs). Steps: syntax validation, JSON template validation, run `python -m unittest tests.test_foundry_server -v` from `foundry-server/`, installer smoke with `--skip-smoke`. Windows excluded from matrix because existing tests use POSIX-only paths (`/bin/cat`, `/tmp`) — Windows-side is intentional manual via `installer/install-foundry.cmd`. When Windows tests are added later (e.g., platform-skip decorators), extend the matrix.
12	Add `CHANGELOG.md` to repo root	2026-05-13	Backfilled v0.1.0 through v0.1.3 from history.md + git tag dates. Follows Keep a Changelog (https://keepachangelog.com/en/1.1.0/). Going forward, every release tag adds a section. Tag references at the bottom link each version to its GitHub release.
MS-align	Microsoft Agent Skills standardization — align foundry's skill install path with the MS-standardized `~/.copilot/skills/`, add workspace `.github/skills/` scaffold, refresh docs	2026-05-15	Triggered by Microsoft's 2026-05-13 announcement of Agent Skills in Visual Studio (Insiders) standardizing the agentskills.io spec on four canonical paths. v0.1.x global path `~/.vs-code-foundry/skills/` was off-standard; this work moves the canonical location to `~/.copilot/skills/` so VS Copilot / VS Code Copilot auto-discover the skill family natively. Backward-compat symlink at the v0.1.x path created automatically by the installer; removable in v0.3.x (tasks #16). F4 dual-opt-in for `~/.claude/skills/` reads is preserved and re-tested across the path move.
v0.2.0	Release of MS Agent Skills alignment bundle	2026-05-15	6 commits on `feature/v0.2.0-ms-agent-skills-alignment` (WP1 server relocate, WP2 tests, WP3 installer, WP4 templates, WP5 docs, WP6 version bumps). VERSION bumped to `"0.2.0"` in both `foundry_server.py:34` and `install-foundry.py:154`. AGENTS.md R7 test count refreshed from "27 unit tests" to "30 unit tests (as of v0.2.0)". `skill["source"]` field flips `"foundry"` → `"copilot"` (cosmetic breaking-change-class, no current consumer). Tracking follow-ups: tasks #16 (drop symlink in v0.3.x), #17 (schema-validate against agentskills.io), #18 (real Windows validation of symlink path).
v0.3.0	Personal-os merge (subtree from joogy06/vs-code-personal-os)	2026-05-18	16+ WPs on `feature/v0.3.0-personal-os-merge`. Subtree-merged 6 commits from vs-code-personal-os under `personal-os/`. Hoisted `kit.agent.md` to root `agents/` (6 personas total). AGENTS.md reconciled to R1-R13 (foundry R1-R11 + personal-os R1-R12 + new R13 two-server topology). Two MCP servers ship by default — `foundry-server` (15 tools, 33 tests) and `personal-os-server` (14 tools, 43 tests) — combined 76 tests. Installer gains `--with-personal-os` (default ON), `--migrate-legacy`, fail-fast on legacy `~/.vs-code-personal-os/`. Verifier extends to 12 G-checks. Source repo `joogy06/vs-code-personal-os` archived 2026-05-18 (tombstone period: README banner + pinned issue, then archive).
v0.3.4	v0.3.4-followups merged (PR #8)	2026-06-08	PR #8 (`feature/v0.3.4-copilot-1.121-followups`) merged to main as `3919959` at the start of the v0.4 branch logistics. Carried the VS Code 1.121 doc edits (#20/#21/#25/#34), the 1.121/April-2026 backlog (#24-#34), the cascade self-review tasks (#35-#43), the @testbed WIP doc, and AGENTS.md Project HARD-RULEs. Merged but NOT tagged — v0.3.4 release ceremony still pending.
wiki	`.wiki/` project knowledge base created	2026-06-05	`project-v1` embedded wiki (own git repo, gitignored here per process-assistant-ai convention). `wiki-v00` bootstrap + `wiki-v01` ingest: 6 pages, 12 raw sources, lint 10.0/10. Documents VS Code 1.123 (subagents, /fleet, session sync, agent taxonomy) + the five-mechanism multi-model-orchestration analysis for the foundry cascade. Registered in `~/.wiki-registry.yaml`; bound via root `.wiki-link` (role: specific, auto_consult on).
v0.4.0	env-adaptive cascade #46/#47 merged (PR #9)	2026-06-08	Full forge cycle (triple-model design) → 3 serial bob spawns (DAG-flattened after bob's first spawn halted with no Agent tool) → 12 WP commits → PR #9 merged as `b266b9f`. Shipped `foundry_env.py` scanner, `capabilities.json` schema v1, `foundry_capabilities` read-only MCP tool (16 visible/17 schemas) + first-start daemon scan, `refresh_capabilities.py`, Tier-0 capability-routing + baked floor in 4 cascade personas, 2 worker personas (challenger/analyst), `TestPersonaTripwires`, verifier G13 + G3 6-coordinator fix, AGENTS.md R14, persona model bumps to Opus 4.8/GPT-5.5/Gemini 3.5 Flash. 136 tests green. Design `docs/plans/2026-06-06-env-adaptive-cascade-design.md`; contract map rev 3 (signed). Merged but tag GATED on win11 M1-M3 smoke (see `remote_claude.md`).
v0.5.0	dynamic perspective dispatch — MERGED to main (PR #10)	2026-06-08	Full triple-model forge cycle (4 Claude approaches + Claude challenger + agy `SERVED_BY=gemini-3.5-flash` + Codex adversarial) → signed contract map rev 4 (G1 PASS) → bob direct-serial-executed 7 WPs (Agent tool unavailable to spawned bob, same as v0.4.0). Shipped `roster.py` (`(stance×model×angle)` resolver), `perspective_policy` SIBLING key on `foundry_capabilities` (no new tool; `recommended_routing` frozen byte-identical), `proposer`/`steelman` NEW stance workers + `challenger`/`analyst` cross-vendor model-agnostic recast, `## Perspective dispatch` on the 4 coordinators (3-fact diversity honesty), config deep-merge + installer config-preserve, R15. 179 tests green (136+43; forge re-verified independently). #48 + #49 closed by design. PR #10 MERGED to main `2dc8d5b`; feature branches tidied (only `main` remains); NOT tagged — v0.5.0 tag gated on win11 MS1-MS5 smoke (`remote_claude.md`). Design `docs/plans/2026-06-08-dynamic-perspective-dispatch-design.md`.
v0.5.1	installer agent reconcile — safe persona dedup/cleanup-on-run (PR #11)	2026-06-08	Fixes the `2× bob` dup (installer deployed personas to `~/.copilot/agents/` AND `<ws>/.github/agents/`; Copilot scans both). Streamlined forge cycle + Codex/agy safety review (caught data-loss holes → hardened). Shipped the `agent_reconcile` engine in `install-foundry.py`: hash-verified ownership (delete only if `sha256` matches the bytes last written), deploy-before-delete, `.foundry-backup/`, fail-closed `installed-agents.json` manifest, `KNOWN_PERSONA_HASHES` first-run migration, `--agent-scope {global,workspace}` default global, FOUNDRY_SENTINEL in all 10 personas, metadata-only `~/.claude/` collision warning (closes #20), `.ps1`/`.cmd` flag forwarding (R1), verifier G14. 198 tests green (155+43; 19 destructive-path `TestAgentReconcile`); forge independently verified the safety end-to-end on the real CLI (dupe cleaned+backed-up; edited/user/Claude files preserved). Contract map N/A (existing_component_extension). PR #11 MERGED to main `be98007`; NOT tagged — win11 smoke (run installer → one @bob) gates it. Design `docs/plans/2026-06-08-installer-agent-reconcile-design.md`.

Personal-os roadmap (post-merge)

Items inherited from the pre-merge personal-os/tasks.md Phase 2-5. Now tracked under foundry's release cycle. See personal-os/tasks.md for the full pre-merge phase plan with verification commands.

#	Task	Priority	Status	Notes
[personal-os] P2-1	Windows hardened installer for personal-os subsystem	high	open	`installer/install-personal-os.cmd` + `.ps1` hardened wrappers (R1) on top of the v0.3.0 shim. Must reuse foundry's post-fix `cmd parens-in-echo` + `$PSScriptRoot` patterns (cross-repo R11 reference). Smoke on a clean Win 11 Pro VM.
[personal-os] P2-2	Startup-folder shortcut creator	medium	open	PowerShell COM `WScript.Shell.CreateShortcut` writing `%APPDATA%\Microsoft\Windows\Start Menu\Programs\Startup\personal-os.lnk` pointing at `code.exe %USERPROFILE%\personal-os-workspace`. Optional Start Menu Programs shortcut for manual launch.
[personal-os] P2-3	Zone-Identifier ADS detection + remediation in personal-os installer	medium	open	Mirror foundry's pattern from `installer/install-foundry.cmd`.
[personal-os] P3-1	Foundry-side `@pa` gains `pa_inbox_list` reading personal-os outbox envelopes	high	open	Cross-workspace surfacing — without this, KIT's outbox writes are inert until the user runs `kit_outbox_show` from personal-os. Foundry-side: each foundry workspace writes `~/.vs-code-foundry/status/<workspace-id>.json` on agent transitions (~50 LOC). `@pa` surfaces queued KIT prompts as the first chat response on workspace open and writes escalations to `~/.vs-code-foundry/personal-os/inbox/<this-workspace>/*.json`.
[personal-os] P3-2	KIT-side: handle resolved prompts moved to `outbox/<ws>/consumed/`	medium	open	After foundry-side P3-1 lands.
[personal-os] P4-1	Confluence bridge (`bridges/confluence/`)	medium	open	Stdlib `urllib` REST client. Bidirectional sync; API tokens from `~/.vs-code-foundry/personal-os/secrets.json` (chmod 600). Port shape from Claude `pa-server` `pa_sync_confluence`.
[personal-os] P4-2	Jira bridge (`bridges/jira/`)	medium	open	Pull assigned issues into `kit.db`; push status on transition. Port `pa_resolve_conflict` pattern (prefer remote on conflict, log to `events.jsonl`).
[personal-os] P4-3	`kit_confluence_` / `kit_jira_` MCP tools (or fold into existing 14 — re-eval R4 tool budget)	medium	open	Combined two-server tool count must stay ≤ 40 (R4 says ≤ 25 per server-equivalent; foundry sits at 15, personal-os at 14, both well-budgeted for bridges).
[personal-os] P5	Optional VSIX plugin (status bar, auto-inject queued prompts)	low / v2	open	Trigger: ONLY if Phase 2-4 prove KIT is too passive. Cross-CLI deliberation 2026-05-12 chose to skip VSIX for v0.1; revisit only with evidence.
[personal-os] P6	Coach layer — Pillar D Phase-1.5 expansion (passive pattern detection over `events.jsonl` tool-arg histograms, ≥ 3 obs across ≥ 2 sessions, surface as "KIT noticed" with confirm-before-save)	low	open	Per the pre-merge personal-os Phase 6 plan.
[personal-os] HK	Wire wiki binding when `.wiki/` gets its first real page	low	deferred	Two-file fix: `.wiki-link` + `~/.wiki-registry.yaml` entry mirroring the `process-assistant-ai` shape. Triggered when first ADR / playbook lands.

Notes on priority levels

high: blocks v1.0 production-readiness; address before next release
medium: completes the v1.x feature set; nice-to-have but not blocking
low: polish / quality-of-life; no urgency
v2 / future: requires significant new scope; not in v1 trajectory

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vs-code-foundry — backlog

Active Tasks

Done

Personal-os roadmap (post-merge)

Notes on priority levels

FilesExpand file tree

tasks.md

Latest commit

History

tasks.md

File metadata and controls

vs-code-foundry — backlog

Active Tasks

Done

Personal-os roadmap (post-merge)

Notes on priority levels