Skip to content

Latest commit

 

History

History
112 lines (102 loc) · 47.7 KB

File metadata and controls

112 lines (102 loc) · 47.7 KB

vs-code-foundry — backlog

Last Updated: 2026-06-08 (v0.5.0 dynamic perspective dispatch built on feature/v0.5-dynamic-perspective-dispatch, pending win11 MS1-MS5 validation + tag: roster.py + perspective_policy sibling + proposer/steelman + challenger/analyst cross-vendor recast; #48 + config-driven model selection closed by design) Prior Update: 2026-06-07 (v0.4.0 env-adaptive cascade shipped: #46/#47 marked done pending win11 M1-M3 validation; filed #48 Tier-2 analyst re-point after gemini CLI retires 2026-06-18) Prior Update: 2026-06-05 (added #46 first-run env scan + #47 capability-adaptive cascade content — user requirement: prod has no CLIs; VS Code 1.123 research ingested into new .wiki/) Prior Update: 2026-05-23 (self-review — full-cascade architecture review of foundry → forge → bob → alf → pa (+kit) against stated intent + against sibling internal-rnd. Read-only analysis; no flow files modified. Review doc at docs/reviews/2026-05-23-cascade-architecture-review.md. 13 gaps (V1–V13); 9 filed as tasks #35–#43 (V9→#33, V12→#7 cross-referenced not re-filed; V10/V11/V13 bundled into #43). Defining finding: this fork dropped internal-rnd's entire enforcement engine — NO gates.py/claims.py/dual-verdict/ledger; all persona hard rules are prose, and progress/contract-map.yaml+.sig are HMAC-signed but read/verified by NOTHING (dead code / false guarantee). Trade-off is reasonable for a Copilot fork (more legible — bob 172 lines vs 573; better platform maturity — CI, Windows hardening, MCP conformance) but cascade correctness now rests entirely on prompt discipline + IDE host safety. Top moves: #35 decide contract-map fate (wire or delete), #36 telemetry rollup on existing actions.jsonl, #37 windows-latest CI. Uses THIS repo's additive alf formula (I+E+U−Eff), not internal-rnd's multiplicative one. No code changed.) Prior Update: 2026-05-22 (added #24-#34 from VS Code 1.121 / Copilot April-2026 release review)

Active Tasks

# Task Priority Status Notes
1 Validate v0.1.2 on real Windows 11 + VS Code 1.107+ high open The M3 capability-shape question gets answered live by VS Code's response to our dual-declaration (tools.tasks: true AND top-level tasks: {}). Run on a hardened enterprise machine to validate the cmd→ps1→py wrapper chain. F1 fix (sys.executable everywhere) means this should now succeed even on hardened machines with no python3 alias.
2 Flesh out reference stub: skills/vs-code-copilot-foundry/references/architecture.md medium open Currently 4-line placeholder. Should cover: foundry-server internals (job manager, skill registry, resource layer), persona-to-server data flow, sampling vs subprocess delegation trade-off.
3 Flesh out reference stub: skills/vs-code-copilot-foundry/references/byok-setup.md medium open Currently placeholder. Should cover: VS Code 1.117+ BYOK setup for Business/Enterprise, plugging Anthropic / OpenAI / Google keys directly, how it integrates with foundry persona model: chains.
4 Flesh out reference stub: skills/vs-code-copilot-foundry/references/custom-agents-spec.md medium open Currently placeholder. Should cover: .agent.md frontmatter schema, handoff buttons, model fallback chains, target field, agents allowlist, runSubagent semantics, depth=5 cap, cost-tier constraint.
5 Flesh out reference stub: skills/vs-code-copilot-foundry/references/setup-copilot-cli.md medium open Currently placeholder. Should cover: copilot --agent foundry -p ... headless mode, /agent slash command, /fleet parallel orchestration, model selection in CLI vs picker.
6 Flesh out reference stub: skills/vs-code-copilot-foundry/references/troubleshooting.md medium open Currently placeholder. Should consolidate all the "common issues" tables from INSTALL.md and SKILL.md + setup-vscode-chat.md into one canonical troubleshooting reference. Add Zone Identifier section, mcp.json troubleshooting, persona auto-discovery debugging.
7 Write foundry_design coordinator script medium open v0.1.x returns a hint asking the caller (@forge) to do parallel codex+gemini calls manually. v1.2 should ship foundry-server/coordinator_design.py that runs the parallel delegations, aggregates, returns a synthesized design hypothesis to forge.
8 Build foundry-cli helper binary medium open Currently config edits happen by hand-editing ~/.vs-code-foundry/config.json. Ship a small CLI: foundry-cli enable claude-bridge, foundry-cli redeploy --workspace <path>, foundry-cli status, foundry-cli logs. Install to ~/.vs-code-foundry/bin/.
9 Extract foundry-server modules if it grows low open foundry_server.py is ~1080 LOC currently. If it grows past ~1300 LOC, split into foundry_server.py (protocol), foundry_jobs.py (job manager), foundry_skills.py (skill registry), foundry_resources.py (MCP resources). Tests follow.
10 Optional: thin VS Code extension wrapping foundry-server low / v2 open Use mcpServerDefinitionProviders to ship foundry-server as part of an extension, eliminating manual .vscode/mcp.json editing for users. ~1000-1500 LOC TS, 5-7 dev-days, Marketplace publication. Defer until real-user-demand-signal is clear.
13 Sign the PowerShell scripts with a code-signing certificate low open / future For environments that enforce AllSigned. Requires acquiring a code-signing cert; cost + maintenance burden. Defer until a real user requests it.
14 Public-repo decision (currently PRIVATE) open future If/when going public: scrub paths, polish README for public audience, add Marketplace/installer landing pages, decide LICENSE. Currently MIT but not yet committed publicly.
15 Real-fleet user feedback gathering low future Once 5+ real users have installed foundry, gather telemetry on which personas + tools are actually used. Drives the v1.2 polish backlog.
16 Remove v0.1.x backward-compat symlink at ~/.vs-code-foundry/skills/vs-code-copilot-foundry low open / v0.3.x The v0.2.0 installer creates this symlink so anyone with hand-coded absolute paths or custom MCP clients pointing at the old location keeps working. Removable in v0.3.x once enough time has passed for consumers to migrate. Drop the migration block in install_skill_family() and document the removal in CHANGELOG.
17 Schema-validate one SKILL.md from the family against the published agentskills.io schema as part of installer smoke low open Defensive guard against schema drift between foundry's installed skills and MS Copilot's parser expectations. Both use agentskills.io; this would catch any future divergence. ~30 LOC in installer/install-foundry.py smoke_test() or a new helper.
18 Real Windows validation of v0.2.0 (Developer Mode + non-admin symlink path) medium done (superseded) Closed 2026-05-21 by the win11-laptop install of v0.3.1+. Confirmed: (a) WinError 1314 warning fires cleanly on non-Developer-Mode + try/except continues correctly; (b) ~/.copilot/skills/ canonical location works regardless; (c) Path.home() honors USERPROFILE. Three subsequent fixes (v0.3.1 / v0.3.2 / v0.3.3) shipped same day from findings. v0.3.3 full re-install still pending real-machine validation — tracked as #23.
23 Real Windows validation of v0.3.3 full re-install path medium open Re-run install-foundry.cmd -Yes -PrefixPath %USERPROFILE%\.vs-code-foundry on win11-laptop after git pull + git checkout v0.3.3. Confirm: (a) 6 personas land in ~/.copilot/agents/ AND ~/.vs-code-foundry/agents/; (b) no DeprecationWarning: datetime.utcnow(); (c) personal-os smoke leg completes with status: PASS (no UnicodeEncodeError on report-print); (d) Developer: Reload Window in VS Code → @ autocomplete shows all 6 personas. Then workspace-install round (-WorkspacePath C:\dev\vs-code-foundry) → confirm .vscode\mcp.json registers both servers + @foundry foundry_health + @kit kit_health both respond. File REPLY thread in remote_claude.md.
19 Installer auto-creates workspace dir when --workspace points at a missing path low open Discovered during WP-14 smoke 2026-05-18: when --workspace <path> points at a non-existent dir, install-foundry.py prints ERROR: Workspace path is not a directory but proceeds anyway. install_workspace_skills_scaffold() creates the dir (and a .github/skills/README.md), then install_workspace() short-circuits because the dir didn't pre-exist. Net result: partial workspace install (just the skills scaffold). Pre-existing v0.2.0 behavior, not v0.3.0 regression. Fix: have install_workspace() also Path.mkdir(parents=True, exist_ok=True) before writing files, OR have _validate_workspace_path() mkdir as part of its check. Either way the misleading ERROR log line should be downgraded to INFO + auto-creation message.
20 Document Claude Code agent name collision in INSTALL.md / AGENTS.md R-rules low done (v0.5.1) Discovered 2026-05-21 on win11-laptop: VS Code Copilot scans both ~/.copilot/agents/ (foundry) and ~/.claude/agents/ (Claude Code) per MS Agent Skills spec → alf/bob/pa name dupes in Copilot Chat's @ autocomplete. Closed in v0.5.1 (feature/v0.5.1-installer-agent-reconcile): (a) INSTALL.md "Agent scopes & dedup" + "Known interactions (#20)" sections shipped; (b) AGENTS.md R2 carve-out wording (the installer may do a metadata exists() only on ~/.claude/agents/{bob,alf,pa}.md for the warning — no content read/write/delete); (c) the installer now DETECTS the collision at install time and prints a metadata-only warning (warn_claude_collision()), and the unrelated foundry-internal 2× bob duplication is fixed by the reconcile engine + --agent-scope default global. NOTE: the ~/.claude/ name collision itself is a cross-tool fact that foundry only warns about (never deletes Claude's files); real win11 smoke of the warning is tracked in remote_claude.md.
21 Installer "user-level-only" mode should warn that kit_* tools won't function without workspace MCP wiring low open Discovered 2026-05-21: when user answered Install to current workspace? n, persona files (post-v0.3.3) reach ~/.copilot/agents/ and Copilot Chat shows @kit. But kit_health / kit_status / etc. all fail because personal-os-server isn't registered in any workspace's .vscode/mcp.json. User saw @kit appearing but tools nonfunctional — confusing. Fix: when --with-personal-os is ON and workspace install is being skipped, emit a clear INFO line at end of install explaining the limitation + how to wire a workspace later.
22 Smoke runner status line on Windows: avoid printing dynamic content that could exceed cp1252 even with reconfigure low open v0.3.2 fixed the immediate UnicodeEncodeError by reconfigure(utf-8, errors=replace), but the underlying issue is that log lines accumulating across all 14 KIT tool exercises CAN contain user-data with non-cp1252 chars (e.g., file paths with em-dashes, sample task titles). Long-term: audit smoke_runner.py's log accumulation for any string interpolation that could surface user content. Belt-and-braces with the v0.3.2 reconfigure should hold; this is preventive.
24 Evaluate Agent Host Protocol (AHP) for foundry medium open / v0.4 eval VS Code 1.121 introduced AHP (microsoft.github.io/agent-host-protocol/) plus Remote Agents (Preview): agent sessions coordinated across SSH / Dev Tunnels, with a lightweight "agent host" process that survives client disconnection. Foundry today is local-stdio MCP only. Decide one of: (a) ignore for v0.x (foundry stays local), (b) prototype @bob / @kit running on a remote AHP host in v0.4, (c) declare no-fit. Inputs: AHP spec, multi-client coordination story, whether foundry-server can host as AHP server, whether stdio MCP still works inside an AHP host. Deliverable: 1-page decision doc in docs/plans/.
25 Document Claude-agent permission settings in agents/bob.agent.md + INSTALL.md medium open VS Code 1.121 added github.copilot.chat.claudeAgent.allowAutoPermissions (Auto Mode — execute without permission prompts but with background safety checks) and github.copilot.chat.claudeAgent.allowDangerouslySkipPermissions (unrestricted). @bob is the persona that writes code; users currently get prompted for every edit. Document both settings in agents/bob.agent.md body (recommend allowAutoPermissions for trusted workspaces, never allowDangerouslySkipPermissions) and add an INSTALL.md "Autonomy modes" subsection. R6 (no --no-verify) still holds — these settings affect prompts only, not commit policy.
26 Add chat.utilityModel / chat.utilitySmallModel recommendations to BYOK reference stub low open Folds into task #3 (BYOK reference stub). VS Code 1.121 added two settings to override the default model for general utility flows (titles, summaries, commit messages, rename suggestions) and lightweight utility tasks. Orthogonal to persona model: chains (R9) — these are user-tier. Recommend cheap small models here to cut token cost; suggest claude-haiku-4-5-20251001 for chat.utilitySmallModel and gemini-flash-line or gpt-5.4-mini (subject to user BYOK) for chat.utilityModel.
27 Verify persona model: display-names resolve under all 6 BYOK providers medium open April-2026 Copilot release: Business/Enterprise BYOK now covers OpenRouter, Microsoft Foundry, Google, Anthropic, OpenAI, and other Chat-Completions/Responses/Messages-compatible endpoints. R9 mandates display-name strings (e.g., 'Claude Opus 4.7 (anthropic)'). Smoke-test that each of the 6 personas' model: declarations actually resolve when the workspace BYOK admin policy is set. Also note: the Insiders Custom Endpoint Provider replaces the deprecated customoai provider — update references/byok-setup.md (task #3). Output: a docs/byok-matrix.md mapping each persona × each BYOK provider → resolved model.
28 Audit JobManager.spawn for VSCODE_AGENT env-var awareness medium open VS Code 1.121 sets VSCODE_AGENT on agent-initiated terminals so CLIs can detect agent context and switch to machine-readable output. foundry-server's JobManager.spawn and the future coordinator_design.py (task #7) run subprocesses (foundry_codex etc.) that could benefit. Two-part fix: (a) JobManager.spawn passes VSCODE_AGENT=1 to its subprocess env (so child CLIs know they're under an agent — they already are; just be explicit), (b) when foundry-server itself sees VSCODE_AGENT set by VS Code Copilot, log a single INFO line at startup confirming agent context.
29 Test foundry_codex long-running jobs vs VS Code background-terminal auto-cleanup medium open VS Code 1.121: background terminals created by chat agents auto-dispose upon command completion. foundry-server's JobManager spawns its own subprocess directly (not via the chat terminal) so the disposal shouldn't reach it, but the new behavior changes the affordance landscape. Smoke-test: start a long-running foundry_codex job, switch chat context, verify the job continues + result is retrievable via foundry_jobs_get. Add to tests/test_foundry_server.py if reproducible.
30 Evaluate deferring foundry_search_skills to Copilot's local semantic index low / v2 open April-2026: semantic indexing now works in all workspaces; agents can run grep-style search across GitHub repos/orgs via the new githubTextSearch tool; experimental /chronicle queries chat history (github.copilot.chat.localIndex.enabled). foundry-server's SkillRegistry / foundry_search_skills is glob+regex based. Evaluate whether to (a) keep glob (simple, stdlib), (b) defer to Copilot's local index when present, (c) emit both. Decision constrained by R8 stdlib-only.
31 Optional OTel emission from foundry_* and kit_* tool handlers low / v2 open VS Code 1.121 ships prebuilt Azure Managed Grafana dashboard visualizing agent operations / token usage / chat sessions / tool calls / per-model latency. If foundry-server emitted OpenTelemetry spans per tool call, users' dashboards would show foundry tool usage natively. Tension with R8 stdlib-only — opentelemetry-api would be foundry's first dep. Possible compromise: env-gated optional dep (pip install vs-code-foundry[otel]); off by default; foundry-server runs without it if not installed. Defer until a user requests dashboard integration.
32 Review agents/bob.agent.md for stale text vs new in-chat diff visualization low open April-2026: code changes display as inline diffs directly in chat threads. @bob persona body may contain instructions like "your edits will be applied silently" or "the user can't see the diff" that contradict the new affordance. Grep agents/bob.agent.md for such language and update — the new affordance is better UX so any stale defensive text just reads weird now. ~10 LOC change at most.
33 Document terminal read/write security boundary in INSTALL.md "Security" section medium open April-2026: agents gained read/write capabilities to any open terminal in VS Code Copilot. This is orthogonal to foundry-server's R2 boundary (which governs what foundry-server itself reads), but it changes the user's threat model — any other agent in the same workspace can now read terminals foundry-server's subprocesses (foundry_codex etc.) are running in. INSTALL.md needs a new subsection alongside the Claude Code agent name collision (task #20) covering: (a) what VS Code's new terminal access means, (b) recommend isolating sensitive workspaces, (c) note that VSCODE_AGENT being set is the signal.
34 Update @forge persona body to emit mermaid for design exploration low open VS Code 1.121: built-in Mermaid Markdown Features extension renders mermaid code blocks in Markdown preview, notebooks, AND chats, with pan/zoom support. @forge does design exploration in chat — emitting mermaid sequence diagrams, component graphs, state machines would substantially improve the design conversation. Update agents/forge.agent.md body to encourage mermaid output for: component relationship graphs, sequence diagrams during cross-CLI deliberation, state machines for cascade transitions. No tools change needed.
35 Self-review V2 — decide the fate of the signed progress/contract-map.yaml (wire it or remove it) high open Architecture review 2026-05-23 (docs/reviews/2026-05-23-cascade-architecture-review.md). progress/contract-map.yaml + .sig exist (HMAC-signed, "rev 2 covers both servers", AGENTS.md 245-248) but a full grep of foundry-server/, agents/, and tests finds NO runtime read, NO signature verification, NO gate consuming them — vestigial copy of internal-rnd's pattern, never wired in. A signed artifact nothing verifies implies integrity the system doesn't enforce. Decision: (a) wire a real verify step bob/forge must pass before execution (gives the fork one genuine mechanical gate, partially closes V1), OR (b) delete the artifact + document in AGENTS.md/PROJECT.md that vs-code-foundry is prompt-discipline-only. Cheapest high-value item; do first. alf score 6.
36 Self-review V4 — build efficacy-telemetry rollup on the existing actions.jsonl substrate medium open Architecture review 2026-05-23. No metric exists for bob-PARTIAL rate, Codex/Gemini false-positive rate (forge/alf both cite "~60% FP" with zero measurement behind it), user-override rate, or test-failure-at-completion rate. BUT foundry_log_action + ~/.vs-code-foundry/actions.jsonl already log actions — substrate is there, only the rollup/metric layer is missing. Ship a foundry-cli metrics (or MCP foundry_metrics read-only tool) that aggregates actions.jsonl into the above rates. Makes the "triple-model coverage is the value-add" claim falsifiable. Highest ROI for making every other claim measurable. alf score 8 (MODERATE).
37 Self-review V7 — add a windows-latest CI job (riskiest platform, least automated coverage) medium open Architecture review 2026-05-23. CI (#11) runs ubuntu+macos × py3.10/3.11/3.12 but EXCLUDES Windows (tests use POSIX paths /bin/cat, /tmp). Yet Windows is where bugs keep surfacing: cp1252 console (#22), py3.14 (v0.3.2), WinError 1314 symlink (#18), datetime.utcnow (#23). The most-targeted enterprise platform is the only one without a CI job. Fix: add @unittest.skipIf(os.name=='nt') (or tempfile/shutil.which abstractions) to the POSIX-path tests, then add a windows-latest leg to .github/workflows/ci.yml. Distinct from #23 (one-off manual re-validation) — this AUTOMATES it. alf score 8 (MODERATE).
38 Self-review V3 — add an independent verification pass to the design→build arc medium open Architecture review 2026-05-23. Unlike internal-rnd's cold-context dual-verdict, here bob SELF-reports COMPLETE/PARTIAL/FAILED (bob.agent.md Step 5); the only objective check is "run the test suite" (Step 4); "report PARTIAL honestly" is prose (line 156). @alf can review bob's output but only runs on explicit user invocation — it is NOT in the build path. A padded COMPLETE is caught only if a human or separately-invoked alf looks. Candidate: make forge's bob-handoff include an auto-alf-review step on completion (button → alf), OR a lightweight foundry_verify MCP tool that re-runs tests + diffs against the WP plan independently of bob's self-report. alf score 8 (MODERATE).
39 Self-review V8 — no SAST/secrets gate on bob-generated code medium open Architecture review 2026-05-23. bob writes code with only an OPTIONAL foundry_codex review (bob line 143). No secrets scan, no SAST, no pre-commit security gate. internal-rnd has a pre-push secrets-scan + the S038 SAST batch in flight (#109/#112 there); none exists here. The Copilot fork ships code to user workspaces with no security floor beyond "ask Codex if you feel like it." Candidate: port internal-rnd's scripts/secrets-scan.{sh,py} as a foundry pre-commit hook the installer can wire, + optional bandit/semgrep invocation via a foundry_sast tool. Coordinate with cross-repo-review.md (internal-rnd S038 is the upstream source). alf score 7 (MINOR, just under threshold).
40 Self-review V1 — no mechanical enforcement floor; all persona hard rules are prose (keystone) high open Architecture review 2026-05-23. The defining structural gap. internal-rnd's thesis ("convert prose rules to subprocess gates because LLMs drift") is dropped entirely here — bob's "Hard rules" (lines 147-157) are unenforced prose with no backstop if the model ignores them. Intrinsic to the Copilot platform choice to a degree (no easy mid-agent subprocess gates), but it means cascade correctness == model instruction-following on a given turn, with no floor. Root cause beneath V2/V3/V5. Not a single fix — a direction: decide how much mechanical floor this fork wants. Minimum viable floor = wire the contract-map (V2/#35) + an independent verify (V3/#38) + a security gate (V8/#39). Revisit scope after #35 and #36 land. alf score 8 (CRITICAL-structural; additive formula compresses it).
41 Self-review V5 — autonomy fully delegated to IDE host; foundry has no backstop of its own low open Architecture review 2026-05-23. bob.agent.md 118-129 cedes the permission model to VS Code 1.121 claudeAgent.allowAutoPermissions ("execute without prompts but with background safety checks") — but those checks are COPILOT'S, not foundry's. If a user flips allowDangerouslySkipPermissions, nothing foundry-side catches a destructive action. internal-rnd had gates as a host-independent backstop. Low-urgency given the warning text already discourages the dangerous setting, but worth a documented "foundry provides no independent safety check; you are trusting the IDE host" note in INSTALL.md Security section (folds with #33). alf score 6.
42 Self-review V6 — fork drift from internal-rnd is structural and one-directional low open Architecture review 2026-05-23. AGENTS.md R2 accepts persona drift by design ("this is a fork"). Consequence: internal-rnd's rigor advances (gate system, dual-verdict, S038 security batch) and NONE flows here automatically; cross-repo-review.md is a manual queue. This fork falls progressively behind the original on rigor unless human-pollinated. Not necessarily wrong — but make it a tracked cadence: a periodic (monthly?) pass over internal-rnd's history.md + cross-repo-review.md Outbound entries to decide what to adopt. Otherwise drift compounds silently. alf score 6.
43 Self-review V10/V11/V13 — minor hygiene: read-only invariant, version drift, bob rollback low open Architecture review 2026-05-23. Three MINOR items bundled. V10: personal-os→foundry "read-only" (R13) is SQLite mode=ro convention, not enforced — add a test asserting no rw open path exists across personal-os reads (score 5). V11: VERSION constant must be hand-synced across foundry_server.py:34 + install-foundry.py ("pre-existing drift acknowledged" in v0.1.2 notes) — single-source it (e.g. read from one VERSION file) so a release can't ship mismatched self-reported versions (score 5). V13: bob runs sequentially in one chat with no .bob-checkpoint.md equivalent; an interrupted multi-WP run loses progress state (per-WP commits mitigate partially) — consider a lightweight resume protocol (score 5).
44 @testbed forge cycle continuation — finish design sections 3-9 + write canonical design doc + spawn bob high open / v0.4 Forge cycle paused 2026-05-23 at sections 1 (Goal) + 2 (Approach) user-approved. WIP captured at docs/plans/2026-05-23-testbed-design-WIP.md. Decisions frozen: B+C+D modalities (A visual deferred to v0.5 per R8 stdlib-only); terminal testbed (no auto-fix loop, max 1 user-approved recheck); separate testbed-server (3rd MCP server, mirrors personal-os-server per R13); HMAC freshness gate on progress/contract-map.yaml.sig; opt-in dev-server policy. Do NOT re-spawn the triple-model design team — full convergence captured in WIP doc (Claude challenger NEEDS-REWORK 4 CRITICAL, Codex challenger NEEDS-REWORK 3 CRITICAL, Gemini analyst 8 research areas). Continue with: section 3 (Components — files + line-count), 4 (Data flow — verdict pipeline), 5 (Error handling — INCONCLUSIVE / INFRA-FAILED vs DELIVERED-BROKEN), 6 (Testing — stdlib-unittest mocking strategy for browser tools), 7 (Performance — modality run-time budgets), 8 (Open questions), 9 (WP plan for bob). Per forge protocol pause after each section. Then write canonical design at docs/plans/2026-05-23-testbed-and-contract-loop-design.md, generate signed contract map via component-contract-mapping skill, run G1 verify, spawn bob.
45 Implement testbed-server (v0.4) — Phase 1 B+C+D modalities, terminal posture, dedicated 3rd MCP server high open / v0.4 Blocked on #44 (design doc + contract map). Phase 1 scope (per user decisions captured in #44 WIP): new testbed-server/ Python stdlib MCP server (~6 tools, mirrors foundry-server/ shape) + new persona agents/testbed.agent.md + new skill skills/vs-code-copilot-foundry/references/visual-testing-playbook.md. Cascade integration: @bob auto-invokes when HMAC-fresh ledger artifacts present (Step 4 verification gate); @kit routes via kit_queue_prompt to target-workspace @bob/@pa (no direct invocation); @alf delegates rendered-behavior URL targets; @forge gains optional "Preview with @testbed" handoff button. R-rule update: R4 combined ≤40 tool cap stays under (~15 foundry + 14 personal-os + ~6 testbed = ~35); R13 extends to three-server topology. New tests: ~25 in testbed-server/tests/test_testbed_server.py + ~5 cascade-edge tests in foundry-server/tests/test_foundry_server.py. Installer (install-foundry.py) gains --with-testbed flag (default ON), registers third server in .vscode/mcp.json. R11 cross-repo-review.md entry: informational only (internal-rnd has the analogous visual-arbiter / verification-arbiter but vs-code-foundry's testbed is side-loaded with different boundaries — not a port request).
46 First-run environment scan + capabilities manifest — foundry-native env-adoption for no-CLI prod environments high done (pending win11 M1-M3 validation) 2026-06-07 Shipped in v0.4.0 (docs/plans/2026-06-06-env-adaptive-cascade-design.md): foundry_env.py scanner + capabilities.json schema v1 + foundry_capabilities MCP tool + first-start daemon scan + installer manifest write + refresh_capabilities.py + verifier G13. M1-M3 manual smoke on win11-laptop still gates the v0.4.0 tag (see remote_claude.md). User requirement 2026-06-05. vs-code-foundry sits on top of agent-foundry-derived content that is CLI-flavored, but prod environments have NO CLIs (no claude/codex/agy, possibly no copilot binary) — VS Code + Copilot only. First run (install-time AND first foundry-server start — prod users may never re-run the installer) must scan the environment: copilot/codex/gemini/claude CLI presence+version, Python, git, network posture. Output: ~/.vs-code-foundry/capabilities.json (tier + per-tool availability, analogous to ~/.claude/state/inventory.json from the env-adoption skill — port the pattern, not the code). Consumers: foundry_health (extend existing probes), new read-only foundry_capabilities MCP tool (R4 budget: 16/25 after add), installer deploy decisions, #47 adaptation layer. Wiki analysis: .wiki/wiki/comparisons/foundry-multi-model-orchestration-options.md §Environment-tier availability.
47 Capability-adaptive cascade content — no-CLI degradation paths in personas/skills high done (pending win11 M1-M3 validation) 2026-06-07 Shipped in v0.4.0: cascade personas (foundry/forge/bob/alf) got 'agent' tool + ## Capability routing + baked ## Capability floor (Tier 0) + mechanism-conditional hard rules + tier banner; 2 Tier-0 worker personas (challenger/analyst, (copilot) model arrays, subagent-only); TestPersonaTripwires mechanical gates. Chose adaptation mode (c) hybrid → resolved to static-floor + manifest-hint + live-tool (no install-time templating). M1-M3 win11 smoke gates the tag. Original requirement: re-route by tier — Tier 2 (CLIs) = current design; Tier 1/0 = native runSubagent fan-out with per-worker model: pins from Copilot's three-vendor catalog (triple-model challenge SURVIVES with zero CLIs). HARD boundary held: adaptation rewrote foundry's OWN shipped copies only, NEVER ~/.claude/ (R2). User requirement 2026-06-05; depends on #46.
48 Re-point the Tier-2 analyst lane after the gemini CLI retires 2026-06-18 medium closed by design (v0.5.0) 2026-06-08 Resolved by the v0.5 dynamic perspective dispatch (docs/plans/2026-06-08-dynamic-perspective-dispatch-design.md). The analyst lane is no longer hard-pinned to a single CLI delegate: the analyst STANCE is dispatched dynamically from the perspective_policy plan (the roster), and on a host without a live external CLI it falls to the Tier-0 native floor runSubagent('analyst', { model: <plan model> }) — design-decision (b), now the primary path. foundry_capabilities._recommended_routing already treats avail("gemini") or avail("agy") as the Tier-2 analyst signal (unchanged in v0.5), and agy is probed by the manifest, so the 2026-06-18 gemini-CLI cutover degrades cleanly with zero code change (analyst stance routes to the floor or to agy when present). Subprocess delegate swap to agy -p (option (a)) remains an OPTIONAL future polish, not a blocker — file a fresh narrow task if/when desired.
49 Config-driven model selection (was tracked in docs/models.md) medium closed by design (v0.5.0) 2026-06-08 The old docs/models.md "make Layer 3 config-driven" item (hardcoded reviewer model names → read from config.json) is subsumed by the v0.5 model_roster. Model selection is now dispatch-time + roster-driven: roster.py's DEFAULT_ROSTER + read_roster(home) deep-merges a model_roster block from config.json, and resolve_perspectives projects the (stance, model, angle) plan — one config edit swaps models with no code change or test run, and a reinstall preserves the edit (deep-merge, not clobber). docs/models.md is now a thin pointer. R15 codifies the principle.
50 v0.6 — vs-code-foundry as the VS Code BRIDGE to agent-foundry (kill the "2 sets" maintenance; thin bridge + generated subagent-team floor) high design brief ready 2026-06-08 DESIGN BRIEF → docs/plans/2026-06-08-v0.6-agent-foundry-bridge-design-BRIEF.md; start a fresh forge COMPLEX cycle on it. User direction: agent-foundry = single source of truth (176 skills + flows + canonical agents, "where main dev lives"); vs-code-foundry = thin Copilot BRIDGE (personas bridge the claude/codex/agy CLI use case). ADAPTIVE (standalone floor preserved). GENERATE the Tier-0 floor personas from agent-foundry (build-time adapter) to drive native VS Code 1.123 subagent TEAMS; DELEGATE to agent-foundry's CLI flows when present; ROUTE skills to the 176. Measured this session: the "2 sets" pain is the PERSONAS (bob 624L↔226L, 5 shared lines, drifting); skills barely duplicated; the 1338L server is legitimate bridge. Open sub-Qs (for the design team): generator transform; the enforcement-engine re-import decision (entangled with #35/#40 — its own phase); the agent collision (#20); the R2 amendment. Likely phased: P1 generator+floor / P2 delegation+skills / P3 enforcement. Revises the 2026-05-11 "deliberately separate" directive (governance — reflect in AGENTS.md R2 + cross-repo-review). 1.123 currency verified live 2026-06-08.

Done

# Task Done date Notes
0 Initial repo creation + v0.1.0 ship 2026-05-11 5 personas + foundry-server + tests + installer + templates + skill family — see history.md 2026-05-11
0a v0.1.1: Windows installer + cross-platform Python fix + R&D docs moved in 2026-05-11 install-foundry.cmd / .ps1 enterprise-hardened, install-foundry.py uses sys.executable for mcp.json, design.md + spec-review.md + 5 research briefs added
0b Belt-and-braces guards in ~/.claude/publish-config.json to prevent accidental publication via agent-foundry pipeline 2026-05-11 Added _vs_code_foundry_separation_doc + 2 new exclusions
0c Initial release tags pushed 2026-05-11 v0.1.0 + v0.1.1
F1 Fix installer hardcoding python3 instead of sys.executable (Codex F1, critical) 2026-05-12 install-foundry.py 3 sites + test_foundry_server.py 2 sites — see history.md 2026-05-12. Side-fixed test suite portability to Windows.
F2 Fix JobManager.spawn closing stdin before communicate() (Codex F2, critical — foundry_codex fully broken) 2026-05-12 Stored stdin on Job.stdin_data; watcher passes it via communicate(input=...). Tightened no-stdin branch from inherited to DEVNULL. New regression test TestJobManagerStdin.
F3 Fix foundry:// path traversal in resources/read (Codex F3, moderate — local info-disclosure) 2026-05-12 New helper _safe_resolve_under with separator-rejection + Path.relative_to containment check. Both agent + design handlers route through it. 3 new tests in TestResourceTraversal.
v0.1.2 Release of codex-verified bugfix bundle 2026-05-12 4 commits on feature/v0.1.2-codex-bundle (WP1 F2, WP2 F3, WP3 F1, WP4 version+docs); PR opened; user merges manually. VERSION constants bumped to "0.1.2" in BOTH foundry_server.py:34 AND install-foundry.py:144 (pre-existing drift acknowledged).
F4 Fix foundry_health bypassing Claude-skills dual-opt-in (Codex F4 — correctly identified) 2026-05-13 The original tasks.md F4 description was a hallucination (talked about _check_cli("claude") over-probing — that's not what the code does). The REAL Codex F4 was about SkillRegistry reading ~/.claude/skills/ whenever read_claude_skills: true in config, bypassing the per-call dual-opt-in documented in R2. Fix: drop the constructor flag, make include_claude per-call, gate at the tool layer with caller_param AND config_flag. Regression: new TestClaudeSkillsDualOptIn class with 4 tests covering the full matrix.
F5 Relax R1 PowerShell line cap from 200 to 250 with rationale 2026-05-13 AGENTS.md R1 row updated. Two-tier rule: 250 for read-only verifiers / installers with no complex flow; 200 if the script has conditional logic / functions / non-trivial loops. verify-foundry-setup.ps1 (227 lines, Write-Host formatting only) is now under cap; no code change required there.
F6 Fix foundry_design coordinator hardcoding python3 (same bug as F1, dead code in v0.1.x) 2026-05-13 foundry_server.py:758cmd[0] "python3"sys.executable. Surrounding if coordinator.exists(): block is unreachable in v0.1.x (the script doesn't ship; tasks.md #7), but fixed proactively so F1's bug doesn't resurface when the script lands. Post-fix grep confirms no live-code "python3" strings remain anywhere.
v0.1.3 Release of cleanup bundle 2026-05-13 4 commits on feature/v0.1.3-cleanup-bundle (WP1 F4, WP2 F5, WP3 F6, WP4 release prep). VERSION bumped to "0.1.3" in both foundry_server.py:34 and install-foundry.py:154. AGENTS.md R7 test count refreshed from "19 unit tests" to "27 unit tests (as of v0.1.3)".
11 GitHub Actions CI: run the test suite on push + PR 2026-05-13 .github/workflows/ci.yml. Matrix: ubuntu-latest + macos-latest × Python 3.10 / 3.11 / 3.12 (6 jobs). Steps: syntax validation, JSON template validation, run python -m unittest tests.test_foundry_server -v from foundry-server/, installer smoke with --skip-smoke. Windows excluded from matrix because existing tests use POSIX-only paths (/bin/cat, /tmp) — Windows-side is intentional manual via installer/install-foundry.cmd. When Windows tests are added later (e.g., platform-skip decorators), extend the matrix.
12 Add CHANGELOG.md to repo root 2026-05-13 Backfilled v0.1.0 through v0.1.3 from history.md + git tag dates. Follows Keep a Changelog (https://keepachangelog.com/en/1.1.0/). Going forward, every release tag adds a section. Tag references at the bottom link each version to its GitHub release.
MS-align Microsoft Agent Skills standardization — align foundry's skill install path with the MS-standardized ~/.copilot/skills/, add workspace .github/skills/ scaffold, refresh docs 2026-05-15 Triggered by Microsoft's 2026-05-13 announcement of Agent Skills in Visual Studio (Insiders) standardizing the agentskills.io spec on four canonical paths. v0.1.x global path ~/.vs-code-foundry/skills/ was off-standard; this work moves the canonical location to ~/.copilot/skills/ so VS Copilot / VS Code Copilot auto-discover the skill family natively. Backward-compat symlink at the v0.1.x path created automatically by the installer; removable in v0.3.x (tasks #16). F4 dual-opt-in for ~/.claude/skills/ reads is preserved and re-tested across the path move.
v0.2.0 Release of MS Agent Skills alignment bundle 2026-05-15 6 commits on feature/v0.2.0-ms-agent-skills-alignment (WP1 server relocate, WP2 tests, WP3 installer, WP4 templates, WP5 docs, WP6 version bumps). VERSION bumped to "0.2.0" in both foundry_server.py:34 and install-foundry.py:154. AGENTS.md R7 test count refreshed from "27 unit tests" to "30 unit tests (as of v0.2.0)". skill["source"] field flips "foundry""copilot" (cosmetic breaking-change-class, no current consumer). Tracking follow-ups: tasks #16 (drop symlink in v0.3.x), #17 (schema-validate against agentskills.io), #18 (real Windows validation of symlink path).
v0.3.0 Personal-os merge (subtree from joogy06/vs-code-personal-os) 2026-05-18 16+ WPs on feature/v0.3.0-personal-os-merge. Subtree-merged 6 commits from vs-code-personal-os under personal-os/. Hoisted kit.agent.md to root agents/ (6 personas total). AGENTS.md reconciled to R1-R13 (foundry R1-R11 + personal-os R1-R12 + new R13 two-server topology). Two MCP servers ship by default — foundry-server (15 tools, 33 tests) and personal-os-server (14 tools, 43 tests) — combined 76 tests. Installer gains --with-personal-os (default ON), --migrate-legacy, fail-fast on legacy ~/.vs-code-personal-os/. Verifier extends to 12 G-checks. Source repo joogy06/vs-code-personal-os archived 2026-05-18 (tombstone period: README banner + pinned issue, then archive).
v0.3.4 v0.3.4-followups merged (PR #8) 2026-06-08 PR #8 (feature/v0.3.4-copilot-1.121-followups) merged to main as 3919959 at the start of the v0.4 branch logistics. Carried the VS Code 1.121 doc edits (#20/#21/#25/#34), the 1.121/April-2026 backlog (#24-#34), the cascade self-review tasks (#35-#43), the @testbed WIP doc, and AGENTS.md Project HARD-RULEs. Merged but NOT tagged — v0.3.4 release ceremony still pending.
wiki .wiki/ project knowledge base created 2026-06-05 project-v1 embedded wiki (own git repo, gitignored here per process-assistant-ai convention). wiki-v00 bootstrap + wiki-v01 ingest: 6 pages, 12 raw sources, lint 10.0/10. Documents VS Code 1.123 (subagents, /fleet, session sync, agent taxonomy) + the five-mechanism multi-model-orchestration analysis for the foundry cascade. Registered in ~/.wiki-registry.yaml; bound via root .wiki-link (role: specific, auto_consult on).
v0.4.0 env-adaptive cascade #46/#47 merged (PR #9) 2026-06-08 Full forge cycle (triple-model design) → 3 serial bob spawns (DAG-flattened after bob's first spawn halted with no Agent tool) → 12 WP commits → PR #9 merged as b266b9f. Shipped foundry_env.py scanner, capabilities.json schema v1, foundry_capabilities read-only MCP tool (16 visible/17 schemas) + first-start daemon scan, refresh_capabilities.py, Tier-0 capability-routing + baked floor in 4 cascade personas, 2 worker personas (challenger/analyst), TestPersonaTripwires, verifier G13 + G3 6-coordinator fix, AGENTS.md R14, persona model bumps to Opus 4.8/GPT-5.5/Gemini 3.5 Flash. 136 tests green. Design docs/plans/2026-06-06-env-adaptive-cascade-design.md; contract map rev 3 (signed). Merged but tag GATED on win11 M1-M3 smoke (see remote_claude.md).
v0.5.0 dynamic perspective dispatch — MERGED to main (PR #10) 2026-06-08 Full triple-model forge cycle (4 Claude approaches + Claude challenger + agy SERVED_BY=gemini-3.5-flash + Codex adversarial) → signed contract map rev 4 (G1 PASS) → bob direct-serial-executed 7 WPs (Agent tool unavailable to spawned bob, same as v0.4.0). Shipped roster.py ((stance×model×angle) resolver), perspective_policy SIBLING key on foundry_capabilities (no new tool; recommended_routing frozen byte-identical), proposer/steelman NEW stance workers + challenger/analyst cross-vendor model-agnostic recast, ## Perspective dispatch on the 4 coordinators (3-fact diversity honesty), config deep-merge + installer config-preserve, R15. 179 tests green (136+43; forge re-verified independently). #48 + #49 closed by design. PR #10 MERGED to main 2dc8d5b; feature branches tidied (only main remains); NOT tagged — v0.5.0 tag gated on win11 MS1-MS5 smoke (remote_claude.md). Design docs/plans/2026-06-08-dynamic-perspective-dispatch-design.md.
v0.5.1 installer agent reconcile — safe persona dedup/cleanup-on-run (PR #11) 2026-06-08 Fixes the 2× bob dup (installer deployed personas to ~/.copilot/agents/ AND <ws>/.github/agents/; Copilot scans both). Streamlined forge cycle + Codex/agy safety review (caught data-loss holes → hardened). Shipped the agent_reconcile engine in install-foundry.py: hash-verified ownership (delete only if sha256 matches the bytes last written), deploy-before-delete, .foundry-backup/, fail-closed installed-agents.json manifest, KNOWN_PERSONA_HASHES first-run migration, --agent-scope {global,workspace} default global, FOUNDRY_SENTINEL in all 10 personas, metadata-only ~/.claude/ collision warning (closes #20), .ps1/.cmd flag forwarding (R1), verifier G14. 198 tests green (155+43; 19 destructive-path TestAgentReconcile); forge independently verified the safety end-to-end on the real CLI (dupe cleaned+backed-up; edited/user/Claude files preserved). Contract map N/A (existing_component_extension). PR #11 MERGED to main be98007; NOT tagged — win11 smoke (run installer → one @bob) gates it. Design docs/plans/2026-06-08-installer-agent-reconcile-design.md.

Personal-os roadmap (post-merge)

Items inherited from the pre-merge personal-os/tasks.md Phase 2-5. Now tracked under foundry's release cycle. See personal-os/tasks.md for the full pre-merge phase plan with verification commands.

# Task Priority Status Notes
[personal-os] P2-1 Windows hardened installer for personal-os subsystem high open installer/install-personal-os.cmd + .ps1 hardened wrappers (R1) on top of the v0.3.0 shim. Must reuse foundry's post-fix cmd parens-in-echo + $PSScriptRoot patterns (cross-repo R11 reference). Smoke on a clean Win 11 Pro VM.
[personal-os] P2-2 Startup-folder shortcut creator medium open PowerShell COM WScript.Shell.CreateShortcut writing %APPDATA%\Microsoft\Windows\Start Menu\Programs\Startup\personal-os.lnk pointing at code.exe %USERPROFILE%\personal-os-workspace. Optional Start Menu Programs shortcut for manual launch.
[personal-os] P2-3 Zone-Identifier ADS detection + remediation in personal-os installer medium open Mirror foundry's pattern from installer/install-foundry.cmd.
[personal-os] P3-1 Foundry-side @pa gains pa_inbox_list reading personal-os outbox envelopes high open Cross-workspace surfacing — without this, KIT's outbox writes are inert until the user runs kit_outbox_show from personal-os. Foundry-side: each foundry workspace writes ~/.vs-code-foundry/status/<workspace-id>.json on agent transitions (~50 LOC). @pa surfaces queued KIT prompts as the first chat response on workspace open and writes escalations to ~/.vs-code-foundry/personal-os/inbox/<this-workspace>/*.json.
[personal-os] P3-2 KIT-side: handle resolved prompts moved to outbox/<ws>/consumed/ medium open After foundry-side P3-1 lands.
[personal-os] P4-1 Confluence bridge (bridges/confluence/) medium open Stdlib urllib REST client. Bidirectional sync; API tokens from ~/.vs-code-foundry/personal-os/secrets.json (chmod 600). Port shape from Claude pa-server pa_sync_confluence.
[personal-os] P4-2 Jira bridge (bridges/jira/) medium open Pull assigned issues into kit.db; push status on transition. Port pa_resolve_conflict pattern (prefer remote on conflict, log to events.jsonl).
[personal-os] P4-3 kit_confluence_* / kit_jira_* MCP tools (or fold into existing 14 — re-eval R4 tool budget) medium open Combined two-server tool count must stay ≤ 40 (R4 says ≤ 25 per server-equivalent; foundry sits at 15, personal-os at 14, both well-budgeted for bridges).
[personal-os] P5 Optional VSIX plugin (status bar, auto-inject queued prompts) low / v2 open Trigger: ONLY if Phase 2-4 prove KIT is too passive. Cross-CLI deliberation 2026-05-12 chose to skip VSIX for v0.1; revisit only with evidence.
[personal-os] P6 Coach layer — Pillar D Phase-1.5 expansion (passive pattern detection over events.jsonl tool-arg histograms, ≥ 3 obs across ≥ 2 sessions, surface as "KIT noticed" with confirm-before-save) low open Per the pre-merge personal-os Phase 6 plan.
[personal-os] HK Wire wiki binding when .wiki/ gets its first real page low deferred Two-file fix: .wiki-link + ~/.wiki-registry.yaml entry mirroring the process-assistant-ai shape. Triggered when first ADR / playbook lands.

Notes on priority levels

  • high: blocks v1.0 production-readiness; address before next release
  • medium: completes the v1.x feature set; nice-to-have but not blocking
  • low: polish / quality-of-life; no urgency
  • v2 / future: requires significant new scope; not in v1 trajectory