Skip to content

Implement: Backfill: per-route store swapping for the experiment switcher (#145)#265

Merged
ealt merged 15 commits into
mainfrom
impl/issue-145-per-route-store-swap
Jun 2, 2026
Merged

Implement: Backfill: per-route store swapping for the experiment switcher (#145)#265
ealt merged 15 commits into
mainfrom
impl/issue-145-per-route-store-swap

Conversation

@ealt
Copy link
Copy Markdown
Owner

@ealt ealt commented Jun 2, 2026

Closes #145. Implements docs/plans/issue-145-per-route-store-swap.md.

Summary

  • Makes the Phase 12c experiment switcher load-bearing: every per-experiment web-ui route now resolves the active experiment per-request (resolve_active_context) and operates against its store / config / integrator-repo, instead of the startup-bound app.state.store / experiment_id / experiment_config. "Select experiment Y" now actually changes the data on ideator / executor / evaluator / all /admin/* per-experiment pages.
  • Reference-impl web-ui only — no spec / wire / JSON-schema / Pydantic / conformance change (Decision 10/11). Single-experiment / no-control-plane deployments are observably unchanged (the resolve fast path does zero validation work).
  • New StoreFactory (per-(experiment_id, role) StoreClient views over one shared httpx.Client, JIT worker-credential bootstrap reusing eden_service_common.auth.bootstrap_worker_credential), a top-nav switcher dropdown, a form_experiment_id switch-mid-form guard, per-experiment config-dir + integrator-repo, and the four credential-bootstrap postures (§3.2).

Landed as waves W1–W6 (see commit history + CHANGELOG [Unreleased]).

What this does NOT cover

Fresh-operator walkthrough

  • Single-experiment surface (the unchanged-behavior guarantee) verified via the full web-ui suite including the real-subprocess e2e tests (test_e2e_real_subprocess, test_admin_e2e, test_executor_e2e, test_evaluator_e2e, test_admin_workers_e2e, test_admin_groups_e2e) — these spawn the actual python -m eden_web_ui against a real task-store-server and drive claim→draft→submit / admin-reclaim through real HTTP.
  • Live multi-experiment switcher walkthrough NOT performed — this environment has no running control-plane + multi-experiment Compose stack. The switcher / resolution / posture-C-D credential paths are covered by test_admin_experiments_routes.py + test_resolve_active.py + test_store_factory.py (fakes + the real control-plane server over httpx.MockTransport), but a live operator click-through of "register 2 experiments → switch → observe data follows" is deferred to the multi-experiment Compose smoke (Backfill: compose-smoke-multi-experiment CI job (Phase 12c deferral) #147). Notes: single-experiment behavior passed cleanly; multi-experiment is unit/integration-covered but not live-walked.

Test plan

  • uv run ruff check . — clean
  • uv run pyright reference/services/web-ui/ — 0 errors
  • python3 scripts/check-complexity.py — clean (0 blocking)
  • python3 scripts/check-rename-discipline.py — clean
  • python3 scripts/spec-xref-check.py — clean (no spec edits)
  • markdownlint (CI command) — 0 errors
  • uv run pytest -q reference/services/web-ui/tests/ — green (667 → 669 with new tests); the one full-suite test_e2e_real_subprocess flake and 2 eden-checkpoint failures both pass in isolation (e2e-under-load / test-ordering artifacts, not in this diff's packages)
  • uv run pytest -q (full reference suite) — 1990 passed / 221 skipped / 2 (eden-checkpoint, pass in isolation)
  • codex-review (implementation profile) — converged at round 2 (3 Bugs + 2 Risks round 0, all addressed; round 2 = no remaining Bug/Risk); record under docs/plans/review/issue-145/impl/
  • bash reference/compose/healthcheck/smoke.sh / smoke-subprocess.shNOT run (no docker in this environment); the compose changes (--experiment-config-dir / --credentials-dir / --repo-root flags + bind-mounts + setup-experiment dirs) are single-experiment-additive and should be exercised by CI's compose-smoke jobs before merge.

Note: the wave + codex-review commits on this branch are unsigned — the 1Password SSH-signing agent was locked during most of the implementation. They can be re-signed (git rebase --exec 'git commit --amend --no-edit -S' main) before merge if signature verification is required.

Related issues

@ealt ealt enabled auto-merge (squash) June 2, 2026 03:05
@ealt
Copy link
Copy Markdown
Owner Author

ealt commented Jun 2, 2026

codex-review: converged (implementation profile, 3 rounds; record under docs/plans/review/issue-145/impl/). Round 0 raised 3 Bugs + 2 Risks (config-fallback contract, Posture C/D 401 classification, per-experiment repo durability, credential-staleness eviction, Posture-D switcher visibility) — all addressed; round 1 confirmed 4/5 + 1 new startup-robustness Risk (control-plane bootstrap now degrades instead of crashing); round 2 found no remaining Bug/Risk. New follow-up filed during review: #270 (split executor.py, which crossed 800 SLOC once #137 + #145 both landed — carries a reviewed # slop-allow-file).

Merged origin/main (the branch was behind by #110/#137/#157/#168/#140-plan). Conflicts resolved in CHANGELOG.md, docs/roadmap.md, routes/_helpers.py (unified on main's StorageNotFound alias + urlencode), and routes/evaluator.py (kept both my config=active.config threading and #137's view/resolver/variant_id redesign). Re-validated: full web-ui suite 676 passed; ruff / pyright / complexity-gate / rename-discipline / markdownlint clean. PR is now MERGEABLE.

Note: commits are unsigned (the 1Password SSH-signing agent was locked during the work; pushes went through during unlock windows). Re-sign before merge if signature verification is required.

@ealt ealt force-pushed the impl/issue-145-per-route-store-swap branch from 59498e9 to 7b52430 Compare June 2, 2026 20:03
ealt and others added 11 commits June 2, 2026 20:03
…ial plumbing

Introduces the per-experiment store-vending substrate that lets every
web-ui route operate against the operator's selected experiment:

- store_factory.py: live StoreFactory (per-(experiment_id, role)
  StoreClient views over one shared httpx.Client; JIT worker-credential
  bootstrap via BearerCache) + StaticStoreFactory (single pre-built
  store for the single-experiment / test path).
- credentials.py: deployment-scoped control-plane credential bootstrap
  (Posture C) + credential-dir resolution.
- routes/_helpers.py: resolve_active_experiment / active_config /
  resolve_active_context with the StaleSelection / ControlPlaneUnreachable
  / MissingAdminToken / config-missing exception taxonomy and the
  unseeded (registered-but-not-seeded) classification.
- app.py: build/accept a store_factory, move experiment_id to a
  per-request template context processor, add experiment_config_dir +
  config cache; lifespan closes the factory. Legacy store/admin_store
  kwargs retained for the W3 fixture sweep.
- cli.py: build the live StoreFactory; new --credential-dir /
  --experiment-config-dir / --control-plane-worker-id flags;
  control-plane client uses the bootstrapped deployment-scoped credential.

All 629 existing web-ui tests pass unchanged via the StaticStoreFactory
compat path; adds test_store_factory.py + test_resolve_active.py.

Refs #145 (plan W1).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…modules

Every per-experiment route handler now resolves the active experiment
per-request via resolve_active_context(request) instead of reading the
startup-bound request.app.state.store / experiment_config / admin_store /
experiment_id. With control_plane=None (single-experiment / test posture)
the helper returns the deployment default, so behavior is identical and
the existing suite validates the refactor unchanged.

- ideator / executor / evaluator: resolve + thread experiment_id (executor
  starting-variant) and config (evaluator render/parse helpers) through.
- admin observability / actions / work_refs / index / workers / groups:
  resolve store + admin_store; actor attribution stays app.state.worker_id
  (until #140). Narrowed payload casts where tightening store: Any → Store
  surfaced pre-existing union-access types.
- index.py: resolve store.

Credential-dir fix: the web-ui is a worker host, so resolve_credential_dir
now honors the common --credentials-dir / $EDEN_WORKER_CREDENTIALS_DIR
(with --credential-dir / $EDEN_CREDENTIAL_DIR as override, XDG as final
fallback). Without this the per-experiment BearerCache wrote to a shared
XDG path and leaked stale credentials across ephemeral task-stores —
breaking the admin-workers / admin-groups real-subprocess e2e tests
(stale token → reissue → 404 → startup crash).

Full web-ui suite green (651 incl. e2e); ruff + pyright clean.

Refs #145 (plan W2).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ke_app

make_app now takes store_factory as its sole store dependency (legacy
store= / admin_store= kwargs and app.state.store / admin_store are gone;
experiment_config / experiment_id stay as the single-experiment config
source + resolve fast-path default). The CLI passes its live
StoreFactory; tests build a StaticStoreFactory via the new
conftest._one_experiment_factory helper. All direct make_app /
make_web_ui_app call sites across the test suite converted.

Also fixes AdminGateMiddleware (missed in W2): it read app.state.store
for the admins-group check. It now resolves the active experiment's
worker store via resolve_active_context, so the admin gate follows the
operator's experiment selection. Deployment-scoped admin pages
(/admin/experiments, /admin/control) gate against the default experiment
and are exempt from per-experiment resolution (they are the redirect
target for resolution failures, so resolving them per-experiment loops).

Full web-ui suite green (651); ruff + pyright clean.

Refs #145 (plan W3).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- base.html gains a top-nav switcher dropdown: a no-JS <details> listing
  every registered experiment as a CSRF-protected POST to
  /admin/experiments/{E}/select, keyed off the session selection (shows
  "Active: <id>" / "Default: <id>", highlights the active row). Hidden in
  no-control-plane deployments.
- switcher_context template context processor + a 5s in-process TTL cache
  on list_experiments (§3.7) so the per-render dropdown doesn't hammer the
  control plane.
- form_experiment_guard (§3.6): every worker submit form carries a hidden
  form_experiment_id; the ideator/executor/evaluator submit handlers
  discard a submission whose form was rendered against a different
  experiment than the now-active one and redirect with a clear banner,
  rather than silently writing to the wrong experiment.
- The dashboard renders the active-experiment resolution-failure banners
  (stale-selection / control-plane-unreachable / cannot-bootstrap-credential
  / task-store-unreachable / config-missing / config-invalid /
  switched-mid-form) that the per-route redirects target.

Full web-ui suite green (660); ruff + pyright clean.

Refs #145 (plan W4).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The executor module's local bare clone is now per-experiment. A new
RepoMaterializer vends per-experiment clones under
<repo-path-parent>/<experiment_id>.git (cloned from the Forgejo remote
with the per-experiment URL substituted from the --forgejo-url org base,
fetched on each access); a repo_for(request, experiment_id) helper
returns the startup-materialized app.state.repo for the deployment
default (single-experiment deployments unchanged) and the materialized
clone for non-default experiments. The executor submit + draft-render,
the admin work-refs list/delete, and the admin dashboard now resolve the
active experiment's repo.

Full web-ui suite green (667); ruff + pyright clean. Adds
test_per_experiment_repo.py.

Refs #145 (plan W5).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- glossary: add experiment switcher / selected experiment / active
  experiment (active_experiment_id) / StoreFactory / active store.
- user-guide §12: rewrite for the control-plane + switcher multi-experiment
  path (selection now changes data); separate-stacks isolation kept as 12.2.
- docs/operations/web-ui-multi-experiment.md (new) + README index: the
  switcher, the four credential-bootstrap postures, per-experiment config
  / repo layout, config-drift caveat.
- compose web-ui: --experiment-config-dir + web-ui-configs bind-mount +
  explicit --credentials-dir (the new resolver otherwise falls back to a
  non-persisted in-container XDG path).
- setup-experiment.sh: create web-ui-configs/ + copy each experiment's
  config to <data-root>/web-ui-configs/<id>.yaml.
- CHANGELOG [Unreleased] entry (closes the 12c §3.6 deferral); roadmap row.
- Deferrals filed: #259 (config wire endpoint), #260 (resolve cache),
  #261 (v1 switcher affordances), #262 (admin-form guard); #147 (multi-exp
  smoke) unchanged.

Refs #145 (plan W6).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… gate

The per-experiment wiring pushed make_app to 106 lines and submit_idea to
101 (threshold 100). Extracted _install_healthz_and_error_handlers(app,
templates) from make_app, and a _collect_idea_form_fields(form) helper
(also DRYs the identical block in add_row). Behavior-preserving; web-ui
suite green; complexity-gate clean (0 blocking).

Refs #145.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Addresses all 5 codex round-0 findings (record under
docs/plans/review/issue-145/impl/):

- Bug 1: active_config no longer silently reuses the default experiment's
  config for a non-default experiment with no --experiment-config-dir
  (raises ExperimentConfigMissing → config-missing redirect).
  --experiment-config is now optional in control-plane mode;
  _resolve_default_config validates the posture + fails fast on a
  default-config/config-dir mismatch. make_app.experiment_config is now
  ExperimentConfig | None.
- Bug 2: resolve_active_experiment catches Unauthorized on the seed probe,
  evicts the cached credential, re-bootstraps once, and raises
  MissingAdminToken (cannot-bootstrap-credential) on a persistent 401 —
  the Decision-8 Posture C/D ladder (a 401 is never inferred as unseeded).
- Bug 3: per-experiment clones move to a durable --repo-root
  (<repo-root>/<id>.git); Compose bind-mounts web-ui-repos + passes the
  flag (the parent-of-repo-path default lands on the non-durable
  container fs in Compose).
- Risk 4: StoreFactory.evict(experiment_id) clears the cached bearer +
  clients, wired into the 401 recovery so reseed/reissue self-heals.
- Risk 5: switcher hidden (not an empty dropdown) when control-plane reads
  are unavailable with a cold cache (Posture D / CP outage).

New tests: config-missing redirect, 401→evict→MissingAdminToken,
switcher-hidden-on-cp-error. Web-ui suite green in isolation; ruff /
pyright clean.

Refs #145.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ashes

Round-1 verdict: 4/5 round-0 findings resolved. Round-2 fixes:

- New Risk: _build_control_plane_client now catches transport / control-plane
  WireError (not just RuntimeError) — a control-plane outage or rejection at
  startup degrades to the Posture-D banners + hidden-switcher posture instead
  of aborting web-ui startup.
- Finding 4 (default-experiment credential staleness): scoped to non-default
  (the default fast path stays zero-overhead, matching pre-#145 behavior) and
  folded into #260; not a regression.

ruff / pyright clean; control-plane + resolve tests green.

Refs #145.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Commits the impl-stage codex-review iteration record under
docs/plans/review/issue-145/impl/ (durable *.md; regenerable
*.jsonl/*.stderr/prompt.txt gitignored per top-level .gitignore).
Review converged after 3 rounds (0 → 1 → 2): round 0 raised 3 Bugs +
2 Risks, all addressed; round 1 confirmed 4/5 + 1 new Risk; round 2
confirmed no remaining Bug/Risk findings.

Refs #145.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
W2's incoming side removed variant_id from _parse_evaluator_submit_form,
but the function body's nested call to _maybe_bundle_evaluator_artifact
still references variant_id. Reinstate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ealt ealt force-pushed the impl/issue-145-per-route-store-swap branch from 7b52430 to 450fb76 Compare June 2, 2026 20:03
ealt and others added 4 commits June 2, 2026 13:18
…xt additions

#145 added ~16 SLOC to executor.py from threading resolve_active_context
through each handler, pushing the file from 800 to 816 SLOC. Per-resource
split is a separate refactor candidate (cousin of F-3); not in scope here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tch check

The 8 compose smokes all append fields (ideation_policy / max_quiescent_iterations)
to the mounted experiment-config.yaml *after* setup-experiment has already copied
the pre-append config into the web-ui's --experiment-config-dir. The codex-round-1
"fail fast on default-config/config-dir mismatch" check then saw the two configs
differ and exited the web-ui (rc=1) → container unhealthy → every compose-smoke /
compose-e2e job failed at `up --wait`.

The check was over-strict: the deployment-default experiment ALWAYS resolves its
config from --experiment-config (active_config's default branch returns
app.state.experiment_config and never reads <config-dir>/<default>.yaml), so a
divergent default entry in the config-dir is harmless. Removed the mismatch
fail-fast; the posture validation (single-exp requires --experiment-config;
control-plane mode requires config OR config-dir) stays.

Verified: `bash reference/compose/healthcheck/smoke.sh` → PASS (web-ui healthy,
quiescence reached, all assertions pass). ruff + pyright clean.

Refs #145.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Records the round-3 review of the compose-smoke regression fix (removing the
over-strict default-config/config-dir mismatch check). Codex confirms it is
acceptable and does not reintroduce round-0 Bug 1 — the load-bearing fix
(non-default experiments never fall back to the default config; raise
ExperimentConfigMissing) lives in active_config, which is unchanged. No
remaining Bug/Risk. Review converged across 4 rounds (0→1→2→3).

Refs #145.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@ealt
Copy link
Copy Markdown
Owner Author

ealt commented Jun 2, 2026

Fixed the compose-smoke regression (all 8 compose-smoke + compose-e2e jobs were red: "container eden-web-ui is unhealthy").

Root cause: the codex-round-1 Bug-1 fix added a "fail fast on default-config / config-dir mismatch" check in _resolve_default_config. Every compose smoke appends fields (ideation_policy / max_quiescent_iterations) to the mounted experiment-config.yaml after setup-experiment has already copied the pre-append config into the web-ui --experiment-config-dir. The check then saw the two differ and exited the web-ui (rc=1) → unhealthy → every stack-bringing smoke failed at up --wait. All 8 smokes append, so they shared this one root cause.

Fix (0e0275c): removed the mismatch fail-fast. It was over-strict — the deployment-default experiment always resolves its config from --experiment-config (active_config's default branch returns app.state.experiment_config and never reads <config-dir>/<default>.yaml), so a divergent default entry in the config-dir is harmless. The real Bug-1 concern (silent default→non-default config fallback) is still prevented by active_config raising ExperimentConfigMissing.

Verified locally: smoke.shPASS, smoke-subprocess.shPASS (web-ui healthy, quiescence reached, all assertions pass). codex-review re-confirmed (round 3): acceptable, does not reintroduce Bug 1 — converged across 4 rounds.

rename-discipline is green (the earlier failure was a stale read of the pre-merge head); the branch is current with main (no _helpers.py conflict — that too was a stale rebase artifact). CI is re-running the smokes now.

@ealt ealt merged commit 28a1256 into main Jun 2, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Backfill: per-route store swapping for the experiment switcher (Phase 12c §3.6 deferral)

1 participant