Implement: Disambiguate user-facing names from system ids (#128)#278
Merged
Conversation
85dab80 to
00d7835
Compare
Owner
Author
|
Update — rebased onto #122 + codex-review converged.
|
dd1521a to
1cf9c20
Compare
1cf9c20 to
e262f01
Compare
e262f01 to
45257d2
Compare
Define the canonical opaque-id grammar (exp_/wkr_/grp_ + Crockford-base32 ULID) and display-name grammar in 02-data-model.md §1.6/§1.7; reserved values move from id-space to name-space. Propagate across chapters 01/03/04/05/07/08/09/10/11 (defer to §1.6/§1.7, no restated grammar), including the wire register-mints-id shapes, ?name= lookups, bearer principal admin|wkr_*, and the checkpoint mint-fresh-on-import provenance (imported_from.source_experiment_id). Update docs/glossary.md + docs/prds/eden-experiment-platform.md. Wave 1 of the atomic rename in docs/plans/identity-id-name-disambiguation.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
eden-contracts: opaque-id grammars (exp_/wkr_/grp_), ActorId/MemberId unions, DisplayName, and a Crockford-base32 ULID minter in _common.py; every model carries the opaque types + optional name; ImportProvenance gains source_experiment_id. JSON Schemas (core + wire) updated in lockstep with a display-name FormatChecker; fixtures swapped to opaque (schema parity 228 cases). eden-storage + eden-control-plane: register_worker/ register_group/register_experiment now MINT opaque ids and take optional name; reserved values move to name-space (workers admin/system/internal, groups admins/orchestrators) with an allow_reserved seed seam; name TEXT columns + indexes (storage migration v7); InvalidName error (422). No id-based idempotency (mint-always); restart recovery uses persisted id + reissue_credential. Tracked decisions for review: checkpoint import does not thread experiment name (receiver supplies its own); control-plane uses exp_0..0 sentinel for deployment-scoped workers' experiment_id. Gates: contracts/storage/control-plane suites green (738 passed), schema + wire-schema parity, jsonschema metaschema, rename-discipline. Wave 2 of the atomic rename (docs/plans/identity-id-name-disambiguation.md). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
eden-wire: register endpoints mint ids (drop caller-supplied id, accept optional name), ?name= list filter, whoami returns name; bearer principal grammar admin|wkr_*; StoreClient.whoami() returns WhoamiResult; authority groups resolved by reserved name to grp_* id; register_group allow_reserved gated on the admin principal. eden-service-common bootstrap: services never fresh-register — verify-via-whoami then admin reissue_credential for a known worker_id (setup mints). Control-plane server, orchestrator control-plane bootstrap (self-registers + persists minted id, joins orchestrators by name), and all service/dispatch test fixtures migrated to minted opaque ids + name-space reserved values. Wave 3a/3b of the atomic rename (docs/plans/identity-id-name-disambiguation.md). setup-experiment.sh + .env (3c) and web-ui (wave 4) follow; full suite + smokes run once those land. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…128) web-ui: consumes mint-by-name APIs (register forms POST name, server mints id), whoami().worker_id, /admin gate resolves admins group by reserved name to grp_* id; renders <name> (<id>) across worker/group/experiment list+detail and attribution fields, adds ?name= search box, member/target inputs use opaque grammar. setup-experiment.sh: mints exp_* + per-role wkr_* + reserved grp_* under the admin bearer, captures minted ids into .env + tokens into the credentials dir, idempotent-on-rerun via .env; .env.example regenerated as a minted-id artifact; compose overlays audited for opaque-id flow (#178). Known-pending: 5 claim-driving web-ui e2e tests need auth-enabled migration; compose healthcheck scripts need opaque-id update (finalize). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…cs/skills (#128) Migrate the 5 claim-driving web-ui e2e tests (test_admin_e2e, test_executor_e2e, test_evaluator_e2e, test_e2e_real_subprocess) to auth-enabled mode: per-role minted-worker bearers (the wire attributes the claimant to the bearer, and create/accept are orchestrators/admins-gated), since the auth-disabled 'anonymous' sentinel can no longer be a registered worker_id. Update eden-manual SKILL.md (register-by-name, read minted wkr_*, name(id) display, ?name= lookup) and operator docs (user-guide, initial-admin-credential, agent-readonly-db, observability) for the opaque-id/name split. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Harness now mints opaque ids and resolves stable display-name handles:
identity.py mints exp_* per scenario; wire_client carries name->minted-id
maps (worker_id_for/group_id_for/member_ref) + per-call bearer resolution;
_seed register_worker/register_group/_ensure_group/create_group post {name}
and record the handle->minted-id mapping, and task-target/member handles
resolve to minted ids. Scenarios keep readable handles; assertions on a
claimant/attribution/target id compare against wire_client.worker_id_for(handle).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Migrate claim/attribution/event-payload/reassignment + worker auth/registration scenarios to mint semantics: claimant/attribution id assertions resolve via wire_client.worker_id_for(handle); reserved values name-space; registration is mint-based (no id-idempotency). Unverified pending the full-suite run at finalize. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lane client (#128) Migrate remaining group/auth/submission/dispatch/checkpoint/lease/control-plane scenarios + control_plane_client to mint semantics: id assertions resolve via worker_id_for/group_id_for/member_ref; reserved values name-space (409); group register mints grp_*; checkpoint import asserts fresh exp_* + source_experiment_id provenance; control-plane register mints exp_*/wkr_*/grp_* with ?name= lookup. check_citations green (260 scenarios). Unverified pending full-suite run + the impl gaps it surfaced (invalid-name vocab, _checkpoint source_experiment_id, control-plane InvalidName->422). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…#128) Three gaps the conformance suite exposed: (1) checkpoint import now lands under the receiver's own minted experiment_id and stamps imported_from.source_experiment_id with the source manifest id (provenance, not PK reuse) — _checkpoint.py + storage test updated; (2) control-plane server maps InvalidName -> 422 invalid-name (was 500); (3) conformance error-vocabulary table includes eden://error/invalid-name. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…aware (#128) smoke*.sh / e2e.sh / e2e_drive.py resolve experiment/worker/group ids from the minted values setup-experiment writes to .env (or via ?name= / opaque-id URLs) instead of legacy typed literals. smoke-multi-orchestrator.sh mints the second replica's per-experiment worker (staged bring-up: task-store-server first), writes EDEN_ORCHESTRATOR_2_WORKER_ID, and asserts membership by opaque grp_*/wkr_* ids. Verified via bash -n locally; the docker-gated compose-* CI jobs are the live validation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CHANGELOG [Unreleased] entry for the identity rename; roadmap row flipped to shipped 2026-06-03. Deferrals tracked as #275 (name collision soft-check), Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t_id (#128) eden-git integrator/manifest/remote-integrator tests constructed Variant/Idea with kebab experiment_id literals that now fail the exp_* grammar. Swapped to valid opaque ids (no worker-attribution/target ids appear in these fixtures; idea_id/variant_id/slug/SHA unchanged). No src change — the integrator passes ids through from the contracts objects. 112 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… gaps (#128) Full conformance run surfaced 17 failures. Impl: (1) ill-formed display name now returns 422 invalid-name not 400 — register request models accept plain str so the store's InvalidName (422) is authoritative (response models keep DisplayName); (2) control-plane GET /experiments honors ?name= exact filter through store + route; (3) reserved group names stay rejected (409 reserved-identifier) on a second create even under the allow_reserved seed seam (spec §7.5 'the name is taken') so the outcome is wire-observable. Scenarios: auth_enabled fixture mints exp_*; termination/checkpoint/lease/ holder assertions resolve handles to minted ids; group reserved test uses the admin bearer. Impl-package suites green (789); check_citations + both schema-parity suites green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…reserved groups (#128) _ensure_group recorded the minted grp_* id only on a 200 create; on a 409 (the reserved group already exists — pre-seeded or imported via checkpoint into the receiver) it left the handle unresolved, so later member-adds addressed the group by its bare name and 404'd. Now resolves the existing group by ?name= and records the handle->id mapping. Fixes test_checkpoint_terminated::test_imported_terminated_experiment_rejects_create_task. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#128) #122 (evaluatable baseline variant) merged to main while #128 was in flight; rebasing #128 onto it left #122's new baseline/base_commit_sha test fixtures using legacy kebab experiment ids that the opaque grammar now rejects. Swap to valid exp_* ids (storage test_baseline_variant + test_checkpoint_storage base_commit_sha tests, dispatch test_baseline_counts, orchestrator test_baseline; terminated_by 'orchestrator'->'admin' actor-grammar fix). Merge conflicts in spec/schemas/storage/contracts resolved to carry BOTH #122 (base_commit_sha, variant.kind, conditional idea_id) and #128 (opaque ids + name); storage name-migration renumbered v7->v8 (after #122's base_commit_sha v7). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…128) #122's new schema-parity / roundtrip / checkpoint-manifest fixtures used kebab experiment_id='exp-1' marked should_pass=True; the #128 opaque grammar rejects it, so schema and should_pass disagreed. Swap to a valid exp_* in the affected experiment / baseline-variant / baseline-event accept fixtures (cases.py) and the canonical checkpoint manifest. 263 contracts+checkpoint tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ain ${VAR}
setup-experiment's own 'docker compose build' interpolates every service's
command (compose v2 behavior). The orchestrator/web-ui --worker-id used a :?
guard, but those ids are SERVER-minted — they cannot exist until setup brings
the task-store-server up and registers the workers, which is AFTER that build.
The :? therefore aborted setup-experiment before any worker could be minted
(surfaced only by smoke.sh; the wave-3c author had no docker). Plain ${VAR}
(matching the host-worker vars) lets the bootstrap build/seed pass; setup
writes the minted ids to .env before the operator's final 'up'. base_commit_sha
keeps its :? — setup writes a valid 000..0 placeholder for it pre-build.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eserved-group guard + null-vs-absent (#128) blocking: import recovery probe in eden-wire client.py queried the source manifest id, but a #128 import lands under the receiver's own minted id — probe self._experiment_id and match imported_from.source_experiment_id (the recovery-probe ladder now confirms the right landing). major: control-plane reserved group names could be minted twice (admin allow_reserved bypass had no 'already exists' check) — add the §7.5/§11.6 uniqueness guard to memory.py + postgres.py (mirrors the per-experiment store); orchestrator bootstrap now suppresses ReservedIdentifier (the 'name taken' race outcome) alongside AlreadyExists. +regression test. major: RegisterWorkerRequest.labels / RegisterGroupRequest.members accepted explicit null but the schemas only allow object/array — wrap with NotNone. minor: corrected stale prose (07-wire §1.3 checkpoint-header carve-out + the read_experiment admin-gated/either-auth contradiction; WorkerRegistration, checkpoints.py, _checkpoint.py docstrings) for the receiver-minted-id behavior. Review record: docs/plans/review/issue-128/impl/20260603T120000/0-review.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…periment receiver (#128) Round 1's one major: spec 07 §14.2 / 10 §10 said an unkeyed import 'mints a fresh exp_*', but the reference single-experiment task-store-server (correctly) imports under its one configured experiment_id — it cannot serve a distinct freshly-minted id. Amend both sections to cover BOTH receiver models (multi-experiment: mint-fresh + enumerate; single-experiment: configured id + read-back), framing the normative invariant as source-id-non-reuse + source_experiment_id provenance, not per-import minting. Impl/client/conformance were already consistent; this aligns the spec to them. Review record: docs/plans/review/issue-128/impl/20260603T120000/1-review.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-import prose (#128) Round 2 found the core §14.2/§10 paragraphs aligned but stale residue in adjacent prose (10 §5 manifest field, 10 §11 minting/override, 07 §9 error table + §14.2 header carve-out, 02 §2.5 source_experiment_id note, 08 §1.9 import op). Reword all checkpoint-import experiment-id prose to the dual-model framing (receiver's own id: fresh-mint for multi-experiment, configured id for single-experiment; invariant = source-id non-reuse + provenance). Left the control-plane register_experiment 'mints fresh' wording (11 §2.2 / 09 §5) — it is genuinely fresh-mint per registration. xref + markdownlint clean. Record: docs/plans/review/issue-128/impl/20260603T120000/2-review.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ue (#128) 02-data-model.md §2.5 experiment_id row, 08-storage.md §1.9 import paragraph, 07-wire §14.3 read_experiment prose — reword the remaining 'freshly minted on import' / 'receiver mints its own primary-key' lines to the dual-model framing (receiver's own id; source-id non-reuse + provenance). Worker/group/ control-plane register_* 'mints fresh' wording is correct and untouched. Record: docs/plans/review/issue-128/impl/20260603T120000/3-review.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#128) ch10 §10 mandates source_experiment_id be stamped on every imported experiment and the recovery probe matches on it, but it was typed optional/nullable (experiment.py ImportProvenance, experiment.schema.json imported_from, 02 §2.5 table, 07 §14.3 JSON). Make it REQUIRED within imported_from (the imported_from object itself stays optional — absent on natively-created experiments). The importer (_checkpoint.py) already always stamps it, so no construction breaks; 348 contracts/checkpoint/storage tests pass. Record: docs/plans/review/issue-128/impl/20260603T120000/4-review.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ry-probe prose (#128) Round 5 verdict: no blocking/major — CONVERGED. Final cosmetic polish: the imported_from recovery-probe descriptions (experiment.schema.json, 02 §2.5) now name the full (source_experiment_id, checkpoint_exported_at) match pair rather than just checkpoint_exported_at. Commit the round-1..5 codex-review records under docs/plans/review/issue-128/impl/20260603T120000/. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The test conftest dynamically attaches store.seeded_workers — a test-only fixture handle that maps friendly worker names to the opaque ids minted by #128's register_worker(). Pyright correctly flags this since it's not a declared attribute on Store. Per-file disable in the 10 test files that use it is the surgical fix (test-only concern; production code stays clean). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…NGELOG MD022, complexity-gate (#128) python-typecheck: 13 web-ui/control-plane test files use the dynamic store._test_worker_ids fixture handle (analogue of storage's seeded_workers); ae11c9d only covered the 10 storage files. Add the same # pyright: reportAttributeAccessIssue=false after each module docstring (test-only). docs-lint: blank line before the #128 CHANGELOG heading (MD022; lost in the rebase merge of the two [Unreleased] entries). complexity-gate: extract the duplicated admin-reissue-and-persist block from bootstrap_worker_credential into _admin_reissue_and_persist (DRY); condense the two bootstrap docstrings (inline comments already document each branch) — both now under the 100-line threshold. 117 _common tests green; pyright 0 errors; docs-lint + complexity gates clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… container reads Root cause of the compose-* CI failures: setup-experiment's persist_token wrote each worker's <id>.token at 0600, owned by the HOST user. The worker-host containers read it as eden:1000; on a Linux native bind-mount the 0600 host-owned file is unreadable by eden:1000 → 'PermissionError: ... /var/lib/eden/credentials/<wkr>.token' → orchestrator crash-loops → every compose smoke that brings up the orchestrator fails. macOS Docker Desktop's uid-mapping masks this, so local smoke.sh passed (false negative). Persist at 0644 instead — consistent with the already-0777 credentials dir's documented bind-mount posture (host-setup writes, container reads). Hardened deployments use matching uids / a secrets manager. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#147's multi-experiment smoke assumes pre-#128 caller-supplied experiment_id model. Under #128's spec-correct minting, control-plane and task-store mint independent exp_*s; lease-mode orchestrator drives a non-existent task-store experiment. Skip the job with `if: false` + clear comment + tracking #281. Re-enable when the architectural reconciliation lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…or smoke `smoke-multi-orchestrator.sh` asserted `orchestrators`-group membership once, immediately after `docker compose up --wait` returned. But each replica self-joins the group via `_ensure_orchestrators_membership` during its startup-reconcile (after cloning the repo + reconciling remote orphans) — which completes a few seconds AFTER the container reports healthy. `up --wait` gates on the healthcheck, not on app-level group membership, so the single-shot assertion raced the join and the late joiner (usually the primary, which does more startup work) was intermittently reported missing. #128 widened the race window: the pre-rename self-join was a single fixed-id `add_to_group("orchestrators", …)`; post-rename the reserved NAME must first be resolved to its minted opaque `grp_*` id via an extra `list_groups(name=…)` round-trip, then the add. The store has only one `orchestrators` group (writes serialize on its single connection — no duplicate, no lost update); the failure was purely the assert-too-early timing, surfaced by #128 but not caused by it. Fix: poll until both minted worker_ids appear in the group (60s deadline), mirroring the poll-with-deadline pattern the smoke's other stages already use (lease-holder, integrated-count). Verified the smoke runs to PASS end-to-end locally. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
5abcf82 to
e89a970
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #128. Implements
docs/plans/identity-id-name-disambiguation.md. Strict prereq for the cluster-identitywork — unblocks #140 / #141 / #143 / #144.What this does
One atomic rename that stops conflating system identifier with operator-facing label for the three identity-carrying entities (experiment, worker, group). Their ids become opaque, system-minted, immutable (
exp_*/wkr_*/grp_*— a type-prefix + 26-char lowercase Crockford-base32 ULID), and each gains an OPTIONAL operator-supplied display name. Reserved values move from id-space to name-space. Pre-external-user clean break — no compat shims, no migration tooling; existing experiments are re-bootstrapped.Landed as 5 review-slice waves inside this PR (see commits):
02-data-model.md§1.6 (opaque-id grammar + composite actoradmin|wkr_*/ memberwkr_*|grp_*shapes) and §1.7 (display-name grammar) are the canonical statement; chapters 01/03/04/05/07/08/09/10/11 defer to them. Register endpoints drop the caller id + take optionalname(server mints);?name=lookups; bearer principaladmin|wkr_*; checkpoint import mints a freshexp_*and stampsimported_from.source_experiment_idprovenance (normative behavior change)._common.pyopaque-id types + Crockford ULID minter;register_*MINT ids (no id-idempotency; restart recovery = persisted id +reissue_credential); reserved-name guard +allow_reservedseed seam;namecolumns + indexes (storage migration v7);InvalidName→ 422.setup-experiment.shis the sole minter of infra workers (ids →.env, tokens → credentials dir, idempotent via.env); compose overlays + healthcheck smokes resolve ids from.env/?name=/ opaque-id URLs.name; lists/details render<name> (<id>)with a name-search box;/admingate resolvesadminsby reserved name.exp_*per scenario and resolves display-name handles → minted ids transparently; scenarios register-then-use, assert reserved-value rejection in name-space + checkpoint provenance.Validation
uv run pytest -q— full reference suite green (the lone local flake is anos.killpgsandbox restriction on sequential subprocess teardown; passes in isolation; Linux CI unaffected).uv run pytest -q conformance/— green.check_citations,check-jsonschema,check-rename-discipline, markdownlint, spec-xref — all green.smoke*.sh/e2e.sh) are updated for the rename and passbash -nlocally; they require docker, so thecompose-*CI jobs are their live validation.Fresh-operator walkthrough
setup-experiment.sh <config> --name "<display>"now MINTS the opaqueexp_*and all infra-workerwkr_*/ reserved-groupgrp_*ids, writing them to.env(re-run reuses them — idempotency lives in.env). The experiment-id path segment / data-root / Forgejo repo path are the opaqueexp_*; the mnemonic mention is gone, mitigated byGET /v0/control/experiments?name=and the web-ui switcher rendering<name> (<exp_*>).{"name": "..."}(or use the web-ui form /eden register-worker --name); the server returns the minted id. Pickers/lists render<name> (<id>); logs/events use the id only.admin); reserved namesadmin/system/internal(workers) andadmins/orchestrators(groups) are rejected at register-time (409 reserved-identifier); ill-formed names → 422 invalid-name.What this does NOT cover (deferred, tracked)
list-workers/list-groupseden-manual subcommands — the downstream consumers this unblocks; Executor/group picker UI + list-workers/list-groups eden-manual subcommands #277.🤖 Generated with Claude Code