Fix #273: align prose + manifest to schema/model 'evaluation' naming by ealt · Pull Request #279 · ealt/eden

ealt · 2026-06-03T23:26:44Z

Summary

Why: The variant evaluation-payload field is evaluation in variant.schema.json, the eden_contracts.Variant model, and the storage/wire impl, but the spec prose spelled it metrics across eight chapters plus the integrator-manifest table. Two names for one field is a readability/onboarding hazard and a latent parity trap — the schema-parity job only checks schema↔model (both already evaluation), so the prose metrics naming was unguarded.
Option 1 (selected during triage): align prose + manifest to the schema/model evaluation naming, keeping the wire + on-tree manifest key stable. The reference integrator already emitted evaluation in .eden/variants/<id>/evaluation.json (see _manifest.py), so this is a prose-and-docstring correction, not a manifest-shape change.
Scope was wider than the issue's two named locations. The issue named 02-data-model.md §9.1 and 06-integrator.md §4.2, but a full audit (per the AGENTS.md "spec inter-chapter restatement is a conflict surface" pitfall) found the same field-name spelling restated across eight chapters (02/03/04/05/06/07/08/10). Renaming only the two named spots would have relocated the drift, so every backtick'd field reference is renamed in lockstep.
Latent conformance bug fixed: test_evaluator_submission.py asserted variant.get("metrics") is None in two evaluation_error scenarios — always None on the wire (field is evaluation), so those assertions passed trivially and never tested the no-graft guarantee they claimed. Corrected to variant.get("evaluation").

What this does NOT cover

baseline.metrics config block (02-data-model.md §2.7, experiment-config.schema.json, eden_contracts.config.BaselineConfig.metrics) — deliberately keeps the metrics name. It is a distinct config field that writes into variant.evaluation; renaming it would touch the config surface, which Option 1 explicitly keeps stable. Not a deferral — an intentional boundary.
Plain-English "metric" concept uses (metric values §1.3, metric names in the evaluation schema §8, "objective over metrics") — left untouched. A metric is a real domain concept; the field that holds the metrics is evaluation.
docs/conformance-coverage.md — a non-CI-enforced generated snapshot, last regenerated at Issue #99: Per-claim MUST/SHOULD audit (chapters 02/03/04/05/06/07/08/09) #112 and drifted ~40 keyword lines since Evaluatable baseline variant (seed becomes a kind=baseline Variant) #122. Regenerating it in this PR would dump unrelated Evaluatable baseline variant (seed becomes a kind=baseline Variant) #122-era churn into a focused rename, so it is left for routine regeneration (it will pick up the renamed prose then). No tracked deferral — the doc self-documents that it is generated and re-runnable.
No new runtime behavior: the only impl changes are docstrings + one human-readable validate_acceptance reason string. No wire/schema/model field changes.

Fresh-operator walkthrough

N/A — internal change only (spec prose, docstrings, one error-message string, test data). No operator-facing surface changes. The 07-wire-protocol.md §11 reference-helper endpoint path was corrected to /validate/evaluation to match the impl, which already exposed that path — no behavior change.

Test plan

All literal pre-push gates from AGENTS.md "Commands" run locally:

Related issues

Closes Spec/impl drift: variant evaluation-payload field is evaluation (schema/model) vs metrics (02-data-model §9.1 prose + integrator manifest) #273 — Spec/impl drift: variant evaluation-payload field is evaluation (schema/model) vs metrics (prose + manifest)
Refs Evaluatable baseline variant (seed becomes a kind=baseline Variant) #122 — surfaced this drift (its CHANGELOG entry's "Pre-existing drift surfaced (tracked)" note)

🤖 Generated with Claude Code

The variant evaluation-payload field is `evaluation` in variant.schema.json, the eden_contracts.Variant model, and the storage/wire impl, but the spec prose spelled it `metrics` across eight chapters plus the integrator-manifest table. Option 1 from the issue: rename the prose + manifest to `evaluation`, keeping the wire and on-tree manifest key stable (the reference integrator already emitted `evaluation` in evaluation.json). A full audit (per the AGENTS.md inter-chapter-restatement pitfall) found the drift was wider than the issue's two named locations — every backtick'd field reference is renamed in lockstep across spec chapters 02/03/04/05/06/07/08/10, the variant.schema.json evaluated_by description, impl docstrings, and the validate_acceptance reason string. Also fixes a latent conformance bug: test_evaluator_submission.py asserted `variant.get("metrics") is None`, which was always None on the wire (field is `evaluation`) and so never tested the no-graft guarantee it claimed. The baseline.metrics config block keeps its name (distinct config field that writes into variant.evaluation; out of scope for Option 1). Plain-English "metric" concept uses are untouched. docs/conformance-coverage.md (a stale, non-CI-enforced generated snapshot) is left for routine regeneration to avoid unrelated #122-era churn in this focused rename. Closes #273. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ealt force-pushed the impl/issue-273-evaluation-metrics-drift branch from c2f0519 to d654cb6 Compare June 4, 2026 02:42

ealt merged commit edf2a83 into main Jun 4, 2026
39 of 40 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #273: align prose + manifest to schema/model 'evaluation' naming#279

Fix #273: align prose + manifest to schema/model 'evaluation' naming#279
ealt merged 1 commit into
mainfrom
impl/issue-273-evaluation-metrics-drift

ealt commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ealt commented Jun 3, 2026

Summary

What this does NOT cover

Fresh-operator walkthrough

Test plan

Related issues

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant