feat(routing): Goal Classifier Router for planner model selection (TD-195) by engkimo · Pull Request #40 · engkimo/open-morphic

engkimo · 2026-05-20T01:48:43Z

Summary

Per-goal routing of LLMPlanner between Sonnet 4.6 and Haiku 4.5 via a pure-LLM GoalClassifierPort, behind the new MORPHIC_PLANNER_ROUTER flag (default disabled — production routing unchanged on merge).
Two interchangeable classifier adapters share a byte-identical SYSTEM_PROMPT (TD-190 stable-prefix invariant): LLMGoalClassifier (Anthropic Haiku 4.5 via LiteLLM) and LocalGoalClassifier (qwen3:8b via Ollama, $0 ops).
Privacy: raw goal never serialized — GoalClassified events carry sha256(goal)[:16] only. Fail-safe: classifier timeout / parse error → PlannerModel.SONNET (never to Haiku).

Live A/B verdict (3 arms × 10 goals × 3 trials, $0.97 total, dump at docs/benchmarks/planner_ab_router_2026_05_20.json):

metric	Sonnet 4.6	Haiku 4.5	Router	Δ vs Sonnet	Acceptance
entity_preserved	83.8%	71.3%	81.3%	−2.5pt	✓ (≥−5pt)
plan_eval	0.898	0.838	0.884	−0.014	✓ (≥−0.030)
avg cost / call	$0.01351	$0.00715	$0.01218	−9.85%

Captured-saving: 20.9% (paper bar 30% missed). Documented in TD-195 as a workload-mix structural effect of the entity-stressed benchmark (6/10 goals carry CJK / quoted entities → classifier correctly routes them to Sonnet at confidence ≥0.9). The router Pareto-dominates the Sonnet baseline (cheaper at near-identical quality).

Spec, plan, memo

Spec: specs/goal-classifier-router/spec.md
Plan: specs/goal-classifier-router/plan.md
Tasks: specs/goal-classifier-router/tasks.md
A/B memo (authoritative): memory/planner_router_ab_2026_05_20.md
ADR: docs/TECH_DECISIONS.md TD-195

Test plan

`uv run --extra dev pytest tests/unit/` → 3360 passed
`uv run --extra dev ruff check .` → clean
Domain layer purity (rg "from (sqlalchemy|...|infrastructure|application|interface)" domain/...) → 0 hits
`rg "from infrastructure.routing" application/` → 0 hits
T141 local live test (qwen3:8b) → 3/3 PASS, $0
T142 remote live test (Haiku 4.5) → 3/3 PASS, ≤$0.003
T145 regression guard (`MORPHIC_PLANNER_ROUTER=disabled` + planner unit tests) → 28/28 PASS
T144 spec-compliance audit (privacy / fail-safe / KV-cache) → PASS via `morphic-pr-reviewer`

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Optional per-goal planner router that routes tasks between Sonnet and Haiku, with CLI flags to enable router and tune acceptance/cost thresholds; benchmark runs now include router metadata and chosen-model attribution.
Documentation
- ADR, specs, changelog, continuation, and environment-variable docs updated with router behavior and new env vars for enabling and tuning the router.
Tests
- Extensive unit and integration tests added for classifier, router, observability, metrics, and planner integration.

SDD pilot 3 — spec.md / plan.md / tasks.md generated. Targets the −11.4pt entity_preserved regression from the 2026-05-19 Haiku A/B while retaining ≥30% of the 47.6% cost saving on the eligible slice. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ied event Three pure-domain VOs for the goal-classifier-router pilot: - `PlannerModel` (Sonnet/Haiku enum) with `to_gateway_id()` - `GoalClassification` (Pydantic VO, confidence/latency/cost bounded) - `GoalClassified` discriminated-union variant added to DebateEvent Privacy: GoalClassified carries `goal_hash` (sha256[:16]) only — raw goal is never serialized into events. Existing council events untouched and round-trip-compatible. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Async port: `classify(goal: str) -> GoalClassification`. Empty or whitespace goal raises ValueError. Pure abstract; impls go in infrastructure/routing/ (LLM + Ollama). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…nomy

…rmalization

…lClassifier (Ollama) LLMGoalClassifier targets Anthropic Haiku 4.5 via LLMGateway. LocalGoalClassifier targets Ollama qwen3:8b via LLMGateway (LiteLLM routes locally); cost_usd forced to 0.0 since compute is local-only. Both adapters share SYSTEM_PROMPT + parse_classification from _prompts.py so the contract stays byte-identical for fair A/B and stable LiteLLM prompt-cache windows (NFR-5 / TD-190). Parser tolerates qwen3 <think> blocks and fenced JSON. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

LLMPlanner accepts an optional PlannerModelRouter. When present it is consulted per goal and the chosen PlannerModel is resolved to its gateway id (claude-haiku-4-5-20251001 / claude-sonnet-4-6) which is passed to LLMGateway.complete. When the router is None (the default), behavior is byte-identical to the pre-router planner — the existing constructor-time ``model`` argument is honored unchanged. Verified: 8 new router-integration tests, 41 pre-existing planner tests all still pass. TD-190 stable system prefix preserved (SYSTEM_PROMPT is byte-identical across Haiku/Sonnet routes). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Build LLMGoalClassifier (Haiku 4.5) when anthropic_api_key is set, LocalGoalClassifier (Ollama qwen3:8b) otherwise. Inject the resulting PlannerModelRouter into LLMPlanner only when planner_router_mode=enabled; otherwise pass router=None for byte-identical pre-router behavior.

TD-195 router observability layer: - RouterMetrics: dependency-free Prometheus-style counters (decisions_total by (model, reason_category)) + latency_ms histogram buffer. Cardinality bounded at 12 by AD-3 closed ReasonCategory set. - RouterObservingEventBus: EventBusPort decorator that taps GoalClassified events for metrics + one INFO log line per decision (goal_hash, chosen_model, reason_category, classifier_latency_ms, classifier_cost_usd) and forwards all events to the inner bus. Privacy: only sha256[:16] hash is ever logged, raw goal never carried. 20 new tests, 3360 unit tests total, 0 regressions, ruff clean.

…tBus wiring - T110: LocalGoalClassifier live (qwen3:8b, $0) — 3 AD-3 buckets all PASS - T111: LLMGoalClassifier live (Haiku 4.5) — 3 AD-3 buckets all PASS - Refine SYSTEM_PROMPT with quoted-entity clarification + 3 few-shot examples (byte-identical across local/remote per TD-190) - Wire RouterObservingEventBus + RouterMetrics into AppContainer so production picks up metrics + structured logs - Cost ceiling per Haiku call relaxed to <=$0.001 (observed ~$0.0007) Verified: 3,360 unit + 6 live integration green, 0 regressions.

Adds a 3rd arm to planner_quality_ab.py that calls PlannerModelRouter once per goal, then runs the planner with the router-chosen model. Reports: - Router-gated mean vs Sonnet baseline (entity_preserved, plan_eval) - Captured-saving ratio = (Sonnet − Router) / (Sonnet − Haiku-only) - Per-goal routing breakdown (HAIKU vs SONNET counts) Acceptance thresholds (configurable): - entity_preserved Δ >= -5pt vs Sonnet - plan_eval Δ >= -0.030 vs Sonnet - captured-saving >= 30% Default mode (no --router) unchanged.

Run: --router --trials 3, 190 LLM calls, $0.97 total. Quality PASS: entity_preserved -2.5pt (>= -5pt threshold) plan_eval -0.014 (>= -0.030 threshold) Captured-saving 20.9% (< 30% threshold) — structural property of this benchmark mix (6/10 goals carry entities → Sonnet, 4/10 Haiku). Router still strictly Pareto-dominates Sonnet at lower cost; defect is in benchmark composition, not router logic. Memo: memory/planner_router_ab_2026_05_20.md recommends shipping.

- TECH_DECISIONS.md: TD-195 "Goal Classifier Router for Planner Model Selection" — decision/rationale/consequences/follow-ups. Captures the 2026-05-20 live A/B verdict (entity_preserved -2.5pt, plan_eval -0.014, cost -9.85% / call) and explains the 20.9% captured-saving vs 30% paper bar as workload-mix structural, not router defect. - ENV_VARS.md: 3 new vars under "Planner Model Router (v0.6.3, TD-195)" — MORPHIC_PLANNER_ROUTER (disabled|remote|local), MORPHIC_PLANNER_ROUTER_HAIKU_CONFIDENCE_THRESHOLD (default 0.7), MORPHIC_PLANNER_ROUTER_CLASSIFIER_TIMEOUT_MS (default 5000). - CONTINUATION.md: new Sprint 91 (TD-195) section at top with branch HEAD, live A/B numbers, and the memo pointer.

- Wrap two integration-test docstring run-commands across lines so they fit the 100-char limit (E501). - Apply ruff import-sort (I001) on auto-fix for the same files. - Replace blind `Exception` in test_config_router with `ValidationError` (B017); the test asserts pydantic rejects an invalid enum value, and the specific type matches that contract. Ruff: All checks passed.

- ENV_VARS.md: MORPHIC_PLANNER_ROUTER is `disabled | enabled` (remote/local is an auto-selected DI choice, not an env value). - ENV_VARS.md: CLASSIFIER_TIMEOUT_MS default is 1500, not 5000 (matches shared/config.py:172 and plan.md). - TECH_DECISIONS.md TD-195: rewrite the opt-in sentence to reflect the actual env contract + DI-time adapter selection.

coderabbitai · 2026-05-20T01:48:55Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 67c0c9fa-862b-46e0-b718-9bd68a042a03

📥 Commits

Reviewing files that changed from the base of the PR and between ba605c1 and 4b2c75e.

📒 Files selected for processing (5)

docs/CONTINUATION.md
docs/ENV_VARS.md
domain/value_objects/council_events.py
infrastructure/routing/_prompts.py
shared/config.py

📝 Walkthrough

Walkthrough

Per-goal planner-model routing added: classifier port/VOs/events, PlannerModelRouter service with confidence/timeouts/fallbacks, shared prompt + parser, remote/local classifier adapters, RouterMetrics and observing event bus, LLMPlanner wiring for per-goal model selection, DI/settings wiring, extended benchmark and JSON output, docs/specs, and comprehensive unit/integration/live tests.

Changes

Goal Classifier Router Feature

Layer / File(s)	Summary
Benchmark tooling and results `benchmarks/planner_quality_ab.py`, `docs/benchmarks/planner_ab_router_2026_05_20.json`	Benchmark gains an optional Router arm that classifies each goal, rolls classifier overhead into router trial cost, records chosen underlying model and classifier cost per trial, and emits router-specific acceptance/captured-saving summaries.
Domain value objects and types `domain/value_objects/planner_model.py`, `domain/value_objects/goal_classification.py`, `domain/value_objects/council_events.py`	Adds `PlannerModel` enum with `to_gateway_id()`, frozen `GoalClassification` VO, and `GoalClassified` event variant with `ReasonCategory`, hashed goal id, chosen_model, and classifier metadata.
Domain port and router service `domain/ports/goal_classifier.py`, `domain/services/planner_model_router.py`	Introduces `GoalClassifierPort` and `PlannerModelRouter` that invokes a classifier under timeout, gates HAiku by confidence threshold, demotes low-confidence HAiku to SONNET, falls back to SONNET on classifier failure/timeout, normalizes reason categories, and publishes GoalClassified events (best-effort).
Shared classifier prompt and parsing `infrastructure/routing/_prompts.py`	Adds stable SYSTEM_PROMPT and `parse_classification` with think/fence stripping, balanced-brace extraction, JSON decoding, model coercion, field validation/truncation, and `ClassificationParseError`.
Goal classifier adapters `infrastructure/routing/llm_goal_classifier.py`, `infrastructure/routing/local_goal_classifier.py`	`LLMGoalClassifier` calls Anthropic Haiku model and returns parsed GoalClassification with latency/cost; `LocalGoalClassifier` calls Ollama local model and returns parsed GoalClassification with cost forced to 0.0.
Router observability and metrics `infrastructure/metrics/router_metrics.py`, `infrastructure/observability/router_observer.py`	`RouterMetrics` records decision counters by (model, reason_category) and latency samples; `RouterObservingEventBus` records metrics and logs structured INFO for GoalClassified events while forwarding to inner EventBus.
LLMPlanner router integration `infrastructure/fractal/llm_planner.py`	`LLMPlanner` accepts optional `router` in constructor; per-goal `_resolve_model_id(goal)` queries router and converts chosen PlannerModel to gateway model id for `LLMGateway.complete`.
Configuration and DI wiring `shared/config.py`, `interface/api/container.py`	`Settings` adds `planner_router_mode`, `planner_router_haiku_confidence_threshold`, and `planner_router_classifier_timeout_ms`; `AppContainer._build_planner_router()` wires local vs remote classifier, wraps event bus for router observation, and constructs `PlannerModelRouter` when enabled.
Test fakes & unit tests (domain, parsing, adapters, router, observability, integration, container) `tests/unit/...`, `tests/integration/...`	Adds InMemoryGoalClassifier fake, FakeLLMGateway/FakeOllamaGateway, comprehensive unit tests for VOs, port, parsing, adapters, PlannerModelRouter behavior (including reason-category normalization, timeouts, privacy), RouterMetrics and RouterObservingEventBus tests, planner+router integration tests, container wiring tests, settings env-var parsing tests, and live integration tests that skip when backends absent.
Specification and docs `specs/goal-classifier-router/*`, `docs/CHANGELOG.md`, `docs/CONTINUATION.md`, `docs/ENV_VARS.md`, `docs/TECH_DECISIONS.md`, `infrastructure/routing/__init__.py`	Adds feature spec, implementation plan, tasks, changelog entry, continuation handoff, ENV_VARS docs for router settings, TD-195 ADR, and a routing package docstring.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant LLMPlanner
  participant PlannerModelRouter
  participant GoalClassifier
  participant LLMGateway
  participant EventBus

  Client->>LLMPlanner: submit goal
  LLMPlanner->>PlannerModelRouter: select_for(goal)
  PlannerModelRouter->>GoalClassifier: classify(goal)
  GoalClassifier->>PlannerModelRouter: GoalClassification (model, confidence, latency, cost)
  PlannerModelRouter->>EventBus: publish GoalClassified(goal_hash, chosen_model, metadata)
  PlannerModelRouter->>LLMPlanner: chosen PlannerModel
  LLMPlanner->>LLMGateway: complete(system+user, model=chosen_model_id)
  LLMGateway-->>LLMPlanner: completion (candidates)
  LLMPlanner-->>Client: plans

Poem

🐰 A router hops through goals with grace,
Choosing Haiku fast or Sonnet for pace—
Each choice hashed, no raw goal shown,
Costs and confidence gently known.
Tiny classifier, big plan cheer!

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feature/goal-classifier-router

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 18

🧹 Nitpick comments (5)

tests/unit/infrastructure/observability/test_router_observer.py (1)

26-29: ⚡ Quick win

Remove type: ignore and type reason_category as ReasonCategory.

The helper currently bypasses the closed-set contract with str + ignore. Using ReasonCategory keeps test fixtures aligned with production typing.

♻️ Proposed fix

 from domain.value_objects.council_events import (
     DebateStarted,
     GoalClassified,
+    ReasonCategory,
 )
@@
 def _goal_classified(
@@
-    reason_category: str = "generic_tech_english",
+    reason_category: ReasonCategory = "generic_tech_english",
@@
-        reason_category=reason_category,  # type: ignore[arg-type]
+        reason_category=reason_category,

Also applies to: 40-50

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/infrastructure/observability/test_router_observer.py` around lines
26 - 29, The test helper currently types reason_category as a plain str with a
type: ignore; instead import and use the production enum/type ReasonCategory and
remove the ignore to keep fixtures aligned with runtime types. Update the
imports in test_router_observer.py to include ReasonCategory (alongside
DebateStarted and GoalClassified) and change the helper variable/parameter named
reason_category to be typed as ReasonCategory (remove the # type: ignore).
Ensure any test fixtures or factory calls that construct reason_category use a
valid ReasonCategory member.

infrastructure/metrics/router_metrics.py (1)

46-51: ⚡ Quick win

Prevent mutation of internal metric state via public properties.

Returning internal defaultdict/list exposes mutable state, so callers can modify counters/latency outside record(). Prefer defensive copies (or immutable views).

♻️ Proposed fix

 from collections import defaultdict
 from collections.abc import Mapping
+from types import MappingProxyType
@@
     `@property`
     def decisions_total(self) -> Mapping[tuple[str, str], int]:
-        return self._decisions
+        return MappingProxyType(dict(self._decisions))
@@
     `@property`
     def latency_samples(self) -> list[int]:
-        return self._latency
+        return list(self._latency)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@infrastructure/metrics/router_metrics.py` around lines 46 - 51, The public
accessors decisions_total and latency_samples expose internal mutable state
(_decisions and _latency); change them to return defensive copies (e.g., a
shallow copy of the mapping and a copy of the list or an immutable view) so
callers cannot mutate internal counters or latency samples; locate the methods
decisions_total and latency_samples and modify their return values to return
copies (or tuples/frozendict if available) instead of returning _decisions and
_latency directly, leaving record() and internal mutation logic unchanged.

tests/unit/infrastructure/routing/test_prompts_parser.py (1)

110-115: ⚡ Quick win

Add a regression test for reason values containing braces.

Please add a case like "reason": "token {date}" to ensure valid JSON with brace characters inside strings keeps parsing successfully.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/infrastructure/routing/test_prompts_parser.py` around lines 110 -
115, Add a regression test that ensures JSON-encoded reasons containing brace
characters parse correctly: create a new test (e.g.,
test_reason_with_braces_parses) or extend
test_long_reason_is_truncated_not_rejected to set raw =
f'{{"model":"haiku","confidence":0.9,"reason":"token {{date}}"}}' (or similar
with braces) and call parse_classification(raw, latency_ms=10, cost_usd=0.0),
then assert result.reason contains the expected string (and/or length
constraints). Reference parse_classification and the raw variable so the test
verifies parsing of braces inside the "reason" field.

benchmarks/planner_quality_ab.py (1)

438-441: ⚡ Quick win

Normalize classifier overhead precision before writing trial rows.

At Line 439, raw float carry-through creates artifacts like 0.0006969999999999999 in dumped benchmark records. Round once, then reuse the rounded value for classifier_cost_usd and cost_usd.

Proposed patch

-                    row.classifier_cost_usd = cls_cost
+                    row.classifier_cost_usd = round(cls_cost, 6)
                     # Roll the per-goal classifier overhead into the router cost.
-                    row.cost_usd = round(row.cost_usd + cls_cost, 6)
+                    row.cost_usd = round(row.cost_usd + row.classifier_cost_usd, 6)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmarks/planner_quality_ab.py` around lines 438 - 441, The code currently
assigns the raw float cls_cost to row.classifier_cost_usd and then adds the
unrounded cls_cost to row.cost_usd, causing floating-point artifacts; round
cls_cost once (e.g., rounded_cls_cost = round(cls_cost, 6)) and then set
row.classifier_cost_usd = rounded_cls_cost and row.cost_usd = round(row.cost_usd
+ rounded_cls_cost, 6) so both stored values use the same normalized precision;
update the lines handling cls_cost, row.classifier_cost_usd and row.cost_usd
accordingly.

tests/unit/domain/test_council_events_goal_classified.py (1)

38-50: ⚡ Quick win

Test does not actually prove discriminator immutability

The method name says “fixed”, but this only checks the default serialized value. Add a negative-path assertion that invalid kind is rejected at validation time.

Suggested test tightening

+import pytest
+from pydantic import ValidationError
@@
     def test_kind_discriminator_is_fixed(self) -> None:
@@
         dumped = ev.model_dump()
         assert dumped["kind"] == "goal_classified"
+
+        with pytest.raises(ValidationError):
+            GoalClassified.model_validate(
+                {
+                    "kind": "debate_started",
+                    "goal_hash": _hash16(),
+                    "chosen_model": PlannerModel.SONNET,
+                    "confidence": 0.5,
+                    "reason_category": "low_confidence",
+                    "classifier_latency_ms": 180,
+                    "classifier_cost_usd": 0.0002,
+                }
+            )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/domain/test_council_events_goal_classified.py` around lines 38 -
50, The test test_kind_discriminator_is_fixed currently only checks the
serialized default; extend it to assert that the discriminator is immutable by
attempting to create a GoalClassified with an invalid kind and expecting a
validation failure: call the GoalClassified constructor (or model_validate) with
kind set to an incorrect string and assert that pydantic.errors.ValidationError
(or pydantic.ValidationError) is raised; keep the existing positive assertion
(dumped["kind"] == "goal_classified") and add this negative-path assertion to
prove the Literal discriminator rejects invalid values.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@benchmarks/planner_quality_ab.py`:
- Around line 504-506: The JSON dump call using
Path(args.dump).write_text(json.dumps(dump_payload, indent=2,
ensure_ascii=False)) produces non-deterministic key ordering; update the call to
include sort_keys=True so json.dumps(dump_payload, indent=2, ensure_ascii=False,
sort_keys=True) to produce deterministic, stable output for dump_payload and
reduce diff noise.

In `@docs/CONTINUATION.md`:
- Around line 27-28: The doc entry for MORPHIC_PLANNER_ROUTER is incorrect:
update the description to use the config contract values `disabled|enabled`
(default `disabled`, opt-in `enabled`) instead of `remote`/`local`; ensure the
text for MORPHIC_PLANNER_ROUTER in CONTINUATION.md matches the spec and other
environment docs and adjust any nearby examples or usage guidance to reference
`enabled` rather than `remote`.

In `@docs/ENV_VARS.md`:
- Line 51: The doc line for MORPHIC_PLANNER_ROUTER incorrectly states adapter
selection depends only on ANTHROPIC_API_KEY; update the description to explain
that selection uses the MORPHIC_PLANNER_ROUTER setting plus the LOCAL_FIRST flag
and runtime budget signals: when LOCAL_FIRST is true the local adapter is
preferred, otherwise if ANTHROPIC_API_KEY is present the remote adapter is used
unless budget signals force a local fallback, and if no key is present the local
adapter is used; mention these three determinants (MORPHIC_PLANNER_ROUTER,
ANTHROPIC_API_KEY, LOCAL_FIRST) and include a brief note about budget-triggered
overrides.

In `@docs/TECH_DECISIONS.md`:
- Around line 8126-8127: Update the TD reference for the cache metric: locate
the text "cache_hit_rate (TD-189)" and replace TD-189 with the correct decision
ID that documents the cache_hit_rate work (e.g., "cache_hit_rate (TD-190)");
ensure the token "cache_hit_rate" and the old "(TD-189)" are the ones changed so
traceability points to the correct TD entry.

In `@domain/value_objects/council_events.py`:
- Line 74: The goal_hash Field currently only enforces length and can accept
non-hex values; tighten validation on the goal_hash attribute in the
CouncilEvent value object by requiring a 16-character lowercase hex string
(e.g., add a regex constraint or a Pydantic validator for goal_hash such as
validating against ^[0-9a-f]{16}$) so it matches the sha256(goal)[:16] contract;
update the Field/validator on the goal_hash symbol in
domain/value_objects/council_events.py accordingly and ensure error messages
reflect invalid hex input.
- Around line 66-80: GoalClassified is currently a mutable Pydantic model
(inherits BaseModel) but must be immutable; update the class definition for
GoalClassified to make it a frozen/immutable Pydantic v2 model by adding the
Pydantic config (e.g. model_config = {"frozen": True}) inside the class so
instances cannot be mutated after creation; keep all existing fields (goal_hash,
chosen_model, confidence, reason_category, classifier_latency_ms,
classifier_cost_usd, classified_at) and their validators intact while ensuring
immutability for the domain value object.

In `@infrastructure/routing/__init__.py`:
- Around line 1-6: This module's first line is the module docstring but is
missing the required future import; add the line "from __future__ import
annotations" as the very first statement in infrastructure.routing.__init__.py
(before the module docstring) so the file complies with the repo-wide rule; no
other code changes needed—the rest of the module (LLMGoalClassifier,
LocalGoalClassifier, and the GoalClassifierPort-compatible definitions) can
remain unchanged.

In `@infrastructure/routing/_prompts.py`:
- Around line 64-83: The current _extract_json_blob uses the brittle
_FIRST_OBJECT_RE = re.compile(r"\{.*?\}", re.DOTALL) which can stop at a brace
inside a quoted string and cause false ClassificationParseError; replace the
regex-based extraction in _extract_json_blob with a JSON-aware scan: find the
first '{' in cleaned (after applying _THINK_BLOCK_RE and _JSON_FENCE_RE
handling), then iterate char-by-char tracking nesting depth, an in_string flag
and escape characters to find the matching closing brace for the top-level
object, and return that substring; keep raising ClassificationParseError when no
opening brace is found or the scan ends without closing the object, and preserve
existing references to _THINK_BLOCK_RE, _JSON_FENCE_RE and
ClassificationParseError.

In `@shared/config.py`:
- Around line 167-174: The two config fields
planner_router_haiku_confidence_threshold and
planner_router_classifier_timeout_ms accept invalid values; update their Field
definitions to enforce bounds (e.g., set
planner_router_haiku_confidence_threshold to accept only 0.0–1.0 using le=1.0
and ge=0.0, and set planner_router_classifier_timeout_ms to require a positive
integer using gt=0 or ge=1) while preserving their validation_alias names so
invalid runtime values raise validation errors instead of silently
misconfiguring routing.

In `@specs/goal-classifier-router/plan.md`:
- Line 185: The markdown contains a fenced code block that opens with triple
backticks (```) but omits a language hint; update that opening fence to include
a language identifier (for example change ``` to ```text or ```markdown) so the
block is lint-compliant and satisfies markdownlint rules for fenced code block
language specification.

In `@specs/goal-classifier-router/spec.md`:
- Around line 130-133: The markdown table row containing the cell
"sqlalchemy|fastapi|litellm|..." is being parsed as extra columns; update that
cell to escape the pipe characters or wrap the entire regex in backticks and
escape the pipes (e.g., use `sqlalchemy\|fastapi\|litellm\|...`) so the table
renders as a single cell and MD056 is avoided; locate the table row with the
"Framework imports in new domain files" metric and apply this change.

In `@specs/goal-classifier-router/tasks.md`:
- Line 43: The markdown line using sha256(goal)[:16] is being parsed as
reversed-link syntax; update the text so the literal is wrapped in inline code
backticks (e.g., `sha256(goal)[:16]`) where `goal_hash` is described to prevent
MD011 markdownlint errors and ensure the raw goal isn't exposed.
- Line 129: The fenced code block opener currently uses plain backticks (```);
add a language tag (for example ```text) to the code fence so the markdown
linter (MD040) passes and formatting remains stable—update the triple-backtick
opener for the block starting at the current fence to include a language
identifier such as text or bash.

In `@tests/integration/test_goal_classifier_local_live.py`:
- Around line 122-123: The assertion serializes the event with
event.model_dump_json() without deterministic ordering; change the serialization
to use a deterministic JSON dump (e.g., call
event.model_dump_json(sort_keys=True) or otherwise ensure JSON is produced with
sort_keys=True) so the payload variable is stable for the privacy assertion;
update the same pattern in test_goal_classifier_remote_live.py as well.

In `@tests/integration/test_goal_classifier_remote_live.py`:
- Around line 113-114: The assertion uses non-deterministic model_dump_json();
change it to produce deterministic JSON by importing json and serializing the
model dump with json.dumps(..., sort_keys=True) (e.g., replace
event.model_dump_json() with json.dumps(event.model_dump(), sort_keys=True)) so
the payload string is stable for the privacy check that compares goal against
the GoalClassified payload.

In `@tests/unit/domain/test_council_events_goal_classified.py`:
- Line 60: The test uses non-deterministic JSON serialization via raw =
original.model_dump_json(); change these calls to pass deterministic sorting
(e.g., original.model_dump_json(sort_keys=True)) so keys are ordered
consistently; update the three occurrences referenced (the call that assigns raw
at the shown location and the other two occurrences at the same file) to include
sort_keys=True on model_dump_json or, if that method is unavailable in your
model class, replace with json.dumps(original.model_dump(), sort_keys=True).

In `@tests/unit/infrastructure/observability/test_router_observer.py`:
- Around line 9-10: Update the module docstring in
tests/unit/infrastructure/observability/test_router_observer.py to remove the
incorrect claim that inner-bus errors are "swallowed"; instead state that events
are forwarded to the inner bus and that errors from the inner bus are propagated
(this aligns with the test asserting propagation). Reference the module-level
docstring near the RouterObserver/router_observer tests and change the phrase
"Forwards the event to the inner bus (best-effort, errors swallowed)." to
language indicating that inner-bus errors are propagated.

In `@tests/unit/infrastructure/test_llm_planner_router_integration.py`:
- Around line 52-63: The JSON payload helper currently returns json.dumps([...])
without deterministic ordering; modify the call to json.dumps in the helper that
returns the test payload so it passes sort_keys=True (e.g., change json.dumps([
... ]) to json.dumps([ ... ], sort_keys=True)) to ensure deterministic
serialization per repo policy.

---

Nitpick comments:
In `@benchmarks/planner_quality_ab.py`:
- Around line 438-441: The code currently assigns the raw float cls_cost to
row.classifier_cost_usd and then adds the unrounded cls_cost to row.cost_usd,
causing floating-point artifacts; round cls_cost once (e.g., rounded_cls_cost =
round(cls_cost, 6)) and then set row.classifier_cost_usd = rounded_cls_cost and
row.cost_usd = round(row.cost_usd + rounded_cls_cost, 6) so both stored values
use the same normalized precision; update the lines handling cls_cost,
row.classifier_cost_usd and row.cost_usd accordingly.

In `@infrastructure/metrics/router_metrics.py`:
- Around line 46-51: The public accessors decisions_total and latency_samples
expose internal mutable state (_decisions and _latency); change them to return
defensive copies (e.g., a shallow copy of the mapping and a copy of the list or
an immutable view) so callers cannot mutate internal counters or latency
samples; locate the methods decisions_total and latency_samples and modify their
return values to return copies (or tuples/frozendict if available) instead of
returning _decisions and _latency directly, leaving record() and internal
mutation logic unchanged.

In `@tests/unit/domain/test_council_events_goal_classified.py`:
- Around line 38-50: The test test_kind_discriminator_is_fixed currently only
checks the serialized default; extend it to assert that the discriminator is
immutable by attempting to create a GoalClassified with an invalid kind and
expecting a validation failure: call the GoalClassified constructor (or
model_validate) with kind set to an incorrect string and assert that
pydantic.errors.ValidationError (or pydantic.ValidationError) is raised; keep
the existing positive assertion (dumped["kind"] == "goal_classified") and add
this negative-path assertion to prove the Literal discriminator rejects invalid
values.

In `@tests/unit/infrastructure/observability/test_router_observer.py`:
- Around line 26-29: The test helper currently types reason_category as a plain
str with a type: ignore; instead import and use the production enum/type
ReasonCategory and remove the ignore to keep fixtures aligned with runtime
types. Update the imports in test_router_observer.py to include ReasonCategory
(alongside DebateStarted and GoalClassified) and change the helper
variable/parameter named reason_category to be typed as ReasonCategory (remove
the # type: ignore). Ensure any test fixtures or factory calls that construct
reason_category use a valid ReasonCategory member.

In `@tests/unit/infrastructure/routing/test_prompts_parser.py`:
- Around line 110-115: Add a regression test that ensures JSON-encoded reasons
containing brace characters parse correctly: create a new test (e.g.,
test_reason_with_braces_parses) or extend
test_long_reason_is_truncated_not_rejected to set raw =
f'{{"model":"haiku","confidence":0.9,"reason":"token {{date}}"}}' (or similar
with braces) and call parse_classification(raw, latency_ms=10, cost_usd=0.0),
then assert result.reason contains the expected string (and/or length
constraints). Reference parse_classification and the raw variable so the test
verifies parsing of braces inside the "reason" field.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d41bf214-f0bd-41f9-a503-a665a7b3e483

📥 Commits

Reviewing files that changed from the base of the PR and between 706c89d and ba605c1.

📒 Files selected for processing (47)

benchmarks/planner_quality_ab.py
docs/CHANGELOG.md
docs/CONTINUATION.md
docs/ENV_VARS.md
docs/TECH_DECISIONS.md
docs/benchmarks/planner_ab_router_2026_05_20.json
domain/ports/goal_classifier.py
domain/services/planner_model_router.py
domain/value_objects/council_events.py
domain/value_objects/goal_classification.py
domain/value_objects/planner_model.py
infrastructure/fractal/llm_planner.py
infrastructure/metrics/__init__.py
infrastructure/metrics/router_metrics.py
infrastructure/observability/__init__.py
infrastructure/observability/router_observer.py
infrastructure/routing/__init__.py
infrastructure/routing/_prompts.py
infrastructure/routing/llm_goal_classifier.py
infrastructure/routing/local_goal_classifier.py
interface/api/container.py
shared/config.py
specs/goal-classifier-router/plan.md
specs/goal-classifier-router/spec.md
specs/goal-classifier-router/tasks.md
tests/integration/test_goal_classifier_local_live.py
tests/integration/test_goal_classifier_remote_live.py
tests/unit/application/_fakes/in_memory_goal_classifier.py
tests/unit/application/_fakes/test_in_memory_goal_classifier.py
tests/unit/domain/test_council_events_goal_classified.py
tests/unit/domain/test_goal_classification.py
tests/unit/domain/test_goal_classifier_port.py
tests/unit/domain/test_planner_model.py
tests/unit/domain/test_planner_model_router.py
tests/unit/infrastructure/metrics/__init__.py
tests/unit/infrastructure/metrics/test_router_metrics.py
tests/unit/infrastructure/observability/__init__.py
tests/unit/infrastructure/observability/test_router_observer.py
tests/unit/infrastructure/routing/__init__.py
tests/unit/infrastructure/routing/test_llm_goal_classifier.py
tests/unit/infrastructure/routing/test_local_goal_classifier.py
tests/unit/infrastructure/routing/test_prompts_parser.py
tests/unit/infrastructure/test_llm_planner_router_integration.py
tests/unit/interface/api/__init__.py
tests/unit/interface/api/test_container_router_wiring.py
tests/unit/interface/test_fractal_container_wiring.py
tests/unit/shared/test_config_router.py

coderabbitai · 2026-05-20T01:54:57Z

        Path(args.dump).write_text(
-            json.dumps(
-                {
-                    "judge": JUDGE,
-                    "trials": args.trials,
-                    "rows": [r.__dict__ for r in rows],
-                    "summary": {
-                        "sonnet": sonnet_sum.__dict__,
-                        "haiku": haiku_sum.__dict__,
-                    },
-                    "total_cost_usd": round(total_cost, 6),
-                },
-                indent=2,
-                ensure_ascii=False,
-            )
+            json.dumps(dump_payload, indent=2, ensure_ascii=False)
        )


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add deterministic key ordering to benchmark JSON dumps.

At Lines 504-506, serialization omits sort_keys=True, which makes dump ordering unstable across runs and creates avoidable diff noise.

Proposed patch

Path(args.dump).write_text( - json.dumps(dump_payload, indent=2, ensure_ascii=False) + json.dumps(dump_payload, indent=2, ensure_ascii=False, sort_keys=True) )

As per coding guidelines, JSON/YAML serialization must use sort_keys=True for deterministic output.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@benchmarks/planner_quality_ab.py` around lines 504 - 506, The JSON dump call using Path(args.dump).write_text(json.dumps(dump_payload, indent=2, ensure_ascii=False)) produces non-deterministic key ordering; update the call to include sort_keys=True so json.dumps(dump_payload, indent=2, ensure_ascii=False, sort_keys=True) to produce deterministic, stable output for dump_payload and reduce diff noise.

coderabbitai · 2026-05-20T01:54:58Z

+existing observability dashboard alongside `cache_hit_rate` (TD-189) so
+that classifier latency, decisions_total, and fallback rate are


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the TD reference for cache_hit_rate follow-up.

Line 8126 references cache_hit_rate (TD-189), but this doc does not include TD-189; cache-hit-rate work appears under TD-188/TD-190/TD-193. Please correct the TD number to avoid traceability confusion.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/TECH_DECISIONS.md` around lines 8126 - 8127, Update the TD reference for the cache metric: locate the text "cache_hit_rate (TD-189)" and replace TD-189 with the correct decision ID that documents the cache_hit_rate work (e.g., "cache_hit_rate (TD-190)"); ensure the token "cache_hit_rate" and the old "(TD-189)" are the ones changed so traceability points to the correct TD entry.

coderabbitai · 2026-05-20T01:54:58Z

+    payload = event.model_dump_json()
+    assert goal not in payload, "Raw goal leaked into GoalClassified payload"


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

# First, find the file and check if it exists find . -name "test_goal_classifier_local_live.py" -type f

Repository: engkimo/open-morphic

Length of output: 119

🏁 Script executed:

# Check the specific lines mentioned in the review git ls-files | grep -i "test_goal_classifier_local_live"

Repository: engkimo/open-morphic

Length of output: 117

🏁 Script executed:

# Get file size to understand if we can read it fully wc -l tests/integration/test_goal_classifier_local_live.py 2>/dev/null || echo "File not found"

Repository: engkimo/open-morphic

Length of output: 121

🏁 Script executed:

cat -n tests/integration/test_goal_classifier_local_live.py

Repository: engkimo/open-morphic

Length of output: 5352

🏁 Script executed:

# Check if there's a remote live test file mentioned in review find . -name "*goal_classifier*remote*" -o -name "*goal_classifier*live*" | head -20

Repository: engkimo/open-morphic

Length of output: 175

🏁 Script executed:

# Search for other uses of model_dump_json() in test files to see patterns rg "model_dump_json" tests/ -A 2 -B 2 --max-count=10

Repository: engkimo/open-morphic

Length of output: 4076

🏁 Script executed:

# Check if json module is imported elsewhere in test files for serialization rg "json\.dumps.*sort_keys" . --type py | head -20

Repository: engkimo/open-morphic

Length of output: 638

🏁 Script executed:

# Verify Pydantic v2 model_dump_json() signature to confirm it doesn't support sort_keys rg "model_dump_json" . --type py -B 2 -A 2 | grep -E "(sort_keys|indent)" | head -10

Repository: engkimo/open-morphic

Length of output: 46

🏁 Script executed:

# Check test_council_events_goal_classified.py to see how it handles similar privacy checks sed -n '50,80p' tests/unit/domain/test_council_events_goal_classified.py

Repository: engkimo/open-morphic

Length of output: 1366

🏁 Script executed:

# Check the remote live test file for the same pattern sed -n '120,125p' tests/integration/test_goal_classifier_remote_live.py

Repository: engkimo/open-morphic

Length of output: 46

🏁 Script executed:

# Check if there are imports of json module in these test files head -50 tests/integration/test_goal_classifier_local_live.py | grep -E "^import|^from"

Repository: engkimo/open-morphic

Length of output: 909

Use deterministic JSON serialization in the privacy assertion.

The payload serialization should explicitly use sort_keys=True to comply with the coding guideline requiring deterministic JSON output.

Proposed fix

+import json ... - payload = event.model_dump_json() + payload = json.dumps( + event.model_dump(mode="json"), + sort_keys=True, + ensure_ascii=False, + )

Per the coding guidelines, all JSON serialization across **/*.py (including tests) must use sort_keys=True for deterministic output. The same issue exists in test_goal_classifier_remote_live.py and should be fixed identically.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

payload = event.model_dump_json()

assert goal not in payload, "Raw goal leaked into GoalClassified payload"

import json

...

payload = json.dumps(

event.model_dump(mode="json"),

sort_keys=True,

ensure_ascii=False,

)

assert goal not in payload, "Raw goal leaked into GoalClassified payload"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/test_goal_classifier_local_live.py` around lines 122 - 123, The assertion serializes the event with event.model_dump_json() without deterministic ordering; change the serialization to use a deterministic JSON dump (e.g., call event.model_dump_json(sort_keys=True) or otherwise ensure JSON is produced with sort_keys=True) so the payload variable is stable for the privacy assertion; update the same pattern in test_goal_classifier_remote_live.py as well.

coderabbitai · 2026-05-20T01:54:58Z

+    payload = event.model_dump_json()
+    assert goal not in payload, "Raw goal leaked into GoalClassified payload"


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

# First, locate the file and verify it exists find . -name "test_goal_classifier_remote_live.py" -type f

Repository: engkimo/open-morphic

Length of output: 120

🏁 Script executed:

# Read the file to check the exact contents at lines 113-114 and surrounding context cat -n ./tests/integration/test_goal_classifier_remote_live.py | head -120 | tail -20

Repository: engkimo/open-morphic

Length of output: 972

🏁 Script executed:

# Check the imports and the beginning of the file, including line 20 cat -n ./tests/integration/test_goal_classifier_remote_live.py | head -30

Repository: engkimo/open-morphic

Length of output: 1209

🏁 Script executed:

# Check for all JSON serialization patterns in the file cat -n ./tests/integration/test_goal_classifier_remote_live.py | grep -E "model_dump|json|JSON"

Repository: engkimo/open-morphic

Length of output: 109

🏁 Script executed:

# Let's check the broader context of the test and also verify if there are other similar patterns in related test files # First check if there's a similar test file (local version) find . -name "*goal_classifier*" -type f | grep -E "\.py$"

Repository: engkimo/open-morphic

Length of output: 613

🏁 Script executed:

# Check the local test file for comparison - look for similar JSON serialization patterns grep -n "model_dump_json\|json.dumps" ./tests/integration/test_goal_classifier_local_live.py | head -20

Repository: engkimo/open-morphic

Length of output: 106

🏁 Script executed:

# Check if there are examples in the codebase where sort_keys=True is used rg "sort_keys\s*=\s*True" --type py -A 2 -B 2

Repository: engkimo/open-morphic

Length of output: 2857

🏁 Script executed:

# Check the Pydantic version and model_dump usage in the codebase rg "model_dump" --type py -B 1 -A 1 | head -40

Repository: engkimo/open-morphic

Length of output: 3044

🏁 Script executed:

# Check if json is already imported in the file grep -n "^import json" ./tests/integration/test_goal_classifier_remote_live.py

Repository: engkimo/open-morphic

Length of output: 46

🏁 Script executed:

# Check the full import section of the file cat -n ./tests/integration/test_goal_classifier_remote_live.py | head -50

Repository: engkimo/open-morphic

Length of output: 2039

🏁 Script executed:

# Verify the exact test function and full context around the issue cat -n ./tests/integration/test_goal_classifier_remote_live.py | sed -n '85,120p'

Repository: engkimo/open-morphic

Length of output: 1538

Add deterministic JSON serialization to the privacy invariant assertion.

The test payload serialization at line 113 must use sort_keys=True per the coding guideline requiring deterministic JSON output. Import json and replace model_dump_json() with an explicit sort.

Proposed fix

+import json import os import pytest

# Privacy invariant: raw goal MUST NOT appear in the event payload. - payload = event.model_dump_json() + payload = json.dumps( + event.model_dump(mode="json"), + sort_keys=True, + ensure_ascii=False, + )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/test_goal_classifier_remote_live.py` around lines 113 - 114, The assertion uses non-deterministic model_dump_json(); change it to produce deterministic JSON by importing json and serializing the model dump with json.dumps(..., sort_keys=True) (e.g., replace event.model_dump_json() with json.dumps(event.model_dump(), sort_keys=True)) so the payload string is stable for the privacy check that compares goal against the GoalClassified payload.

coderabbitai · 2026-05-20T01:54:58Z

+            classifier_latency_ms=1500,
+            classifier_cost_usd=0.0,
+        )
+        raw = original.model_dump_json()


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use deterministic JSON serialization in tests

These serializations should explicitly enforce sorted keys to meet deterministic output requirements.

Deterministic serialization patch

- raw = original.model_dump_json() + raw = json.dumps(original.model_dump(mode="json"), sort_keys=True) @@ - raw = abandoned.model_dump_json() + raw = json.dumps(abandoned.model_dump(mode="json"), sort_keys=True) @@ - payload = json.loads(ev.model_dump_json()) + serialized = json.dumps(ev.model_dump(mode="json"), sort_keys=True) + payload = json.loads(serialized)

As per coding guidelines, "JSON/YAML serialization must use sort_keys=True for deterministic output".

Also applies to: 72-72, 86-86

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/domain/test_council_events_goal_classified.py` at line 60, The test uses non-deterministic JSON serialization via raw = original.model_dump_json(); change these calls to pass deterministic sorting (e.g., original.model_dump_json(sort_keys=True)) so keys are ordered consistently; update the three occurrences referenced (the call that assigns raw at the shown location and the other two occurrences at the same file) to include sort_keys=True on model_dump_json or, if that method is unavailable in your model class, replace with json.dumps(original.model_dump(), sort_keys=True).

coderabbitai · 2026-05-20T01:54:58Z

+3. Forwards the event to the inner bus (best-effort, errors swallowed).
+


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the error-handling claim in the module docstring.

The docstring says inner-bus errors are swallowed, but this suite validates propagation. Align wording to avoid a misleading contract.

📝 Proposed fix

-3. Forwards the event to the inner bus (best-effort, errors swallowed). +3. Forwards the event to the inner bus (errors propagate).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

3. Forwards the event to the inner bus (best-effort, errors swallowed).

3. Forwards the event to the inner bus (errors propagate).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/infrastructure/observability/test_router_observer.py` around lines 9 - 10, Update the module docstring in tests/unit/infrastructure/observability/test_router_observer.py to remove the incorrect claim that inner-bus errors are "swallowed"; instead state that events are forwarded to the inner bus and that errors from the inner bus are propagated (this aligns with the test asserting propagation). Reference the module-level docstring near the RouterObserver/router_observer tests and change the phrase "Forwards the event to the inner bus (best-effort, errors swallowed)." to language indicating that inner-bus errors are propagated.

coderabbitai · 2026-05-20T01:54:58Z

+    return json.dumps(
+        [
+            {
+                "description": "Step 1: do something",
+                "is_terminal": True,
+                "score": 0.9,
+                "condition": None,
+                "input_artifacts": {},
+                "output_artifacts": {},
+            }
+        ]
+    )


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use deterministic JSON serialization in test payload helper.

Add sort_keys=True to keep serialization deterministic per repo policy.

🔧 Proposed fix

def _sample_payload() -> str: return json.dumps( [ { "description": "Step 1: do something", "is_terminal": True, "score": 0.9, "condition": None, "input_artifacts": {}, "output_artifacts": {}, } - ] + ], + sort_keys=True, )

As per coding guidelines, "JSON/YAML serialization must use sort_keys=True for deterministic output".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

return json.dumps(

[

{

"description": "Step 1: do something",

"is_terminal": True,

"score": 0.9,

"condition": None,

"input_artifacts": {},

"output_artifacts": {},

}

]

)

return json.dumps(

[

{

"description": "Step 1: do something",

"is_terminal": True,

"score": 0.9,

"condition": None,

"input_artifacts": {},

"output_artifacts": {},

}

],

sort_keys=True,

)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/infrastructure/test_llm_planner_router_integration.py` around lines 52 - 63, The JSON payload helper currently returns json.dumps([...]) without deterministic ordering; modify the call to json.dumps in the helper that returns the test payload so it passes sort_keys=True (e.g., change json.dumps([ ... ]) to json.dumps([ ... ], sort_keys=True)) to ensure deterministic serialization per repo policy.

- domain/value_objects/council_events: freeze GoalClassified VO + enforce 16-hex pattern on goal_hash (immutability + privacy contract). - infrastructure/routing/_prompts: replace non-greedy regex with balanced brace scanner so JSON strings containing '}' parse correctly. - shared/config: bound planner_router_haiku_confidence_threshold to [0.0, 1.0] and planner_router_classifier_timeout_ms to > 0; reject silent misconfig. - docs/CONTINUATION + docs/ENV_VARS: correct opt-in value to `enabled` (was stale `remote`/`local`) and document adapter-selection priority (LOCAL_FIRST → ANTHROPIC_API_KEY+budget → off). 3,360 unit tests pass; ruff clean.

- specs/goal-classifier-router/tasks.md: mark all 43 boxes complete (T001-T154 shipped via PR #40). - docs/CHANGELOG: enrich Unreleased entry with PR #40, merge date, A/B headline numbers, and pointer to the authoritative memo.

- Promote Unreleased CHANGELOG entry to v0.6.3 (2026-05-22), rolling up 7 PRs since v0.6.2: TD-195 router (#40), plan-evaluator fix (#39), Haiku A/B (#38), Haiku pricing fix (#37), cost-sim harness (#36), TD-189 step 5 CLI (#35). - Bump pyproject + main.py + version consistency test + uv.lock. - Refresh CLAUDE.md version banner to 0.6.3 / 2026-05-22.

Cosmetic-only cleanup of 9 residual Minor items from PR #40 review: - benchmarks/planner_quality_ab.py: add sort_keys=True to JSON dump - docs/TECH_DECISIONS.md: fix cache_hit_rate TD ref (TD-189 → TD-193) - infrastructure/routing/__init__.py: add `from __future__ import annotations` - specs/goal-classifier-router/plan.md: tag fenced block with `text` - specs/goal-classifier-router/spec.md: escape pipes in table cell (MD056) - specs/goal-classifier-router/tasks.md: backtick sha256(...) literal (MD011) + language tag on fenced block (MD040) - tests/unit/domain/test_council_events_goal_classified.py: switch to json.dumps(model_dump(mode="json"), sort_keys=True) in 3 spots - tests/unit/infrastructure/observability/test_router_observer.py: align docstring with actual contract (errors propagate, not swallowed) - tests/unit/infrastructure/test_llm_planner_router_integration.py: add sort_keys=True to _sample_payload helper No behavioral changes. 22 touched unit tests pass; ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

engkimo and others added 18 commits May 19, 2026 23:00

feat(domain): T020-T021 GoalClassifierPort ABC

9e30401

Async port: `classify(goal: str) -> GoalClassification`. Empty or whitespace goal raises ValueError. Pure abstract; impls go in infrastructure/routing/ (LLM + Ollama). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

test(_fakes): T030 InMemoryGoalClassifier port-compliant fake

679c292

refactor(domain): align ReasonCategory Literal with plan.md AD-3 taxo…

5ff44e9

…nomy

feat(domain): T040-T041 PlannerModelRouter with AD-2 gating + AD-3 no…

4a2be8d

…rmalization

feat(infrastructure): T050-T051 shared classifier prompt + parser

ac496a4

feat(config): T090 add planner router settings (MORPHIC_PLANNER_ROUTER)

840b3d3

coderabbitai Bot reviewed May 20, 2026

View reviewed changes

engkimo merged commit 1bcbbe3 into main May 21, 2026
5 of 6 checks passed

engkimo deleted the feature/goal-classifier-router branch May 21, 2026 06:01

		existing observability dashboard alongside `cache_hit_rate` (TD-189) so
		that classifier latency, decisions_total, and fallback rate are

		payload = event.model_dump_json()
		assert goal not in payload, "Raw goal leaked into GoalClassified payload"

		3. Forwards the event to the inner bus (best-effort, errors swallowed).

Conversation

engkimo commented May 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Spec, plan, memo

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

engkimo commented May 20, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 20, 2026 •

edited

Loading