fix(plan-evaluator): unwrap arbitrarily-nested JSON list payloads#39
Conversation
Live A/B run (haiku_planner_ab_2026_05_19) observed the judge model
returning `[[{...}]]` 3 times. The single-level guard only popped one
layer, then `data.get` blew up on the inner list and the call fell
back to (0.5, 0.5, 1.0). Replace `if` with `while` so any depth is
unwrapped, and raise on non-dict so the existing exception handler
takes the fallback path explicitly.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthrough
ChangesJSON array unwrapping and validation
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
- Promote Unreleased CHANGELOG entry to v0.6.3 (2026-05-22), rolling up 7 PRs since v0.6.2: TD-195 router (#40), plan-evaluator fix (#39), Haiku A/B (#38), Haiku pricing fix (#37), cost-sim harness (#36), TD-189 step 5 CLI (#35). - Bump pyproject + main.py + version consistency test + uv.lock. - Refresh CLAUDE.md version banner to 0.6.3 / 2026-05-22.
Summary
ifguard withwhileloop to unwrap any depth of nested-list JSON returned by the judge modelValueErroron non-dict so the existing exception handler routes cleanly into the fallback pathtest_nested_list_payload_unwrappedcovering the[[{...}]]shape observed liveContext
During the Sonnet-vs-Haiku planner A/B benchmark (see
memory/haiku_planner_ab_2026_05_19.md), the Sonnet judge returned[[{...}]]on 3/60 calls. The existingif isinstance(data, list)only popped one layer, thendata.getraisedAttributeErrorand the call fell back to (0.5, 0.5, 1.0) — marginally hurting the Haiku score in the A/B.Test plan
pytest tests/unit/infrastructure/test_llm_plan_evaluator.py -v— 17/17 pass (including new regression)ruff check infrastructure/fractal/llm_plan_evaluator.py tests/unit/infrastructure/test_llm_plan_evaluator.py— clean🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
Bug Fixes
Tests