fix(plan-evaluator): unwrap arbitrarily-nested JSON list payloads by engkimo · Pull Request #39 · engkimo/open-morphic

engkimo · 2026-05-19T03:19:32Z

Summary

Replace single-level if guard with while loop to unwrap any depth of nested-list JSON returned by the judge model
Raise ValueError on non-dict so the existing exception handler routes cleanly into the fallback path
Add regression test test_nested_list_payload_unwrapped covering the [[{...}]] shape observed live

Context

During the Sonnet-vs-Haiku planner A/B benchmark (see memory/haiku_planner_ab_2026_05_19.md), the Sonnet judge returned [[{...}]] on 3/60 calls. The existing if isinstance(data, list) only popped one layer, then data.get raised AttributeError and the call fell back to (0.5, 0.5, 1.0) — marginally hurting the Haiku score in the A/B.

Test plan

pytest tests/unit/infrastructure/test_llm_plan_evaluator.py -v — 17/17 pass (including new regression)
ruff check infrastructure/fractal/llm_plan_evaluator.py tests/unit/infrastructure/test_llm_plan_evaluator.py — clean
CI green

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

Bug Fixes
- Enhanced plan evaluation JSON parsing to properly unwrap nested array structures and enforce stricter validation of decoded JSON objects, improving reliability when handling complex LLM responses.
Tests
- Added regression test to verify correct extraction and processing of plan evaluation scores from nested JSON array payloads.

Live A/B run (haiku_planner_ab_2026_05_19) observed the judge model returning `[[{...}]]` 3 times. The single-level guard only popped one layer, then `data.get` blew up on the inner list and the call fell back to (0.5, 0.5, 1.0). Replace `if` with `while` so any depth is unwrapped, and raise on non-dict so the existing exception handler takes the fallback path explicitly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

coderabbitai · 2026-05-19T03:19:43Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f76ba764-eaa9-46e8-b480-a00af65769e5

📥 Commits

Reviewing files that changed from the base of the PR and between 989eb6c and 23a9295.

📒 Files selected for processing (2)

infrastructure/fractal/llm_plan_evaluator.py
tests/unit/infrastructure/test_llm_plan_evaluator.py

📝 Walkthrough

Walkthrough

LLMPlanEvaluator._parse_evaluation now loops through and unwraps nested JSON arrays to extract the first element, then validates the result is a dictionary, raising ValueError if not. A regression test confirms nested-list JSON payloads (e.g., [[{...}]]) are properly unwrapped and evaluated without fallback behavior.

Changes

JSON array unwrapping and validation

Layer / File(s)	Summary
JSON array unwrapping logic and validation test `infrastructure/fractal/llm_plan_evaluator.py`, `tests/unit/infrastructure/test_llm_plan_evaluator.py`	`_parse_evaluation` adds loop-based unwrapping of nested JSON arrays and enforces dict validation with `ValueError` on mismatch. Regression test `test_nested_list_payload_unwrapped` verifies nested-list JSON (`[[{...}]]`) is unwrapped and scores extracted without fallback triggering.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

A nested array wrapped tight,
Now unwrapped to see the light,
Validation guards the parsed way,
No more fallback on malformed spray! 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title accurately and concisely describes the main change: fixing the plan evaluator to handle arbitrarily-nested JSON list payloads, which aligns directly with the core logic changes (replacing single-level if guard with while loop for unwrapping nested lists).
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/plan-evaluator-nested-list

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

- Promote Unreleased CHANGELOG entry to v0.6.3 (2026-05-22), rolling up 7 PRs since v0.6.2: TD-195 router (#40), plan-evaluator fix (#39), Haiku A/B (#38), Haiku pricing fix (#37), cost-sim harness (#36), TD-189 step 5 CLI (#35). - Bump pyproject + main.py + version consistency test + uv.lock. - Refresh CLAUDE.md version banner to 0.6.3 / 2026-05-22.

engkimo merged commit 706c89d into main May 19, 2026
6 checks passed

engkimo deleted the fix/plan-evaluator-nested-list branch May 19, 2026 03:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(plan-evaluator): unwrap arbitrarily-nested JSON list payloads#39

fix(plan-evaluator): unwrap arbitrarily-nested JSON list payloads#39
engkimo merged 1 commit into
mainfrom
fix/plan-evaluator-nested-list

engkimo commented May 19, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 19, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

engkimo commented May 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Context

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

engkimo commented May 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 19, 2026 •

edited

Loading