Skip to content

fix(plan-evaluator): unwrap arbitrarily-nested JSON list payloads#39

Merged
engkimo merged 1 commit into
mainfrom
fix/plan-evaluator-nested-list
May 19, 2026
Merged

fix(plan-evaluator): unwrap arbitrarily-nested JSON list payloads#39
engkimo merged 1 commit into
mainfrom
fix/plan-evaluator-nested-list

Conversation

@engkimo

@engkimo engkimo commented May 19, 2026

Copy link
Copy Markdown
Owner

Summary

  • Replace single-level if guard with while loop to unwrap any depth of nested-list JSON returned by the judge model
  • Raise ValueError on non-dict so the existing exception handler routes cleanly into the fallback path
  • Add regression test test_nested_list_payload_unwrapped covering the [[{...}]] shape observed live

Context

During the Sonnet-vs-Haiku planner A/B benchmark (see memory/haiku_planner_ab_2026_05_19.md), the Sonnet judge returned [[{...}]] on 3/60 calls. The existing if isinstance(data, list) only popped one layer, then data.get raised AttributeError and the call fell back to (0.5, 0.5, 1.0) — marginally hurting the Haiku score in the A/B.

Test plan

  • pytest tests/unit/infrastructure/test_llm_plan_evaluator.py -v — 17/17 pass (including new regression)
  • ruff check infrastructure/fractal/llm_plan_evaluator.py tests/unit/infrastructure/test_llm_plan_evaluator.py — clean
  • CI green

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Enhanced plan evaluation JSON parsing to properly unwrap nested array structures and enforce stricter validation of decoded JSON objects, improving reliability when handling complex LLM responses.
  • Tests

    • Added regression test to verify correct extraction and processing of plan evaluation scores from nested JSON array payloads.

Review Change Stack

Live A/B run (haiku_planner_ab_2026_05_19) observed the judge model
returning `[[{...}]]` 3 times. The single-level guard only popped one
layer, then `data.get` blew up on the inner list and the call fell
back to (0.5, 0.5, 1.0). Replace `if` with `while` so any depth is
unwrapped, and raise on non-dict so the existing exception handler
takes the fallback path explicitly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 19, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f76ba764-eaa9-46e8-b480-a00af65769e5

📥 Commits

Reviewing files that changed from the base of the PR and between 989eb6c and 23a9295.

📒 Files selected for processing (2)
  • infrastructure/fractal/llm_plan_evaluator.py
  • tests/unit/infrastructure/test_llm_plan_evaluator.py

📝 Walkthrough

Walkthrough

LLMPlanEvaluator._parse_evaluation now loops through and unwraps nested JSON arrays to extract the first element, then validates the result is a dictionary, raising ValueError if not. A regression test confirms nested-list JSON payloads (e.g., [[{...}]]) are properly unwrapped and evaluated without fallback behavior.

Changes

JSON array unwrapping and validation

Layer / File(s) Summary
JSON array unwrapping logic and validation test
infrastructure/fractal/llm_plan_evaluator.py, tests/unit/infrastructure/test_llm_plan_evaluator.py
_parse_evaluation adds loop-based unwrapping of nested JSON arrays and enforces dict validation with ValueError on mismatch. Regression test test_nested_list_payload_unwrapped verifies nested-list JSON ([[{...}]]) is unwrapped and scores extracted without fallback triggering.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

A nested array wrapped tight,
Now unwrapped to see the light,
Validation guards the parsed way,
No more fallback on malformed spray! 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately and concisely describes the main change: fixing the plan evaluator to handle arbitrarily-nested JSON list payloads, which aligns directly with the core logic changes (replacing single-level if guard with while loop for unwrapping nested lists).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/plan-evaluator-nested-list

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@engkimo engkimo merged commit 706c89d into main May 19, 2026
6 checks passed
@engkimo engkimo deleted the fix/plan-evaluator-nested-list branch May 19, 2026 03:24
engkimo added a commit that referenced this pull request May 22, 2026
- Promote Unreleased CHANGELOG entry to v0.6.3 (2026-05-22), rolling up
  7 PRs since v0.6.2: TD-195 router (#40), plan-evaluator fix (#39),
  Haiku A/B (#38), Haiku pricing fix (#37), cost-sim harness (#36),
  TD-189 step 5 CLI (#35).
- Bump pyproject + main.py + version consistency test + uv.lock.
- Refresh CLAUDE.md version banner to 0.6.3 / 2026-05-22.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant