You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: clamp grader scores to open interval (0, 1) to pass Phase 2 validation
The OpenEnv validator requires task scores to be strictly between 0 and 1.
ScoreGrader was returning exactly 0.0 (empty trajectory or all-negative
rewards) and exactly 1.0 (perfect agent), causing all three tasks to fail
the score-range check and the "at least 3 tasks with graders" check.
Changed clamping bounds from [0.0, 1.0] to [_SCORE_MIN, _SCORE_MAX]
where _SCORE_MIN = 1e-9 and _SCORE_MAX = 1 - 1e-9.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
0 commit comments