Halt after `max_consecutive_errored_rollouts` by rasdani · Pull Request #2835 · PrimeIntellect-ai/prime-rl

rasdani · 2026-06-18T14:57:23Z

Summary

Add orchestrator.max_consecutive_errored_rollouts with default 10 and None disable support.

Stop the training after a configurable number of rollouts failed in a row.

Prevents spamming sandboxes on a persistent sandbox service outage.

Validation

uv run pytest tests/unit/orchestrator/test_orchestrator_error_guard.py
uv run pytest tests/unit/test_configs.py -k max_consecutive_errored_rollouts
uv run ruff check docs/training.md packages/prime-rl-configs/src/prime_rl/configs/orchestrator.py src/prime_rl/orchestrator/orchestrator.py tests/unit/test_configs.py tests/unit/orchestrator/test_orchestrator_error_guard.py
git diff --check

Known Non-Scope Failure

uv run pytest tests/unit/test_configs.py currently fails at test_shared_output_dir_propagates_through_cli; that existing CLI --output-dir propagation issue is intentionally not fixed in this PR.

Note

Low Risk
Localized orchestrator safety rail with default-on threshold and disable via None; no auth or training math changes.

Overview
Adds orchestrator.max_consecutive_errored_rollouts (default 10, None to disable) so long runs fail fast when env rollouts keep dying after verifiers retries are exhausted.

The orchestrator counts consecutive terminal errored train and eval rollouts in the main loop before TrainSink / EvalSink handling; successful rollouts reset the streak, and synthetic Cancelled markers from off-policy cleanup are ignored. Hitting the limit raises RuntimeError with a warning log per error. Docs and config validation tests cover the new knob; unit tests exercise streak logic and main-loop ordering.

^{Reviewed by Cursor Bugbot for commit 749b193. Bugbot is set up for automated code reviews on this repo. Configure here.}

mikasenghaas

hmm looks a bit dangerous, esp the default. in prod we have had hours of outages and then training resumed as the service came back which was acc quite nice

mikasenghaas · 2026-06-18T18:36:35Z

mikasenghaas · 2026-06-18T18:36:43Z

mikasenghaas · 2026-06-18T18:37:11Z

    max_off_policy_steps: int = Field(8, ge=0)
    """Maximum policies allowed to generate a single rollout. Rollouts generated more than ``max_off_policy_steps`` ahead of training are discarded. Higher values yield better throughput at the cost of off-policy noise."""

+    max_consecutive_errored_rollouts: int | None = Field(10, ge=1)


i think 10 is a bit rough here?

scared of cancelling runs on intermittent errors

what do you suggest?

samsja

can we remove tests ?

rasdani · 2026-06-18T22:58:11Z

@codex remove all tests this PR added

chatgpt-codex-connector · 2026-06-18T23:00:30Z

Summary

Removed the config tests added for max_consecutive_errored_rollouts, restoring the surrounding tests/unit/test_configs.py flow from trainer token-export coverage directly into existing inference DP-rank config coverage. tests/unit/test_configs.pyL170-L176
Deleted tests/unit/orchestrator/test_orchestrator_error_guard.py, which contained the orchestrator error-guard tests requested for removal.
Committed the changes on the current branch: 209b807 Remove rollout error guard tests.
Created a PR record titled: Remove tests added by rollout error guard PR.

Testing

✅ git diff --check
⚠️ uv run ruff check tests/unit/test_configs.py — could not run because uv failed dependency resolution for local dependency deps/research-environments/environments/aime2024 after warning about parsing exclude-newer = "7 days".

View task →

Halt on consecutive errored rollouts

749b193

rasdani force-pushed the halt-consecutive-errored-rollouts branch from a668787 to 749b193 Compare June 18, 2026 15:13

rasdani changed the title ~~Halt on consecutive errored rollouts~~ Halt after max_consecutive_errored_rollouts Jun 18, 2026

rasdani marked this pull request as ready for review June 18, 2026 15:15

rasdani requested review from mikasenghaas and samsja June 18, 2026 15:17

mikasenghaas reviewed Jun 18, 2026

View reviewed changes

samsja reviewed Jun 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Halt after `max_consecutive_errored_rollouts`#2835

Halt after `max_consecutive_errored_rollouts`#2835
rasdani wants to merge 1 commit into
mainfrom
halt-consecutive-errored-rollouts

rasdani commented Jun 18, 2026 •

edited

Loading

Uh oh!

mikasenghaas left a comment •

edited

Loading

Uh oh!

mikasenghaas Jun 18, 2026

Uh oh!

mikasenghaas Jun 18, 2026

Uh oh!

mikasenghaas Jun 18, 2026

Uh oh!

mikasenghaas Jun 18, 2026

Uh oh!

rasdani Jun 18, 2026

Uh oh!

samsja left a comment

Uh oh!

rasdani commented Jun 18, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rasdani commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Known Non-Scope Failure

Uh oh!

mikasenghaas left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mikasenghaas Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

mikasenghaas Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

mikasenghaas Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

mikasenghaas Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

rasdani Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

samsja left a comment

Choose a reason for hiding this comment

Uh oh!

rasdani commented Jun 18, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rasdani commented Jun 18, 2026 •

edited

Loading

mikasenghaas left a comment •

edited

Loading