Codex/orchestrator r3 memory by samsja · Pull Request #2853 · PrimeIntellect-ai/prime-rl

samsja · 2026-06-22T18:29:11Z

Note

Medium Risk
Touches orchestrator batching, transport, filters, and monitoring on compacted trajectories; production paths change memory/release behavior but debug flags are opt-in.

Overview
Adds orchestrator debug (orchestrator.debug) so you can stress the rollout pipeline without full inference/trainer stacks: optional no-op inference, no trainer (noop rollout transport + local policy version bumps), fake tokenizer, and RSS logging. The RL launcher skips inference/trainer processes and GPU allocation when those flags are set.

Memory-focused changes for heavy R3 / long trajectories: after interleaving, raw trajectory token arrays and routed-expert payloads are compacted to length/metadata summaries; batches drop held references after send/logging; malloc_trim and optional process memory logging run per step. Training samples can carry a scalar completion_temperature instead of per-token lists.

Compatibility updates so behavior stays correct on compacted payloads: rollout filters read from rollout.samples when raw tokens are pruned; monitors and length helpers understand *_len fields; trainer packing accepts compact temperatures.

Ships a fake-r3-trajectory debug env (deterministic long multi-turn rollouts with optional routed experts) wired into the envs extra for memory repro/testing.

^{Reviewed by Cursor Bugbot for commit 10ed89d. Bugbot is set up for automated code reviews on this repo. Configure here.}

…-memory # Conflicts: # src/prime_rl/trainer/batch.py # tests/unit/orchestrator/test_advantage.py # tests/unit/orchestrator/test_batch.py

samsja added 8 commits June 13, 2026 22:46

add orchestrator fake R3 debug mode

80b0a66

prune duplicate train rollout payloads

52b9ed8

release train batch memory after send

88b9f20

prune raw routed payloads during interleave

0cfa7fa

Make train memory cleanup unconditional

54ad443

Polish train trace pruning review feedback

b016e2b

Support compacted token lengths in advantage shaping

391f6c9

Merge remote-tracking branch 'origin/main' into codex/orchestrator-r3…

10ed89d

…-memory # Conflicts: # src/prime_rl/trainer/batch.py # tests/unit/orchestrator/test_advantage.py # tests/unit/orchestrator/test_batch.py

samsja marked this pull request as ready for review June 22, 2026 19:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Codex/orchestrator r3 memory#2853

Codex/orchestrator r3 memory#2853
samsja wants to merge 8 commits into
mainfrom
codex/orchestrator-r3-memory

samsja commented Jun 22, 2026 •

edited by cursor Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

samsja commented Jun 22, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

samsja commented Jun 22, 2026 •

edited by cursor Bot

Loading