Skip to content

Codex/orchestrator r3 memory#2853

Open
samsja wants to merge 8 commits into
mainfrom
codex/orchestrator-r3-memory
Open

Codex/orchestrator r3 memory#2853
samsja wants to merge 8 commits into
mainfrom
codex/orchestrator-r3-memory

Conversation

@samsja

@samsja samsja commented Jun 22, 2026

Copy link
Copy Markdown
Member

Note

Medium Risk
Touches orchestrator batching, transport, filters, and monitoring on compacted trajectories; production paths change memory/release behavior but debug flags are opt-in.

Overview
Adds orchestrator debug (orchestrator.debug) so you can stress the rollout pipeline without full inference/trainer stacks: optional no-op inference, no trainer (noop rollout transport + local policy version bumps), fake tokenizer, and RSS logging. The RL launcher skips inference/trainer processes and GPU allocation when those flags are set.

Memory-focused changes for heavy R3 / long trajectories: after interleaving, raw trajectory token arrays and routed-expert payloads are compacted to length/metadata summaries; batches drop held references after send/logging; malloc_trim and optional process memory logging run per step. Training samples can carry a scalar completion_temperature instead of per-token lists.

Compatibility updates so behavior stays correct on compacted payloads: rollout filters read from rollout.samples when raw tokens are pruned; monitors and length helpers understand *_len fields; trainer packing accepts compact temperatures.

Ships a fake-r3-trajectory debug env (deterministic long multi-turn rollouts with optional routed experts) wired into the envs extra for memory repro/testing.

Reviewed by Cursor Bugbot for commit 10ed89d. Bugbot is set up for automated code reviews on this repo. Configure here.

@samsja samsja marked this pull request as ready for review June 22, 2026 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant