feat: add Qwen3-30B router-replay debug config by mikasenghaas · Pull Request #2847 · PrimeIntellect-ai/prime-rl

mikasenghaas · 2026-06-21T04:51:57Z

Summary

Add configs/debug/r3.toml — a Qwen3-30B-A3B (Thinking) RL debug config with router replay enabled (trainer.enable_router_replay = true, paired with inference.enable_return_routed_experts = true).
Ported from the rlm5/qwen30b-debug.toml config used on the perf/r3 experiment line, updated to the current config schema and cleaned up.

Schema migration (vs. the old config)

orchestrator.rollouts_per_example → orchestrator.group_size.
Removed fields that no longer exist in the schema: max_async_level, orchestrator.use_token_client, orchestrator.use_renderer, [orchestrator.buffer] (online_difficulty_filtering).
Dropped the explicit [orchestrator.client] block — X-Session-ID = trajectory_id is now auto-set by the orchestrator.
Dropped name from [inference.model] — the shared top-level [model] name now propagates to all sub-configs.
Eval is disabled by omitting the [orchestrator.eval] block (the schema now rejects env = []).

Debug cleanup

Removed the one-off perf-experiment fields and their narration comments: use_process_pool (round 4), dump_raw_rollouts (round 5), gather_chunk_size (round 6).
Removed [trainer.experimental.token_export] (per-token JSONL rollout export — debug-only).

Verification

Validated that the config loads against RLConfig (same path as tests/unit/test_configs.py::test_load_configs):

PARSED OK
trainer.enable_router_replay = True
inference.enable_return_routed_experts = True
orchestrator.group_size = 8
student.client.extra_headers_from_state = {'X-Session-ID': 'trajectory_id'}
inference.model.name (propagated) = Qwen/Qwen3-30B-A3B-Thinking-2507
renderer.name = qwen3

🤖 Generated with Claude Code

Ports rlm5/qwen30b-debug.toml (perf/r3 line) into configs/debug/r3.toml, updated to the current config schema and stripped of one-off debug fields. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

mikasenghaas and others added 4 commits June 20, 2026 21:51

feat: add Qwen3-30B router-replay debug config

9215c2c

Ports rlm5/qwen30b-debug.toml (perf/r3 line) into configs/debug/r3.toml, updated to the current config schema and stripped of one-off debug fields. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat: rename wandb project to r3-debug

b0e8fa1

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat: set max_steps to 100 and drop explicit output_dir

e28f857

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat: set max_inflight_rollouts to 512

a5f2849

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Qwen3-30B router-replay debug config#2847

feat: add Qwen3-30B router-replay debug config#2847
mikasenghaas wants to merge 4 commits into
mainfrom
feat/r3-debug-config

mikasenghaas commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mikasenghaas commented Jun 21, 2026

Summary

Schema migration (vs. the old config)

Debug cleanup

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant