Skip to content

feat: add Qwen3-30B router-replay debug config#2847

Draft
mikasenghaas wants to merge 4 commits into
mainfrom
feat/r3-debug-config
Draft

feat: add Qwen3-30B router-replay debug config#2847
mikasenghaas wants to merge 4 commits into
mainfrom
feat/r3-debug-config

Conversation

@mikasenghaas

Copy link
Copy Markdown
Member

Summary

  • Add configs/debug/r3.toml — a Qwen3-30B-A3B (Thinking) RL debug config with router replay enabled (trainer.enable_router_replay = true, paired with inference.enable_return_routed_experts = true).
  • Ported from the rlm5/qwen30b-debug.toml config used on the perf/r3 experiment line, updated to the current config schema and cleaned up.

Schema migration (vs. the old config)

  • orchestrator.rollouts_per_exampleorchestrator.group_size.
  • Removed fields that no longer exist in the schema: max_async_level, orchestrator.use_token_client, orchestrator.use_renderer, [orchestrator.buffer] (online_difficulty_filtering).
  • Dropped the explicit [orchestrator.client] block — X-Session-ID = trajectory_id is now auto-set by the orchestrator.
  • Dropped name from [inference.model] — the shared top-level [model] name now propagates to all sub-configs.
  • Eval is disabled by omitting the [orchestrator.eval] block (the schema now rejects env = []).

Debug cleanup

  • Removed the one-off perf-experiment fields and their narration comments: use_process_pool (round 4), dump_raw_rollouts (round 5), gather_chunk_size (round 6).
  • Removed [trainer.experimental.token_export] (per-token JSONL rollout export — debug-only).

Verification

Validated that the config loads against RLConfig (same path as tests/unit/test_configs.py::test_load_configs):

PARSED OK
trainer.enable_router_replay = True
inference.enable_return_routed_experts = True
orchestrator.group_size = 8
student.client.extra_headers_from_state = {'X-Session-ID': 'trajectory_id'}
inference.model.name (propagated) = Qwen/Qwen3-30B-A3B-Thinking-2507
renderer.name = qwen3

🤖 Generated with Claude Code

mikasenghaas and others added 4 commits June 20, 2026 21:51
Ports rlm5/qwen30b-debug.toml (perf/r3 line) into configs/debug/r3.toml,
updated to the current config schema and stripped of one-off debug fields.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant