18 Jun 08:31

PanAndy

7f9d4d3

v0.3.0 Latest

Latest

ROLL v0.3.0 Release Notes

大家好！感谢大家对ROLL的关注。ROLL发布了v0.3.0版本，新增Video RLVR、AgentRunner 2.0、MTP训练、Router Replay、Multi-Teacher OPD等重要特性；新增OpenTelemetry可观测性支持；强化mcore_adapter能力；扩展NPU/AMD硬件适配。以下是近期更新的一些梳理，我们将持续对ROLL进行迭代更新，欢迎加入ROLL的社区。

🚀 亮点

新增 Video/Audio RLVR 训练支持（Video-R1 reward）
新增 AgentRunner 2.0 抽象，解耦Agent交互逻辑，支持更灵活的多轮Agent场景
新增 RemoteBatch 惰性数据传输机制，优化大规模 image/video/long_context logits 跨Worker传输
新增 MoE Router Replay (R3)，MoE模型训练时复用rollout阶段的路由决策
支持 Qwen3.5/3.6 MTP (Multi-Token Prediction) SFT/RL 训练
新增 OpenTelemetry 分布式追踪，提供端到端可观测性

🚀 主要新特性

Pipeline

新增 Video/Audio RLVR Pipeline，支持视频、音频理解场景的强化学习训练(Qwen3Omni系列模型)
新增 Multi-Teacher On-Policy Distillation 支持，[文档](docs_roll/i18n/zh-Hans/docusaurus-plugin-content-docs/current/User Guides/Pipeline/on_policy_distill_pipeline_start.md)
新增 LLM-as-Judge Server 模式，支持独立部署 judge 服务，示例配置

Agent Native 2.0

新增 AgentRunner 抽象，解耦"Agent如何与环境交互"与"训练样本构造"，[设计文档](docs_roll/i18n/zh-Hans/docusaurus-plugin-content-docs/current/User Guides/Agentic/agent_runner.md)
新增 ProxyEnvManager / MessageTracker，支持更复杂的Agent交互模式，[设计文档](docs_roll/i18n/zh-Hans/docusaurus-plugin-content-docs/current/User Guides/Agentic/prefix_aggregation.md)
新增 Atropos 环境集成，示例配置
新增 OpenReward 环境集成，示例配置

mcore_adapter

新增 MTP (Multi-Token Prediction) 训练支持（standalone/joint两种模式），[使用文档](docs_roll/i18n/zh-Hans/docusaurus-plugin-content-docs/current/User Guides/Advanced Features/mtp_training.md)
新增 Router Replay (R3)，MoE训练时复用rollout路由决策，减少重计算开销，[使用文档](docs_roll/docs/User Guides/Advanced Features/router_replay.md)
新增 Fused Entropy CE kernel，TP=1场景下加速交叉熵计算
新增 PP Stage Compile Warmup，Pipeline并行编译预热
Qwen3.5/3.6系列 VLM sequence packing 优化

RemoteBatch 传输优化

新增惰性数据传输后端，支持 image/video/long_context 场景下大规模数据高效传输，[使用文档](docs_roll/i18n/zh-Hans/docusaurus-plugin-content-docs/current/User Guides/Advanced Features/remote_batch_transfer.md)
基于TransferQueue优化 Ray Worker 间存储管理

Observability

新增 OpenTelemetry 集成，支持分布式追踪，[使用文档](docs_roll/i18n/zh-Hans/docusaurus-plugin-content-docs/current/User Guides/Advanced Features/opentelemetry_tracing.md)
新增 OTEL Receiver，pipeline各阶段端到端tracing

FSDP2

新增 Qwen3 MoE patch，支持 MoE 模型 FSDP2 训练
LoRA模型支持优化
FSDP2 / EP并行支持

Docker

新增 NPU A2/A3 Docker 镜像
新增 AMD torch2.8.0/torch2.10 Docker 镜像

Hardware

NPU：新增 A2/A3 适配，修复 FSDP2 相关问题，新增 Ascend 全流程文档
AMD：新增 torch2.10 支持，ROCm参数同步优化

Models

支持 Qwen3.5 Dense (27B) / MoE (35B-A3, 122B-A10, 397A-17) 系列模型
提供 Megatron + FSDP2 多种规格配置示例

Performance 优化

do_checkpoint pin_memory 优化
GC 优化
low-memory checkpoint convert

Bug Fix

fix sglang & vllm 偶现 port conflict
fix reward worker metrics 透出
fix vllm GDN attention mixed decode/spec-decode crash（vllm < 0.17.2）

Deprecated

DeepSpeed Strategy（third_party代码已移除）
Wan RewardRL（生成模型的RL训练重构中）

TODOs

Multi Agent 支持
Full vocab version Multi-Teacher OPD

Assets 2

0 Join discussion

09 Mar 10:09

PanAndy

v0.2.1

2eba7c3

v0.2.1

Hello everyone! Thank you for your interest in ROLL.
ROLL has recently received a large set of new features. Below is a summary of the latest updates. We will continue iterating on ROLL—welcome to join the ROLL community.
#366

🚀 Highlights

Rollout has been refactored to be scheduled by a router, with support for sglang-router.
Added training support for [On-Policy Distillation](docs_roll/i18n/zh-Hans/docusaurus-plugin-content-docs/current/User Guides/Pipeline/on_policy_distill_pipeline_start.md).
Added support for the Qwen3.5 model family: Dense / MoE.

🚀 Major New Features

Rollout
- Router scheduling refactor
  - Refactored the sglang strategy to support both engine and server modes.
  - Refactored schedulers (rlvr DynamicScheduler / agentic RolloutScheduler) so that scheduling is now uniformly provided by the Router.
  - Migrated the original LoadBalancer and RequestScheduler to PromptAffinityRouter and EnvAffinityRouter.
  - Added support for sglang-router.
Pipeline recipes
- Added On-Policy Distillation training support.
Models
- Added support for Qwen3.5 Dense/MoE series models.
Docker
- Updated images/environments: torch 2.10, vLLM 0.16.0 nightly, vLLM 0.15.1, mcore 0.16.0.
Bug fixes
- Set VLLM_USE_FLASHINFER_SAMPLER=0 by default for vLLM on Torch 2.8.0 to mitigate overly repetitive responses.
- Fixed occasional port conflicts between sglang and vLLM.
- Fixed sglang multi-node failures when infer_dp > 1.
- Fixed reward worker metrics exposure.
- Fixed a get_node_ip cache issue in model download that could cause deadlock/timeouts.
- Fixed FSDP2 DCE save when CPU offloading is enabled.
- Fixed casting during FSDP2 model initialization.

Assets 2

0 Join discussion

04 Feb 09:04

PanAndy

v0.2.0

3077bef

v0.2.0 release

Hello everyone! Thank you for your attention to ROLL.
ROLL has recently updated with a large number of new features. Below is a summary of recent updates, and we will continue to iterate and update ROLL. Welcome to join the ROLL community.

🚀 Highlights:

New model support: Qwen3-VL, Qwen3-MoE-VL, Qwen3-Omni, GLM-4.7
Agentic training and Rollout GPU partial overlap, switching idle training GPUs to Rollout
DynamicSamplingScheduler coroutine refactoring
New: FSDP2 Strategy
Training supports Sequence packing and Dynamic batching

🚀 Major New Features:

Rollout
- DynamicSamplingScheduler coroutine refactoring
- Custom rollout pre/post process, supporting dynamic sampling params, multi-stage generation, ThinkingBudget control
- Sglang: Strategy refactoring, supporting server mode, native onload/offload, inflight FP8 quant rollout, cross-machine multi-node deployment
- vLLM: DP/EP support, supports vllm==0.12.0
- Provides AgentNative Rollout paradigm, AgentNativeStepEnvManager + SokobanNativeEnv, fully managed context by env
- Async Rollout Hang Detect: Added asynchronous Rollout hang detection to quickly locate problematic envs
- Supports rollout dump & mock, improving forward/train phase precision alignment efficiency
- Agentic pipeline supports train-val/rollout overlap
Training
- FSDP2
- Megatron support LoRA, LoRA RL blogs: https://macaron.im/mindlab/research/building-trillion-parameter-reasoning-rl-with-10-gpus
- Save model parameters in HF format online during Megatron training
- Support FP8 training for Megatron Strategy
- Sequence packing, fine-tuned loss_func interface definition
- Dynamic batching
- Add DeepSpeed SFT support
Model Update implementation optimization: Eliminate inter-machine redundancy, weight conversion and nccl broadcast overlap, optimize host to device, adjust multiple pp serial synchronization to lock mode for simultaneous synchronization
Asynchronous Feature
- Training and Rollout GPU partial overlap, switching idle training GPUs to Rollout, report: https://arxiv.org/abs/2512.24873
- Agentic off policy loss with IS correction
Pipeline recipe
- VLM image tool use: DeepEyes, tool invocation and reward calculation overlap
Models: New model support for Qwen3-VL, Qwen3-MoE-VL, Qwen3-Omni-Thinker, GLM-4.7

Assets 2

0 Join discussion

08 Dec 08:37

PanAndy

v0.1.3

e1695f2

release flag for v0.1.3

🚀亮点:

(feat): support Qwen3VL, mcore_adapter and examples.
(feat): Add optimization for computing ref_logprobs and old_logprobs.
(feat): support vllm beam_search.
(feat): Add support for Qwen-3-next on AMD GPUs.
(feat): support sglang==0.5.4、vllm==0.11.1、torch2.8.0.

🚀主要新特性：

Agentic
- (fix): fix agentic val get_batch state in redundancy env.
- (feat): agentic-spec actor worker.
- (feat): add infer_log_probs in agentic.
- (feat): refactor agentic norm like LitePPO.
- (feat): add agentic profile metrics.
模型与后端
- (feat): support vllm beam_search.
- (feat): Add support for Qwen-3-next on AMD GPUs.
- (feat): support offload nccl to save gpu memory. Thanks for slime.
- (feat): support sglang 054.
- (feat): sglang support dp-attention.
- (feat): add enable_reference option. #250
- (feat): add enable_old_logprobs, opt old log probs by cache.
- (feat): support Qwen3VL, mcore_adapter and examples yaml. #190
- (feat): add sequence packing for sft pipeline and distill pipeline, optimize memory usage during top-k logits computation.
bug fix, refactor
- (fix): update math rule reward worker with thinking. #281
- (feat): set RAY_CGRAPH_get_timeout=600.
- (fix): fix train infer ratio/diff mean & add train infer ratio/diff token/seq mask & add rollout importance sampling. #242 #273
- (fix): ensure compatibility with transformers version check for causal mask update.
- (fix): fix vllm 0110 import for torch280.
- (fix): fix tokenizer mismatch between policy and reward model in llm judge reward worker. #91
- (fix): fix bugs in data fetching for face embeddings for wan_module.
- (fix): vllm _generate_standard missing prompt_token_ids input args in vllm >0.11.0. #189
- (fix): vllm add missing argument is_lora in function update_parameter. #233
- (fix): fix bugs with metrics recording in the DPO pipeline.
- (fix): update image loading logic for byte data in rlvr_vlm_pipeline.py
- (fix): add alive check. #253

Assets 2

Releases: alibaba/ROLL

v0.3.0

ROLL v0.3.0 Release Notes

🚀 亮点

🚀 主要新特性

Pipeline

Agent Native 2.0

mcore_adapter

RemoteBatch 传输优化

Observability

FSDP2

Docker

Hardware

Models

Performance 优化

Bug Fix

Deprecated

TODOs

Uh oh!

v0.2.1

🚀 Highlights

🚀 Major New Features

Uh oh!

v0.2.0 release

Uh oh!

release flag for v0.1.3

Uh oh!