Features and Roadmap#
Available now | Coming in v0.4
Coming in v0.4#
Megatron Inference - Megatron Inference for fast Day-0 support for new Megatron models (avoid weight conversion)
Async RL - Support for asynchronous rollouts and replay buffers for off-policy training, and enable a fully asynchronous GRPO
Vision Language Models (VLM) - Support SFT and GRPO on VLMs through the DTensor path
Improved Native Performance - Improve training time for native PyTorch models
Improved Large MoE Performance - Improve Megatron Core training performance and generation performance
End-to-End FP8 Low-Precision Training - Support for Megatron Core FP8 training and FP8 vLLM generation
Megatron Bridge Integration - Integrate Megatron Bridge to enable training features from Megatron Core
NeMo Automodel Integration - Integrate NeMo Automodel to power the DTensor path
New Models -
gpt-ossExpand Algorithms - DAPO, GSPO, On-policy Distillation
GB200 - Add container support for GB200
Available Now#
Distributed Training - Ray-based infrastructure
Environment Support and Isolation - Support for multi-environment training and dependency isolation between components
Worker Isolation - Process isolation between RL Actors (no worries about global state)
Learning Algorithms - GRPO/GSPO, SFT, and DPO
Multi-Turn RL - Multi-turn generation and training for RL with tool use, games, etc
Advanced Parallelism with DTensor - PyTorch FSDP2, TP, CP, and SP for efficient training
Larger Model Support with Longer Sequences - Performant parallelisms with Megatron Core (TP/PP/CP/SP/EP/FSDP)
MoE Models - Support for DeepSeekV3 and Qwen-3 MoE models (Megatron)
Sequence Packing - Sequence packing in both DTensor and Megatron Core for huge training performance gains
Fast Generation - vLLM backend for optimized inference
Hugging Face Integration - Works with 1B–70B models (Qwen, Llama)