Supported Models#
This directory contains family-organized documentation for models supported by Megatron Bridge. Each model page covers supported variants, Hugging Face <-> Megatron Bridge conversion, training recipe links, and model-specific notes.
Family Index#
Family |
Model documentation |
|---|---|
Bailing |
|
DeepSeek |
|
Falcon |
|
Gemma |
|
GLM |
|
GPT-OSS |
|
Kimi |
|
Llama |
|
MiniMax |
|
Mistral |
|
Xiaomi-MiMo |
|
Moonlight |
|
Nemotron |
Llama Nemotron, Nemotron H and Nemotron Nano v2, Nemotron-3 Nano, Nemotron-3 Super, Nemotron Nano V2 VL, Nemotron-3 Nano Omni |
OLMoE |
|
Qwen |
Qwen, Qwen3-MoE, Qwen3-Next, Qwen2.5-VL, Qwen3-VL, Qwen3.5 / 3.6, Qwen2-Audio, Qwen2.5-Omni, Qwen3-Omni, Qwen3-ASR |
Sarvam |
|
StepFun |
Model Documentation Structure#
Each model documentation page typically includes:
Model Overview - Architecture and key features
Available Variants - Supported model sizes and configurations
Conversion Examples - Converting between Hugging Face and Megatron formats
Training Recipes - Links to training configurations and examples
Architecture Details - Model-specific features and configurations
Model Support Overview#
Decoder-Only and Hybrid Backbones#
Bailing, DeepSeek, Falcon, Gemma, GLM, GPT-OSS, Kimi, Llama, MiniMax, Mistral, Moonlight, Nemotron, OLMoE, Qwen, Sarvam, StepFun, and Xiaomi-MiMo
MoE and hybrid variants including Bailing, DeepSeek, GLM, GPT-OSS, MiniMax, Nemotron-3, OLMoE, Qwen3-MoE, Qwen3-Next, and Sarvam
Multimodal Variants#
Gemma 3 VL and Gemma 4 VL
GLM-4.5V
Kimi-K2.5-VL
Ministral 3
Nemotron Nano V2 VL and Nemotron-3 Nano Omni
Qwen2-Audio, Qwen2.5-VL, Qwen2.5-Omni, Qwen3-VL, Qwen3.5 / 3.6, Qwen3-Omni, and Qwen3-ASR
Conversion Support#
All model pages document support for one or both conversion directions:
Hugging Face -> Megatron Bridge: Load pretrained weights for training
Megatron Bridge -> Hugging Face: Export trained models for deployment
Conversion features:
Automatic architecture detection
Parallelism-aware conversion (TP/PP/VPP/CP/EP)
Streaming and memory-efficient transfers
Verification mechanisms for conversion accuracy
Refer to the Bridge Guide for detailed conversion instructions.