Supported Models#

This directory contains family-organized documentation for models supported by Megatron Bridge. Each model page covers supported variants, Hugging Face <-> Megatron Bridge conversion, training recipe links, and model-specific notes.

Family Index#

Family	Model documentation
Bailing	Ling 2.0
DeepSeek	DeepSeek V2, DeepSeek V3, DeepSeek V4
Falcon	Falcon H1
Gemma	Gemma, Gemma 2, Gemma 3, Gemma 3 VL, Gemma 4 VL
GLM	GLM 4.5, GLM-4.5V, GLM-4.7 / 4.7-Flash, GLM-5 / 5.1
GPT-OSS	GPT OSS
Kimi	Kimi K2, Kimi-K2.5-VL
Llama	Llama 2, Llama 3
MiniMax	MiniMax-M2 / M2.5 / M2.7, MiniMax-M3
Mistral	Mistral, Ministral 3
Xiaomi-MiMo	Xiaomi-MiMo
Moonlight	Moonlight
Nemotron	Llama Nemotron, Nemotron H and Nemotron Nano v2, Nemotron-3 Nano, Nemotron-3 Super, Nemotron Nano V2 VL, Nemotron-3 Nano Omni
OLMoE	OLMoE
Qwen	Qwen, Qwen3-MoE, Qwen3-Next, Qwen2.5-VL, Qwen3-VL, Qwen3.5 / 3.6, Qwen2-Audio, Qwen2.5-Omni, Qwen3-Omni, Qwen3-ASR
Sarvam	Sarvam
StepFun	Step-3.5-Flash

Model Documentation Structure#

Each model documentation page typically includes:

Model Overview - Architecture and key features
Available Variants - Supported model sizes and configurations
Conversion Examples - Converting between Hugging Face and Megatron formats
Training Recipes - Links to training configurations and examples
Architecture Details - Model-specific features and configurations

Model Support Overview#

Decoder-Only and Hybrid Backbones#

Bailing, DeepSeek, Falcon, Gemma, GLM, GPT-OSS, Kimi, Llama, MiniMax, Mistral, Moonlight, Nemotron, OLMoE, Qwen, Sarvam, StepFun, and Xiaomi-MiMo
MoE and hybrid variants including Bailing, DeepSeek, GLM, GPT-OSS, MiniMax, Nemotron-3, OLMoE, Qwen3-MoE, Qwen3-Next, and Sarvam

Multimodal Variants#

Gemma 3 VL and Gemma 4 VL
GLM-4.5V
Kimi-K2.5-VL
Ministral 3
Nemotron Nano V2 VL and Nemotron-3 Nano Omni
Qwen2-Audio, Qwen2.5-VL, Qwen2.5-Omni, Qwen3-VL, Qwen3.5 / 3.6, Qwen3-Omni, and Qwen3-ASR

Conversion Support#

All model pages document support for one or both conversion directions:

Hugging Face -> Megatron Bridge: Load pretrained weights for training
Megatron Bridge -> Hugging Face: Export trained models for deployment

Conversion features:

Automatic architecture detection
Parallelism-aware conversion (TP/PP/VPP/CP/EP)
Streaming and memory-efficient transfers
Verification mechanisms for conversion accuracy

Refer to the Bridge Guide for detailed conversion instructions.

Supported Models#

Family Index#

Quick Navigation#

I want to#

Model Documentation Structure#

Model Support Overview#

Decoder-Only and Hybrid Backbones#

Multimodal Variants#

Conversion Support#

Supported Models#

Family Index#

Quick Navigation#

I want to#

Model Documentation Structure#

Model Support Overview#

Decoder-Only and Hybrid Backbones#

Multimodal Variants#

Related Documentation#

Conversion Support#