Supported Models#
This directory contains documentation for all models supported by Megatron Bridge, including Large Language Models (LLMs) and Vision Language Models (VLMs). Each model documentation includes architecture details, conversion examples for Hugging Face ↔ Megatron Bridge, and links to training recipes.
Model Categories#
Megatron Bridge supports two main categories of models:
🔤 Large Language Models (LLMs)#
Text-only models for language understanding and generation tasks.
Category |
Model Count |
Documentation |
|---|---|---|
Large Language Models |
13 models |
Supported LLM Families:
DeepSeek (V2, V3)
Gemma (2, 3)
GLM-4.5
GPT-OSS
LLaMA (3, Nemotron)
Mistral
Moonlight
Nemotron-H
OLMoE
Qwen (2, 2.5, 3, 3 MoE, 3-Next)
🖼️ Vision Language Models (VLMs)#
Multimodal models that combine vision and language capabilities.
Category |
Model Count |
Documentation |
|---|---|---|
Vision Language Models |
4 models |
Supported VLM Families:
Gemma 3 VL
Nemotron Nano V2 VL
Qwen (2.5 VL, 3 VL)
Model Documentation Structure#
Each model documentation page typically includes:
Model Overview - Architecture and key features
Available Variants - Supported model sizes and configurations
Conversion Examples - Converting between Hugging Face and Megatron formats
Training Recipes - Links to training configurations and examples
Architecture Details - Model-specific features and configurations
Common Tasks by Model Type#
For LLM Models#
Training:
Pretraining on large corpora
Supervised fine-tuning (SFT)
Parameter-efficient fine-tuning (PEFT/LoRA)
Preference optimization (DPO)
Deployment:
Export to Hugging Face format
Integration with inference engines
Model serving and deployment
Use Cases:
Text generation
Question answering
Conversational AI
Code generation
For VLM Models#
Training:
Multimodal pretraining
Vision-language alignment
Fine-tuning on visual tasks
Deployment:
Export to Hugging Face format
Multimodal inference
Use Cases:
Image captioning
Visual question answering
Document understanding
Multimodal reasoning
Model Support Overview#
By Architecture Type#
Decoder-Only (Autoregressive):
GPT-style models (GPT-OSS)
LLaMA family (LLaMA 3, LLaMA Nemotron)
Qwen family (Qwen 2, 2.5, 3, 3-Next)
Gemma family (Gemma 2, 3)
DeepSeek family (DeepSeek V2, V3)
Mistral, Moonlight, Nemotron-H, GLM-4.5
Mixture-of-Experts (MoE):
Qwen 3 MoE, Qwen 3-Next
DeepSeek V2, V3
OLMoE
Vision-Language (Multimodal):
Gemma 3 VL
Qwen 2.5 VL, Qwen 3 VL
Nemotron Nano V2 VL
By Provider#
Meta/LLaMA:
LLaMA 3
NVIDIA:
LLaMA Nemotron
Nemotron-H
Nemotron Nano V2 VL
Alibaba Cloud:
Qwen (2, 2.5, 3, 3 MoE, 3-Next)
Qwen VL (2.5, 3)
Google:
Gemma (2, 3)
Gemma 3 VL
DeepSeek:
DeepSeek (V2, V3)
Other:
Mistral AI (Mistral)
GLM-4.5
GPT-OSS
Moonlight
OLMoE
Conversion Support#
All models support bidirectional conversion:
Hugging Face → Megatron Bridge: Load pretrained weights for training
Megatron Bridge → Hugging Face: Export trained models for deployment
Conversion features:
Automatic architecture detection
Parallelism-aware conversion (TP/PP/VPP/CP/EP)
Streaming and memory-efficient transfers
Verification mechanisms for conversion accuracy
Refer to the Bridge Guide for detailed conversion instructions.
Ready to explore? Choose a model category:
Or return to the main documentation.