Supported Models#

This directory contains documentation for all models supported by Megatron Bridge, including Large Language Models (LLMs) and Vision Language Models (VLMs). Each model documentation includes architecture details, conversion examples for Hugging Face ↔ Megatron Bridge, and links to training recipes.

Model Categories#

Megatron Bridge supports two main categories of models:

🔤 Large Language Models (LLMs)#

Text-only models for language understanding and generation tasks.

Category	Model Count	Documentation
Large Language Models	13 models	LLM Documentation

Supported LLM Families:

DeepSeek (V2, V3)
Gemma (2, 3)
GLM-4.5
GPT-OSS
LLaMA (3, Nemotron)
Mistral
Moonlight
Nemotron-H
OLMoE
Qwen (2, 2.5, 3, 3 MoE, 3-Next)

🖼️ Vision Language Models (VLMs)#

Multimodal models that combine vision and language capabilities.

Category	Model Count	Documentation
Vision Language Models	4 models	VLM Documentation

Supported VLM Families:

Gemma 3 VL
Nemotron Nano V2 VL
Qwen (2.5 VL, 3 VL)

Quick Navigation#

I want to#

🔍 Find a specific LLM model → Browse Large Language Models documentation

🖼️ Find a specific VLM model → Browse Vision Language Models documentation

🔄 Convert models between formats → See Bridge Guide for Hugging Face ↔ Megatron conversion

🚀 Get started with training → See Training Documentation for training guides

📚 Understand model architectures → Each model page documents architecture-specific features and configurations

🔧 Add support for a new model → Refer to Adding New Models

📊 Use training recipes → Read Recipe Usage for pre-configured training recipes

Model Documentation Structure#

Each model documentation page typically includes:

Model Overview - Architecture and key features
Available Variants - Supported model sizes and configurations
Conversion Examples - Converting between Hugging Face and Megatron formats
Training Recipes - Links to training configurations and examples
Architecture Details - Model-specific features and configurations

Common Tasks by Model Type#

For LLM Models#

Training:

Pretraining on large corpora
Supervised fine-tuning (SFT)
Parameter-efficient fine-tuning (PEFT/LoRA)
Preference optimization (DPO)

Deployment:

Export to Hugging Face format
Integration with inference engines
Model serving and deployment

Use Cases:

Text generation
Question answering
Conversational AI
Code generation

For VLM Models#

Training:

Multimodal pretraining
Vision-language alignment
Fine-tuning on visual tasks

Deployment:

Export to Hugging Face format
Multimodal inference

Use Cases:

Image captioning
Visual question answering
Document understanding
Multimodal reasoning

Model Support Overview#

By Architecture Type#

Decoder-Only (Autoregressive):

GPT-style models (GPT-OSS)
LLaMA family (LLaMA 3, LLaMA Nemotron)
Qwen family (Qwen 2, 2.5, 3, 3-Next)
Gemma family (Gemma 2, 3)
DeepSeek family (DeepSeek V2, V3)
Mistral, Moonlight, Nemotron-H, GLM-4.5

Mixture-of-Experts (MoE):

Qwen 3 MoE, Qwen 3-Next
DeepSeek V2, V3
OLMoE

Vision-Language (Multimodal):

Gemma 3 VL
Qwen 2.5 VL, Qwen 3 VL
Nemotron Nano V2 VL

By Provider#

Meta/LLaMA:

LLaMA 3

NVIDIA:

LLaMA Nemotron
Nemotron-H
Nemotron Nano V2 VL

Alibaba Cloud:

Qwen (2, 2.5, 3, 3 MoE, 3-Next)
Qwen VL (2.5, 3)

Google:

Gemma (2, 3)
Gemma 3 VL

DeepSeek:

DeepSeek (V2, V3)

Other:

Mistral AI (Mistral)
GLM-4.5
GPT-OSS
Moonlight
OLMoE

Conversion Support#

All models support bidirectional conversion:

Hugging Face → Megatron Bridge: Load pretrained weights for training
Megatron Bridge → Hugging Face: Export trained models for deployment

Conversion features:

Automatic architecture detection
Parallelism-aware conversion (TP/PP/VPP/CP/EP)
Streaming and memory-efficient transfers
Verification mechanisms for conversion accuracy

Refer to the Bridge Guide for detailed conversion instructions.

Ready to explore? Choose a model category:

Or return to the main documentation.