Large Language Models#
This directory contains documentation for Large Language Models (LLMs) supported by Megatron Bridge. Each model documentation includes examples for converting to/from 🤗 Hugging Face and links to training recipes.
Available Models#
Megatron Bridge supports the following LLM families:
Model |
Documentation |
Description |
|---|---|---|
DeepSeek V2 |
DeepSeek V2 model family |
|
DeepSeek V3 |
DeepSeek V3 model family |
|
Gemma 2 |
Google Gemma 2 models |
|
Gemma 3 |
Google Gemma 3 models |
|
GLM-4.5 |
GLM-4.5 model family |
|
GPT-OSS |
Open-source GPT-style models |
|
LLaMA 3 |
Meta LLaMA 3 models |
|
LLaMA Nemotron |
NVIDIA LLaMA Nemotron models |
|
Mistral |
Mistral AI models |
|
Moonlight |
Moonlight model family |
|
Nemotron-3 |
NVIDIA Nemotron-3 models |
|
Nemotron-H |
NVIDIA Nemotron-H models |
|
OLMoE |
OLMoE (Open Language Model - Mixture of Experts) |
|
Qwen |
Alibaba Cloud Qwen model family |
Model Documentation Structure#
Each model documentation page typically includes:
Model Overview - Architecture and key features
Available Variants - Supported model sizes and configurations
Conversion Examples - Converting between Hugging Face and Megatron formats
Training Recipes - Links to training configurations and examples
Architecture Details - Model-specific features and configurations
Ready to explore? Choose a model from the list above or return to the main documentation.