Large Language Models#

This directory contains documentation for Large Language Models (LLMs) supported by Megatron Bridge. Each model documentation includes examples for converting to/from 🤗 Hugging Face and links to training recipes.

Available Models#

Megatron Bridge supports the following LLM families:

Model	Documentation	Description
DeepSeek V2	deepseek-v2.md	DeepSeek V2 model family
DeepSeek V3	deepseek-v3.md	DeepSeek V3 model family
Gemma 2	gemma2.md	Google Gemma 2 models
Gemma 3	gemma3.md	Google Gemma 3 models
GLM-4.5	glm45.md	GLM-4.5 model family
GPT-OSS	gpt-oss.md	Open-source GPT-style models
LLaMA 3	llama3.md	Meta LLaMA 3 models
LLaMA Nemotron	llama-nemotron.md	NVIDIA LLaMA Nemotron models
Mistral	mistral.md	Mistral AI models
Moonlight	moonlight.md	Moonlight model family
Nemotron-3	nemotron3.md	NVIDIA Nemotron-3 models
Nemotron-H	nemotronh.md	NVIDIA Nemotron-H models
OLMoE	olmoe.md	OLMoE (Open Language Model - Mixture of Experts)
Qwen	qwen.md	Alibaba Cloud Qwen model family

Model Documentation Structure#

Each model documentation page typically includes:

Model Overview - Architecture and key features
Available Variants - Supported model sizes and configurations
Conversion Examples - Converting between Hugging Face and Megatron formats
Training Recipes - Links to training configurations and examples
Architecture Details - Model-specific features and configurations

Ready to explore? Choose a model from the list above or return to the main documentation.