Large Language Models#

This directory contains documentation for Large Language Models (LLMs) supported by Megatron Bridge. Each model documentation includes examples for converting to/from 🤗 Hugging Face and links to training recipes.

Available Models#

Megatron Bridge supports the following LLM families:

Model

Documentation

Description

DeepSeek V2

deepseek-v2.md

DeepSeek V2 model family

DeepSeek V3

deepseek-v3.md

DeepSeek V3 model family

Gemma 2

gemma2.md

Google Gemma 2 models

Gemma 3

gemma3.md

Google Gemma 3 models

GLM-4.5

glm45.md

GLM-4.5 model family

GPT-OSS

gpt-oss.md

Open-source GPT-style models

LLaMA 3

llama3.md

Meta LLaMA 3 models

LLaMA Nemotron

llama-nemotron.md

NVIDIA LLaMA Nemotron models

Mistral

mistral.md

Mistral AI models

Moonlight

moonlight.md

Moonlight model family

Nemotron-3

nemotron3.md

NVIDIA Nemotron-3 models

Nemotron-H

nemotronh.md

NVIDIA Nemotron-H models

OLMoE

olmoe.md

OLMoE (Open Language Model - Mixture of Experts)

Qwen

qwen.md

Alibaba Cloud Qwen model family

Quick Navigation#

I want to#

🔍 Find a specific model → Browse the model list above or use the index page

🔄 Convert models between formats → Each model page includes conversion examples for Hugging Face ↔ Megatron Bridge

🚀 Get started with training → See Training Documentation for training guides

📚 Understand model architecture → Each model page documents architecture-specific features and configurations

🔧 Add support for a new model → Refer to Adding New Models

Model Documentation Structure#

Each model documentation page typically includes:

  1. Model Overview - Architecture and key features

  2. Available Variants - Supported model sizes and configurations

  3. Conversion Examples - Converting between Hugging Face and Megatron formats

  4. Training Recipes - Links to training configurations and examples

  5. Architecture Details - Model-specific features and configurations


Ready to explore? Choose a model from the list above or return to the main documentation.