Megatron Bridge Documentation#

Welcome to the Megatron Bridge documentation! This guide helps you navigate our comprehensive documentation to find exactly what you need for training, converting, and working with large language models and vision language models.

🚀 Quick Start Paths#

I want to#

🏃‍♂️ Get started with model conversion → Start with Bridge Guide for Hugging Face ↔ Megatron conversion

⚡ Understand parallelisms and performance → Jump to Parallelisms Guide and Performance Guide

🚀 Start training a model → See Training Documentation for comprehensive training guides

📚 Find model documentation → Browse Supported Models for LLMs or Vision Language Models for VLMs

🔧 Migrate from NeMo 2 or Megatron-LM → Check NeMo 2 Migration Guide or Megatron-LM Migration Guide

📊 Use training recipes → Read Recipe Usage for pre-configured training recipes

🔌 Add support for a new model → Refer to Adding New Models

📋 Check version information → See Releases Documentation for versions, changelog, and known issues

👥 Documentation by Role#

📚 Complete Documentation Index#

Getting Started#

Document	Purpose	When to Read
Bridge Guide	Hugging Face ↔ Megatron conversion guide	First time converting models
Bridge Tech Details	Technical details of the bridge system	Understanding bridge internals
Parallelisms Guide	Data and model parallelism strategies	Setting up distributed training
Performance Summary	Quick performance reference	Quick performance lookup
Performance Guide	Comprehensive performance optimization	Optimizing training performance

Model Support#

Document	Purpose	When to Read
Large Language Models	LLM model documentation	Working with LLM models
Vision Language Models	VLM model documentation	Working with VLM models
Adding New Models	Guide for adding model support	Extending model support

Training and Customization#

Document	Purpose	When to Read
Training Documentation	Comprehensive training guides	Setting up and customizing training
Configuration Container Overview	Central training configuration	Understanding training configuration
Entry Points	Training entry points and execution	Understanding training flow
Training Loop Settings	Training loop parameters	Configuring training parameters
Optimizer & Scheduler	Optimization configuration	Setting up optimizers
Mixed Precision	Mixed precision training	Reducing memory usage
PEFT	Parameter-efficient fine-tuning	Fine-tuning with limited resources
Checkpointing	Checkpoint management	Saving and resuming training
Logging	Logging and monitoring	Monitoring training progress
Profiling	Performance profiling	Identifying bottlenecks

Recipes and Workflows#

Document	Purpose	When to Read
Recipe Usage	Using pre-configured training recipes	Quick training setup
Bridge RL Integration	Reinforcement learning integration	RL training workflows

Migration Guides#

Document	Purpose	When to Read
NeMo 2 Migration Guide	Migrating from NeMo 2	Upgrading from NeMo 2
Megatron-LM Migration Guide	Migrating from Megatron-LM	Upgrading from Megatron-LM

Reference#

Document	Purpose	When to Read
API Documentation	Complete API reference	Building integrations
Releases Documentation	Version history and known issues	Checking versions, troubleshooting
Documentation Guide	Contributing to documentation	Contributing docs

🗺️ Common Reading Paths#

🆕 First-Time Users#

Bridge Guide (10 min - understand conversion)
Parallelisms Guide (15 min - understand distributed training)
Training Documentation (choose your training path)
Recipe Usage (5 min - use pre-configured recipes)

🔧 Setting Up Training#

Training Documentation (overview of training system)
Configuration Container Overview (understand configuration)
Entry Points (how training starts)
Training Loop Settings (configure parameters)
Logging (set up monitoring)

⚡ Performance Optimization#

Performance Guide (comprehensive optimization strategies)
Performance Summary (quick reference)
Mixed Precision (reduce memory usage)
Communication Overlap (optimize distributed training)
Activation Recomputation (reduce memory footprint)
Profiling (identify bottlenecks)

🔄 Model Conversion Workflow#

Bridge Guide (conversion basics)
Bridge Tech Details (technical details)
Supported Models or Vision Language Models (model-specific guides)
Adding New Models (extend support)

🔧 Customization and Extension#

Training Documentation (training customization)
PEFT (parameter-efficient fine-tuning)
Distillation (knowledge distillation)
Adding New Models (add model support)
Bridge RL Integration (RL workflows)

📦 Migration Paths#

NeMo 2 Migration Guide (from NeMo 2)
Megatron-LM Migration Guide (from Megatron-LM)
Training Documentation (new training system)

📁 Directory Structure#

Main Documentation#

Guides - Core guides for parallelisms, performance, recipes, and migration
Bridge Documentation - Hugging Face ↔ Megatron conversion guides
Model Documentation - Supported model families and architectures

Subdirectories#

models/#

llm/ - Large Language Model documentation
- Individual model guides (Qwen, LLaMA, Mistral, etc.)
- Conversion examples and training recipes
vlm/ - Vision Language Model documentation
- VLM model guides (Qwen VL, Gemma VL, etc.)
- Multimodal model support

training/#

Configuration - ConfigContainer, entry points, training loop settings
Optimization - Optimizer, scheduler, mixed precision, communication overlap
Performance - Attention optimizations, activation recomputation, CPU offloading
Monitoring - Logging, profiling, checkpointing, resiliency
Advanced - PEFT, packed sequences, distillation

releases/#

Software Versions - Current versions and dependencies
Changelog - Release history and changes
Known Issues - Bugs, limitations, and workarounds

🔗 How Documents Connect#

        graph TD
    A[README.md<br/>Start Here] --> B[Bridge Guide<br/>Model Conversion]
    A --> C[Training Docs<br/>Training Setup]
    A --> D[Models<br/>Model Support]
    
    B --> E[Bridge Tech Details<br/>Technical Deep Dive]
    B --> F[Supported Models<br/>Model-Specific Guides]
    
    C --> G[Config Container<br/>Configuration]
    C --> H[Performance Guide<br/>Optimization]
    C --> I[Parallelisms<br/>Distributed Training]
    
    G --> J[Training Loop<br/>Training Parameters]
    G --> K[Optimizer & Scheduler<br/>Optimization Setup]
    
    H --> L[Mixed Precision<br/>Memory Efficiency]
    H --> M[Communication Overlap<br/>Performance]
    
    I --> N[Data Parallelism<br/>DDP]
    I --> O[Model Parallelism<br/>TP/PP/VPP]
    
    D --> P[LLM Models<br/>Language Models]
    D --> Q[VLM Models<br/>Vision Language Models]
    
    style A fill:#e1f5fe
    style B fill:#f3e5f5
    style C fill:#e8f5e8
    style D fill:#fff3e0
    style H fill:#fce4ec
    style I fill:#e0f2f1

🤝 Getting Help#

GitHub Issues: Report bugs or request features
Documentation Issues: Found something unclear? Let us know!
Community: Join discussions and share experiences

📖 Additional Resources#

Examples - Code examples and tutorials
Contributing Guide - How to contribute to the project
API Documentation - Complete API reference

Ready to get started? Choose your path above or dive into the Bridge Guide for model conversion! 🚀