Skip to main content
Ctrl+K
Megatron Bridge - Home Megatron Bridge - Home

Megatron Bridge

  • GitHub
Megatron Bridge - Home Megatron Bridge - Home

Megatron Bridge

  • GitHub

Table of Contents

Guides

  • Parallelisms Guide
  • Performance
  • Performance
  • Performance Tuning Guide
  • Using Recipes
  • NeMo 2.0 to Megatron Bridge Migration Guide
  • Megatron-LM to Megatron Bridge Guide

Bridge with 🤗 Hugging Face

  • Get Started with 🤗 Hugging Face Conversion
  • Megatron Bridge Conversion Technical Details

Supported Models

  • Large Language Models
    • DeepSeek V2
    • DeepSeek V3
    • Gemma 2
    • Gemma 3
    • GLM 4.5
    • GPT OSS
    • Llama 3
    • Llama Nemotron
    • Mistral
    • Moonlight
    • Nemotron 3 Nano
    • Nemotron 3 Super
    • Nemotron H and Nemotron Nano v2
    • OLMoE
    • Qwen
  • Vision Language Models
    • Gemma 3 VL (Vision-Language)
    • GLM-4.5V
    • Ministral 3
    • Nemotron Nano V2 VL
    • Qwen2.5-VL
    • Qwen3-VL
    • Qwen 3.5

Training and Customization

  • Configuration Overview
  • Training Entry Points
  • Training Loop Configuration
  • Optimizer and Scheduler Configuration
  • Logging and Monitoring
  • Profiling
  • Checkpointing
  • Megatron FSDP
  • Resiliency
  • Mixed Precision Training
  • CUDA Graphs
  • Hybrid / Hierarchical Context Parallel
  • Communication Overlap
  • Attention Optimizations
  • Activation Recomputation
  • CPU Offloading
  • Parameter-Efficient Fine-Tuning (PEFT)
  • Packed Sequences
  • Multi-Token Prediction (MTP)
  • Knowledge Distillation
  • Pruning
  • Callbacks

Model Optimization with ModelOpt

  • Quantization

Development

  • Contribute a New Model to Megatron Bridge
  • Adapting Megatron Bridge in Reinforcement Learning Frameworks
  • Documentation Development
  • API Reference
    • bridge
      • bridge.peft
        • bridge.peft.walk_utils
        • bridge.peft.utils
        • bridge.peft.canonical_lora
        • bridge.peft.module_matcher
        • bridge.peft.lora_layers
        • bridge.peft.lora
        • bridge.peft.dora_layers
        • bridge.peft.adapter_wrapper
        • bridge.peft.base
        • bridge.peft.dora
        • bridge.peft.recompute
      • bridge.diffusion
        • bridge.diffusion.common
        • bridge.diffusion.conversion
        • bridge.diffusion.base
        • bridge.diffusion.data
        • bridge.diffusion.models
        • bridge.diffusion.recipes
      • bridge.data
        • bridge.data.energon
        • bridge.data.hf_processors
        • bridge.data.builders
        • bridge.data.mimo
        • bridge.data.datasets
        • bridge.data.vlm_datasets
        • bridge.data.utils
        • bridge.data.iterator_utils
        • bridge.data.samplers
        • bridge.data.finetuning
        • bridge.data.loaders
      • bridge.models
        • bridge.models.common
        • bridge.models.llama_nemotron
        • bridge.models.qwen_audio
        • bridge.models.mistral
        • bridge.models.conversion
        • bridge.models.glm
        • bridge.models.hf_pretrained
        • bridge.models.olmoe
        • bridge.models.nemotron_vl
        • bridge.models.glm_vl
        • bridge.models.mamba
        • bridge.models.kimi
        • bridge.models.gemma
        • bridge.models.decorators
        • bridge.models.sarvam
        • bridge.models.qwen
        • bridge.models.kimi_vl
        • bridge.models.gemma_vl
        • bridge.models.gpt
        • bridge.models.qwen_omni
        • bridge.models.nemotronh
        • bridge.models.nemotron
        • bridge.models.mimo
        • bridge.models.gpt_oss
        • bridge.models.ministral3
        • bridge.models.minimax_m2
        • bridge.models.llama
        • bridge.models.deepseek
        • bridge.models.qwen_vl
        • bridge.models.transformer_config
        • bridge.models.gpt_full_te_layer_autocast_spec
        • bridge.models.gpt_provider
        • bridge.models.distillation_provider
        • bridge.models.model_provider
        • bridge.models.t5_provider
        • bridge.models.mla_provider
        • bridge.models.config
      • bridge.utils
        • bridge.utils.common_utils
        • bridge.utils.vocab_utils
        • bridge.utils.instantiate_utils
        • bridge.utils.slurm_utils
        • bridge.utils.decorators
        • bridge.utils.activation_map
        • bridge.utils.import_utils
        • bridge.utils.yaml_utils
        • bridge.utils.fusions
      • bridge.recipes
        • bridge.recipes.glm
        • bridge.recipes.olmoe
        • bridge.recipes.nemotron_vl
        • bridge.recipes.glm_vl
        • bridge.recipes.gemma
        • bridge.recipes.qwen
        • bridge.recipes.kimi_vl
        • bridge.recipes.gpt
        • bridge.recipes.nemotronh
        • bridge.recipes.utils
        • bridge.recipes.gemma3_vl
        • bridge.recipes.gpt_oss
        • bridge.recipes.moonlight
        • bridge.recipes.ministral3
        • bridge.recipes.llama
        • bridge.recipes.deepseek
        • bridge.recipes.qwen_vl
        • bridge.recipes.run_plugins
        • bridge.recipes.common
      • bridge.training
        • bridge.training.mlm_compat
        • bridge.training.utils
        • bridge.training.post_training
        • bridge.training.tokenizers
        • bridge.training.setup
        • bridge.training.tensor_inspect
        • bridge.training.vlm_step
        • bridge.training.losses
        • bridge.training.distill
        • bridge.training.inprocess_restart
        • bridge.training.state
        • bridge.training.optim
        • bridge.training.model_load_save
        • bridge.training.profiling
        • bridge.training.comm_overlap
        • bridge.training.eval
        • bridge.training.fault_tolerance
        • bridge.training.train
        • bridge.training.finetune
        • bridge.training.nvrx_straggler
        • bridge.training.pretrain
        • bridge.training.callbacks
        • bridge.training.mixed_precision
        • bridge.training.checkpointing
        • bridge.training.config
        • bridge.training.llava_step
        • bridge.training.gpt_step
        • bridge.training.flex_dispatcher_backend
        • bridge.training.forward_step_func_types
        • bridge.training.initialize
      • bridge.package_info

Releases

  • Release Developer Guide
  • Software Component Versions
  • Changelog
  • Known Issues

Agent Skills

  • Agent Skills Reference
    • Developer Guide
    • MLM vs Bridge Training
    • Adding New Model Support in Megatron-Bridge
    • LLM Bridge Patterns
    • VLM Bridge Patterns
    • Recipe Patterns
    • Test and Example Patterns
    • Parallelism Strategy Selection Skill
    • CUDA Graphs
    • TP / DP / PP Communication Overlap Skill
    • Megatron FSDP Skill
    • Packed Sequences & Long-Context Training
    • Sequence Packing Skill
    • Hybrid / Hierarchical Context Parallel Skill
    • MoE Expert-Parallel Overlap Skill
    • MoE Communication Overlap
    • Multi-Node Slurm
    • Resiliency

Directory Readme Files

  • Megatron Bridge Documentation
  • Supported Models
  • Large Language Models
  • Vision Language Models
  • Releases
  • Training and Customization
  • Model Optimization
  • API Reference
  • bridge
  • bridge.diffusion
  • bridge.diffusion.recipes

bridge.diffusion.recipes#

Subpackages#

  • bridge.diffusion.recipes.wan
    • bridge.diffusion.recipes.wan.wan
  • bridge.diffusion.recipes.flux
    • bridge.diffusion.recipes.flux.flux

previous

bridge.diffusion.models.flux.layers

next

bridge.diffusion.recipes.wan

On this page
  • Subpackages
NVIDIA NVIDIA
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.