Skip to main content

Ctrl+K

Megatron Core

GitHub

Megatron Core

GitHub

Table of Contents

About Megatron Core

Overview
Release Notes

Get Started

Quick Start
Megatron Core Installation

Basic Usage

Data Preparation
Training Examples
Parallelism Strategies Guide

Supported Models

Supported Models

Advanced Features

Mixture of Experts
- Multi-Token Prediction (MTP)
- Multi-Latent Attention
context_parallel package
Megatron FSDP
Distributed Optimizer
Optimizer CPU Offload
Custom Pipeline Model Parallel Layout
Fine-grained Activation Offloading (collaborated with rednote)
Megatron Energon
Megatron RL
Tokenizers

Developer Guide

Contributing to Megatron-LM
How to Submit a PR
Oncall Overview
Generating Docs Locally

Discussions

Discussions

API Reference

API Guide
API Reference
- core

API Reference
core
core.transformer
core.transformer.moe

`core.transformer.moe`#

Submodules#

core.transformer.moe.grouped_gemm_util
core.transformer.moe.moe_utils
core.transformer.moe.router_replay
core.transformer.moe.fused_a2a
core.transformer.moe.shared_experts
core.transformer.moe.moe_layer
core.transformer.moe.upcycling_utils
core.transformer.moe.token_dispatcher
core.transformer.moe.router
core.transformer.moe.experts

previous

core.transformer

next

core.transformer.moe.grouped_gemm_util

On this page

Submodules

Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.