Installation#

System Requirements#

Hardware#

Recommended: NVIDIA Turing architecture or later
FP8 Support: Requires NVIDIA Hopper, Ada, or Blackwell GPUs

Software#

Python: >= 3.10 (3.12 recommended)
PyTorch: >= 2.6.0
CUDA Toolkit: Latest stable version

Prerequisites#

Install uv, a fast Python package installer:

curl -LsSf https://astral.sh/uv/install.sh | sh

Option A: Pip Install (Recommended)#

Install the latest stable release from PyPI:

uv pip install megatron-core

To include optional training dependencies (Weights & Biases, SentencePiece, HF Transformers):

uv pip install "megatron-core[training]"

For all extras including Transformer Engine:

uv pip install --group build
uv pip install --no-build-isolation "megatron-core[training,dev]"

Note

--no-build-isolation requires build dependencies to be pre-installed in the environment. torch is needed because several [dev] packages (mamba-ssm, nv-grouped-gemm, transformer-engine) import it at build time to compile CUDA kernels. Expect this step to take 20+ minutes depending on your hardware. If you prefer pre-built binaries, the NGC Container ships with these pre-compiled.

Warning

Building from source can consume a large amount of memory. By default the build runs one compiler job per CPU core, which may cause out-of-memory failures on machines with many cores. To limit parallel compilation jobs, set the MAX_JOBS environment variable before installing (e.g. MAX_JOBS=4).

Tip

For a lighter set of development dependencies without Transformer Engine and ModelOpt, use [lts] instead of [dev]: uv pip install --no-build-isolation "megatron-core[training,lts]". The [lts] and [dev] extras are mutually exclusive.

To clone the repository for examples:

git clone https://github.com/NVIDIA/Megatron-LM.git

Option B: Install from Source#

For development or to run the latest unreleased code:

git clone https://github.com/NVIDIA/Megatron-LM.git
cd Megatron-LM
uv pip install -e .

To install with all development dependencies (includes Transformer Engine, requires pre-installed build deps):

uv pip install --group build
uv pip install --no-build-isolation -e ".[training,dev]"

Tip

If the build runs out of memory, limit parallel compilation jobs with MAX_JOBS=4 uv pip install --no-build-isolation -e ".[training,dev]".

Option C: NGC Container#

For a pre-configured environment with all dependencies pre-installed (PyTorch, CUDA, cuDNN, NCCL, Transformer Engine), use the PyTorch NGC Container.

We recommend using the previous month’s NGC container rather than the latest one to ensure compatibility with the current Megatron Core release and testing matrix.

docker run --gpus all -it --rm \
  -v /path/to/dataset:/workspace/dataset \
  -v /path/to/checkpoints:/workspace/checkpoints \
  -e PIP_CONSTRAINT= \
  nvcr.io/nvidia/pytorch:26.01-py3

Note

The NGC PyTorch container constrains the Python environment globally via PIP_CONSTRAINT. The -e PIP_CONSTRAINT= flag above unsets this so that Megatron Core and its dependencies install correctly.

Then install Megatron Core inside the container (torch is already available in the NGC image):

pip install uv
uv pip install --no-build-isolation "megatron-core[training,dev]"

You are now ready to run training. See Your First Training Run for next steps.