Installation
Prerequisites
Linux x86_64
NVIDIA Driver supporting CUDA 12.1 or later.
cuDNN 9.3 or later.
If the CUDA Toolkit headers are not available at runtime in a standard installation path, e.g. within CUDA_HOME, set NVTE_CUDA_INCLUDE_PATH in the environment.
Transformer Engine in NGC Containers
Transformer Engine library is preinstalled in the PyTorch container in versions 22.09 and later on NVIDIA GPU Cloud.
pip - from PyPI
Transformer Engine can be directly installed from our PyPI, e.g.
pip3 install --no-build-isolation transformer_engine[pytorch]
To obtain the necessary Python bindings for Transformer Engine, the frameworks needed must be explicitly specified as extra dependencies in a comma-separated list (e.g. [jax,pytorch]). Transformer Engine ships wheels for the core library. Source distributions are shipped for the JAX and PyTorch extensions.
pip - from GitHub
Additional Prerequisites
Installation (stable release)
Execute the following command to install the latest stable version of Transformer Engine:
pip3 install --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@stable
This will automatically detect if any supported deep learning frameworks are installed and build Transformer Engine support for them. To explicitly specify frameworks, set the environment variable NVTE_FRAMEWORK to a comma-separated list (e.g. NVTE_FRAMEWORK=jax,pytorch).
Installation (development build)
Warning
While the development build of Transformer Engine could contain new features not available in the official build yet, it is not supported and so its usage is not recommended for general use.
Execute the following command to install the latest development build of Transformer Engine:
pip3 install --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@main
This will automatically detect if any supported deep learning frameworks are installed and build Transformer Engine support for them. To explicitly specify frameworks, set the environment variable NVTE_FRAMEWORK to a comma-separated list (e.g. NVTE_FRAMEWORK=jax,pytorch). To only build the framework-agnostic C++ API, set NVTE_FRAMEWORK=none.
In order to install a specific PR, execute (after changing NNN to the PR number):
pip3 install --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@refs/pull/NNN/merge
Installation (from source)
Execute the following commands to install Transformer Engine from source:
# Clone repository, checkout stable branch, clone submodules
git clone --branch stable --recursive https://github.com/NVIDIA/TransformerEngine.git
cd TransformerEngine
export NVTE_FRAMEWORK=pytorch # Optionally set framework
pip3 install --no-build-isolation . # Build and install
If the Git repository has already been cloned, make sure to also clone the submodules:
git submodule update --init --recursive
Extra dependencies for testing can be installed by setting the “test” option:
pip3 install --no-build-isolation .[test]
To build the C++ extensions with debug symbols, e.g. with the -g flag:
pip3 install --no-build-isolation . --global-option=--debug
Troubleshooting
Common Issues and Solutions:
ABI Compatibility Issues:
Symptoms:
ImportError
with undefined symbols when importing transformer_engineSolution: Ensure PyTorch and Transformer Engine are built with the same C++ ABI setting. Rebuild PyTorch from source with matching ABI.
Context: If you’re using PyTorch built with a different C++ ABI than your system’s default, you may encounter these undefined symbol errors. This is particularly common with pip-installed PyTorch outside of containers.
Missing Headers or Libraries:
Symptoms: CMake errors about missing headers (
cudnn.h
,cublas_v2.h
,filesystem
, etc.)Solution: Install missing development packages or set environment variables to point to correct locations:
export CUDA_PATH=/path/to/cuda export CUDNN_PATH=/path/to/cudnn
If CMake can’t find a C++ compiler, set the
CXX
environment variable.Ensure all paths are correctly set before installation.
Build Resource Issues:
Symptoms: Compilation hangs, system freezes, or out-of-memory errors
Solution: Limit parallel builds:
MAX_JOBS=1 NVTE_BUILD_THREADS_PER_JOB=1 pip install ...
Verbose Build Logging:
For detailed build logs to help diagnose issues:
cd transformer_engine pip install -v -v -v --no-build-isolation .