Quick Start Guide#
The CUTLASS DSL 4.4 release currently supports Linux and Python 3.10 - 3.13 only. To install CUTLASS DSLs (limited to CuTe DSL for now), use the following command
Installation#
Before installing the latest version, you need to uninstall any previous CUTLASS DSL Installation.
pip uninstall nvidia-cutlass-dsl nvidia-cutlass-dsl-libs-base nvidia-cutlass-dsl-libs-cu13 -y
To ensure compatibility with the examples and code on GitHub, use the setup.sh file from the corresponding commit in the repository.
git clone https://github.com/NVIDIA/cutlass.git
# For CUDA Toolkit 12.9:
./cutlass/python/CuTeDSL/setup.sh --cu12
# For CUDA Toolkit 13.1:
./cutlass/python/CuTeDSL/setup.sh --cu13
If you just want to try out the last known stable release of the CUTLASS DSL (may not be compatible with the latest examples and code), run:
# For CUDA Toolkit 12.9:
pip install nvidia-cutlass-dsl
# For CUDA Toolkit 13.1:
pip install nvidia-cutlass-dsl[cu13]
The nvidia-cutlass-dsl wheel includes everything needed to generate GPU kernels. It requires
the same NVIDIA driver version as the corresponding CUDA Toolkit
(CUDA Toolkit 12.9 or CUDA Toolkit 13.1).
Recommended Dependencies#
To run examples and begin development, we recommend installing:
pip install torch jupyter
We recommend installing JAX with CUDA support at version 0.8.1 to run JAX examples.
Recommended Python environment variables for jupyter notebooks#
We recommend setting the following environment variable when running jupyter notebooks.
export PYTHONUNBUFFERED=1