cuSPARSELt: A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication¶

NVIDIA cuSPARSELt is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix:

$D = \alpha op(A) * op(B) + \beta op(C)$

where $op(A)/op(B)$ refers to in-place operations such as transpose/non-transpose.

The cuSPARSELt APIs allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.

Download: developer.nvidia.com/cusparselt/downloads

Key Features¶

NVIDIA Sparse MMA tensor core support
Mixed-precision computation support:
- FP16 input/output, FP32 Tensor Core accumulate
- BFLOAT16 input/output, FP32 Tensor Core accumulate
- INT8 input/output, INT32 Tensor Core compute
- FP32 input/output, TF32 Tensor Core compute
- TF32 input/output, TF32 Tensor Core compute
Matrix pruning and compression functionalities
Auto-tuning functionality (see cusparseLtMatmulSearch())

Support¶

Supported SM Architectures: SM 8.0, SM 8.6
Supported OSes: Linux, Windows
Supported CPU Architectures: x86_64, Arm64

Prerequisites¶

CUDA 11.2 toolkit (or above) and compatible driver (see CUDA Driver Release Notes).
Dependencies: cudart, nvrtc, cusparse.h header

cuSPARSELt: A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication¶

Key Features¶

Support¶

Prerequisites¶

Index¶