cuSPARSELt: A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication¶

NVIDIA cuSPARSELt is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix:

$D = \alpha op(A) * op(B) + \beta op(C)$

where $op(A)/op(B)$ refers to in-place operations such as transpose/non-transpose

The cuSPARSELt APIs allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.

Download: developer.nvidia.com/cusparselt/downloads

Key Features¶

NVIDIA Ampere architecture Sparse MMA tensor core support
Mixed-precision support:
- FP16 inputs/output, FP32 Tensor Core accumulate
- BFLOAT16 inputs/output, BFLOAT32 Tensor Core accumulate
- INT8 inputs/output, INT32 Tensor Core compute
Memory Layouts: row-major, column-major
Matrix pruning and compression functionalities
Auto-tuning functionality (see cusparseLtMatmulSearch())

Support¶

Supported SM Architectures: SM 8.0
Supported OSes: Linux
Supported CPU Architectures: x86_64

Other platforms will be added in the future releases.

Prerequisites¶

CUDA 11.0 toolkit and compatible driver (see CUDA Driver Release Notes).
Dependencies: cudart, cusparse.h header

cuSPARSELt: A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication¶

Key Features¶

Support¶

Prerequisites¶

Index¶