cuSPARSELt: A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication¶
NVIDIA cuSPARSELt is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix:
where refers to in-place operations such as transpose/non-transpose
The cuSPARSELt APIs allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.
NVIDIA Sparse MMA tensor core support
FP32Tensor Core accumulate
BFLOAT32Tensor Core accumulate
INT32Tensor Core compute
Memory Layouts: row-major, column-major
Matrix pruning and compression functionalities
Auto-tuning functionality (see cusparseLtMatmulSearch())
Supported SM Architectures:
Supported CPU Architectures:
Other platforms will be added in the future releases.