cuSPARSELt: A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication¶
NVIDIA cuSPARSELt is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix:
where refers to in-place operations such as transpose/non-transpose
The cuSPARSELt APIs allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.
Download: developer.nvidia.com/cusparselt/downloads
Key Features¶
NVIDIA Ampere architecture Sparse MMA tensor core support
-
Mixed-precision support:
FP16
inputs/output,FP32
Tensor Core accumulateBFLOAT16
inputs/output,BFLOAT32
Tensor Core accumulateINT8
inputs/output,INT32
Tensor Core compute
Memory Layouts: row-major, column-major
Matrix pruning and compression functionalities
Auto-tuning functionality (see cusparseLtMatmulSearch())
Support¶
Supported SM Architectures:
SM 8.0
Supported OSes:
Linux
Supported CPU Architectures:
x86_64
Other platforms will be added in the future releases.
Prerequisites¶
CUDA 11.0 toolkit and compatible driver (see CUDA Driver Release Notes).
Dependencies:
cudart
,cusparse.h
header