cuSPARSELt: A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication¶
NVIDIA cuSPARSELt is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix:
where refers to in-place operations such as transpose/non-transpose.
The cuSPARSELt APIs allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.
Download: developer.nvidia.com/cusparselt/downloads
Key Features¶
NVIDIA Sparse MMA tensor core support
-
Mixed-precision computation support:
FP16input/output,FP32Tensor Core accumulateBFLOAT16input/output,FP32Tensor Core accumulateINT8input/output,INT32Tensor Core computeFP32input/output,TF32Tensor Core computeTF32input/output,TF32Tensor Core compute
Matrix pruning and compression functionalities
Auto-tuning functionality (see cusparseLtMatmulSearch())
Support¶
Supported SM Architectures:
SM 8.0,SM 8.6Supported OSes:
Linux,WindowsSupported CPU Architectures:
x86_64,Arm64
Prerequisites¶
CUDA 11.2 toolkit (or above) and compatible driver (see CUDA Driver Release Notes).
Dependencies:
cudart,nvrtc,cusparse.hheader