cuSPARSELt: A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication¶
NVIDIA cuSPARSELt is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix:
where refers to in-place operations such as transpose/non-transpose, and
are scalars.
The cuSPARSELt APIs allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.
Download: developer.nvidia.com/cusparselt/downloads
Provide Feedback: Math-Libs-Feedback@nvidia.com
Examples: cuSPARSELt Example 1, cuSPARSELt Example 2
Blog post: Exploiting NVIDIA Ampere Structured Sparsity with cuSPARSELt
Key Features¶
NVIDIA Sparse MMA tensor core support
Mixed-precision computation support:
FP16
input/output,FP32
Tensor Core accumulateBFLOAT16
input/output,FP32
Tensor Core accumulateINT8
input/output,INT32
Tensor Core computeFP32
input/output,TF32
Tensor Core computeTF32
input/output,TF32
Tensor Core compute
Matrix pruning and compression functionalities
Activation functions, bias vector, and output scaling
Batched computation (multiple matrices in a single run)
GEMM Split-K mode
Auto-tuning functionality (see cusparseLtMatmulSearch())
NVTX ranging and Logging functionalities
Support¶
Supported SM Architectures:
SM 8.0
,SM 8.6
Supported OSes:
Linux
,Windows
Supported CPU Architectures:
x86_64
,Arm64
Prerequisites¶
CUDA 11.2 toolkit (or above) and compatible driver (see CUDA Driver Release Notes).
Dependencies:
cudart
,cusparse.h
header