cuSPARSELt: A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication¶

NVIDIA cuSPARSELt is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix:

$D = Activation(\alpha op(A) \cdot op(B) + \beta op(C) + bias) \cdot scale$

where $op(A)/op(B)$ refers to in-place operations such as transpose/non-transpose, and $alpha, beta, scale$ are scalars.

The cuSPARSELt APIs allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.

Download: developer.nvidia.com/cusparselt-downloads

Provide Feedback: Math-Libs-Feedback@nvidia.com

Examples: cuSPARSELt Example 1, cuSPARSELt Example 2

Blog post: Exploiting NVIDIA Ampere Structured Sparsity with cuSPARSELt

Key Features¶

NVIDIA Sparse MMA tensor core support
Mixed-precision computation support:
- FP16 input/output, FP32 Tensor Core accumulate
- BFLOAT16 input/output, FP32 Tensor Core accumulate
- INT8 input/output, INT32 Tensor Core compute
- INT8 input, FP16 output, INT32 Tensor Core compute
- FP32 input/output, TF32 Tensor Core compute
- TF32 input/output, TF32 Tensor Core compute
Matrix pruning and compression functionalities
Activation functions, bias vector, and output scaling
Batched computation (multiple matrices in a single run)
GEMM Split-K mode
Auto-tuning functionality (see cusparseLtMatmulSearch())
NVTX ranging and Logging functionalities

Support¶

Supported SM Architectures: SM 8.0, SM 8.6, SM 8.9
Supported OSes: Linux, Windows
Supported CPU Architectures: x86_64, Arm64

cuSPARSELt: A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication¶

Key Features¶

Support¶

Index¶