cuSPARSELt: A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication

NVIDIA cuSPARSELt is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix:

D = \alpha op(A) * op(B) + \beta op(C)

where op(A)/op(B) refers to in-place operations such as transpose/non-transpose

The cuSPARSELt APIs allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.


Key Features

  • NVIDIA Ampere architecture Sparse MMA tensor core support

  • Mixed-precision support:

    • FP16 inputs/output, FP32 Tensor Core accumulate

    • BFLOAT16 inputs/output, BFLOAT32 Tensor Core accumulate

    • INT8 inputs/output, INT32 Tensor Core compute

  • Memory Layouts: row-major, column-major

  • Matrix pruning and compression functionalities

  • Auto-tuning functionality (see cusparseLtMatmulSearch())


  • Supported SM Architectures: SM 8.0

  • Supported OSes: Linux

  • Supported CPU Architectures: x86_64

Other platforms will be added in the future releases.