################################################################################### cuSPARSELt: A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication ################################################################################### **NVIDIA cuSPARSELt** is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix: .. math:: D = Activation(\alpha op(A) \cdot op(B) + \beta op(C) + bias) \cdot scale where :math:`op(A)/op(B)` refers to in-place operations such as transpose/non-transpose, and :math:`alpha, beta, scale` are scalars. The *cuSPARSELt APIs* allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types. **Download:** `developer.nvidia.com/cusparselt/downloads `_ **Provide Feedback:** `Math-Libs-Feedback@nvidia.com `_ **Examples**: `cuSPARSELt Example 1 `_, `cuSPARSELt Example 2 `_ **Blog post**: - `Exploiting NVIDIA Ampere Structured Sparsity with cuSPARSELt `_ - `Structured Sparsity in the NVIDIA Ampere Architecture and Applications in Search Engines `__ - `Making the Most of Structured Sparsity in the NVIDIA Ampere Architecture `__ ================================================================================ Key Features ================================================================================ * *NVIDIA Sparse MMA tensor core* support * Mixed-precision computation support: +--------------+----------------+-----------------+-------------+ | Input A/B | Input C | Output D | Compute | +==============+================+=================+=============+ | `FP32` | `FP32` | `FP32` | `FP32` | +--------------+----------------+-----------------+-------------+ | `FP16` | `FP16` | `FP16` | `FP32` | + + + +-------------+ | | | | `FP16` | +--------------+----------------+-----------------+-------------+ | `BF16` | `BF16` | `BF16` | `FP32` | +--------------+----------------+-----------------+-------------+ | `INT8` | `INT8` | `INT8` | `INT32` | + +----------------+-----------------+ + | | `INT32` | `INT32` | | + +----------------+-----------------+ + | | `FP16` | `FP16` | | + +----------------+-----------------+ + | | `BF16` | `BF16` | | +--------------+----------------+-----------------+-------------+ | `E4M3` | `FP16` | `E4M3` | `FP32` | + +----------------+-----------------+ + | | `BF16` | `E4M3` | | + +----------------+-----------------+ + | | `FP16` | `FP16` | | + +----------------+-----------------+ + | | `BF16` | `BF16` | | + +----------------+-----------------+ + | | `FP32` | `FP32` | | +--------------+----------------+-----------------+-------------+ | `E5M2` | `FP16` | `E5M2` | `FP32` | + +----------------+-----------------+ + | | `BF16` | `E5M2` | | + +----------------+-----------------+ + | | `FP16` | `FP16` | | + +----------------+-----------------+ + | | `BF16` | `BF16` | | + +----------------+-----------------+ + | | `FP32` | `FP32` | | +--------------+----------------+-----------------+-------------+ * Matrix pruning and compression functionalities * Activation functions, bias vector, and output scaling * Batched computation (multiple matrices in a single run) * GEMM Split-K mode * Auto-tuning functionality (see :ref:`cusparseLtMatmulSearch() `) * NVTX ranging and Logging functionalities ================================================================================ Support ================================================================================ * *Supported SM Architectures*: `SM 8.0`, `SM 8.6`, `SM 8.9`, `SM 9.0` * *Supported CPU architectures and operating systems*: +------------+--------------------+ | OS | CPU archs | +============+====================+ | `Windows` | `x86_64` | +------------+--------------------+ | `Linux` | `x86_64`, `Arm64` | +------------+--------------------+ ================================================================================ Index ================================================================================ .. toctree:: :maxdepth: 2 release_notes getting_started types functions logging license