*********************************************************************************** cuSPARSELt: A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication *********************************************************************************** **NVIDIA cuSPARSELt** is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix: .. math:: D = \alpha op(A) * op(B) + \beta op(C) where :math:`op(A)/op(B)` refers to in-place operations such as transpose/non-transpose The *cuSPARSELt APIs* allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types. **Download:** `developer.nvidia.com/cusparselt/downloads `_ ============ Key Features ============ * *NVIDIA Sparse MMA tensor core* support * Mixed-precision support: * `FP16` inputs/output, `FP32` Tensor Core accumulate * `BFLOAT16` inputs/output, `BFLOAT32` Tensor Core accumulate * `INT8` inputs/output, `INT32` Tensor Core compute * Memory Layouts: row-major, column-major * Matrix pruning and compression functionalities * Auto-tuning functionality (see :ref:`cusparseLtMatmulSearch() `) ======= Support ======= * *Supported SM Architectures*: `SM 8.0` * *Supported OSes*: `Linux` * *Supported CPU Architectures*: `x86_64` *Other platforms will be added in the future releases*. ============= Prerequisites ============= * `CUDA 11.0 toolkit `_ and compatible driver (see `CUDA Driver Release Notes `_). * *Dependencies*: `cudart`, `cusparse.h` header ===== Index ===== .. toctree:: :maxdepth: 2 getting_started types functions license