###################################################################################
cuSPARSELt: A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication
###################################################################################

**NVIDIA cuSPARSELt** is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix:

.. math::

   D = Activation(\alpha op(A) \cdot op(B) + \beta op(C) + bias) \cdot scale

where :math:`op(A)/op(B)` refers to in-place operations such as transpose/non-transpose, and :math:`alpha, beta, scale` are scalars.

The *cuSPARSELt APIs* allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.

**Download:** `developer.nvidia.com/cusparselt/downloads <https://developer.nvidia.com/cusparselt/downloads>`_

**Provide Feedback:** `Math-Libs-Feedback@nvidia.com <mailto:Math-Libs-Feedback@nvidia.com?subject=cuSPARSELt-Feedback>`_

**Examples**:
`cuSPARSELt Example 1 <https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuSPARSELt/spmma>`_,
`cuSPARSELt Example 2 <https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuSPARSELt/spmma2>`_

**Blog post**:
`Exploiting NVIDIA Ampere Structured Sparsity with cuSPARSELt <https://developer.nvidia.com/blog/exploiting-ampere-structured-sparsity-with-cusparselt/>`_

================================================================================
Key Features
================================================================================

* *NVIDIA Sparse MMA tensor core* support
* Mixed-precision computation support:

    * `FP16` input/output, `FP32` Tensor Core accumulate
    * `BFLOAT16` input/output, `FP32` Tensor Core accumulate
    * `INT8` input/output, `INT32` Tensor Core compute
    * `INT8` input, `FP16` output, `INT32` Tensor Core compute
    * `FP32` input/output, `TF32` Tensor Core compute
    * `TF32` input/output, `TF32` Tensor Core compute
* Matrix pruning and compression functionalities
* Activation functions, bias vector, and output scaling
* Batched computation (multiple matrices in a single run)
* GEMM Split-K mode
* Auto-tuning functionality (see :ref:`cusparseLtMatmulSearch() <cusparseLtMatmulSearch-label>`)
* NVTX ranging and Logging functionalities

================================================================================
Support
================================================================================

* *Supported SM Architectures*: `SM 8.0`, `SM 8.6`, `SM 8.9`
* *Supported OSes*: `Linux`, `Windows`
* *Supported CPU Architectures*: `x86_64`, `Arm64`

================================================================================
Index
================================================================================

.. toctree::
   :maxdepth: 2

   release_notes
   getting_started
   types
   functions
   logging
   license