***********************************************************************************
cuSPARSELt: A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication
***********************************************************************************

**NVIDIA cuSPARSELt** is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix:

.. math::

   D = \alpha op(A) * op(B) + \beta op(C)

where :math:`op(A)/op(B)` refers to in-place operations such as transpose/non-transpose

The *cuSPARSELt APIs* allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.

**Download:** `developer.nvidia.com/cusparselt/downloads <https://developer.nvidia.com/cusparselt/downloads>`_

============
Key Features
============

* *NVIDIA Sparse MMA tensor core* support
* Mixed-precision support:

    * `FP16` inputs/output, `FP32` Tensor Core accumulate
    * `BFLOAT16` inputs/output, `BFLOAT32` Tensor Core accumulate
    * `INT8` inputs/output, `INT32` Tensor Core compute
* Memory Layouts: row-major, column-major
* Matrix pruning and compression functionalities
* Auto-tuning functionality (see :ref:`cusparseLtMatmulSearch() <cusparseLtMatmulSearch-label>`)

=======
Support
=======

* *Supported SM Architectures*: `SM 8.0`
* *Supported OSes*: `Linux`
* *Supported CPU Architectures*: `x86_64`

*Other platforms will be added in the future releases*.

=============
Prerequisites
=============

* `CUDA 11.0 toolkit <https://developer.nvidia.com/cuda-11.0-download-archive>`_ and compatible driver (see `CUDA Driver Release Notes <https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cuda-major-component-versions>`_).
* *Dependencies*: `cudart`, `cusparse.h` header

=====
Index
=====

.. toctree::
   :maxdepth: 2

   getting_started
   types
   functions
   license