################################################################################
cuSPARSELt Functions
################################################################################


.. role::  raw-html(raw)
    :format: html

.. |handle| replace:: ``handle``
.. _handle: types.html#cusparselthandle-t

.. |order| replace:: ``order``
.. _order: https://docs.nvidia.com/cuda/cusparse/index.html#cusparse-generic-enum-layout

.. |valueType| replace:: ``valueType``
.. _valueType: https://docs.nvidia.com/cuda/cusparse/index.html#cusparse-generic-enum-data-types

.. |sparsity| replace:: ``sparsity``
.. _sparsity: types.html#cusparseltsparsity-t

.. |computeType| replace:: ``computeType``
.. _computeType: types.html#cusparsecomputetype

.. |matDescr| replace:: ``matDescr``
.. _matDescr: types.html#cusparseltmatdescriptor-t

.. |sparseMatDescr| replace:: ``sparseMatDescr``
.. _sparseMatDescr: types.html#cusparseltmatdescriptor-t

.. |matmulDescr| replace:: ``matmulDescr``
.. _matMulDescr: types.html#cusparseltmatmuldescriptor-t

.. |matAttribute| replace:: ``matAttribute``
.. _matAttribute: types.html#cusparseltmatdescattribute-t

.. |matmulAttribute| replace:: ``matmulAttribute``
.. _matmulAttribute: types.html#cusparseltmatmuldescattribute-t

.. |opA| replace:: ``opA``
.. _opA: https://docs.nvidia.com/cuda/cusparse/index.html#cusparseOperation_t

.. |opB| replace:: ``opB``
.. _opB: https://docs.nvidia.com/cuda/cusparse/index.html#cusparseOperation_t

.. |matA| replace:: ``matA``
.. _matA: types.html#cusparseltmatdescriptor-t

.. |matB| replace:: ``matB``
.. _matB: types.html#cusparseltmatdescriptor-t

.. |matC| replace:: ``matC``
.. _matC: types.html#cusparseltmatdescriptor-t

.. |matD| replace:: ``matD``
.. _matD: types.html#cusparseltmatdescriptor-t

.. |algSelection| replace:: ``algSelection``
.. _algSelection: types.html#cusparseltmatmulalgselection-t

.. |alg| replace:: ``alg``
.. _alg: types.html#cusparseltmatmulalg-t

.. |attribute| replace:: ``attribute``
.. _attribute: types.html#cusparseltmatmulalgattribute-t

.. |plan| replace:: ``plan``
.. _plan: types.html#cusparseltmatmulplan-t

.. |pruneAlg| replace:: ``pruneAlg``
.. _pruneAlg: types.html#cusparseltprunealg-t

.. |CUSPARSE_COMPUTE_32I| replace:: ``CUSPARSE_COMPUTE_32I``
.. _CUSPARSE_COMPUTE_32I: types.html#cusparsecomputetype

.. |CUSPARSE_COMPUTE_32F| replace:: ``CUSPARSE_COMPUTE_32F``
.. _CUSPARSE_COMPUTE_32F: types.html#cusparsecomputetype

.. |CUSPARSE_COMPUTE_16F| replace:: ``CUSPARSE_COMPUTE_16F``
.. _CUSPARSE_COMPUTE_16F: types.html#cusparsecomputetype

.. |CUSPARSELT_MATMUL_SEARCH_ITERATIONS| replace:: ``CUSPARSELT_MATMUL_SEARCH_ITERATIONS``
.. _CUSPARSELT_MATMUL_SEARCH_ITERATIONS: types.html#cusparseltmatmulalgattribute-t

.. |CUSPARSELT_MATMUL_ALG_CONFIG_ID| replace:: ``CUSPARSELT_MATMUL_ALG_CONFIG_ID``
.. _CUSPARSELT_MATMUL_ALG_CONFIG_ID: types.html#cusparseltmatmulalgattribute-t


================================================================================
Library Management Functions
================================================================================

.. _cusparseLtInit-label:

--------------------------------------------------------------------------------
:code:`cusparseLtInit`
--------------------------------------------------------------------------------

.. code-block:: cpp

   cusparseStatus_t
   cusparseLtInit(cusparseLtHandle_t* handle)

| The function initializes the cuSPARSELt library handle (``cusparseLtHandle_t``) which holds the cuSPARSELt library context. It allocates light hardware resources on the host, and must be called prior to making any other cuSPARSELt library calls. Calling any cusparseLt function which uses ``cusparseLtHandle_t`` without a previous call of ``cusparseLtInit()`` will return an error.
| The cuSPARSELt library context is tied to the current CUDA device. To use the library on multiple devices, one cuSPARSELt handle should be created for each device.

+-----------+--------+--------+------------------------------------------------+
| Parameter | Memory | In/Out | Description                                    |
+===========+========+========+================================================+
| |handle|_ | Host   | OUT    | cuSPARSELt library handle                      |
+-----------+--------+--------+------------------------------------------------+

See `cusparseStatus_t <https://docs.nvidia.com/cuda/cusparse/index.html#cusparseStatus_t>`_ for the description of the return status.

----

.. _cusparseLtDestroy-label:

--------------------------------------------------------------------------------
:code:`cusparseLtDestroy`
--------------------------------------------------------------------------------

.. code-block:: cpp

   cusparseStatus_t
   cusparseLtDestroy(const cusparseLtHandle_t* handle)

| The function releases hardware resources used by the cuSPARSELt library. This function is the last call with a particular handle to the cuSPARSELt library.
| Calling any cusparseLt function which uses ``cusparseLtHandle_t`` after ``cusparseLtDestroy()`` will return an error.

+-----------+--------+--------+------------------------------------------------+
| Parameter | Memory | In/Out | Description                                    |
+===========+========+========+================================================+
| |handle|_ | Host   | IN     | cuSPARSELt library handle                      |
+-----------+--------+--------+------------------------------------------------+

See `cusparseStatus_t <https://docs.nvidia.com/cuda/cusparse/index.html#cusparseStatus_t>`_ for the description of the return status.

----

--------------------------------------------------------------------------------
:code:`cusparseLtGetVersion`
--------------------------------------------------------------------------------

.. code-block:: cpp

    cusparseStatus_t
    cusparseLtGetVersion(const cusparseLtHandle_t* handle,
                         int*                      version)

| This function returns the version number of the cuSPARSELt library.

+-----------+--------+--------+------------------------------------------------+
| Parameter | Memory | In/Out | Description                                    |
+===========+========+========+================================================+
| |handle|_ | Host   | IN     | cuSPARSELt library handle                      |
+-----------+--------+--------+------------------------------------------------+
| `version` | Host   | OUT    | The version number of the library              |
+-----------+--------+--------+------------------------------------------------+

See `cusparseStatus_t <https://docs.nvidia.com/cuda/cusparse/index.html#cusparseStatus_t>`_ for the description of the return status.

----

--------------------------------------------------------------------------------
:code:`cusparseLtGetProperty`
--------------------------------------------------------------------------------

.. code-block:: cpp

    cusparseStatus_t
    cusparseLtGetProperty(libraryPropertyType propertyType,
                          int*                value)

| The function returns the value of the requested property. Refer to `libraryPropertyType` for supported types.

+----------------+--------+--------+------------------------------------------------+
| Parameter      | Memory | In/Out | Description                                    |
+================+========+========+================================================+
| `propertyType` | Host   | IN     | Requested property                             |
+----------------+--------+--------+------------------------------------------------+
| `value`        | Host   | OUT    | Value of the requested property                |
+----------------+--------+--------+------------------------------------------------+

`libraryPropertyType` (defined in `library_types.h`):

+-----------------+------------------------------------------------------------+
| Value           | Meaning                                                    |
+=================+============================================================+
| `MAJOR_VERSION` | Enumerator to query the major version                      |
+-----------------+------------------------------------------------------------+
| `MINOR_VERSION` | Enumerator to query the minor version                      |
+-----------------+------------------------------------------------------------+
| `PATCH_LEVEL`   | Number to identify the patch level                         |
+-----------------+------------------------------------------------------------+

See `cusparseStatus_t <https://docs.nvidia.com/cuda/cusparse/index.html#cusparseStatus_t>`_ for the description of the return status.

----

================================================================================
Matrix Descriptor Functions
================================================================================

.. _cusparseLtDenseDescriptorInit-label:

--------------------------------------------------------------------------------
:code:`cusparseLtDenseDescriptorInit`
--------------------------------------------------------------------------------

.. code-block:: cpp

   cusparseStatus_t
   cusparseLtDenseDescriptorInit(const cusparseLtHandle_t*  handle,
                                 cusparseLtMatDescriptor_t* matDescr,
                                 int64_t                    rows,
                                 int64_t                    cols,
                                 int64_t                    ld,
                                 uint32_t                   alignment,
                                 cudaDataType               valueType,
                                 cusparseOrder_t            order)

The function initializes the descriptor of a *dense* matrix.

+--------------+--------+--------+----------------------------+-----------------------------------------------------------------------------------------------+
| Parameter    | Memory | In/Out | Description                | Possible Values                                                                               |
+==============+========+========+============================+===============================================================================================+
| |handle|_    | Host   | IN     | cuSPARSELt library handle  |                                                                                               |
+--------------+--------+--------+----------------------------+-----------------------------------------------------------------------------------------------+
| |matDescr|_  | Host   | OUT    | Dense matrix description   |                                                                                               |
+--------------+--------+--------+----------------------------+-----------------------------------------------------------------------------------------------+
| `rows`       | Host   | IN     | Number of rows             |                                                                                               |
+--------------+--------+--------+----------------------------+-----------------------------------------------------------------------------------------------+
| `cols`       | Host   | IN     | Number of columns          |                                                                                               |
+--------------+--------+--------+----------------------------+-----------------------------------------------------------------------------------------------+
| `ld`         | Host   | IN     | Leading dimension          | :raw-html:`&ge;` rows if column-major, :raw-html:`&ge;` cols if row-major                     |
+--------------+--------+--------+----------------------------+-----------------------------------------------------------------------------------------------+
| `alignment`  | Host   | IN     | Memory alignment in bytes  | Multiple of 16                                                                                |
+--------------+--------+--------+----------------------------+-----------------------------------------------------------------------------------------------+
| |valueType|_ | Host   | IN     | Data type of the matrix    | `CUDA_R_32F`, `CUDA_R_16F`, `CUDA_R_16BF`, `CUDA_R_8I`, `CUDA_R_8F_E4M3`, `CUDA_R_8F_E5M2`    |
+--------------+--------+--------+----------------------------+-----------------------------------------------------------------------------------------------+
| |order|_     | Host   | IN     | Memory layout              | `CUSPARSE_ORDER_COL`, `CUSPARSE_ORDER_ROW`                                                    |
+--------------+--------+--------+----------------------------+-----------------------------------------------------------------------------------------------+

**Constrains**:

    * `rows`, `cols`, and `ld` must be a multiple of

        - 16 if `valueType` is `CUDA_R_8I`, `CUDA_R_8F_E4M3` or `CUDA_R_8F_E5M2`
        - 8 if `valueType` is `CUDA_R_16F` or `CUDA_R_16BF`
        - 4 if `valueType` is `CUDA_R_32F`

    * The total size of the matrix cannot exceed:

        - :math:`2^{32}-1` elements for `CUDA_R_8I`, `CUDA_R_8F_E4M3` or `CUDA_R_8F_E5M2`
        - :math:`2^{31}-1` elements for `CUDA_R_16F` or `CUDA_R_16BF`
        - :math:`2^{30}-1` elements for `CUDA_R_32F`

See `cusparseStatus_t <https://docs.nvidia.com/cuda/cusparse/index.html#cusparseStatus_t>`_ for the description of the return status.

----

.. _cusparseLtStructuredDescriptorInit-label:

--------------------------------------------------------------------------------
:code:`cusparseLtStructuredDescriptorInit`
--------------------------------------------------------------------------------

.. code-block:: cpp

   cusparseStatus_t
   cusparseLtStructuredDescriptorInit(const cusparseLtHandle_t*  handle,
                                      cusparseLtMatDescriptor_t* matDescr,
                                      int64_t                    rows,
                                      int64_t                    cols,
                                      int64_t                    ld,
                                      uint32_t                   alignment,
                                      cudaDataType               valueType,
                                      cusparseOrder_t            order,
                                      cusparseLtSparsity_t       sparsity)

The function initializes the descriptor of a *structured* matrix.

+--------------+--------+--------+----------------------------+---------------------------------------------------------------------------------------------+
| Parameter    | Memory | In/Out | Description                | Possible Values                                                                             |
+==============+========+========+============================+=============================================================================================+
| |handle|_    | Host   | IN     | cuSPARSELt library handle  |                                                                                             |
+--------------+--------+--------+----------------------------+---------------------------------------------------------------------------------------------+
| |matDescr|_  | Host   | OUT    | Sparse matrix description  |                                                                                             |
+--------------+--------+--------+----------------------------+---------------------------------------------------------------------------------------------+
| `rows`       | Host   | IN     | Number of rows             |                                                                                             |
+--------------+--------+--------+----------------------------+---------------------------------------------------------------------------------------------+
| `cols`       | Host   | IN     | Number of columns          |                                                                                             |
+--------------+--------+--------+----------------------------+---------------------------------------------------------------------------------------------+
| `ld`         | Host   | IN     | Leading dimension          | :raw-html:`&ge;` rows if column-major, :raw-html:`&ge;` cols if row-major                   |
+--------------+--------+--------+----------------------------+---------------------------------------------------------------------------------------------+
| `alignment`  | Host   | IN     | Memory alignment in bytes  | Multiple of 16                                                                              |
+--------------+--------+--------+----------------------------+---------------------------------------------------------------------------------------------+
| |valueType|_ | Host   | IN     | Data type of the matrix    | `CUDA_R_32F`, `CUDA_R_16F`, `CUDA_R_16BF`, `CUDA_R_8I`, `CUDA_R_8F_E4M3`, `CUDA_R_8F_E5M2`  |
+--------------+--------+--------+----------------------------+---------------------------------------------------------------------------------------------+
| |order|_     | Host   | IN     | Memory layout              | `CUSPARSE_ORDER_COL`, `CUSPARSE_ORDER_ROW`                                                  |
+--------------+--------+--------+----------------------------+---------------------------------------------------------------------------------------------+
| |sparsity|_  | Host   | IN     | Matrix sparsity ratio      | `CUSPARSELT_SPARSITY_50_PERCENT`                                                            |
+--------------+--------+--------+----------------------------+---------------------------------------------------------------------------------------------+

**Constrains**:

    * `rows`, `cols`, and `ld` must be a multiple of

        - 32 if `valueType` is `CUDA_R_8I`, `CUDA_R_8F_E4M3` or `CUDA_R_8F_E5M2`
        - 16 if `valueType` is `CUDA_R_16F`, or `CUDA_R_16BF`
        - 8  if `valueType` is `CUDA_R_32F`

    * The total size of the matrix cannot exceed:

        - :math:`2^{32}-1` elements for `CUDA_R_8I`, `CUDA_R_8F_E4M3` or `CUDA_R_8F_E5M2`
        - :math:`2^{31}-1` elements for `CUDA_R_16F` or `CUDA_R_16BF`
        - :math:`2^{30}-1` elements for `CUDA_R_32F`

See `cusparseStatus_t <https://docs.nvidia.com/cuda/cusparse/index.html#cusparseStatus_t>`_ for the description of the return status.

----

.. _cusparseLtMatDescriptorDestroy-label:

--------------------------------------------------------------------------------
:code:`cusparseLtMatDescriptorDestroy`
--------------------------------------------------------------------------------

.. code-block:: cpp

   cusparseStatus_t
   cusparseLtMatDescriptorDestroy(const cusparseLtMatDescriptor_t* matDescr)

| The function releases the resources used by an instance of a matrix descriptor. After this call, the matrix descriptor, the matmul descriptor, and the plan can no longer be used.

+-----------------+--------+--------+------------------------------------------+
| Parameter       | Memory | In/Out | Description                              |
+=================+========+========+==========================================+
| |matDescr|_     | Host   | IN     | Matrix descriptor                        |
+-----------------+--------+--------+------------------------------------------+

See `cusparseStatus_t <https://docs.nvidia.com/cuda/cusparse/index.html#cusparseStatus_t>`_ for the description of the return status.

----

.. _cusparseLtMatDescSetAttribute-label:

--------------------------------------------------------------------------------
:code:`cusparseLtMatDescSetAttribute`
--------------------------------------------------------------------------------

.. code-block:: cpp

    cusparseStatus_t
    cusparseLtMatDescSetAttribute(const cusparseLtHandle_t*    handle,
                                  cusparseLtMatDescriptor_t*   matmulDescr,
                                  cusparseLtMatDescAttribute_t matAttribute,
                                  const void*                  data,
                                  size_t                       dataSize)

| The function sets the value of the specified attribute belonging to matrix descriptor such as number of batches and their stride.

.. table::
    :widths: 10 8 8 34 20

    +-----------------+--------+--------+-------------------------------------------------------------------+-------------------------------------------------------------+
    | Parameter       | Memory | In/Out | Description                                                       | Possible Values                                             |
    +=================+========+========+===================================================================+=============================================================+
    | |handle|_       | Host   | IN     | cuSPARSELt library handle                                         |                                                             |
    +-----------------+--------+--------+-------------------------------------------------------------------+-------------------------------------------------------------+
    | |matmulDescr|_  | Host   | OUT    | Matrix descriptor                                                 |                                                             |
    +-----------------+--------+--------+-------------------------------------------------------------------+-------------------------------------------------------------+
    | |matAttribute|_ | Host   | IN     | Attribute to set                                                  | `CUSPARSELT_MAT_NUM_BATCHES`, `CUSPARSELT_MAT_BATCH_STRIDE` |
    +-----------------+--------+--------+-------------------------------------------------------------------+-------------------------------------------------------------+
    | `data`          | Host   | IN     | Pointer to the value to which the specified attribute will be set |                                                             |
    +-----------------+--------+--------+-------------------------------------------------------------------+-------------------------------------------------------------+
    | `dataSize`      | Host   | IN     | Size in bytes of the attribute value used for verification        |                                                             |
    +-----------------+--------+--------+-------------------------------------------------------------------+-------------------------------------------------------------+

See `cusparseStatus_t <https://docs.nvidia.com/cuda/cusparse/index.html#cusparseStatus_t>`_ for the description of the return status.

----

.. _cusparseLtMatDescGetAttribute-label:

--------------------------------------------------------------------------------
:code:`cusparseLtMatDescGetAttribute`
--------------------------------------------------------------------------------

.. code-block:: cpp

    cusparseStatus_t
    cusparseLtMatDescGetAttribute(const cusparseLtHandle_t*        handle,
                                  const cusparseLtMatDescriptor_t* matmulDescr,
                                  cusparseLtMatDescAttribute_t     matAttribute,
                                  void*                            data,
                                  size_t                           dataSize)

| The function gets the value of the specified attribute belonging to matrix descriptor such as number of batches and their stride.

.. table::
    :widths: 10 8 8 34 20

    +-----------------+--------+--------+--------------------------------------------------------------------------+-------------------------------------------------------------+
    | Parameter       | Memory | In/Out | Description                                                              | Possible Values                                             |
    +=================+========+========+==========================================================================+=============================================================+
    | |handle|_       | Host   | IN     | cuSPARSELt library handle                                                |                                                             |
    +-----------------+--------+--------+--------------------------------------------------------------------------+-------------------------------------------------------------+
    | |matmulDescr|_  | Host   | IN     | Matrix descriptor                                                        |                                                             |
    +-----------------+--------+--------+--------------------------------------------------------------------------+-------------------------------------------------------------+
    | |matAttribute|_ | Host   | IN     | Attribute to retrieve                                                    | `CUSPARSELT_MAT_NUM_BATCHES`, `CUSPARSELT_MAT_BATCH_STRIDE` |
    +-----------------+--------+--------+--------------------------------------------------------------------------+-------------------------------------------------------------+
    | `data`          | Host   | OUT    | Memory address containing the attribute value retrieved by this function |                                                             |
    +-----------------+--------+--------+--------------------------------------------------------------------------+-------------------------------------------------------------+
    | `dataSize`      | Host   | IN     | Size in bytes of the attribute value used for verification               |                                                             |
    +-----------------+--------+--------+--------------------------------------------------------------------------+-------------------------------------------------------------+

See `cusparseStatus_t <https://docs.nvidia.com/cuda/cusparse/index.html#cusparseStatus_t>`_ for the description of the return status.

----

================================================================================
Matmul Descriptor Functions
================================================================================

.. _cusparseLtMatmulDescriptorInit-label:

--------------------------------------------------------------------------------
:code:`cusparseLtMatmulDescriptorInit`
--------------------------------------------------------------------------------

.. code-block:: cpp

   cusparseStatus_t
   cusparseLtMatmulDescriptorInit(const cusparseLtHandle_t*        handle,
                                  cusparseLtMatmulDescriptor_t*    matmulDescr,
                                  cusparseOperation_t              opA,
                                  cusparseOperation_t              opB,
                                  const cusparseLtMatDescriptor_t* matA,
                                  const cusparseLtMatDescriptor_t* matB,
                                  const cusparseLtMatDescriptor_t* matC,
                                  const cusparseLtMatDescriptor_t* matD,
                                  cusparseComputeType              computeType)

The function initializes the *matrix multiplication* descriptor.

.. table::
    :widths: 10 8 8 28 26

    +----------------+--------+--------+-------------------------------------------+-------------------------------------------------------------------------------------------------------+
    | Parameter      | Memory | In/Out | Description                               | Possible Values                                                                                       |
    +================+========+========+===========================================+=======================================================================================================+
    | |handle|_      | Host   | IN     | cuSPARSELt library handle                 |                                                                                                       |
    +----------------+--------+--------+-------------------------------------------+-------------------------------------------------------------------------------------------------------+
    | |matmulDescr|_ | Host   | OUT    | Matrix multiplication descriptor          |                                                                                                       |
    +----------------+--------+--------+-------------------------------------------+-------------------------------------------------------------------------------------------------------+
    | |opA|_         | Host   | IN     | Operation applied to the matrix `A`       | `CUSPARSE_OPERATION_NON_TRANSPOSE`, `CUSPARSE_OPERATION_TRANSPOSE`                                    |
    +----------------+--------+--------+-------------------------------------------+-------------------------------------------------------------------------------------------------------+
    | |opB|_         | Host   | IN     | Operation applied to the matrix `B`       | `CUSPARSE_OPERATION_NON_TRANSPOSE`, `CUSPARSE_OPERATION_TRANSPOSE`                                    |
    +----------------+--------+--------+-------------------------------------------+-------------------------------------------------------------------------------------------------------+
    | |matA|_        | Host   | IN     | Structured or dense matrix descriptor `A` |                                                                                                       |
    +----------------+--------+--------+-------------------------------------------+-------------------------------------------------------------------------------------------------------+
    | |matB|_        | Host   | IN     | Structured or dense matrix descriptor `B` |                                                                                                       |
    +----------------+--------+--------+-------------------------------------------+-------------------------------------------------------------------------------------------------------+
    | |matC|_        | Host   | IN     | Dense matrix descriptor `C`               |                                                                                                       |
    +----------------+--------+--------+-------------------------------------------+-------------------------------------------------------------------------------------------------------+
    | |matD|_        | Host   | IN     | Dense matrix descriptor `D`               |                                                                                                       |
    +----------------+--------+--------+-------------------------------------------+-------------------------------------------------------------------------------------------------------+
    | |computeType|_ | Host   | IN     | Compute precision                         | `CUSPARSE_COMPUTE_32I`, `CUSPARSE_COMPUTE_32F`, `CUSPARSE_COMPUTE_16F`                                |
    +----------------+--------+--------+-------------------------------------------+-------------------------------------------------------------------------------------------------------+

The structured matrix descriptor can used for |matA|_ or |matB|_ but not both.


**Constrains**:

    * See :ref:`cusparseLtMatmul() <cusparseLtMatmul-label>` for the supported data types.

    * `CUDA_R_8I`, `CUDA_R_8F_E4M3` and `CUDA_R_8F_E5M2` data type only supports (the opposite if `B` is structured):

        * ``opA/opB = TN`` if the matrix orders are ``orderA/orderB = Col/Col``
        * ``opA/opB = NT`` if the matrix orders are ``orderA/orderB = Row/Row``
        * ``opA/opB = NN`` if the matrix orders are ``orderA/orderB = Row/Col``
        * ``opA/opB = TT`` if the matrix orders are ``orderA/orderB = Col/Row``
    * `C` and `D` must have the same leading dimension and memory layout (see `cusparseOrder_t <https://docs.nvidia.com/cuda/cusparse/index.html#cusparseStatus_t>`_  for different memory layouts).

See `cusparseStatus_t <https://docs.nvidia.com/cuda/cusparse/index.html#cusparseStatus_t>`_ for the description of the return status.

----

.. _cusparseLtMatmulDescSetAttribute-label:

--------------------------------------------------------------------------------
:code:`cusparseLtMatmulDescSetAttribute`
--------------------------------------------------------------------------------

.. code-block:: cpp

    cusparseStatus_t
    cusparseLtMatmulDescSetAttribute(const cusparseLtHandle_t*       handle,
                                     cusparseLtMatmulDescriptor_t*   matmulDescr,
                                     cusparseLtMatmulDescAttribute_t matmulAttribute,
                                     const void*                     data,
                                     size_t                          dataSize)

| The function sets the value of the specified attribute belonging to matrix descriptor such as activation function and bias.

.. table::
    :widths: 10 8 8 28 26

    +--------------------+--------+--------+-------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Parameter          | Memory | In/Out | Description                                                       |                                                                                                                                                                                                                                                                                                                                                                             |
    +====================+========+========+===================================================================+=============================================================================================================================================================================================================================================================================================================================================================================+
    | |handle|_          | Host   | IN     | cuSPARSELt library handle                                         |                                                                                                                                                                                                                                                                                                                                                                             |
    +--------------------+--------+--------+-------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | |matmulDescr|_     | Host   | OUT    | Matrix descriptor                                                 |                                                                                                                                                                                                                                                                                                                                                                             |
    +--------------------+--------+--------+-------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | |matmulAttribute|_ | Host   | IN     | Attribute to set                                                  | `CUSPARSELT_MATMUL_ACTIVATION_RELU`, `CUSPARSELT_MATMUL_ACTIVATION_RELU_UPPERBOUND`, `CUSPARSELT_MATMUL_ACTIVATION_RELU_THRESHOLD`, `CUSPARSELT_MATMUL_ACTIVATION_GELU`, `CUSPARSELT_MATMUL_ACTIVATION_GELU_SCALING`, `CUSPARSELT_MATMUL_ALPHA_VECTOR_SCALING`, `CUSPARSELT_MATMUL_BETA_VECTOR_SCALING`, `CUSPARSELT_MATMUL_BIAS_POINTER`, `CUSPARSELT_MATMUL_BIAS_STRIDE`  |
    +--------------------+--------+--------+-------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | `data`             | Host   | IN     | Pointer to the value to which the specified attribute will be set |                                                                                                                                                                                                                                                                                                                                                                             |
    +--------------------+--------+--------+-------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | `dataSize`         | Host   | IN     | Size in bytes of the attribute value used for verification        |                                                                                                                                                                                                                                                                                                                                                                             |
    +--------------------+--------+--------+-------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

See `cusparseStatus_t <https://docs.nvidia.com/cuda/cusparse/index.html#cusparseStatus_t>`_ for the description of the return status.

----

.. _cusparseLtMatmulDescGetAttribute-label:

--------------------------------------------------------------------------------
:code:`cusparseLtMatmulDescGetAttribute`
--------------------------------------------------------------------------------

.. code-block:: cpp

    cusparseStatus_t
    cusparseLtMatmulDescGetAttribute(const cusparseLtHandle_t*           handle,
                                     const cusparseLtMatmulDescriptor_t* matmulDescr,
                                     cusparseLtMatmulDescAttribute_t     matmulAttribute,
                                     void*                               data,
                                     size_t                              dataSize)

| The function gets the value of the specified attribute belonging to matrix descriptor such as activation function and bias.

.. table::
    :widths: 10 8 8 28 26

    +--------------------+--------+--------+--------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Parameter          | Memory | In/Out | Description                                                              |                                                                                                                                                                                                                                                                                                                                                                            |
    +====================+========+========+==========================================================================+============================================================================================================================================================================================================================================================================================================================================================================+
    | |handle|_          | Host   | IN     | cuSPARSELt library handle                                                |                                                                                                                                                                                                                                                                                                                                                                            |
    +--------------------+--------+--------+--------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | |matmulDescr|_     | Host   | IN     | Matrix descriptor                                                        |                                                                                                                                                                                                                                                                                                                                                                            |
    +--------------------+--------+--------+--------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | |matmulAttribute|_ | Host   | IN     | Attribute to retrieve                                                    | `CUSPARSELT_MATMUL_ACTIVATION_RELU`, `CUSPARSELT_MATMUL_ACTIVATION_RELU_UPPERBOUND`, `CUSPARSELT_MATMUL_ACTIVATION_RELU_THRESHOLD`, `CUSPARSELT_MATMUL_ACTIVATION_GELU`, `CUSPARSELT_MATMUL_ACTIVATION_GELU_SCALING`, `CUSPARSELT_MATMUL_ALPHA_VECTOR_SCALING`, `CUSPARSELT_MATMUL_BETA_VECTOR_SCALING`, `CUSPARSELT_MATMUL_BIAS_POINTER`, `CUSPARSELT_MATMUL_BIAS_STRIDE` |
    +--------------------+--------+--------+--------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | `data`             | Host   | OUT    | Memory address containing the attribute value retrieved by this function |                                                                                                                                                                                                                                                                                                                                                                            |
    +--------------------+--------+--------+--------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | `dataSize`         | Host   | IN     | Size in bytes of the attribute value used for verification               |                                                                                                                                                                                                                                                                                                                                                                            |
    +--------------------+--------+--------+--------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

See `cusparseStatus_t <https://docs.nvidia.com/cuda/cusparse/index.html#cusparseStatus_t>`_ for the description of the return status.

----

================================================================================
Matmul Algorithm Functions
================================================================================

.. _cusparseLtMatmulAlgSelectionInit-label:

--------------------------------------------------------------------------------
:code:`cusparseLtMatmulAlgSelectionInit`
--------------------------------------------------------------------------------

.. code-block:: cpp

   cusparseStatus_t
   cusparseLtMatmulAlgSelectionInit(const cusparseLtHandle_t*           handle,
                                    cusparseLtMatmulAlgSelection_t*     algSelection,
                                    const cusparseLtMatmulDescriptor_t* matmulDescr,
                                    cusparseLtMatmulAlg_t               alg)

The function initializes the *algorithm selection* descriptor.

+-----------------+--------+--------+----------------------------------+---------------------------------+
| Parameter       | Memory | In/Out | Description                      | Possible Values                 |
+=================+========+========+==================================+=================================+
| |handle|_       | Host   | IN     | cuSPARSELt library handle        |                                 |
+-----------------+--------+--------+----------------------------------+---------------------------------+
| |algSelection|_ | Host   | OUT    | Algorithm selection descriptor   |                                 |
+-----------------+--------+--------+----------------------------------+---------------------------------+
| |matmulDescr|_  | Host   | IN     | Matrix multiplication descriptor |                                 |
+-----------------+--------+--------+----------------------------------+---------------------------------+
| |alg|_          | Host   | IN     | Algorithm mode                   | `CUSPARSELT_MATMUL_ALG_DEFAULT` |
+-----------------+--------+--------+----------------------------------+---------------------------------+

See `cusparseStatus_t <https://docs.nvidia.com/cuda/cusparse/index.html#cusparseStatus_t>`_ for the description of the return status.

----

.. _cusparseLtMatmulAlgSetAttribute-label:

--------------------------------------------------------------------------------
:code:`cusparseLtMatmulAlgSetAttribute`
--------------------------------------------------------------------------------

.. code-block:: cpp

   cusparseStatus_t
   cusparseLtMatmulAlgSetAttribute(const cusparseLtHandle_t*       handle,
                                   cusparseLtMatmulAlgSelection_t* algSelection,
                                   cusparseLtMatmulAlgAttribute_t  attribute,
                                   const void*                     data,
                                   size_t                          dataSize)

The function sets the value of the specified attribute belonging to algorithm selection descriptor.

.. table::
    :widths: 10 8 8 28 26

    +-----------------+--------+--------+--------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Parameter       | Memory | In/Out | Description                                                        | Possible Values                                                                                                                                                                                                     |
    +=================+========+========+====================================================================+=====================================================================================================================================================================================================================+
    | |handle|_       | Host   | IN     | cuSPARSELt library handle                                          |                                                                                                                                                                                                                     |
    +-----------------+--------+--------+--------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | |algSelection|_ | Host   | OUT    | Algorithm selection descriptor                                     |                                                                                                                                                                                                                     |
    +-----------------+--------+--------+--------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | |attribute|_    | Host   | IN     | The attribute to set                                               | `CUSPARSELT_MATMUL_ALG_CONFIG_ID`, `CUSPARSELT_MATMUL_ALG_CONFIG_MAX_ID`, `CUSPARSELT_MATMUL_SEARCH_ITERATIONS`, `CUSPARSELT_MATMUL_SPLIT_K`, `CUSPARSELT_MATMUL_SPLIT_K_MODE`, `CUSPARSELT_MATMUL_SPLIT_K_BUFFERS` |
    +-----------------+--------+--------+--------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | `data`          | Host   | IN     | Pointer to the value to which the specified attribute will be set  |                                                                                                                                                                                                                     |
    +-----------------+--------+--------+--------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | `dataSize`      | Host   | IN     | Size in bytes of the attribute value used for verification         |                                                                                                                                                                                                                     |
    +-----------------+--------+--------+--------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

See `cusparseStatus_t <https://docs.nvidia.com/cuda/cusparse/index.html#cusparseStatus_t>`_ for the description of the return status.

----

.. _cusparseLtMatmulAlgGetAttribute-label:

--------------------------------------------------------------------------------
:code:`cusparseLtMatmulAlgGetAttribute`
--------------------------------------------------------------------------------

.. code-block:: cpp

   cusparseStatus_t
   cusparseLtMatmulAlgGetAttribute(const cusparseLtHandle_t*             handle,
                                   const cusparseLtMatmulAlgSelection_t* algSelection,
                                   cusparseLtMatmulAlgAttribute_t        attribute,
                                   void*                                 data,
                                   size_t                                dataSize)

The function returns the value of the queried attribute belonging to algorithm selection descriptor.

.. table::
    :widths: 10 8 8 28 26

    +-----------------+--------+--------+--------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Parameter       | Memory | In/Out | Description                                                              | Possible Values                                                                                                                                                                                                     |
    +=================+========+========+==========================================================================+=====================================================================================================================================================================================================================+
    | |handle|_       | Host   | IN     | cuSPARSELt library handle                                                |                                                                                                                                                                                                                     |
    +-----------------+--------+--------+--------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | |algSelection|_ | Host   | IN     | Algorithm selection descriptor                                           |                                                                                                                                                                                                                     |
    +-----------------+--------+--------+--------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | |attribute|_    | Host   | IN     | The attribute that will be retrieved by this function                    | `CUSPARSELT_MATMUL_ALG_CONFIG_ID`, `CUSPARSELT_MATMUL_ALG_CONFIG_MAX_ID`, `CUSPARSELT_MATMUL_SEARCH_ITERATIONS`, `CUSPARSELT_MATMUL_SPLIT_K`, `CUSPARSELT_MATMUL_SPLIT_K_MODE`, `CUSPARSELT_MATMUL_SPLIT_K_BUFFERS` |
    +-----------------+--------+--------+--------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | `data`          | Host   | OUT    | Memory address containing the attribute value retrieved by this function |                                                                                                                                                                                                                     |
    +-----------------+--------+--------+--------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | `dataSize`      | Host   | IN     | Size in bytes of the attribute value used for verification               |                                                                                                                                                                                                                     |
    +-----------------+--------+--------+--------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

See `cusparseStatus_t <https://docs.nvidia.com/cuda/cusparse/index.html#cusparseStatus_t>`_ for the description of the return status.

----

================================================================================
Matmul Functions
================================================================================

.. _cusparseLtMatmulGetWorkspace-label:

--------------------------------------------------------------------------------
:code:`cusparseLtMatmulGetWorkspace`
--------------------------------------------------------------------------------

.. code-block:: cpp

   cusparseStatus_t
   cusparseLtMatmulGetWorkspace(const cusparseLtHandle_t*     handle,
                                const cusparseLtMatmulPlan_t* plan,
                                size_t*                       workspaceSize)

The function determines the required workspace size associated to the selected algorithm.

+-----------------+--------+--------+-----------------------------------+
| Parameter       | Memory | In/Out | Description                       |
+=================+========+========+===================================+
| |handle|_       | Host   | IN     | cuSPARSELt library handle         |
+-----------------+--------+--------+-----------------------------------+
| |plan|_         | Host   | IN     | Matrix multiplication plan        |
+-----------------+--------+--------+-----------------------------------+
| `workspaceSize` | Host   | OUT    | Workspace size in bytes           |
+-----------------+--------+--------+-----------------------------------+

See `cusparseStatus_t <https://docs.nvidia.com/cuda/cusparse/index.html#cusparseStatus_t>`_ for the description of the return status.

----

.. _cusparseLtMatmulPlanInit-label:

--------------------------------------------------------------------------------
:code:`cusparseLtMatmulPlanInit`
--------------------------------------------------------------------------------

.. code-block:: cpp

   cusparseStatus_t
   cusparseLtMatmulPlanInit(const cusparseLtHandle_t*             handle,
                            cusparseLtMatmulPlan_t*               plan,
                            const cusparseLtMatmulDescriptor_t*   matmulDescr,
                            const cusparseLtMatmulAlgSelection_t* algSelection)

+-----------------+--------+--------+-----------------------------------+
| Parameter       | Memory | In/Out | Description                       |
+=================+========+========+===================================+
| |handle|_       | Host   | IN     | cuSPARSELt library handle         |
+-----------------+--------+--------+-----------------------------------+
| |plan|_         | Host   | OUT    | Matrix multiplication plan        |
+-----------------+--------+--------+-----------------------------------+
| |matmulDescr|_  | Host   | IN     | Matrix multiplication descriptor  |
+-----------------+--------+--------+-----------------------------------+
| |algSelection|_ | Host   | IN     | Algorithm selection descriptor    |
+-----------------+--------+--------+-----------------------------------+

See `cusparseStatus_t <https://docs.nvidia.com/cuda/cusparse/index.html#cusparseStatus_t>`_ for the description of the return status.

----

.. _cusparseLtMatmulPlanDestroy-label:

--------------------------------------------------------------------------------
:code:`cusparseLtMatmulPlanDestroy`
--------------------------------------------------------------------------------

.. code-block:: cpp

   cusparseStatus_t
   cusparseLtMatmulPlanDestroy(const cusparseLtMatmulPlan_t* plan)

| The function releases the resources used by an instance of the matrix multiplication plan. This function is the last call with a specific plan instance.
| Calling any cusparseLt function which uses ``cusparseLtMatmulPlan_t`` after ``cusparseLtMatmulPlanDestroy()`` will return an error.

+-----------------+--------+--------+-----------------------------------+
| Parameter       | Memory | In/Out | Description                       |
+=================+========+========+===================================+
| |plan|_         | Host   | IN     | Matrix multiplication plan        |
+-----------------+--------+--------+-----------------------------------+

See `cusparseStatus_t <https://docs.nvidia.com/cuda/cusparse/index.html#cusparseStatus_t>`_ for the description of the return status.

----

.. _cusparseLtMatmul-label:

--------------------------------------------------------------------------------
:code:`cusparseLtMatmul`
--------------------------------------------------------------------------------

.. code-block:: cpp

   cusparseStatus_t
   cusparseLtMatmul(const cusparseLtHandle_t*     handle,
                    const cusparseLtMatmulPlan_t* plan,
                    const void*                   alpha,
                    const void*                   d_A,
                    const void*                   d_B,
                    const void*                   beta,
                    const void*                   d_C,
                    void*                         d_D,
                    void*                         workspace,
                    cudaStream_t*                 streams,
                    int32_t                       numStreams)

The function computes the matrix multiplication of matrices `A` and `B` to produce the the output matrix `D`, according to the following operation:

.. math::

   D = Activation(\alpha op(A) \cdot op(B) + \beta op(C) + bias) \cdot scale

| where `A`, `B`, and `C` are input matrices, and :math:`\alpha` and :math:`\beta` are input scalars or vectors of scalars (device-side pointers).
| **Note**: The function currently only supports the case where `D` has the same shape of `C`

+-----------------+--------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Parameter       | Memory | In/Out | Description                                                                                                                                                                                                             |
+=================+========+========+=========================================================================================================================================================================================================================+
| |handle|_       | Host   | IN     | cuSPARSELt library handle                                                                                                                                                                                               |
+-----------------+--------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| |plan|_         | Host   | IN     | Matrix multiplication plan                                                                                                                                                                                              |
+-----------------+--------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `alpha`         | Host   | IN     | :math:`\alpha` scalar/vector of scalars used for multiplication (`float` data type)                                                                                                                                     |
+-----------------+--------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `d_A`           | Device | IN     | Pointer to the structured or dense matrix `A`                                                                                                                                                                           |
+-----------------+--------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `d_B`           | Device | IN     | Pointer to the structured or dense matrix `B`                                                                                                                                                                           |
+-----------------+--------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `beta`          | Host   | IN     | :math:`\beta` scalar/vector of scalars used for multiplication (`float` data type). It can have a `NULL` value only if `CUSPARSELT_MATMUL_ALPHA_VECTOR_SCALING` is set without `CUSPARSELT_MATMUL_BETA_VECTOR_SCALING`  |
+-----------------+--------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `d_C`           | Device | IN     | Pointer to the dense matrix `C`                                                                                                                                                                                         |
+-----------------+--------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `d_D`           | Device | OUT    | Pointer to the dense matrix `D`                                                                                                                                                                                         |
+-----------------+--------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `workspace`     | Device | IN     | Pointer to workspace                                                                                                                                                                                                    |
+-----------------+--------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `streams`       | Host   | IN     | Pointer to CUDA stream array for the computation                                                                                                                                                                        |
+-----------------+--------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| `numStreams`    | Host   | IN     | Number of CUDA streams in `streams`                                                                                                                                                                                     |
+-----------------+--------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

**Data types Supported:**


+--------------------+-------------------+-------------------+-------------------------------+
| Input A/B          | Input C           | Output D          | Compute                       |
+====================+===================+===================+===============================+
| `CUDA_R_32F`       | `CUDA_R_32F`                          | |CUSPARSE_COMPUTE_32F|_       |
+--------------------+-------------------+-------------------+-------------------------------+
| `CUDA_R_16F`       | `CUDA_R_16F`                          | |CUSPARSE_COMPUTE_32F|_       |
+                    +                                       +-------------------------------+
|                    |                                       | |CUSPARSE_COMPUTE_16F|_       |
+--------------------+-------------------+-------------------+-------------------------------+
| `CUDA_R_16BF`      | `CUDA_R_16BF`                         | |CUSPARSE_COMPUTE_32F|_       |
+--------------------+-------------------+-------------------+-------------------------------+
| `CUDA_R_8I`        | `CUDA_R_8I`                           | |CUSPARSE_COMPUTE_32I|_       |
+                    +-------------------+-------------------+                               +
|                    | `CUDA_R_32I`                          |                               |
+                    +-------------------+-------------------+                               +
|                    | `CUDA_R_16F`                          |                               |
+                    +-------------------+-------------------+                               +
|                    | `CUDA_R_16BF`                         |                               |
+--------------------+-------------------+-------------------+-------------------------------+
| `CUDA_R_8F_E4M3`   | `CUDA_R_16F`      | `CUDA_R_8F_E4M3`  | |CUSPARSE_COMPUTE_32F|_       |
+                    +-------------------+-------------------+                               +
|                    | `CUDA_R_16BF`     | `CUDA_R_8F_E4M3`  |                               |
+                    +-------------------+-------------------+                               +
|                    | `CUDA_R_16F`                          |                               |
+                    +-------------------+-------------------+                               +
|                    | `CUDA_R_16BF`                         |                               |
+                    +-------------------+-------------------+                               +
|                    | `CUDA_R_32F`                          |                               |
+--------------------+-------------------+-------------------+-------------------------------+
| `CUDA_R_8F_E5M2`   | `CUDA_R_16F`      | `CUDA_R_8F_E5M2`  | |CUSPARSE_COMPUTE_32F|_       |
+                    +-------------------+-------------------+                               +
|                    | `CUDA_R_16BF`     | `CUDA_R_8F_E5M2`  |                               |
+                    +-------------------+-------------------+                               +
|                    | `CUDA_R_16F`                          |                               |
+                    +-------------------+-------------------+                               +
|                    | `CUDA_R_16BF`                         |                               |
+                    +-------------------+-------------------+                               +
|                    | `CUDA_R_32F`                          |                               |
+--------------------+-------------------+-------------------+-------------------------------+


The *structured matrix* `A` or `B` (before the compression) must respect the following constrains depending on the operation applied on it:

* For ``op = CUSPARSE_NON_TRANSPOSE``

    - `CUDA_R_16F`, `CUDA_R_16BF`, `CUDA_R_8I`, `CUDA_R_8F_E4M3`, `CUDA_R_8F_E5M2` each row must have at least two non-zero values every four elements
    - `CUDA_R_32F` each row must have at least one non-zero value every two elements

* For ``op = CUSPARSE_TRANSPOSE``

    - `CUDA_R_16F`, `CUDA_R_16BF`, `CUDA_R_8I`, `CUDA_R_8F_E4M3`, `CUDA_R_8F_E5M2` each column must have at least two non-zero values every four elements
    - `CUDA_R_32F` each column must have at least one non-zero value every two elements

`int8`, `e4m3` and `e5m2` kernels should run at high SM clocks for maximizing the performance.

The correctness of the pruning result (matrix `A`/`B`) can be check with the function :ref:`cusparseLtSpMMAPruneCheck() <cusparseLtSpMMAPruneCheck-label>`.

**Constrains:**

    * All pointers must be aligned to 16 bytes

**Properties**

    * The routine requires no extra storage
    * The routine supports asynchronous execution with respect to `streams[0]`
    * Provides deterministic (bit-wise) results for each run

`cusparseLtMatmul` supports the following `optimizations <https://docs.nvidia.com/cuda/cusparse/index.html#optimization-notes>`_:

    * CUDA graph capture
    * Hardware Memory Compression

See `cusparseStatus_t <https://docs.nvidia.com/cuda/cusparse/index.html#cusparseStatus_t>`_ for the description of the return status.

----

.. _cusparseLtMatmulSearch-label:

--------------------------------------------------------------------------------
:code:`cusparseLtMatmulSearch`
--------------------------------------------------------------------------------

.. code-block:: cpp

   cusparseStatus_t
   cusparseLtMatmulSearch(const cusparseLtHandle_t* handle,
                          cusparseLtMatmulPlan_t*   plan,
                          const void*               alpha,
                          const void*               d_A,
                          const void*               d_B,
                          const void*               beta,
                          const void*               d_C,
                          void*                     d_D,
                          void*                     workspace,
                          cudaStream_t*             streams,
                          int32_t                   numStreams)

| The function evaluates all available algorithms for the matrix multiplication and automatically updates the `plan` by selecting the fastest one. The functionality is intended to be used for auto-tuning purposes when the same operation is repeated multiple times over different inputs.
| The function behavior is the same of :ref:`cusparseLtMatmul() <cusparseLtMatmul-label>`.

 * The function is *NOT* asynchronous with respect to `streams[0]` (*blocking call*)
 * The number of iterations for the evaluation can be set by using  :ref:`cusparseLtMatmulAlgSetAttribute() <cusparseLtMatmulAlgSetAttribute-label>` with |CUSPARSELT_MATMUL_SEARCH_ITERATIONS|_.
 * The selected algorithm id can be retrieved by using :ref:`cusparseLtMatmulAlgGetAttribute() <cusparseLtMatmulAlgGetAttribute-label>` with |CUSPARSELT_MATMUL_ALG_CONFIG_ID|_.
 * The function also searches for optimal combination of :ref:`Split-K parameters <cusparseLtMatmulAlgAttribute_t-label>`. The selected values can be retrieved by using :ref:`cusparseLtMatmulAlgGetAttribute() <cusparseLtMatmulAlgGetAttribute-label>`.

----

================================================================================
Helper Functions
================================================================================

.. _cusparseLtSpMMAPrune-label:

--------------------------------------------------------------------------------
:code:`cusparseLtSpMMAPrune`
--------------------------------------------------------------------------------

.. code-block:: cpp

   cusparseStatus_t
   cusparseLtSpMMAPrune(const cusparseLtHandle_t*           handle,
                        const cusparseLtMatmulDescriptor_t* matmulDescr,
                        const void*                         d_in,
                        void*                               d_out,
                        cusparseLtPruneAlg_t                pruneAlg,
                        cudaStream_t                        stream)

The function prunes a dense matrix `d_in` according to the specified algorithm `pruneAlg`.

.. table::
    :widths: 10 8 8 28 26

    +-----------------+--------+--------+-----------------------------------+---------------------------------------------------------------+
    | Parameter       | Memory | In/Out | Description                       | Possible Values                                               |
    +=================+========+========+===================================+===============================================================+
    | |handle|_       | Host   | IN     | cuSPARSELt library handle         |                                                               |
    +-----------------+--------+--------+-----------------------------------+---------------------------------------------------------------+
    | |matmulDescr|_  | Host   | IN     | Matrix multiplication descriptor  |                                                               |
    +-----------------+--------+--------+-----------------------------------+---------------------------------------------------------------+
    | `d_in`          | Device | IN     | Pointer to the dense matrix       |                                                               |
    +-----------------+--------+--------+-----------------------------------+---------------------------------------------------------------+
    | `d_out`         | Device | OUT    | Pointer to the pruned matrix      |                                                               |
    +-----------------+--------+--------+-----------------------------------+---------------------------------------------------------------+
    | |pruneAlg|_     | Device | IN     | Pruning algorithm                 | `CUSPARSELT_PRUNE_SPMMA_TILE`, `CUSPARSELT_PRUNE_SPMMA_STRIP` |
    +-----------------+--------+--------+-----------------------------------+---------------------------------------------------------------+
    | `stream`        | Host   | IN     | CUDA stream for the computation   |                                                               |
    +-----------------+--------+--------+-----------------------------------+---------------------------------------------------------------+

**Properties**

    * The routine requires no extra storage
    * The routine supports asynchronous execution with respect to `stream`
    * Provides deterministic (bit-wise) results for each run

`cusparseLtSpMMAPrune()` supports the following `optimizations <https://docs.nvidia.com/cuda/cusparse/index.html#optimization-notes>`_:

    * CUDA graph capture
    * Hardware Memory Compression

See `cusparseStatus_t <https://docs.nvidia.com/cuda/cusparse/index.html#cusparseStatus_t>`_ for the description of the return status.

----

.. _cusparseLtSpMMAPrune2-label:

--------------------------------------------------------------------------------
:code:`cusparseLtSpMMAPrune2 [DEPRECATED]`
--------------------------------------------------------------------------------

.. code-block:: cpp

    cusparseStatus_t
    cusparseLtSpMMAPrune2(const cusparseLtHandle_t*        handle,
                          const cusparseLtMatDescriptor_t* sparseMatDescr,
                          int                              isSparseA,
                          cusparseOperation_t              op,
                          const void*                      d_in,
                          void*                            d_out,
                          cusparseLtPruneAlg_t             pruneAlg,
                          cudaStream_t                     stream);

The function prunes a dense matrix `d_in` according to the specified algorithm `pruneAlg`.

.. table::
    :widths: 10 8 8 34 20

    +-------------------+--------+--------+----------------------------------------------------------------------------------------------+--------------------------------------------------------------------+
    | Parameter         | Memory | In/Out | Description                                                                                  | Possible Values                                                    |
    +===================+========+========+==============================================================================================+====================================================================+
    | |handle|_         | Host   | IN     | cuSPARSELt library handle                                                                    |                                                                    |
    +-------------------+--------+--------+----------------------------------------------------------------------------------------------+--------------------------------------------------------------------+
    | |sparseMatDescr|_ | Host   | IN     | Structured (sparse) matrix descriptor                                                        |                                                                    |
    +-------------------+--------+--------+----------------------------------------------------------------------------------------------+--------------------------------------------------------------------+
    | `isSparseA`       | Host   | IN     | Specify if the structured (sparse) matrix is in the first position (`matA` or `matB`)        | `0` **false**, **true** otherwise                                  |
    +-------------------+--------+--------+----------------------------------------------------------------------------------------------+--------------------------------------------------------------------+
    | `op`              | Host   | IN     | Operation that will be applied to the structured (sparse) matrix in the multiplication       | `CUSPARSE_OPERATION_NON_TRANSPOSE`, `CUSPARSE_OPERATION_TRANSPOSE` |
    +-------------------+--------+--------+----------------------------------------------------------------------------------------------+--------------------------------------------------------------------+
    | `d_in`            | Device | IN     | Pointer to the dense matrix                                                                  |                                                                    |
    +-------------------+--------+--------+----------------------------------------------------------------------------------------------+--------------------------------------------------------------------+
    | `d_out`           | Device | OUT    | Pointer to the pruned matrix                                                                 |                                                                    |
    +-------------------+--------+--------+----------------------------------------------------------------------------------------------+--------------------------------------------------------------------+
    | |pruneAlg|_       | Device | IN     | Pruning algorithm                                                                            | `CUSPARSELT_PRUNE_SPMMA_TILE`, `CUSPARSELT_PRUNE_SPMMA_STRIP`      |
    +-------------------+--------+--------+----------------------------------------------------------------------------------------------+--------------------------------------------------------------------+
    | `stream`          | Host   | IN     | CUDA stream for the computation                                                              |                                                                    |
    +-------------------+--------+--------+----------------------------------------------------------------------------------------------+--------------------------------------------------------------------+

If `CUSPARSELT_PRUNE_SPMMA_TILE` is used, `isSparseA` and `op` are not relevant.

The function has the same properties of :ref:`cusparseLtSpMMAPrune() <cusparseLtSpMMAPrune-label>`

----

.. _cusparseLtSpMMAPruneCheck-label:

--------------------------------------------------------------------------------
:code:`cusparseLtSpMMAPruneCheck`
--------------------------------------------------------------------------------

.. code-block:: cpp

   cusparseStatus_t
   cusparseLtSpMMAPruneCheck(const cusparseLtHandle_t*           handle,
                             const cusparseLtMatmulDescriptor_t* matmulDescr,
                             const void*                         d_in,
                             int*                                d_valid,
                             cudaStream_t                        stream)

The function checks the correctness of the pruning structure for a given matrix.

+-----------------+--------+--------+----------------------------------------------+
| Parameter       | Memory | In/Out | Description                                  |
+=================+========+========+==============================================+
| |handle|_       | Host   | IN     | cuSPARSELt library handle                    |
+-----------------+--------+--------+----------------------------------------------+
| |matmulDescr|_  | Host   | IN     | Matrix multiplication descriptor             |
+-----------------+--------+--------+----------------------------------------------+
| `d_in`          | Device | IN     | Pointer to the matrix to check               |
+-----------------+--------+--------+----------------------------------------------+
| `d_valid`       | Device | OUT    | Validation results (`0` correct, `1` wrong)  |
+-----------------+--------+--------+----------------------------------------------+
| `stream`        | Host   | IN     | CUDA stream for the computation              |
+-----------------+--------+--------+----------------------------------------------+

See `cusparseStatus_t <https://docs.nvidia.com/cuda/cusparse/index.html#cusparseStatus_t>`_ for the description of the return status.

----


.. _cusparseLtSpMMAPruneCheck2-label:

--------------------------------------------------------------------------------
:code:`cusparseLtSpMMAPruneCheck2 [DEPRECATED]`
--------------------------------------------------------------------------------

.. code-block:: cpp

    cusparseStatus_t
    cusparseLtSpMMAPruneCheck2(const cusparseLtHandle_t*        handle,
                               const cusparseLtMatDescriptor_t* sparseMatDescr,
                               int                              isSparseA,
                               cusparseOperation_t              op,
                               const void*                      d_in,
                               int*                             d_valid,
                               cudaStream_t                     stream)

The function checks the correctness of the pruning structure for a given matrix.

+---------------------+--------+--------+------------------------------------------------------------------------------------------+----------------------------------------------------------------------+
| Parameter           | Memory | In/Out | Description                                                                              | Possible Values                                                      |
+=====================+========+========+==========================================================================================+======================================================================+
| |handle|_           | Host   | IN     | cuSPARSELt library handle                                                                |                                                                      |
+---------------------+--------+--------+------------------------------------------------------------------------------------------+----------------------------------------------------------------------+
| |sparseMatDescr|_   | Host   | IN     | Structured (sparse) matrix descriptor                                                    |                                                                      |
+---------------------+--------+--------+------------------------------------------------------------------------------------------+----------------------------------------------------------------------+
| `isSparseA`         | Host   | IN     | Specify if the structured (sparse) matrix is in the first position (`matA` or `matB`)    |  `0`: **false**, `!= 0`: **true**                                    |
+---------------------+--------+--------+------------------------------------------------------------------------------------------+----------------------------------------------------------------------+
| `op`                | Host   | IN     | Operation that will be applied to the structured (sparse) matrix in the multiplication   |  `CUSPARSE_OPERATION_NON_TRANSPOSE`, `CUSPARSE_OPERATION_TRANSPOSE`  |
+---------------------+--------+--------+------------------------------------------------------------------------------------------+----------------------------------------------------------------------+
| `d_in`              | Device | IN     | Pointer to the matrix to check                                                           |                                                                      |
+---------------------+--------+--------+------------------------------------------------------------------------------------------+----------------------------------------------------------------------+
| `d_valid`           | Device | OUT    | Validation results (`0` correct, `1` wrong)                                              |                                                                      |
+---------------------+--------+--------+------------------------------------------------------------------------------------------+----------------------------------------------------------------------+
| `stream`            | Host   | IN     | CUDA stream for the computation                                                          |                                                                      |
+---------------------+--------+--------+------------------------------------------------------------------------------------------+----------------------------------------------------------------------+

The function has the same properties of :ref:`cusparseLtSpMMAPruneCheck() <cusparseLtSpMMAPruneCheck-label>`

----

.. _cusparseLtSpMMACompressedSize-label:

--------------------------------------------------------------------------------
:code:`cusparseLtSpMMACompressedSize`
--------------------------------------------------------------------------------

.. code-block:: cpp

  cusparseStatus_t
  cusparseLtSpMMACompressedSize(const cusparseLtHandle_t*     handle,
                                const cusparseLtMatmulPlan_t* plan,
                                size_t*                       compressedSize,
                                size_t*                       compressBufferSize)

The function provides the size of the *compressed* matrix to be allocated before calling :ref:`cusparseLtSpMMACompress() <cusparseLtSpMMACompress-label>`.

+-----------------------+--------+--------+----------------------------------------------------------------+
| Parameter             | Memory | In/Out | Description                                                    |
+=======================+========+========+================================================================+
| |handle|_             | Host   | IN     | cuSPARSELt library handle                                      |
+-----------------------+--------+--------+----------------------------------------------------------------+
| |plan|_               | Host   | IN     | Matrix plan descriptor                                         |
+-----------------------+--------+--------+----------------------------------------------------------------+
| `compressedSize`      | Host   | OUT    | Size in bytes for the compressed matrix                        |
+-----------------------+--------+--------+----------------------------------------------------------------+
| `compressBufferSize`  | Host   | OUT    | Size in bytes for the buffer needed for the matrix compression |
+-----------------------+--------+--------+----------------------------------------------------------------+

See `cusparseStatus_t <https://docs.nvidia.com/cuda/cusparse/index.html#cusparseStatus_t>`_ for the description of the return status.

----


.. _cusparseLtSpMMACompressedSize2-label:

--------------------------------------------------------------------------------
:code:`cusparseLtSpMMACompressedSize2 [DEPRECATED]`
--------------------------------------------------------------------------------

.. code-block:: cpp

  cusparseStatus_t
  cusparseLtSpMMACompressedSize2(const cusparseLtHandle_t*        handle,
                                 const cusparseLtMatDescriptor_t* sparseMatDescr,
                                 size_t*                          compressedSize,
                                 size_t*                          compressBufferSize)

The function provides the size of the *compressed* matrix to be allocated before calling :ref:`cusparseLtSpMMACompress2() <cusparseLtSpMMACompress2-label>`. It has to be called after :ref:`cusparseLtMatmulPlanInit <cusparseLtMatmulPlanInit-label>`.

+-----------------------+--------+--------+----------------------------------------------------------------+
| Parameter             | Memory | In/Out | Description                                                    |
+=======================+========+========+================================================================+
| |handle|_             | Host   | IN     | cuSPARSELt library handle                                      |
+-----------------------+--------+--------+----------------------------------------------------------------+
| |sparseMatDescr|_     | Host   | IN     | Structured (sparse) matrix descriptor                          |
+-----------------------+--------+--------+----------------------------------------------------------------+
| `compressedSize`      | Host   | OUT    | Size in bytes of the compressed matrix                         |
+-----------------------+--------+--------+----------------------------------------------------------------+
| `compressBufferSize`  | Host   | OUT    | Size in bytes for the buffer needed for the matrix compression |
+-----------------------+--------+--------+----------------------------------------------------------------+

The function has the same properties of :ref:`cusparseLtSpMMACompressedSize() <cusparseLtSpMMACompressedSize-label>`

----

.. _cusparseLtSpMMACompress-label:

--------------------------------------------------------------------------------
:code:`cusparseLtSpMMACompress`
--------------------------------------------------------------------------------

.. code-block:: cpp

   cusparseStatus_t
   cusparseLtSpMMACompress(const cusparseLtHandle_t*     handle,
                           const cusparseLtMatmulPlan_t* plan,
                           const void*                   d_dense,
                           void*                         d_compressed,
                           void*                         d_compressed_buffer,
                           cudaStream_t                  stream)

The function compresses a dense matrix `d_dense`. The *compressed* matrix is intended to be used as the first/second operand `A`/`B` in the :ref:`cusparseLtMatmul() <cusparseLtMatmul-label>` function.

+-----------------------+--------+--------+---------------------------------------------------+
| Parameter             | Memory | In/Out | Description                                       |
+=======================+========+========+===================================================+
| |handle|_             | Host   | IN     | cuSPARSELt library handle                         |
+-----------------------+--------+--------+---------------------------------------------------+
| |plan|_               | Host   | IN     | Matrix multiplication plan                        |
+-----------------------+--------+--------+---------------------------------------------------+
| `d_dense`             | Device | IN     | Pointer to the dense matrix                       |
+-----------------------+--------+--------+---------------------------------------------------+
| `d_compressed`        | Device | OUT    | Pointer to the *compressed* matrix                |
+-----------------------+--------+--------+---------------------------------------------------+
| `d_compressed_buffer` | Device | OUT    | Pointer to temporary buffer for the compression   |
+-----------------------+--------+--------+---------------------------------------------------+
| `stream`              | Host   | IN     | CUDA stream for the computation                   |
+-----------------------+--------+--------+---------------------------------------------------+

**Properties**

    * The routine supports asynchronous execution with respect to `stream`
    * Provides deterministic (bit-wise) results for each run

`cusparseLtSpMMACompress()` supports the following `optimizations <https://docs.nvidia.com/cuda/cusparse/index.html#optimization-notes>`_:

    * CUDA graph capture
    * Hardware Memory Compression

See `cusparseStatus_t <https://docs.nvidia.com/cuda/cusparse/index.html#cusparseStatus_t>`_ for the description of the return status.

----

.. _cusparseLtSpMMACompress2-label:

--------------------------------------------------------------------------------
:code:`cusparseLtSpMMACompress2 [DEPRECATED]`
--------------------------------------------------------------------------------

.. code-block:: cpp

    cusparseStatus_t
    cusparseLtSpMMACompress2(const cusparseLtHandle_t*        handle,
                             const cusparseLtMatDescriptor_t* sparseMatDescr,
                             int                              isSparseA,
                             cusparseOperation_t              op,
                             const void*                      d_dense,
                             void*                            d_compressed,
                             void*                            d_compressed_buffer,
                             cudaStream_t                     stream)

.. table::
    :widths: 10 8 8 34 20

    +-----------------------+--------+--------+-----------------------------------------------------------------------------------------+--------------------------------------------------------------------+
    | Parameter             | Memory | In/Out | Description                                                                             | Possible Values                                                    |
    +=======================+========+========+=========================================================================================+====================================================================+
    | |handle|_             | Host   | IN     | cuSPARSELt library handle                                                               |                                                                    |
    +-----------------------+--------+--------+-----------------------------------------------------------------------------------------+--------------------------------------------------------------------+
    | |sparseMatDescr|_     | Host   | IN     | Structured (sparse) matrix descriptor                                                   |                                                                    |
    +-----------------------+--------+--------+-----------------------------------------------------------------------------------------+--------------------------------------------------------------------+
    | `isSparseA`           | Host   | IN     | Specify if the structured (sparse) matrix is in the first position (`matA` or `matB`)   | `0` **false**, **true** otherwise                                  |
    +-----------------------+--------+--------+-----------------------------------------------------------------------------------------+--------------------------------------------------------------------+
    | `op`                  | Host   | IN     | Operation that will be applied to the structured (sparse) matrix in the multiplication  | `CUSPARSE_OPERATION_NON_TRANSPOSE`, `CUSPARSE_OPERATION_TRANSPOSE` |
    +-----------------------+--------+--------+-----------------------------------------------------------------------------------------+--------------------------------------------------------------------+
    | `d_dense`             | Device | IN     | Pointer to the dense matrix                                                             |                                                                    |
    +-----------------------+--------+--------+-----------------------------------------------------------------------------------------+--------------------------------------------------------------------+
    | `d_compressed`        | Device | OUT    | Pointer to the *compressed* matrix                                                      |                                                                    |
    +-----------------------+--------+--------+-----------------------------------------------------------------------------------------+--------------------------------------------------------------------+
    | `d_compressed_buffer` | Device | OUT    | Pointer to temporary buffer for the compression                                         |                                                                    |
    +-----------------------+--------+--------+-----------------------------------------------------------------------------------------+--------------------------------------------------------------------+
    | `stream`              | Host   | IN     | CUDA stream for the computation                                                         |                                                                    |
    +-----------------------+--------+--------+-----------------------------------------------------------------------------------------+--------------------------------------------------------------------+

The function has the same properties of :ref:`cusparseLtSpMMACompress() <cusparseLtSpMMACompress-label>`