Generic API Functions

cusparseAxpby()

cusparseStatus_t
cusparseAxpby(cusparseHandle_t          handle,
              const void*               alpha,
              cusparseConstSpVecDescr_t vecX,  // non-const descriptor supported
              const void*               beta,
              cusparseDnVecDescr_t      vecY)

The function computes the sum of a sparse vector vecX and a dense vector vecY.

\(\mathbf{Y} = \alpha\mathbf{X} + \beta\mathbf{Y}\)

In other words,

for i=0 to n-1
    Y[i] = beta * Y[i]
for i=0 to nnz-1
    Y[X_indices[i]] += alpha * X_values[i]

Param.	Memory	In/out	Meaning
`handle`	HOST	IN	Handle to the cuSPARSE library context
`alpha`	HOST or DEVICE	IN	\(\alpha\) scalar used for multiplication of compute type
`vecX`	HOST	IN	Sparse vector `X`
`beta`	HOST or DEVICE	IN	\(\beta\) scalar used for multiplication of compute type
`vecY`	HOST	IN/OUT	Dense vector `Y`

cusparseAxpby supports the following index type for representing the sparse vector vecX:

32-bit indices (CUSPARSE_INDEX_32I)
64-bit indices (CUSPARSE_INDEX_64I)

cusparseAxpby supports the following data types:

Uniform-precision computation:

`X`/`Y`/`compute`
`CUDA_R_32F`
`CUDA_R_64F`
`CUDA_C_32F`
`CUDA_C_64F`

Mixed-precision computation:

`X`/`Y`	`compute`
`CUDA_R_16F`	`CUDA_R_32F`
`CUDA_R_16BF`	`CUDA_R_32F`
`CUDA_C_16F`	`CUDA_C_32F`	[DEPRECATED]
`CUDA_C_16BF`	`CUDA_C_32F`	[DEPRECATED]

cusparseAxpby() has the following constraints:

The arrays representing the sparse vector vecX must be aligned to 16 bytes

cusparseAxpby() has the following properties:

The routine requires no extra storage
The routine supports asynchronous execution
Provides deterministic (bit-wise) results for each run if the the sparse vector vecX indices are distinct
The routine allows indices of vecX to be unsorted

cusparseAxpby() supports the following optimizations:

CUDA graph capture
Hardware Memory Compression

See cusparseStatus_t for the description of the return status.

Please visit cuSPARSE Library Samples - cusparseAxpby for a code example.

cusparseGather()

cusparseStatus_t
cusparseGather(cusparseHandle_t          handle,
               cusparseConstDnVecDescr_t vecY,  // non-const descriptor supported
               cusparseSpVecDescr_t      vecX)

The function gathers the elements of the dense vector vecY into the sparse vector vecX

In other words,

for i=0 to nnz-1
    X_values[i] = Y[X_indices[i]]

Param.	Memory	In/out	Meaning
`handle`	HOST	IN	Handle to the cuSPARSE library context
`vecX`	HOST	OUT	Sparse vector `X`
`vecY`	HOST	IN	Dense vector `Y`

cusparseGather supports the following index type for representing the sparse vector vecX:

32-bit indices (CUSPARSE_INDEX_32I)
64-bit indices (CUSPARSE_INDEX_64I)

cusparseGather supports the following data types:

`X`/`Y`
`CUDA_R_16F`
`CUDA_R_16BF`
`CUDA_R_32F`
`CUDA_R_64F`
`CUDA_C_16F` [DEPRECATED]
`CUDA_C_16BF` [DEPRECATED]
`CUDA_C_32F`
`CUDA_C_64F`

cusparseGather() has the following constraints:

The arrays representing the sparse vector vecX must be aligned to 16 bytes

cusparseGather() has the following properties:

The routine requires no extra storage
The routine supports asynchronous execution
Provides deterministic (bit-wise) results for each run if the the sparse vector vecX indices are distinct
The routine allows indices of vecX to be unsorted

cusparseGather() supports the following optimizations:

CUDA graph capture
Hardware Memory Compression

See cusparseStatus_t for the description of the return status.

Please visit cuSPARSE Library Samples - cusparseGather for a code example.

cusparseScatter()

cusparseStatus_t
cusparseScatter(cusparseHandle_t          handle,
                cusparseConstSpVecDescr_t vecX,  // non-const descriptor supported
                cusparseDnVecDescr_t      vecY)

The function scatters the elements of the sparse vector vecX into the dense vector vecY

In other words,

for i=0 to nnz-1
    Y[X_indices[i]] = X_values[i]

Param.	Memory	In/out	Meaning
`handle`	HOST	IN	Handle to the cuSPARSE library context
`vecX`	HOST	IN	Sparse vector `X`
`vecY`	HOST	OUT	Dense vector `Y`

cusparseScatter supports the following index type for representing the sparse vector vecX:

32-bit indices (CUSPARSE_INDEX_32I)
64-bit indices (CUSPARSE_INDEX_64I)

cusparseScatter supports the following data types:

`X`/`Y`
`CUDA_R_8I`
`CUDA_R_16F`
`CUDA_R_16BF`
`CUDA_R_32F`
`CUDA_R_64F`
`CUDA_C_16F` [DEPRECATED]
`CUDA_C_16BF` [DEPRECATED]
`CUDA_C_32F`
`CUDA_C_64F`

cusparseScatter() has the following constraints:

The arrays representing the sparse vector vecX must be aligned to 16 bytes

cusparseScatter() has the following properties:

The routine requires no extra storage
The routine supports asynchronous execution
Provides deterministic (bit-wise) results for each run if the the sparse vector vecX indices are distinct
The routine allows indices of vecX to be unsorted

cusparseScatter() supports the following optimizations:

CUDA graph capture
Hardware Memory Compression

See cusparseStatus_t for the description of the return status.

Please visit cuSPARSE Library Samples - cusparseScatter for a code example.

cusparseRot() [DEPRECATED]

> The routine will be removed in the next major release

cusparseStatus_t
cusparseRot(cusparseHandle_t     handle,
            const void*          c_coeff,
            const void*          s_coeff,
            cusparseSpVecDescr_t vecX,
            cusparseDnVecDescr_t vecY)

The function computes the Givens rotation matrix

\(G = \begin{bmatrix} c & s \\ {- s} & c \\ \end{bmatrix}\)

to a sparse vecX and a dense vector vecY

In other words,

for i=0 to nnz-1
    Y[X_indices[i]] = c * Y[X_indices[i]] - s * X_values[i]
    X_values[i]     = c * X_values[i]     + s * Y[X_indices[i]]

Param.	Memory	In/out	Meaning
`handle`	HOST	IN	Handle to the cuSPARSE library context
`c_coeff`	HOST or DEVICE	IN	cosine element of the rotation matrix
`vecX`	HOST	IN/OUT	Sparse vector `X`
`s_coeff`	HOST or DEVICE	IN	sine element of the rotation matrix
`vecY`	HOST	IN/OUT	Dense vector `Y`

cusparseRot supports the following index type for representing the sparse vector vecX:

32-bit indices (CUSPARSE_INDEX_32I)
64-bit indices (CUSPARSE_INDEX_64I)

cusparseRot supports the following data types:

Uniform-precision computation:

`X`/`Y`/`compute`
`CUDA_R_32F`
`CUDA_R_64F`
`CUDA_C_32F`
`CUDA_C_64F`

Mixed-precision computation:

`X`/`Y`	`compute`
`CUDA_R_16F`	`CUDA_R_32F`
`CUDA_R_16BF`	`CUDA_R_32F`
`CUDA_C_16F`	`CUDA_C_32F`	[DEPRECATED]
`CUDA_C_16BF`	`CUDA_C_32F`	[DEPRECATED]

cusparseRot() has the following constraints:

The arrays representing the sparse vector vecX must be aligned to 16 bytes

cusparseRot() has the following properties:

The routine requires no extra storage
The routine supports asynchronous execution
Provides deterministic (bit-wise) results for each run if the the sparse vector vecX indices are distinct

cusparseRot() supports the following optimizations:

CUDA graph capture
Hardware Memory Compression

See cusparseStatus_t for the description of the return status.

Please visit cuSPARSE Library Samples - cusparseRot for a code example.

cusparseSpVV()

cusparseStatus_t
cusparseSpVV_bufferSize(cusparseHandle_t          handle,
                        cusparseOperation_t       opX,
                        cusparseConstSpVecDescr_t vecX,  // non-const descriptor supported
                        cusparseConstDnVecDescr_t vecY,  // non-const descriptor supported
                        void*                     result,
                        cudaDataType              computeType,
                        size_t*                   bufferSize)

cusparseStatus_t
cusparseSpVV(cusparseHandle_t          handle,
             cusparseOperation_t       opX,
             cusparseConstSpVecDescr_t vecX,  // non-const descriptor supported
             cusparseConstDnVecDescr_t vecY,  // non-const descriptor supported
             void*                     result,
             cudaDataType              computeType,
             void*                     externalBuffer)

The function computes the inner dot product of a sparse vector vecX and a dense vector vecY

\(result = op\left(\mathbf{X}\right) \cdot \mathbf{Y}\)

In other words,

result = 0;
for i=0 to nnz-1
    result += op(X_values[i]) * Y[X_indices[i]]

\(\text{op}(X) = \begin{cases} X & \text{if op(X) == CUSPARSE_OPERATION_NON_TRANSPOSE} \\ \overline{X} & \text{if op(X) == CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE} \\ \end{cases}\)

The function cusparseSpVV_bufferSize() returns the size of the workspace needed by cusparseSpVV()

Param.	Memory	In/out	Meaning
`handle`	HOST	IN	Handle to the cuSPARSE library context
`opX`	HOST	IN	Operation `op(X)` that is non-transpose or conjugate transpose
`vecX`	HOST	IN	Sparse vector `X`
`vecY`	HOST	IN	Dense vector `Y`
`result`	HOST or DEVICE	OUT	The resulting dot product
`computeType`	HOST	IN	Datatype in which the computation is executed
`bufferSize`	HOST	OUT	Number of bytes of workspace needed by `cusparseSpVV`
`externalBuffer`	DEVICE	IN	Pointer to a workspace buffer of at least `bufferSize` bytes

cusparseSpVV supports the following index type for representing the sparse vector vecX:

32-bit indices (CUSPARSE_INDEX_32I)
64-bit indices (CUSPARSE_INDEX_64I)

The data types combinations currently supported for cusparseSpVV are listed below:

Uniform-precision computation:

`X`/`Y`/`computeType`
`CUDA_R_32F`
`CUDA_R_64F`
`CUDA_C_32F`
`CUDA_C_64F`

Mixed-precision computation:

`X`/`Y`	`computeType`/`result`	Notes
`CUDA_R_8I`	`CUDA_R_32I`
`CUDA_R_8I`	`CUDA_R_32F`
`CUDA_R_16F`	`CUDA_R_32F`
`CUDA_R_16BF`	`CUDA_R_32F`
`CUDA_C_16F`	`CUDA_C_32F`	[DEPRECATED]
`CUDA_C_16BF`	`CUDA_C_32F`	[DEPRECATED]

cusparseSpVV() has the following constraints:

The arrays representing the sparse vector vecX must be aligned to 16 bytes

cusparseSpVV() has the following properties:

The routine requires no extra storage
The routine supports asynchronous execution
Provides deterministic (bit-wise) results for each run if the the sparse vector vecX indices are distinct
The routine allows indices of vecX to be unsorted

cusparseSpVV() supports the following optimizations:

CUDA graph capture
Hardware Memory Compression

See cusparseStatus_t for the description of the return status.

Please visit cuSPARSE Library Samples - cusparseSpVV for a code example.

cusparseSpMV()

cusparseStatus_t
cusparseSpMV_bufferSize(cusparseHandle_t          handle,
                        cusparseOperation_t       opA,
                        const void*               alpha,
                        cusparseConstSpMatDescr_t matA,  // non-const descriptor supported
                        cusparseConstDnVecDescr_t vecX,  // non-const descriptor supported
                        const void*               beta,
                        cusparseDnVecDescr_t      vecY,
                        cudaDataType              computeType,
                        cusparseSpMVAlg_t         alg,
                        size_t*                   bufferSize)

cusparseStatus_t
cusparseSpMV_preprocess(cusparseHandle_t          handle,
                        cusparseOperation_t       opA,
                        const void*               alpha,
                        cusparseConstSpMatDescr_t matA,  // non-const descriptor supported
                        cusparseConstDnVecDescr_t vecX,  // non-const descriptor supported
                        const void*               beta,
                        cusparseDnVecDescr_t      vecY,
                        cudaDataType              computeType,
                        cusparseSpMVAlg_t         alg,
                        void*                     externalBuffer)

cusparseStatus_t
cusparseSpMV(cusparseHandle_t          handle,
             cusparseOperation_t       opA,
             const void*               alpha,
             cusparseConstSpMatDescr_t matA,  // non-const descriptor supported
             cusparseConstDnVecDescr_t vecX,  // non-const descriptor supported
             const void*               beta,
             cusparseDnVecDescr_t      vecY,
             cudaDataType              computeType,
             cusparseSpMVAlg_t         alg,
             void*                     externalBuffer)

This function performs the multiplication of a sparse matrix matA and a dense vector vecX

\(\mathbf{Y} = \alpha op\left( \mathbf{A} \right) \cdot \mathbf{X} + \beta\mathbf{Y}\)

where

op(A) is a sparse matrix of size \(m \times k\)
X is a dense vector of size \(k\)
Y is a dense vector of size \(m\)
\(\alpha\) and \(\beta\) are scalars

Also, for matrix A

\(\text{op}(A) == \begin{cases} A & \text{if op(A) == CUSPARSE_OPERATION_NON_TRANSPOSE} \\ A^{T} & \text{if op(A) == CUSPARSE_OPERATION_TRANSPOSE} \\ A^{H} & \text{if op(A) ==CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE} \\ \end{cases}\)

The function cusparseSpMV_bufferSize() returns the size of the workspace needed by cusparseSpMV_preprocess() and cusparseSpMV()

Param.	Memory	In/out	Meaning
`handle`	HOST	IN	Handle to the cuSPARSE library context
`opA`	HOST	IN	Operation `op(A)`
`alpha`	HOST or DEVICE	IN	\(\alpha\) scalar used for multiplication of type `computeType`
`matA`	HOST	IN	Sparse matrix `A`
`vecX`	HOST	IN	Dense vector `X`
`beta`	HOST or DEVICE	IN	\(\beta\) scalar used for multiplication of type `computeType`
`vecY`	HOST	IN/OUT	Dense vector `Y`
`computeType`	HOST	IN	Datatype in which the computation is executed
`alg`	HOST	IN	Algorithm for the computation
`bufferSize`	HOST	OUT	Number of bytes of workspace needed by `cusparseSpMV`
`externalBuffer`	DEVICE	IN	Pointer to a workspace buffer of at least `bufferSize` bytes

The sparse matrix formats currently supported are listed below:

CUSPARSE_FORMAT_COO
CUSPARSE_FORMAT_CSR
CUSPARSE_FORMAT_CSC
CUSPARSE_FORMAT_SLICED_ELL

cusparseSpMV supports the following index type for representing the sparse matrix matA:

32-bit indices (CUSPARSE_INDEX_32I)
64-bit indices (CUSPARSE_INDEX_64I)

cusparseSpMV supports the following data types:

Uniform-precision computation:

`A`/`X`/ `Y`/`computeType`
`CUDA_R_32F`
`CUDA_R_64F`
`CUDA_C_32F`
`CUDA_C_64F`

Mixed-precision computation:

`A`/`X`	`Y`	`computeType`	Notes
`CUDA_R_8I`	`CUDA_R_32I`	`CUDA_R_32I`
`CUDA_R_8I`	`CUDA_R_32F`	`CUDA_R_32F`
`CUDA_R_16F`
`CUDA_R_16BF`
`CUDA_R_16F`	`CUDA_R_16F`
`CUDA_R_16BF`	`CUDA_R_16BF`
`CUDA_C_32F`	`CUDA_C_32F`	`CUDA_C_32F`
`CUDA_C_16F`	`CUDA_C_16F`		[DEPRECATED]
`CUDA_C_16BF`	`CUDA_C_16BF`		[DEPRECATED]

`A`	`X`/`Y`/`computeType`
`CUDA_R_32F`	`CUDA_R_64F`

Mixed Regular/Complex computation:

`A`	`X`/`Y`/`computeType`
`CUDA_R_32F`	`CUDA_C_32F`
`CUDA_R_64F`	`CUDA_C_64F`

NOTE: CUDA_R_16F, CUDA_R_16BF, CUDA_C_16F, and CUDA_C_16BF data types always imply mixed-precision computation.

cusparseSpMV() supports the following algorithms:

Algorithm	Notes
`CUSPARSE_SPMV_ALG_DEFAULT`	Default algorithm for any sparse matrix format.
`CUSPARSE_SPMV_COO_ALG1`	Default algorithm for COO sparse matrix format. May produce slightly different results during different runs with the same input parameters.
`CUSPARSE_SPMV_COO_ALG2`	Provides deterministic (bit-wise) results for each run. If `opA != CUSPARSE_OPERATION_NON_TRANSPOSE`, it is identical to `CUSPARSE_SPMV_COO_ALG1`.
`CUSPARSE_SPMV_CSR_ALG1`	Default algorithm for CSR/CSC sparse matrix format. May produce slightly different results during different runs with the same input parameters.
`CUSPARSE_SPMV_CSR_ALG2`	Provides deterministic (bit-wise) results for each run. If `opA != CUSPARSE_OPERATION_NON_TRANSPOSE`, it is identical to `CUSPARSE_SPMV_CSR_ALG1`.
`CUSPARSE_SPMV_SELL_ALG1`	Default algorithm for Sliced Ellpack sparse matrix format. Provides deterministic (bit-wise) results for each run.

Performance notes:

CUSPARSE_SPMV_COO_ALG1 and CUSPARSE_SPMV_CSR_ALG1 provide higher performance than CUSPARSE_SPMV_COO_ALG2 and CUSPARSE_SPMV_CSR_ALG2.
In general, opA == CUSPARSE_OPERATION_NON_TRANSPOSE is 3x faster than opA != CUSPARSE_OPERATION_NON_TRANSPOSE.
Using cusparseSpMV_preprocess() helps improve performance of cusparseSpMV() in CSR. It is beneficial when we need to run cusparseSpMV() multiple times with a same matrix (cusparseSpMV_preprocess() is executed only once).

cusparseSpMV() has the following properties:

The routine requires extra storage for CSR/CSC format (all algorithms) and for COO format with CUSPARSE_SPMV_COO_ALG2 algorithm.
Provides deterministic (bit-wise) results for each run only for CUSPARSE_SPMV_COO_ALG2 and CUSPARSE_SPMV_CSR_ALG2 algorithms, and opA == CUSPARSE_OPERATION_NON_TRANSPOSE.
The routine supports asynchronous execution.
compute-sanitizer could report false race conditions for this routine when beta == 0. This is for optimization purposes and does not affect the correctness of the computation.
The routine allows the indices of matA to be unsorted.

cusparseSpMV() supports the following optimizations:

CUDA graph capture
Hardware Memory Compression

See cusparseStatus_t for the description of the return status.

Please visit cuSPARSE Library Samples - cusparseSpMV CSR and cusparseSpMV COO for a code example.

cusparseSpSV()

cusparseStatus_t
cusparseSpSV_createDescr(cusparseSpSVDescr_t* spsvDescr);

cusparseStatus_t
cusparseSpSV_destroyDescr(cusparseSpSVDescr_t spsvDescr);

cusparseStatus_t
cusparseSpSV_bufferSize(cusparseHandle_t          handle,
                        cusparseOperation_t       opA,
                        const void*               alpha,
                        cusparseConstSpMatDescr_t matA,  // non-const descriptor supported
                        cusparseConstDnVecDescr_t vecX,  // non-const descriptor supported
                        cusparseDnVecDescr_t      vecY,
                        cudaDataType              computeType,
                        cusparseSpSVAlg_t         alg,
                        cusparseSpSVDescr_t       spsvDescr,
                        size_t*                   bufferSize)

cusparseStatus_t
cusparseSpSV_analysis(cusparseHandle_t          handle,
                      cusparseOperation_t       opA,
                      const void*               alpha,
                      cusparseConstSpMatDescr_t matA,  // non-const descriptor supported
                      cusparseConstDnVecDescr_t vecX,  // non-const descriptor supported
                      cusparseDnVecDescr_t      vecY,
                      cudaDataType              computeType,
                      cusparseSpSVAlg_t         alg,
                      cusparseSpSVDescr_t       spsvDescr
                      void*                     externalBuffer)

cusparseStatus_t
cusparseSpSV_solve(cusparseHandle_t          handle,
                   cusparseOperation_t       opA,
                   const void*               alpha,
                   cusparseConstSpMatDescr_t matA,  // non-const descriptor supported
                   cusparseConstDnVecDescr_t vecX,  // non-const descriptor supported
                   cusparseDnVecDescr_t      vecY,
                   cudaDataType              computeType,
                   cusparseSpSVAlg_t         alg,
                   cusparseSpSVDescr_t       spsvDescr)

cusparseStatus_t
cusparseSpSV_updateMatrix(cusparseHandle_t     handle,
                          cusparseSpSVDescr_t  spsvDescr,
                          void*                newValues,
                          cusparseSpSVUpdate_t updatePart)

The function solves a system of linear equations whose coefficients are represented in a sparse triangular matrix:

\(op\left( \mathbf{A} \right) \cdot \mathbf{Y} = \alpha\mathbf{X}\)

where

op(A) is a sparse square matrix of size \(m \times m\)
X is a dense vector of size \(m\)
Y is a dense vector of size \(m\)
\(\alpha\) is a scalar

Also, for matrix A

\(\text{op}(A) = \begin{cases} A & \text{if op(A) == CUSPARSE_OPERATION_NON_TRANSPOSE} \\ A^{T} & \text{if op(A) == CUSPARSE_OPERATION_TRANSPOSE} \\ A^{H} & \text{if op(A) == CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE} \\ \end{cases}\)

The function cusparseSpSV_bufferSize() returns the size of the workspace needed by cusparseSpSV_analysis() and cusparseSpSV_solve(). The function cusparseSpSV_analysis() performs the analysis phase, while cusparseSpSV_solve() executes the solve phase for a sparse triangular linear system. The opaque data structure spsvDescr is used to share information among all functions. The function cusparseSpSV_updateMatrix() updates spsvDescr with new matrix values.

The routine supports arbitrary sparsity for the input matrix, but only the upper or lower triangular part is taken into account in the computation.

NOTE: all parameters must be consistent across cusparseSpSV API calls and the matrix descriptions and externalBuffer must not be modified between cusparseSpSV_analysis() and cusparseSpSV_solve(). The function cusparseSpSV_updateMatrix() can be used to update the values on the sparse matrix stored inside the opaque data structure spsvDescr

Param.	Memory	In/out	Meaning
`handle`	HOST	IN	Handle to the cuSPARSE library context
`opA`	HOST	IN	Operation `op(A)`
`alpha`	HOST or DEVICE	IN	\(\alpha\) scalar used for multiplication of type `computeType`
`matA`	HOST	IN	Sparse matrix `A`
`vecX`	HOST	IN	Dense vector `X`
`vecY`	HOST	IN/OUT	Dense vector `Y`
`computeType`	HOST	IN	Datatype in which the computation is executed
`alg`	HOST	IN	Algorithm for the computation
`bufferSize`	HOST	OUT	Number of bytes of workspace needed by `cusparseSpSV_analysis()` and `cusparseSpSV_solve()`
`externalBuffer`	DEVICE	IN/OUT	Pointer to a workspace buffer of at least `bufferSize` bytes. It is used by `cusparseSpSV_analysis` and `cusparseSpSV_solve()`
`spsvDescr`	HOST	IN/OUT	Opaque descriptor for storing internal data used across the three steps

The sparse matrix formats currently supported are listed below:

CUSPARSE_FORMAT_CSR
CUSPARSE_FORMAT_COO
CUSPARSE_FORMAT_SLICED_ELL

The cusparseSpSV() supports the following shapes and properties:

CUSPARSE_FILL_MODE_LOWER and CUSPARSE_FILL_MODE_UPPER fill modes
CUSPARSE_DIAG_TYPE_NON_UNIT and CUSPARSE_DIAG_TYPE_UNIT diagonal types

The fill mode and diagonal type can be set by cusparseSpMatSetAttribute()

cusparseSpSV() supports the following index type for representing the sparse matrix matA:

32-bit indices (CUSPARSE_INDEX_32I)
64-bit indices (CUSPARSE_INDEX_64I)

cusparseSpSV() supports the following data types:

Uniform-precision computation:

`A`/`X`/ `Y`/`computeType`
`CUDA_R_32F`
`CUDA_R_64F`
`CUDA_C_32F`
`CUDA_C_64F`

cusparseSpSV() supports the following algorithms:

Algorithm	Notes
`CUSPARSE_SPSV_ALG_DEFAULT`	Default algorithm

cusparseSpSV() has the following properties:

The routine requires extra storage for the analysis phase which is proportional to number of non-zero entries of the sparse matrix
Provides deterministic (bit-wise) results for each run for the solving phase cusparseSpSV_solve()
The routine supports in-place operation
The cusparseSpSV_solve() routine supports asynchronous execution
cusparseSpSV_bufferSize() and cusparseSpSV_analysis() routines accept NULL for vecX and vecY
The routine allows the indices of matA to be unsorted

cusparseSpSV() supports the following optimizations:

CUDA graph capture
Hardware Memory Compression

cusparseSpSV_updateMatrix() updates the sparse matrix after calling the analysis phase. This functions supports the following update strategies (updatePart):

Strategy	Notes
`CUSPARSE_SPSV_UPDATE_GENERAL`	Updates the sparse matrix values with values of `newValues` array
`CUSPARSE_SPSV_UPDATE_DIAGONAL`	Updates the diagonal part of the matrix with diagonal values stored in `newValues` array. That is, `newValues` has the new diagonal values only

See cusparseStatus_t for the description of the return status.

Please visit cuSPARSE Library Samples - cusparseSpSV CSR and cuSPARSE Library Samples - cusparseSpSV COO for code examples.

cusparseSpMM()

cusparseStatus_t
cusparseSpMM_bufferSize(cusparseHandle_t          handle,
                        cusparseOperation_t       opA,
                        cusparseOperation_t       opB,
                        const void*               alpha,
                        cusparseConstSpMatDescr_t matA,  // non-const descriptor supported
                        cusparseConstDnMatDescr_t matB,  // non-const descriptor supported
                        const void*               beta,
                        cusparseDnMatDescr_t      matC,
                        cudaDataType              computeType,
                        cusparseSpMMAlg_t         alg,
                        size_t*                   bufferSize)

cusparseStatus_t
cusparseSpMM_preprocess(cusparseHandle_t          handle,
                        cusparseOperation_t       opA,
                        cusparseOperation_t       opB,
                        const void*               alpha,
                        cusparseConstSpMatDescr_t matA,  // non-const descriptor supported
                        cusparseConstDnMatDescr_t matB,  // non-const descriptor supported
                        const void*               beta,
                        cusparseDnMatDescr_t      matC,
                        cudaDataType              computeType,
                        cusparseSpMMAlg_t         alg,
                        void*                     externalBuffer)

cusparseStatus_t
cusparseSpMM(cusparseHandle_t          handle,
             cusparseOperation_t       opA,
             cusparseOperation_t       opB,
             const void*               alpha,
             cusparseConstSpMatDescr_t matA,  // non-const descriptor supported
             cusparseConstDnMatDescr_t matB,  // non-const descriptor supported
             const void*               beta,
             cusparseDnMatDescr_t      matC,
             cudaDataType              computeType,
             cusparseSpMMAlg_t         alg,
             void*                     externalBuffer)

The function performs the multiplication of a sparse matrix matA and a dense matrix matB.

\(\mathbf{C} = \alpha op\left( \mathbf{A} \right) \cdot op\left( \mathbf{B} \right) + \beta\mathbf{C}\)

where

op(A) is a sparse matrix of size \(m \times k\)
op(B) is a dense matrix of size \(k \times n\)
C is a dense matrix of size \(m \times n\)
\(\alpha\) and \(\beta\) are scalars

The routine can be also used to perform the multiplication of a dense matrix and a sparse matrix by switching the dense matrices layout:

\(\begin{array}{l} \left. \mathbf{C}_{C} = \mathbf{B}_{C} \cdot \mathbf{A} + \beta\mathbf{C}_{C}\rightarrow \right. \\ {\mathbf{C}_{R} = \mathbf{A}^{T} \cdot \mathbf{B}_{R} + \beta\mathbf{C}_{R}} \\ \end{array}\)

where \(\mathbf{B}_{C}\) , \(\mathbf{C}_{C}\) indicate column-major layout, while \(\mathbf{B}_{R}\) , \(\mathbf{C}_{R}\) refer to row-major layout

Also, for matrix A and B

\(\text{op}(A) = \begin{cases} A & \text{if op(A) == CUSPARSE_OPERATION_NON_TRANSPOSE} \\ A^{T} & \text{if op(A) == CUSPARSE_OPERATION_TRANSPOSE} \\ A^{H} & \text{if op(A) == CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE} \\ \end{cases}\)

\(\text{op}(B) = \begin{cases} B & \text{if op(B) == CUSPARSE_OPERATION_NON_TRANSPOSE} \\ B^{T} & \text{if op(B) == CUSPARSE_OPERATION_TRANSPOSE} \\ B^{H} & \text{if op(B) == CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE} \\ \end{cases}\)

When using the (conjugate) transpose of the sparse matrix A, this routine may produce slightly different results during different runs with the same input parameters.

The function cusparseSpMM_bufferSize() returns the size of the workspace needed by cusparseSpMM()

The function cusparseSpMM_preprocess() can be called before cusparseSpMM to speedup the actual computation. It is useful when cusparseSpMM is called multiple times with the same sparsity pattern (matA). The values of the matrices (matA, matB, matC) can change arbitrarily. It provides performance advantages is used with CUSPARSE_SPMM_CSR_ALG1 or CUSPARSE_SPMM_CSR_ALG3. For all other formats and algorithms have no effect.

Param.	Memory	In/out	Meaning
`handle`	HOST	IN	Handle to the cuSPARSE library context
`opA`	HOST	IN	Operation `op(A)`
`opB`	HOST	IN	Operation `op(B)`
`alpha`	HOST or DEVICE	IN	\(\alpha\) scalar used for multiplication of type `computeType`
`matA`	HOST	IN	Sparse matrix `A`
`matB`	HOST	IN	Dense matrix `B`
`beta`	HOST or DEVICE	IN	\(\beta\) scalar used for multiplication of type `computeType`
`matC`	HOST	IN/OUT	Dense matrix `C`
`computeType`	HOST	IN	Datatype in which the computation is executed
`alg`	HOST	IN	Algorithm for the computation
`bufferSize`	HOST	OUT	Number of bytes of workspace needed by `cusparseSpMM`
`externalBuffer`	DEVICE	IN	Pointer to workspace buffer of at least `bufferSize` bytes

cusparseSpMM supports the following sparse matrix formats:

CUSPARSE_FORMAT_COO
CUSPARSE_FORMAT_CSR
CUSPARSE_FORMAT_CSC
CUSPARSE_FORMAT_BLOCKED_ELL

(1)

COO/CSR/CSC FORMATS

cusparseSpMM supports the following index type for representing the sparse matrix matA:

32-bit indices (CUSPARSE_INDEX_32I)
64-bit indices (CUSPARSE_INDEX_64I)

cusparseSpMM supports the following data types:

Uniform-precision computation:

`A`/`B`/ `C`/`computeType`
`CUDA_R_32F`
`CUDA_R_64F`
`CUDA_C_32F`
`CUDA_C_64F`

Mixed-precision computation:

`A`/`B`	`C`	`computeType`
`CUDA_R_8I`	`CUDA_R_32I`	`CUDA_R_32I`
`CUDA_R_8I`	`CUDA_R_32F`	`CUDA_R_32F`
`CUDA_R_16F`
`CUDA_R_16BF`
`CUDA_R_16F`	`CUDA_R_16F`
`CUDA_R_16BF`	`CUDA_R_16BF`
`CUDA_C_16F`	`CUDA_C_16F`	`CUDA_C_32F`	[DEPRECATED]
`CUDA_C_16BF`	`CUDA_C_16BF`	`CUDA_C_32F`	[DEPRECATED]

NOTE: CUDA_R_16F, CUDA_R_16BF, CUDA_C_16F, and CUDA_C_16BF data types always imply mixed-precision computation.

cusparseSpMM supports the following algorithms:

Algorithm	Notes
`CUSPARSE_SPMM_ALG_DEFAULT`	Default algorithm for any sparse matrix format
`CUSPARSE_SPMM_COO_ALG1`	Algorithm 1 for COO sparse matrix format May provide better performance for small number of nnz Provides the best performance with column-major layout It supports batched computation May produce slightly different results during different runs with the same input parameters
`CUSPARSE_SPMM_COO_ALG2`	Algorithm 2 for COO sparse matrix format It provides deterministic result Provides the best performance with column-major layout In general, slower than Algorithm 1 It supports batched computation It requires additional memory If `opA != CUSPARSE_OPERATION_NON_TRANSPOSE`, it is identical to `CUSPARSE_SPMM_COO_ALG1`
`CUSPARSE_SPMM_COO_ALG3`	Algorithm 3 for COO sparse matrix format May provide better performance for large number of nnz May produce slightly different results during different runs with the same input parameters
`CUSPARSE_SPMM_COO_ALG4`	Algorithm 4 for COO sparse matrix format Provides better performance with row-major layout It supports batched computation May produce slightly different results during different runs with the same input parameters
`CUSPARSE_SPMM_CSR_ALG1`	Algorithm 1 for CSR/CSC sparse matrix format Provides the best performance with column-major layout It supports batched computation It requires additional memory May produce slightly different results during different runs with the same input parameters
`CUSPARSE_SPMM_CSR_ALG2`	Algorithm 2 for CSR/CSC sparse matrix format Provides the best performance with row-major layout It supports batched computation It requires additional memory May produce slightly different results during different runs with the same input parameters
`CUSPARSE_SPMM_CSR_ALG3`	Algorithm 3 for CSR/CSC sparse matrix format It provides deterministic result It requires additional memory It supports only `opA == CUSPARSE_OPERATION_NON_TRANSPOSE` It does not support `opB == CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE` It does not support `CUDA_C_16F and CUDA_C_16BF` data types

Performance notes:

Row-major layout provides higher performance than column-major
CUSPARSE_SPMM_COO_ALG4 and CUSPARSE_SPMM_CSR_ALG2 should be used with row-major layout, while CUSPARSE_SPMM_COO_ALG1, CUSPARSE_SPMM_COO_ALG2, CUSPARSE_SPMM_COO_ALG3, and CUSPARSE_SPMM_CSR_ALG1 with column-major layout
For beta != 1, the output matrix is scaled before the actual computation
For n == 1, the routine uses cusparseSpMV() as fallback

cusparseSpMM() with all algorithms support the following batch modes except for CUSPARSE_SPMM_CSR_ALG3:

\(C_{i} = A \cdot B_{i}\)
\(C_{i} = A_{i} \cdot B\)
\(C_{i} = A_{i} \cdot B_{i}\)

The number of batches and their strides can be set by using cusparseCooSetStridedBatch, cusparseCsrSetStridedBatch, and cusparseDnMatSetStridedBatch. The maximum number of batches for cusparseSpMM() is 65,535.

cusparseSpMM() has the following properties:

The routine requires no extra storage for CUSPARSE_SPMM_COO_ALG1, CUSPARSE_SPMM_COO_ALG3, CUSPARSE_SPMM_COO_ALG4
The routine supports asynchronous execution
Provides deterministic (bit-wise) results for each run only for CUSPARSE_SPMM_COO_ALG2 and CUSPARSE_SPMM_CSR_ALG3 algorithms
compute-sanitizer could report false race conditions for this routine. This is for optimization purposes and does not affect the correctness of the computation
The routine allows the indices of matA to be unsorted

cusparseSpMM() supports the following optimizations:

CUDA graph capture
Hardware Memory Compression

Please visit cuSPARSE Library Samples - cusparseSpMM CSR and cusparseSpMM COO for a code example. For batched computation please visit cusparseSpMM CSR Batched and cusparseSpMM COO Batched.

(2)

BLOCKED-ELLPACK FORMAT

cusparseSpMM supports the following data types for CUSPARSE_FORMAT_BLOCKED_ELL format and the following GPU architectures for exploiting NVIDIA Tensor Cores:

`A`/`B`	`C`	`computeType`	`opB`	`Compute Capability`
`CUDA_R_16F`	`CUDA_R_16F`	`CUDA_R_16F`	`N`, `T`	`≥ 70`
`CUDA_R_16F`	`CUDA_R_16F`	`CUDA_R_32F`	`N`, `T`	`≥ 70`
`CUDA_R_16F`	`CUDA_R_32F`	`CUDA_R_32F`	`N`, `T`	`≥ 70`
`CUDA_R_8I`	`CUDA_R_32I`	`CUDA_R_32I`	`N` column-major	`≥ 75`
`CUDA_R_8I`	`CUDA_R_32I`	`CUDA_R_32I`	`T` row-major	`≥ 75`
`CUDA_R_16BF`	`CUDA_R_16BF`	`CUDA_R_32F`	`N`, `T`	`≥ 80`
`CUDA_R_16BF`	`CUDA_R_32F`	`CUDA_R_32F`	`N`, `T`	`≥ 80`
`CUDA_R_32F`	`CUDA_R_32F`	`CUDA_R_32F`	`N`, `T`	`≥ 80`
`CUDA_R_64F`	`CUDA_R_64F`	`CUDA_R_64F`	`N`, `T`	`≥ 80`

cusparseSpMM supports the following algorithms with CUSPARSE_FORMAT_BLOCKED_ELL format:

Algorithm	Notes
`CUSPARSE_SPMM_ALG_DEFAULT`	Default algorithm for any sparse matrix format
`CUSPARSE_SPMM_BLOCKED_ELL_ALG1`	Default algorithm for Blocked-ELL format

Performance notes:

Blocked-ELL SpMM provides the best performance with Power-of-2 Block-Sizes.
Large Block-Sizes (e.g. ≥ 64) provide the best performance.

The function has the following limitations:

The pointer mode must be equal to CUSPARSE_POINTER_MODE_HOST
Only opA == CUSPARSE_OPERATION_NON_TRANSPOSE is supported.
opB == CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE is not supported.
Only CUSPARSE_INDEX_32I is supported.

Please visit cuSPARSE Library Samples - cusparseSpMM Blocked-ELL for a code example.

See cusparseStatus_t for the description of the return status.

cusparseSpMMOp()

cusparseStatus_t CUSPARSEAPI
cusparseSpMMOp_createPlan(cusparseHandle_t          handle,
                          cusparseSpMMOpPlan_t*     plan,
                          cusparseOperation_t       opA,
                          cusparseOperation_t       opB,
                          cusparseConstSpMatDescr_t matA,  // non-const descriptor supported
                          cusparseConstDnMatDescr_t matB,  // non-const descriptor supported
                          cusparseDnMatDescr_t      matC,
                          cudaDataType              computeType,
                          cusparseSpMMOpAlg_t       alg,
                          const void*               addOperationNvvmBuffer,
                          size_t                    addOperationBufferSize,
                          const void*               mulOperationNvvmBuffer,
                          size_t                    mulOperationBufferSize,
                          const void*               epilogueNvvmBuffer,
                          size_t                    epilogueBufferSize,
                          size_t*                   SpMMWorkspaceSize)

cusparseStatus_t
cusparseSpMMOp_destroyPlan(cusparseSpMMOpPlan_t plan)

cusparseStatus_t
cusparseSpMMOp(cusparseSpMMOpPlan_t plan,
               void*                externalBuffer)

NOTE 1: NVRTC and nvJitLink are not currently available on Arm64 Android platforms.

NOTE 2: The routine does not support Android and Tegra platforms except Judy (sm87).

Experimental: The function performs the multiplication of a sparse matrix matA and a dense matrix matB with custom operators.

\({C^{\prime}}_{ij} = \text{epilogue}\left( {\sum_{k}^{\oplus}{op\left( A_{ik} \right) \otimes op\left( B_{kj} \right),C_{ij}}} \right)\)

where

op(A) is a sparse matrix of size \(m \times k\)
op(B) is a dense matrix of size \(k \times n\)
C is a dense matrix of size \(m \times n\)
\(\oplus\) , \(\otimes\) , and \(\text{epilogue}\) are custom add, mul, and epilogue operators respectively.

Also, for matrix A and B

\(\text{op}(A) = \begin{cases} A & \text{if op(A) == CUSPARSE_OPERATION_NON_TRANSPOSE} \\ A^{T} & \text{if op(A) == CUSPARSE_OPERATION_TRANSPOSE} \\ \end{cases}\)

\(\text{op}(B) = \begin{cases} B & {\text{if op(}B\text{) == CUSPARSE_OPERATION_NON_TRANSPOSE}} \\ B^{T} & {\text{if op(}B\text{) == CUSPARSE_OPERATION_TRANSPOSE}} \\ \end{cases}\)

Only opA == CUSPARSE_OPERATION_NON_TRANSPOSE is currently supported

The function cusparseSpMMOp_createPlan() returns the size of the workspace and the compiled kernel needed by cusparseSpMMOp()

Param.	Memory	In/out	Meaning
`handle`	HOST	IN	Handle to the cuSPARSE library context
`opA`	HOST	IN	Operation `op(A)`
`opB`	HOST	IN	Operation `op(B)`
`matA`	HOST	IN	Sparse matrix `A`
`matB`	HOST	IN	Dense matrix `B`
`matC`	HOST	IN/OUT	Dense matrix `C`
`computeType`	HOST	IN	Datatype in which the computation is executed
`alg`	HOST	IN	Algorithm for the computation
`addOperationNvvmBuffer`	HOST	IN	Pointer to the NVVM buffer containing the custom add operator
`addOperationBufferSize`	HOST	IN	Size in bytes of `addOperationNvvmBuffer`
`mulOperationNvvmBuffer`	HOST	IN	Pointer to the NVVM buffer containing the custom mul operator
`mulOperationBufferSize`	HOST	IN	Size in bytes of `mulOperationNvvmBuffer`
`epilogueNvvmBuffer`	HOST	IN	Pointer to the NVVM buffer containing the custom epilogue operator
`epilogueBufferSize`	HOST	IN	Size in bytes of `epilogueNvvmBuffer`
`SpMMWorkspaceSize`	HOST	OUT	Number of bytes of workspace needed by `cusparseSpMMOp`

The operators must have the following signature and return type

__device__ <computetype> add_op(<computetype> value1, <computetype> value2);

__device__ <computetype> mul_op(<computetype> value1, <computetype> value2);

__device__ <computetype> epilogue(<computetype> value1, <computetype> value2);

<computetype> is one of float, double, cuComplex, cuDoubleComplex, or int,

cusparseSpMMOp supports the following sparse matrix formats:

CUSPARSE_FORMAT_CSR

cusparseSpMMOp supports the following index type for representing the sparse matrix matA:

32-bit indices (CUSPARSE_INDEX_32I)
64-bit indices (CUSPARSE_INDEX_64I)

cusparseSpMMOp supports the following data types:

Uniform-precision computation:

`A`/`B`/ `C`/`computeType`
`CUDA_R_32F`
`CUDA_R_64F`
`CUDA_C_32F`
`CUDA_C_64F`

Mixed-precision computation:

`A`/`B`	`C`	`computeType`
`CUDA_R_8I`	`CUDA_R_32I`	`CUDA_R_32I`
`CUDA_R_8I`	`CUDA_R_32F`	`CUDA_R_32F`
`CUDA_R_16F`
`CUDA_R_16BF`
`CUDA_R_16F`	`CUDA_R_16F`
`CUDA_R_16BF`	`CUDA_R_16BF`

cusparseSpMMOp supports the following algorithms:

Algorithm	Notes
`CUSPARSE_SPMM_OP_ALG_DEFAULT`	Default algorithm for any sparse matrix format

Performance notes:

Row-major layout provides higher performance than column-major.

cusparseSpMMOp() has the following properties:

The routine requires extra storage
The routine supports asynchronous execution
Provides deterministic (bit-wise) results for each run
The routine allows the indices of matA to be unsorted

cusparseSpMMOp() supports the following optimizations:

CUDA graph capture
Hardware Memory Compression

Please visit cuSPARSE Library Samples - cusparseSpMMOp

See cusparseStatus_t for the description of the return status.

cusparseSpSM()

cusparseStatus_t
cusparseSpSM_createDescr(cusparseSpSMDescr_t* spsmDescr);
cusparseStatus_t
cusparseSpSM_destroyDescr(cusparseSpSMDescr_t spsmDescr);

cusparseStatus_t
cusparseSpSM_bufferSize(cusparseHandle_t     handle,
                        cusparseOperation_t  opA,
                        cusparseOperation_t  opB,
                        const void*          alpha,
                        cusparseConstSpMatDescr_t matA,  // non-const descriptor supported
                        cusparseConstDnMatDescr_t matB,  // non-const descriptor supported
                        cusparseDnMatDescr_t matC,
                        cudaDataType         computeType,
                        cusparseSpSMAlg_t    alg,
                        cusparseSpSMDescr_t  spsmDescr,
                        size_t*              bufferSize)

cusparseStatus_t
cusparseSpSM_analysis(cusparseHandle_t          handle,
                      cusparseOperation_t       opA,
                      cusparseOperation_t       opB,
                      const void*               alpha,
                      cusparseConstSpMatDescr_t matA,  // non-const descriptor supported
                      cusparseConstDnMatDescr_t matB,  // non-const descriptor supported
                      cusparseDnMatDescr_t      matC,
                      cudaDataType              computeType,
                      cusparseSpSMAlg_t         alg,
                      cusparseSpSMDescr_t       spsmDescr,
                      void*                     externalBuffer)

cusparseStatus_t
cusparseSpSM_solve(cusparseHandle_t          handle,
                   cusparseOperation_t       opA,
                   cusparseOperation_t       opB,
                   const void*               alpha,
                   cusparseConstSpMatDescr_t matA,  // non-const descriptor supported
                   cusparseConstDnMatDescr_t matB,  // non-const descriptor supported
                   cusparseDnMatDescr_t      matC,
                   cudaDataType              computeType,
                   cusparseSpSMAlg_t         alg,
                   cusparseSpSMDescr_t       spsmDescr)

cusparseStatus_t
cusparseSpSM_updateMatrix(cusparseHandle_t     handle,
                          cusparseSpSMDescr_t  spsmDescr,
                          void*                newValues,
                          cusparseSpSMUpdate_t updatePart)

The function solves a system of linear equations whose coefficients are represented in a sparse triangular matrix:

\(op\left( \mathbf{A} \right) \cdot \mathbf{C} = \mathbf{\alpha}op\left( \mathbf{B} \right)\)

where

op(A) is a sparse square matrix of size \(m \times m\)
op(B) is a dense matrix of size \(m \times n\)
C is a dense matrix of size \(m \times n\)
\(\alpha\) is a scalar

Also, for matrix A

\(\text{op}(A) = \begin{cases} A & \text{if op(A) == CUSPARSE_OPERATION_NON_TRANSPOSE} \\ A^{T} & \text{if op(A) == CUSPARSE_OPERATION_TRANSPOSE} \\ A^{H} & \text{if op(A) == CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE} \\ \end{cases}\)

\(\text{op}(B) = \begin{cases} B & \text{if op(B) == CUSPARSE_OPERATION_NON_TRANSPOSE} \\ B^{T} & \text{if op(B) == CUSPARSE_OPERATION_TRANSPOSE} \\ & \text{ } \\ \end{cases}\)

The function cusparseSpSM_bufferSize() returns the size of the workspace needed by cusparseSpSM_analysis() and cusparseSpSM_solve(). The function cusparseSpSM_analysis() performs the analysis phase, while cusparseSpSM_solve() executes the solve phase for a sparse triangular linear system. The opaque data structure spsmDescr is used to share information among all functions. The function cusparseSpSM_updateMatrix() updates spsmDescr with new matrix values.

The routine supports arbitrary sparsity for the input matrix, but only the upper or lower triangular part is taken into account in the computation.

cusparseSpSM_bufferSize() requires a buffer size for the analysis phase which is proportional to number of non-zero entries of the sparse matrix

The externalBuffer is stored into spsmDescr and used by cusparseSpSM_solve(). For this reason, the device memory buffer must be deallocated only after cusparseSpSM_solve()

NOTE: all parameters must be consistent across cusparseSpSM API calls and the matrix descriptions and externalBuffer must not be modified between cusparseSpSM_analysis() and cusparseSpSM_solve()

Param.	Memory	In/out	Meaning
`handle`	HOST	IN	Handle to the cuSPARSE library context
`opA`	HOST	IN	Operation `op(A)`
`opB`	HOST	IN	Operation `op(B)`
`alpha`	HOST or DEVICE	IN	\(\alpha\) scalar used for multiplication of type `computeType`
`matA`	HOST	IN	Sparse matrix `A`
`matB`	HOST	IN	Dense matrix `B`
`matC`	HOST	IN/OUT	Dense matrix `C`
`computeType`	HOST	IN	Datatype in which the computation is executed
`alg`	HOST	IN	Algorithm for the computation
`bufferSize`	HOST	OUT	Number of bytes of workspace needed by `cusparseSpSM_analysis()` and `cusparseSpSM_solve()`
`externalBuffer`	DEVICE	IN/OUT	Pointer to a workspace buffer of at least `bufferSize` bytes. It is used by `cusparseSpSM_analysis` and `cusparseSpSM_solve()`
`spsmDescr`	HOST	IN/OUT	Opaque descriptor for storing internal data used across the three steps

The sparse matrix formats currently supported are listed below:

CUSPARSE_FORMAT_CSR
CUSPARSE_FORMAT_COO

The cusparseSpSM() supports the following shapes and properties:

CUSPARSE_FILL_MODE_LOWER and CUSPARSE_FILL_MODE_UPPER fill modes
CUSPARSE_DIAG_TYPE_NON_UNIT and CUSPARSE_DIAG_TYPE_UNIT diagonal types

The fill mode and diagonal type can be set by cusparseSpMatSetAttribute()

cusparseSpSM() supports the following index type for representing the sparse matrix matA:

32-bit indices (CUSPARSE_INDEX_32I)
64-bit indices (CUSPARSE_INDEX_64I)

cusparseSpSM() supports the following data types:

Uniform-precision computation:

`A`/`B`/ `C`/`computeType`
`CUDA_R_32F`
`CUDA_R_64F`
`CUDA_C_32F`
`CUDA_C_64F`

cusparseSpSM() supports the following algorithms:

Algorithm	Notes
`CUSPARSE_SPSM_ALG_DEFAULT`	Default algorithm

cusparseSpSM() has the following properties:

The routine requires no extra storage
Provides deterministic (bit-wise) results for each run for the solving phase cusparseSpSM_solve()
The cusparseSpSM_solve() routine supports asynchronous execution
The routine supports in-place operation. The same device pointer must be provided to the values parameter of the dense matrices matB and matC. All other dense matrix descriptor parameters (e.g., order) can be set independently
cusparseSpSM_bufferSize() and cusparseSpSM_analysis() routines accept descriptors of NULL values for matB and matC. These two routines do not accept NULL descriptors
The routine allows the indices of matA to be unsorted

cusparseSpSM() supports the following optimizations:

CUDA graph capture
Hardware Memory Compression

cusparseSpSM_updateMatrix() updates the sparse matrix after calling the analysis phase. This functions supports the following update strategies (updatePart):

Strategy	Notes
`CUSPARSE_SPSM_UPDATE_GENERAL`	Updates the sparse matrix values with values of `newValues` array
`CUSPARSE_SPSM_UPDATE_DIAGONAL`	Updates the diagonal part of the matrix with diagonal values stored in `newValues` array. That is, `newValues` has the new diagonal values only

See cusparseStatus_t for the description of the return status.

Please visit cuSPARSE Library Samples - cusparseSpSM CSR and cuSPARSE Library Samples - cusparseSpSM COO for code examples.

cusparseSDDMM()

cusparseStatus_t
cusparseSDDMM_bufferSize(cusparseHandle_t          handle,
                         cusparseOperation_t       opA,
                         cusparseOperation_t       opB,
                         const void*               alpha,
                         cusparseConstDnMatDescr_t matA,  // non-const descriptor supported
                         cusparseConstDnMatDescr_t matB,  // non-const descriptor supported
                         const void*               beta,
                         cusparseSpMatDescr_t      matC,
                         cudaDataType              computeType,
                         cusparseSDDMMAlg_t        alg,
                         size_t*                   bufferSize)

cusparseStatus_t
cusparseSDDMM_preprocess(cusparseHandle_t          handle,
                         cusparseOperation_t       opA,
                         cusparseOperation_t       opB,
                         const void*               alpha,
                         cusparseConstDnMatDescr_t matA,  // non-const descriptor supported
                         cusparseConstDnMatDescr_t matB,  // non-const descriptor supported
                         const void*               beta,
                         cusparseSpMatDescr_t      matC,
                         cudaDataType              computeType,
                         cusparseSDDMMAlg_t        alg,
                         void*                     externalBuffer)

cusparseStatus_t
cusparseSDDMM(cusparseHandle_t          handle,
              cusparseOperation_t       opA,
              cusparseOperation_t       opB,
              const void*               alpha,
              cusparseConstDnMatDescr_t matA,  // non-const descriptor supported
              cusparseConstDnMatDescr_t matB,  // non-const descriptor supported
              const void*               beta,
              cusparseSpMatDescr_t      matC,
              cudaDataType              computeType,
              cusparseSDDMMAlg_t        alg,
              void*                     externalBuffer)

This function performs the multiplication of matA and matB, followed by an element-wise multiplication with the sparsity pattern of matC. Formally, it performs the following operation:

\(\mathbf{C} = \alpha({op}(\mathbf{A}) \cdot {op}(\mathbf{B})) \circ {spy}(\mathbf{C}) + \beta\mathbf{C}\)

where

op(A) is a dense matrix of size \(m \times k\)
op(B) is a dense matrix of size \(k \times n\)
C is a sparse matrix of size \(m \times n\)
\(\alpha\) and \(\beta\) are scalars
\(\circ\) denotes the Hadamard (entry-wise) matrix product, and \({spy}\left( \mathbf{C} \right)\) is the structural sparsity pattern matrix of C defined as:

\({spy}(\mathbf{C})_{ij} = \begin{cases} 1 & {\text{if}\,\mathbf{C}_{ij}\,\text{is an entry stored in}\,\texttt{matC}} \\ 0 & \text{otherwise} \\ \end{cases}.\)

Also, for matrix A and B

\(\text{op}(A) = \begin{cases} A & \text{if op(A) == CUSPARSE_OPERATION_NON_TRANSPOSE} \\ A^{T} & \text{if op(A) == CUSPARSE_OPERATION_TRANSPOSE} \\ \end{cases}\)

\(\text{op}(B) = \begin{cases} B & \text{if op(B) == CUSPARSE_OPERATION_NON_TRANSPOSE} \\ B^{T} & \text{if op(B) == CUSPARSE_OPERATION_TRANSPOSE} \\ \end{cases}\)

The function cusparseSDDMM_bufferSize() returns the size of the workspace needed by cusparseSDDMM or cusparseSDDMM_preprocess.

The function cusparseSDDMM_preprocess() can be called before cusparseSDDMM to speedup the actual computation. It is useful when cusparseSDDMM is called multiple times with the same sparsity pattern (matC). The values of the dense matrices (matA, matB) can change arbitrarily.

Param.	Memory	In/out	Meaning
`handle`	HOST	IN	Handle to the cuSPARSE library context
`opA`	HOST	IN	Operation `op(A)`
`opB`	HOST	IN	Operation `op(B)`
`alpha`	HOST or DEVICE	IN	\(\alpha\) scalar used for multiplication of type `computeType`
`matA`	HOST	IN	Dense matrix `matA`
`matB`	HOST	IN	Dense matrix `matB`
`beta`	HOST or DEVICE	IN	\(\beta\) scalar used for multiplication of type `computeType`
`matC`	HOST	IN/OUT	Sparse matrix `matC`
`computeType`	HOST	IN	Datatype in which the computation is executed
`alg`	HOST	IN	Algorithm for the computation
`bufferSize`	HOST	OUT	Number of bytes of workspace needed by `cusparseSDDMM`
`externalBuffer`	DEVICE	IN	Pointer to a workspace buffer of at least `bufferSize` bytes

Currently supported sparse matrix formats:

CUSPARSE_FORMAT_CSR
CUSPARSE_FORMAT_BSR

cusparseSDDMM() supports the following index type for representing the sparse matrix matA:

32-bit indices (CUSPARSE_INDEX_32I)
64-bit indices (CUSPARSE_INDEX_64I)

The data types combinations currently supported for cusparseSDDMM are listed below:

Uniform-precision computation:

`A`/`X`/ `Y`/`computeType`
`CUDA_R_32F`
`CUDA_R_64F`
`CUDA_C_32F`
`CUDA_C_64F`

Mixed-precision computation:

`A`/`B`	`C`	`computeType`
`CUDA_R_16F`	`CUDA_R_32F`	`CUDA_R_32F`
`CUDA_R_16F`	`CUDA_R_16F`	`CUDA_R_32F`

cusparseSDDMM for CUSPARSE_FORMAT_BSR also supports the following mixed-precision computation:

`A`/`B`	`C`	`computeType`
`CUDA_R_16BF`	`CUDA_R_32F`	`CUDA_R_32F`
`CUDA_R_16BF`	`CUDA_R_16BF`	`CUDA_R_32F`

NOTE: CUDA_R_16F, CUDA_R_16BF data types always imply mixed-precision computation.

cusparseSDDMM() for CUSPASRE_FORMAT_BSR supports block sizes of 2, 4, 8, 16, 32, 64 and 128.

cusparseSDDMM() supports the following algorithms:

Algorithm	Notes
`CUSPARSE_SDDMM_ALG_DEFAULT`	Default algorithm. It supports batched computation.

Performance notes: cuspaseSDDMM() for CUSPARSE_FORMAT_CSR provides the best performance when matA and matB satisfy:

matA:
- matA is in row-major order and opA is CUSPARSE_OPERATION_NON_TRANSPOSE, or
- matA is in col-major order and opA is not CUSPARSE_OPERATION_NON_TRANSPOSE
matB:
- matB is in col-major order and opB is CUSPARSE_OPERATION_NON_TRANSPOSE, or
- matB is in row-major order and opB is not CUSPARSE_OPERATION_NON_TRANSPOSE

cuspaseSDDMM() for CUSPARSE_FORMAT_BSR provides the best performance when matA and matB satisfy:

matA:
- matA is in row-major order and opA is CUSPARSE_OPERATION_NON_TRANSPOSE, or
- matA is in col-major order and opA is not CUSPARSE_OPERATION_NON_TRANSPOSE
matB:
- matB is in row-major order and opB is CUSPARSE_OPERATION_NON_TRANSPOSE, or
- matB is in col-major order and opB is not CUSPARSE_OPERATION_NON_TRANSPOSE

cusparseSDDMM() supports the following batch modes:

\(C_{i} = (A \cdot B) \circ C_{i}\)
\(C_{i} = \left( A_{i} \cdot B \right) \circ C_{i}\)
\(C_{i} = \left( A \cdot B_{i} \right) \circ C_{i}\)
\(C_{i} = \left( A_{i} \cdot B_{i} \right) \circ C_{i}\)

The number of batches and their strides can be set by using cusparseCsrSetStridedBatch and cusparseDnMatSetStridedBatch. The maximum number of batches for cusparseSDDMM() is 65,535.

cusparseSDDMM() has the following properties:

The routine requires no extra storage
Provides deterministic (bit-wise) results for each run
The routine supports asynchronous execution
The routine allows the indices of matC to be unsorted

cusparseSDDMM() supports the following optimizations:

CUDA graph capture
Hardware Memory Compression

See cusparseStatus_t for the description of the return status.

Please visit cuSPARSE Library Samples - cusparseSDDMM for a code example. For batched computation please visit cusparseSDDMM CSR Batched.

cusparseSpGEMM()

cusparseStatus_t
cusparseSpGEMM_createDescr(cusparseSpGEMMDescr_t* descr)

cusparseStatus_t
cusparseSpGEMM_destroyDescr(cusparseSpGEMMDescr_t descr)

cusparseStatus_t
cusparseSpGEMM_workEstimation(cusparseHandle_t          handle,
                              cusparseOperation_t       opA,
                              cusparseOperation_t       opB,
                              const void*               alpha,
                              cusparseConstSpMatDescr_t matA,  // non-const descriptor supported
                              cusparseConstSpMatDescr_t matB,  // non-const descriptor supported
                              const void*               beta,
                              cusparseSpMatDescr_t      matC,
                              cudaDataType              computeType,
                              cusparseSpGEMMAlg_t       alg,
                              cusparseSpGEMMDescr_t     spgemmDescr,
                              size_t*                   bufferSize1,
                              void*                     externalBuffer1)

cusparseStatus_t
cusparseSpGEMM_getNumProducts(cusparseSpGEMMDescr_t spgemmDescr,
                              int64_t*              num_prods)

cusparseStatus_t
cusparseSpGEMM_estimateMemory(cusparseHandle_t          handle,
                              cusparseOperation_t       opA,
                              cusparseOperation_t       opB,
                              const void*               alpha,
                              cusparseConstSpMatDescr_t matA,  // non-const descriptor supported
                              cusparseConstSpMatDescr_t matB,  // non-const descriptor supported
                              const void*               beta,
                              cusparseSpMatDescr_t      matC,
                              cudaDataType              computeType,
                              cusparseSpGEMMAlg_t       alg,
                              cusparseSpGEMMDescr_t     spgemmDescr,
                              float                     chunk_fraction,
                              size_t*                   bufferSize3,
                              void*                     externalBuffer3,
                              size_t*                   bufferSize2)

cusparseStatus_t
cusparseSpGEMM_compute(cusparseHandle_t          handle,
                       cusparseOperation_t       opA,
                       cusparseOperation_t       opB,
                       const void*               alpha,
                       cusparseConstSpMatDescr_t matA,  // non-const descriptor supported
                       cusparseConstSpMatDescr_t matB,  // non-const descriptor supported
                       const void*               beta,
                       cusparseSpMatDescr_t      matC,
                       cudaDataType              computeType,
                       cusparseSpGEMMAlg_t       alg,
                       cusparseSpGEMMDescr_t     spgemmDescr,
                       size_t*                   bufferSize2,
                       void*                     externalBuffer2)

cusparseStatus_t
cusparseSpGEMM_copy(cusparseHandle_t          handle,
                    cusparseOperation_t       opA,
                    cusparseOperation_t       opB,
                    const void*               alpha,
                    cusparseConstSpMatDescr_t matA,  // non-const descriptor supported
                    cusparseConstSpMatDescr_t matB,  // non-const descriptor supported
                    const void*               beta,
                    cusparseSpMatDescr_t      matC,
                    cudaDataType              computeType,
                    cusparseSpGEMMAlg_t       alg,
                    cusparseSpGEMMDescr_t     spgemmDescr)

This function performs the multiplication of two sparse matrices matA and matB.

\(\mathbf{C^{\prime}} = \alpha op\left( \mathbf{A} \right) \cdot op\left( \mathbf{B} \right) + \beta\mathbf{C}\)

where \(\alpha,\) \(\beta\) are scalars, and \(\mathbf{C},\) \(\mathbf{C^{\prime}}\) have the same sparsity pattern.

The functions cusparseSpGEMM_workEstimation(), cusparseSpGEMM_estimateMemory(), and cusparseSpGEMM_compute() are used for both determining the buffer size and performing the actual computation.

Param.	Memory	In/out	Meaning
`handle`	HOST	IN	Handle to the cuSPARSE library context
`opA`	HOST	IN	Operation `op(A)`
`opB`	HOST	IN	Operation `op(B)`
`alpha`	HOST or DEVICE	IN	\(\alpha\) scalar used for multiplication
`matA`	HOST	IN	Sparse matrix `A`
`matB`	HOST	IN	Sparse matrix `B`
`beta`	HOST or DEVICE	IN	\(\beta\) scalar used for multiplication
`matC`	HOST	IN/OUT	Sparse matrix `C`
`computeType`	HOST	IN	Enumerator specifying the datatype in which the computation is executed
`alg`	HOST	IN	Enumerator specifying the algorithm for the computation
`spgemmDescr`	HOST	IN/OUT	Opaque descriptor for storing internal data used across the three steps
`num_prods`	HOST	OUT	Pointer to a 64-bit integer that stores the number of intermediate products calculated by `cusparseSpGEMM_workEstimation`
`chunk_fraction`	HOST	IN	The fraction of total intermediate products being computed in a chunk. Used by `CUSPARSE_SPGEMM_ALG3` only. Value is in range (0,1].
`bufferSize1`	HOST	IN/OUT	Number of bytes of workspace requested by `cusparseSpGEMM_workEstimation`
`bufferSize2`	HOST	IN/OUT	Number of bytes of workspace requested by `cusparseSpGEMM_compute`
`bufferSize3`	HOST	IN/OUT	Number of bytes of workspace requested by `cusparseSpGEMM_estimateMemory`
`externalBuffer1`	DEVICE	IN	Pointer to workspace buffer needed by `cusparseSpGEMM_workEstimation` and `cusparseSpGEMM_compute`
`externalBuffer2`	DEVICE	IN	Pointer to workspace buffer needed by `cusparseSpGEMM_compute` and `cusparseSpGEMM_copy`
`externalBuffer3`	DEVICE	IN	Pointer to workspace buffer needed by `cusparseSpGEMM_estimateMemory`

Currently, the function has the following limitations:

Only 32-bit indices CUSPARSE_INDEX_32I is supported
Only CSR format CUSPARSE_FORMAT_CSR is supported
Only opA, opB equal to CUSPARSE_OPERATION_NON_TRANSPOSE are supported

The data types combinations currently supported for cusparseSpGEMM are listed below :

Uniform-precision computation:

`A`/`B`/ `C`/`computeType`
`CUDA_R_16F` [DEPRECATED]
`CUDA_R_16BF` [DEPRECATED]
`CUDA_R_32F`
`CUDA_R_64F`
`CUDA_C_16F` [DEPRECATED]
`CUDA_C_16BF` [DEPRECATED]
`CUDA_C_32F`
`CUDA_C_64F`

cusparseSpGEMM routine runs for the following algorithms:

Algorithm	Notes
`CUSPARSE_SPGEMM_DEFAULT`	Default algorithm. Currently, it is `CUSPARSE_SPGEMM_ALG1`.
`CUSPARSE_SPGEMM_ALG1`	Algorithm 1 Invokes `cusparseSpGEMM_compute` twice. The first invocation provides an upper bound of the memory required for the computation. The required memory is generally several times larger of the actual memory used. The user can provide an arbitrary buffer size bufferSize2 in the second invocation. If it is not sufficient, the routine will returns `CUSPARSE_STATUS_INSUFFICIENT_RESOURCES` status. Provides better performance than other algorithms. Provides deterministic (bit-wise) results for each run.
`CUSPARSE_SPGEMM_ALG2`	Algorithm 2 Invokes `cusparseSpGEMM_estimateMemory` to get the amount of the memory required for the computation. Requires less memory for the computation than Algorithm 1. Performance is lower than Algorithm 1, higher than Algorithm 3. Provides deterministic (bit-wise) results for each run.
`CUSPARSE_SPGEMM_ALG3`	Algorithm 3 Computes the intermediate products in chunks, one chunk at a time. Invokes `cusparseSpGEMM_estimateMemory` to get the amount of the memory required for the computation. The user can control the amount of required memory by changing the chunk size via `chunk_fraction`. The chunk size is a fraction of total intermediate products: `chunk_fraction * (*num_prods)`. Provides deterministic (bit-wise) results for each run.

cusparseSpGEMM() has the following properties:

The routine requires no extra storage
The routine supports asynchronous execution
The routine allows the indices of matA and matB to be unsorted
The routine guarantees the indices of matC to be sorted

cusparseSpGEMM() supports the following optimizations:

CUDA graph capture
Hardware Memory Compression

See cusparseStatus_t for the description of the return status.

Please visit cuSPARSE Library Samples - cusparseSpGEMM for a code example for CUSPARSE_SPGEMM_DEFAULT and CUSPARSE_SPGEMM_ALG1, and cuSPARSE Library Samples - memory-optimzed cusparseSpGEMM for a code example for CUSPARSE_SPGEMM_ALG2 and CUSPARSE_SPGEMM_ALG3.

cusparseSpGEMMreuse()

cusparseStatus_t
cusparseSpGEMM_createDescr(cusparseSpGEMMDescr_t* descr)

cusparseStatus_t
cusparseSpGEMM_destroyDescr(cusparseSpGEMMDescr_t descr)

cusparseStatus_t
cusparseSpGEMMreuse_workEstimation(cusparseHandle_t      handle,
                                   cusparseOperation_t   opA,
                                   cusparseOperation_t   opB,
                                   cusparseSpMatDescr_t  matA,  // non-const descriptor supported
                                   cusparseSpMatDescr_t  matB,  // non-const descriptor supported
                                   cusparseSpMatDescr_t  matC,
                                   cusparseSpGEMMAlg_t   alg,
                                   cusparseSpGEMMDescr_t spgemmDescr,
                                   size_t*               bufferSize1,
                                   void*                 externalBuffer1)

cusparseStatus_t
cusparseSpGEMMreuse_nnz(cusparseHandle_t      handle,
                        cusparseOperation_t   opA,
                        cusparseOperation_t   opB,
                        cusparseSpMatDescr_t  matA,  // non-const descriptor supported
                        cusparseSpMatDescr_t  matB,  // non-const descriptor supported
                        cusparseSpMatDescr_t  matC,
                        cusparseSpGEMMAlg_t   alg,
                        cusparseSpGEMMDescr_t spgemmDescr,
                        size_t*               bufferSize2,
                        void*                 externalBuffer2,
                        size_t*               bufferSize3,
                        void*                 externalBuffer3,
                        size_t*               bufferSize4,
                        void*                 externalBuffer4)

cusparseStatus_t CUSPARSEAPI
cusparseSpGEMMreuse_copy(cusparseHandle_t      handle,
                         cusparseOperation_t   opA,
                         cusparseOperation_t   opB,
                         cusparseSpMatDescr_t  matA,  // non-const descriptor supported
                         cusparseSpMatDescr_t  matB,  // non-const descriptor supported
                         cusparseSpMatDescr_t  matC,
                         cusparseSpGEMMAlg_t   alg,
                         cusparseSpGEMMDescr_t spgemmDescr,
                         size_t*               bufferSize5,
                         void*                 externalBuffer5)

cusparseStatus_t CUSPARSEAPI
cusparseSpGEMMreuse_compute(cusparseHandle_t      handle,
                            cusparseOperation_t   opA,
                            cusparseOperation_t   opB,
                            const void*           alpha,
                            cusparseSpMatDescr_t  matA,  // non-const descriptor supported
                            cusparseSpMatDescr_t  matB,  // non-const descriptor supported
                            const void*           beta,
                            cusparseSpMatDescr_t  matC,
                            cudaDataType          computeType,
                            cusparseSpGEMMAlg_t   alg,
                            cusparseSpGEMMDescr_t spgemmDescr)

This function performs the multiplication of two sparse matrices matA and matB where the structure of the output matrix matC can be reused for multiple computations with different values.

\(\mathbf{C^{\prime}} = \alpha op\left( \mathbf{A} \right) \cdot op\left( \mathbf{B} \right) + \beta\mathbf{C}\)

where \(\alpha\) and \(\beta\) are scalars.

The functions cusparseSpGEMMreuse_workEstimation(), cusparseSpGEMMreuse_nnz(), and cusparseSpGEMMreuse_copy() are used for determining the buffer size and performing the actual computation.

Note: cusparseSpGEMMreuse() output CSR matrix (matC) is sorted by column indices.

MEMORY REQUIREMENT: cusparseSpGEMMreuse requires to keep in memory all intermediate products to reuse the structure of the output matrix. On the other hand, the number of intermediate products is orders of magnitude higher than the number of non-zero entries in general. In order to minimize the memory requirements, the routine uses multiple buffers that can be deallocated after they are no more needed. If the number of intermediate product exceeds 2^31-1, the routine will returns CUSPARSE_STATUS_INSUFFICIENT_RESOURCES status.

Currently, the function has the following limitations:

Only 32-bit indices CUSPARSE_INDEX_32I is supported
Only CSR format CUSPARSE_FORMAT_CSR is supported
Only opA, opB equal to CUSPARSE_OPERATION_NON_TRANSPOSE are supported

The data types combinations currently supported for cusparseSpGEMMreuse are listed below.

Uniform-precision computation:

`A`/`B`/ `C`/`computeType`
`CUDA_R_32F`
`CUDA_R_64F`
`CUDA_C_16F` [DEPRECATED]
`CUDA_C_16BF` [DEPRECATED]
`CUDA_C_32F`
`CUDA_C_64F`

Mixed-precision computation: [DEPRECATED]

`A`/`B`	`C`	`computeType`
`CUDA_R_16F`	`CUDA_R_16F`	`CUDA_R_32F`
`CUDA_R_16BF`	`CUDA_R_16BF`	`CUDA_R_32F`

cusparseSpGEMMreuse routine runs for the following algorithm:

Algorithm

Notes

CUSPARSE_SPGEMM_DEFAULT

CUSPARSE_SPGEMM_CSR_ALG_NONDETERMINITIC

Default algorithm. Provides deterministic (bit-wise) structure for the output matrix for each run, while value computation is not deterministic.

CUSPARSE_SPGEMM_CSR_ALG_DETERMINITIC

Provides deterministic (bit-wise) structure for the output matrix and value computation for each run.

cusparseSpGEMMreuse() has the following properties:

The routine requires no extra storage
The routine supports asynchronous execution
The routine allows the indices of matA and matB to be unsorted
The routine guarantees the indices of matC to be sorted

cusparseSpGEMMreuse() supports the following optimizations:

CUDA graph capture
Hardware Memory Compression

Refer to cusparseStatus_t for the description of the return status.

Please visit cuSPARSE Library Samples - cusparseSpGEMMreuse for a code example.

cusparseSparseToDense()

cusparseStatus_t
cusparseSparseToDense_bufferSize(cusparseHandle_t           handle,
                                 cusparseConstSpMatDescr_t  matA,  // non-const descriptor supported
                                 cusparseDnMatDescr_t       matB,
                                 cusparseSparseToDenseAlg_t alg,
                                 size_t*                    bufferSize)

cusparseStatus_t
cusparseSparseToDense(cusparseHandle_t           handle,
                      cusparseConstSpMatDescr_t  matA,  // non-const descriptor supported
                      cusparseDnMatDescr_t       matB,
                      cusparseSparseToDenseAlg_t alg,
                      void*                      buffer)

The function converts the sparse matrix matA in CSR, CSC, or COO format into its dense representation matB. Blocked-ELL is not currently supported.

The function cusparseSparseToDense_bufferSize() returns the size of the workspace needed by cusparseSparseToDense().

Param.	Memory	In/out	Meaning
`handle`	HOST	IN	Handle to the cuSPARSE library context
`matA`	HOST	IN	Sparse matrix `A`
`matB`	HOST	OUT	Dense matrix `B`
`alg`	HOST	IN	Algorithm for the computation
`bufferSize`	HOST	OUT	Number of bytes of workspace needed by `cusparseSparseToDense()`
`buffer`	DEVICE	IN	Pointer to workspace buffer

cusparseSparseToDense() supports the following index type for representing the sparse matrix matA:

32-bit indices (CUSPARSE_INDEX_32I)
64-bit indices (CUSPARSE_INDEX_64I)

cusparseSparseToDense() supports the following data types:

`A`/`B`
`CUDA_R_8I`
`CUDA_R_16F`
`CUDA_R_16BF`
`CUDA_R_32F`
`CUDA_R_64F`
`CUDA_C_16F` [DEPRECATED]
`CUDA_C_16BF` [DEPRECATED]
`CUDA_C_32F`
`CUDA_C_64F`

cusparseSparse2Dense() supports the following algorithm:

Algorithm	Notes
`CUSPARSE_SPARSETODENSE_ALG_DEFAULT`	Default algorithm

cusparseSparseToDense() has the following properties:

The routine requires no extra storage
The routine supports asynchronous execution
Provides deterministic (bit-wise) results for each run
The routine allows the indices of matA to be unsorted

cusparseSparseToDense() supports the following optimizations:

CUDA graph capture
Hardware Memory Compression

See cusparseStatus_t for the description of the return status.

Please visit cuSPARSE Library Samples - cusparseSparseToDense for a code example.

cusparseDenseToSparse()

cusparseStatus_t
cusparseDenseToSparse_bufferSize(cusparseHandle_t           handle,
                                 cusparseConstDnMatDescr_t  matA,  // non-const descriptor supported
                                 cusparseSpMatDescr_t       matB,
                                 cusparseDenseToSparseAlg_t alg,
                                 size_t*                    bufferSize)

cusparseStatus_t
cusparseDenseToSparse_analysis(cusparseHandle_t           handle,
                               cusparseConstDnMatDescr_t  matA,  // non-const descriptor supported
                               cusparseSpMatDescr_t       matB,
                               cusparseDenseToSparseAlg_t alg,
                               void*                      buffer)

cusparseStatus_t
cusparseDenseToSparse_convert(cusparseHandle_t           handle,
                              cusparseConstDnMatDescr_t  matA,  // non-const descriptor supported
                              cusparseSpMatDescr_t       matB,
                              cusparseDenseToSparseAlg_t alg,
                              void*                      buffer)

The function converts the dense matrix matA into a sparse matrix matB in CSR, CSC, COO, or Blocked-ELL format.

The function cusparseDenseToSparse_bufferSize() returns the size of the workspace needed by cusparseDenseToSparse_analysis().

The function cusparseDenseToSparse_analysis() updates the number of non-zero elements in the sparse matrix descriptor matB. The user is responsible to allocate the memory required by the sparse matrix:

Row/Column indices and value arrays for CSC and CSR respectively
Row, column, value arrays for COO
Column (ellColInd), value (ellValue) arrays for Blocked-ELL

Finally, we call cusparseDenseToSparse_convert() for filling the arrays allocated in the previous step.

Param.	Memory	In/out	Meaning
`handle`	HOST	IN	Handle to the cuSPARSE library context
`matA`	HOST	IN	Dense matrix `A`
`matB`	HOST	OUT	Sparse matrix `B`
`alg`	HOST	IN	Algorithm for the computation
`bufferSize`	HOST	OUT	Number of bytes of workspace needed by `cusparseDenseToSparse_analysis()`
`buffer`	DEVICE	IN	Pointer to workspace buffer

cusparseDenseToSparse() supports the following index type for representing the sparse vector matB:

32-bit indices (CUSPARSE_INDEX_32I)
64-bit indices (CUSPARSE_INDEX_64I)

cusparseDenseToSparse() supports the following data types:

`A`/`B`
`CUDA_R_*8I`
`CUDA_R_16F`
`CUDA_R_16BF`
`CUDA_R_32F`
`CUDA_R_64F`
`CUDA_C_16F` [DEPRECATED]
`CUDA_C_16BF` [DEPRECATED]
`CUDA_C_32F`
`CUDA_C_64F`

cusparseDense2Sparse() supports the following algorithm:

Algorithm	Notes
`CUSPARSE_DENSETOSPARSE_ALG_DEFAULT`	Default algorithm

cusparseDenseToSparse() has the following properties:

The routine requires no extra storage
The routine supports asynchronous execution
Provides deterministic (bit-wise) results for each run
The routine does not guarantee the indices of matB to be sorted

cusparseDenseToSparse() supports the following optimizations:

CUDA graph capture
Hardware Memory Compression

See cusparseStatus_t for the description of the return status.

Please visit cuSPARSE Library Samples - cusparseDenseToSparse (CSR) and cuSPARSE Library Samples - cusparseDenseToSparse (Blocked-ELL) for code examples.