Generic API Functions#

nvpl_sparse_spmv()#

nvpl_sparse_status_t
nvpl_sparse_spmv_create_descr(nvpl_sparse_spmv_descr_t* descr)

nvpl_sparse_status_t
nvpl_sparse_spmv_destroy_descr(nvpl_sparse_spmv_descr_t descr)

nvpl_sparse_status_t
nvpl_sparse_spmv_buffer_size(nvpl_sparse_handle_t        handle,
                     nvpl_sparse_operation_t             op_A,
                     const void*                         alpha,
                     nvpl_sparse_const_sp_mat_descr_t    mat_A,
                     nvpl_sparse_const_dn_vec_descr_t    vec_X,
                     const void*                         beta,
                     nvpl_sparse_dn_vec_descr_t          vec_Y,
                     nvpl_sparse_dn_vec_descr_t          vec_Z,
                     nvpl_sparse_data_type_t             compute_type,
                     nvpl_sparse_spmv_alg_t              alg,
                     nvpl_sparse_spmv_descr_t            spmv_descr,
                     size_t*                             buffer_size)

nvpl_sparse_status_t
nvpl_sparse_spmv_analysis(nvpl_sparse_handle_t           handle,
                     nvpl_sparse_operation_t             op_A,
                     const void*                         alpha,
                     nvpl_sparse_const_sp_mat_descr_t    mat_A,
                     nvpl_sparse_const_dn_vec_descr_t    vec_X,
                     const void*                         beta,
                     nvpl_sparse_dn_vec_descr_t          vec_Y,
                     nvpl_sparse_dn_vec_descr_t          vec_Z,
                     nvpl_sparse_data_type_t             compute_type,
                     nvpl_sparse_spmv_alg_t              alg,
                     nvpl_sparse_spmv_descr_t            spmv_descr,
                     void*                               external_buffer)

nvpl_sparse_status_t
nvpl_sparse_spmv(nvpl_sparse_handle_t        handle,
          nvpl_sparse_operation_t            op_A,
          const void*                        alpha,
          nvpl_sparse_const_sp_mat_descr_t   mat_A,
          nvpl_sparse_const_dn_vec_descr_t   vec_X,
          const void*                        beta,
          nvpl_sparse_dn_vec_descr_t         vec_Y,
          nvpl_sparse_dn_vec_descr_t         vec_Z,
          nvpl_sparse_data_type_t            compute_type,
          nvpl_sparse_spmv_alg_t             alg,
          nvpl_sparse_spmv_descr_t           spmv_descr)

This function performs the multiplication of a sparse matrix mat_A and a dense vector vec_X

\(\mathbf{Z} = \alpha op\left( \mathbf{A} \right) \cdot \mathbf{X} + \beta\mathbf{Y}\)

where

op(A) is a sparse matrix of size \(m \times k\)
X is a dense vector of size \(k\)
Y is a dense vector of size \(m\)
Z is a dense vector of size \(m\)
\(\alpha\) and \(\beta\) are scalars

Also, for matrix A

\(\text{op}(A) == \begin{cases} A & \text{if}\; op = {\small{\texttt{NVPL_SPARSE_OPERATION_NON_TRANSPOSE}}} \\ A^{T} & \text{if}\; op = {\small{\texttt{NVPL_SPARSE_OPERATION_TRANSPOSE}}} \\ A^{H} & \text{if}\; op = {\small{\texttt{NVPL_SPARSE_OPERATION_CONJUGATE_TRANSPOSE}}} \\ \end{cases}\)

Routine usage#

To use this routine, you should:

Create a descriptor using nvpl_sparse_spmv_create_descr(). The opaque data structure spmv_descr is used to share information among all functions.
Call nvpl_sparse_spmv_buffer_size() to get the size of the workspace needed by nvpl_sparse_spmv_analysis().
Allocate a workspace buffer of at least buffer_size bytes. The buffer must remain valid until the execution (nvpl_sparse_spmv()) is complete, and should not be modified between the analysis and execution steps.
Call nvpl_sparse_spmv_analysis() to perform the analysis.
Call nvpl_sparse_spmv() to perform the multiplication. This step can be performed multiple times with different right hand side vectors vec_Y and output vectors vec_Z.
Destroy the descriptor using nvpl_sparse_spmv_destroy_descr().

Optionally, you can omit the analysis step and call nvpl_sparse_spmv() directly after creating the descriptor. In this case, the performance might be degraded. It is recommended not to omit the analysis step.

Note

When nvpl_sparse_spmv_analysis() is called, the operation type (op_A) and the number of OpenMP threads must match exactly when nvpl_sparse_spmv() is called later with the same descriptor. Calling nvpl_sparse_spmv() with a different op_A or a different thread count than was used during analysis will return NVPL_SPARSE_STATUS_INVALID_VALUE.

Parameters#

Param.	In/out	Meaning
`handle`	IN	Handle to the NVPL Sparse library context
`op_A`	IN	Operation `op(A)`
`alpha`	IN	\(\alpha\) scalar used for multiplication of type `compute_type`
`mat_A`	IN	Sparse matrix `A`
`vec_X`	IN	Dense vector `X`
`beta`	IN	\(\beta\) scalar used for multiplication of type `compute_type`
`vec_Y`	IN	Dense vector `Y`.
`vec_Z`	OUT	Dense vector `Z`. Can be aliased with `vec_Y` or be an alias of `vec_Y`.
`compute_type`	IN	Datatype in which the computation is executed
`alg`	IN	Algorithm for the computation
`buffer_size`	OUT	Number of bytes of workspace needed by `nvpl_sparse_spmv_analysis` and `nvpl_sparse_spmv`
`external_buffer`	IN/OUT	Pointer to a workspace buffer of at least `buffer_size` bytes used by `nvpl_sparse_spmv_analysis` and `nvpl_sparse_spmv()`
`spmv_descr`	IN/OUT	Opaque descriptor for storing internal data used across the three steps

Supported types and formats#

The sparse matrix formats currently supported are listed below:

NVPL_SPARSE_FORMAT_COO
NVPL_SPARSE_FORMAT_CSR
NVPL_SPARSE_FORMAT_CSC
NVPL_SPARSE_FORMAT_SLICED_ELL

nvpl_sparse_spmv supports the following index type for representing the sparse matrix mat_A:

32-bit indices (NVPL_SPARSE_INDEX_32I)
64-bit indices (NVPL_SPARSE_INDEX_64I)

nvpl_sparse_spmv() supports the following data types:

Uniform-precision computation:

`A`/`X`/`Y`/`Z`/`computeType`
`NVPL_SPARSE_R_32F`
`NVPL_SPARSE_R_64F`
`NVPL_SPARSE_C_32F`
`NVPL_SPARSE_C_64F`

nvpl_sparse_spmv() supports the following algorithms:

Algorithm	Notes
`NVPL_SPARSE_SPMV_ALG_DEFAULT`	Default algorithm for any sparse matrix format.
`NVPL_SPARSE_SPMV_COO_ALG1`	Default algorithm for COO sparse matrix format. May produce slightly different results during different runs with the same input parameters.
`NVPL_SPARSE_SPMV_CSR_ALG1`	Algorithm for CSR/CSC sparse matrix format with fast analysis step. Provides deterministic (bit-wise) results for each run.
`NVPL_SPARSE_SPMV_CSR_ALG2`	Algorithm for CSR/CSC sparse matrix format with more extensive analysis step. Can yield better performance than `NVPL_SPARSE_SPMV_CSR_ALG1` at the cost of increased analysis time. Provides deterministic (bit-wise) results for each run.
`NVPL_SPARSE_SPMV_SELL_ALG1`	Default algorithm for Sliced Ellpack sparse matrix format. Provides deterministic (bit-wise) results for each run.

Performance#

NVPL_SPARSE_SPMV_CSR_ALG1 provides higher performance than NVPL_SPARSE_SPMV_COO_ALG1.
If the analysis step is omitted, the performance might be degraded.
nvpl_sparse_spmv_create_descr() allocates a small amount of memory for the descriptor but does not perform any expensive computations.

Notes#

The routine allows the CSR column indices (or CSC row indices) of matA to be unsorted.

See nvpl_sparse_status_t for the description of the return status.

nvpl_sparse_spsv()#

nvpl_sparse_status_t
nvpl_sparse_spsv_create_descr(nvpl_sparse_spsv_descr_t* descr)

nvpl_sparse_status_t
nvpl_sparse_spsv_destroy_descr(nvpl_sparse_spsv_descr_t descr)

nvpl_sparse_status_t
nvpl_sparse_spsv_buffer_size(nvpl_sparse_handle_t        handle,
                     nvpl_sparse_operation_t             op_A,
                     const void*                         alpha,
                     nvpl_sparse_const_sp_mat_descr_t    mat_A,
                     nvpl_sparse_const_dn_vec_descr_t    vec_X,
                     nvpl_sparse_dn_vec_descr_t          vec_Y,
                     nvpl_sparse_data_type_t             compute_type,
                     nvpl_sparse_spsv_alg_t              alg,
                     nvpl_sparse_spsv_descr_t            spsv_descr,
                     size_t*                             buffer_size)

nvpl_sparse_status_t
nvpl_sparse_spsv_analysis(nvpl_sparse_handle_t           handle,
                   nvpl_sparse_operation_t               op_A,
                   const void*                           alpha,
                   nvpl_sparse_const_sp_mat_descr_t      mat_A,
                   nvpl_sparse_const_dn_vec_descr_t      vec_X,
                   nvpl_sparse_dn_vec_descr_t            vec_Y,
                   nvpl_sparse_data_type_t               compute_type,
                   nvpl_sparse_spsv_alg_t                alg,
                   nvpl_sparse_spsv_descr_t              spsv_descr,
                   void*                                 external_buffer)

nvpl_sparse_status_t
nvpl_sparse_spsv_solve(nvpl_sparse_handle_t              handle,
                nvpl_sparse_operation_t                  op_A,
                const void*                              alpha,
                nvpl_sparse_const_sp_mat_descr_t         mat_A,
                nvpl_sparse_const_dn_vec_descr_t         vec_X,
                nvpl_sparse_dn_vec_descr_t               vec_Y,
                nvpl_sparse_data_type_t                  compute_type,
                nvpl_sparse_spsv_alg_t                   alg,
                nvpl_sparse_spsv_descr_t                 spsv_descr)

The function solves a system of linear equations whose coefficients are represented in a sparse triangular matrix:

\(op\left( \mathbf{A} \right) \cdot \mathbf{Y} = \alpha\mathbf{X}\)

where

op(A) is a sparse square matrix of size \(m \times m\)
X is a dense vector of size \(m\)
Y is a dense vector of size \(m\)
\(\alpha\) is a scalar

Also, for matrix A

\(\text{op}(A) = \begin{cases} A & \text{if}\; op = {\small{\texttt{NVPL_SPARSE_OPERATION_NON_TRANSPOSE}}} \\ A^{T} & \text{if}\; op = {\small{\texttt{NVPL_SPARSE_OPERATION_TRANSPOSE}}} \\ A^{H} & \text{if}\; op = {\small{\texttt{NVPL_SPARSE_OPERATION_CONJUGATE_TRANSPOSE}}} \\ \end{cases}\)

Routine usage#

To use this routine, you should:

Create a descriptor using nvpl_sparse_spsv_create_descr(). The opaque data structure spsv_descr is used to share information among all functions.
Call nvpl_sparse_spsv_buffer_size() to get the size of the workspace needed by nvpl_sparse_spsv_analysis().
Allocate a workspace buffer of at least buffer_size bytes. The buffer must remain valid until the execution (nvpl_sparse_spsv_solve()) is complete, and should not be modified between the analysis and execution steps.
Call nvpl_sparse_spsv_analysis() to perform the analysis.
Call nvpl_sparse_spsv_solve() to execute the solve phase. This step can be performed multiple times with different right hand side vectors.
Destroy the descriptor using nvpl_sparse_spsv_destroy_descr().

Analysis step is mandatory for this routine and cannot be omitted.

The function nvpl_sparse_spsv_update_matrix() can be used to update spsv_descr with new matrix values.

All parameters must be consistent across nvpl_sparse_spsv API calls and the matrix descriptors.

Parameters#

Param.	Memory	Meaning
`handle`	IN	Handle to the NVPL Sparse library context
`op_A`	IN	Operation `op(A)`
`alpha`	IN	\(\alpha\) scalar used for multiplication of type `compute_type`
`mat_A`	IN	Sparse matrix `A`
`vec_X`	IN	Dense vector `X`.
`vec_Y`	IN/OUT	Dense vector `Y`. Can be aliased with `vec_X` or be an alias of `vec_X`.
`compute_type`	IN	Datatype in which the computation is executed
`alg`	IN	Algorithm for the computation
`buffer_size`	OUT	Number of bytes of workspace needed by `nvpl_sparse_spsv_analysis()` and `nvpl_sparse_spsv_solve()`
`external_buffer`	IN/OUT	Pointer to a workspace buffer of at least `bufferSize` bytes. It is used by `nvpl_sparse_spsv_analysis` and `nvpl_sparse_spsv_solve()`
`spsv_descr`	IN/OUT	Opaque descriptor for storing internal data used across the three steps

Matrix update#

nvpl_sparse_status_t
nvpl_sparse_spsv_update_matrix(nvpl_sparse_handle_t       handle,
                        nvpl_sparse_spsv_descr_t          spsv_descr,
                        void*                             new_values,
                        nvpl_sparse_spsv_update_t         update_part)

nvpl_sparse_spsv_update_matrix() updates the sparse matrix after calling the analysis phase. This functions supports the following update strategies (update_part):

Strategy	Notes
`NVPL_SPARSE_SPSV_UPDATE_GENERAL`	Updates the sparse matrix values with values of `new_values` array
`NVPL_SPARSE_SPSV_UPDATE_DIAGONAL`	Updates the diagonal part of the matrix with diagonal values stored in `new_values` array. That is, `new_values` has the new diagonal values only

Supported types and formats#

The sparse matrix formats currently supported are listed below:

NVPL_SPARSE_FORMAT_CSR
NVPL_SPARSE_FORMAT_COO
NVPL_SPARSE_FORMAT_SLICED_ELL

The nvpl_sparse_spsv supports the following shapes and properties:

NVPL_SPARSE_FILL_MODE_LOWER and NVPL_SPARSE_FILL_MODE_UPPER fill modes
NVPL_SPARSE_DIAG_TYPE_NON_UNIT and NVPL_SPARSE_DIAG_TYPE_UNIT diagonal types

nvpl_sparse_spsv supports the following index type for representing the sparse matrix mat_A:

32-bit indices (NVPL_SPARSE_INDEX_32I)
64-bit indices (NVPL_SPARSE_INDEX_64I)

nvpl_sparse_spsv supports the following data types:

Uniform-precision computation:

`A`/`X`/ `Y`/`computeType`
`NVPL_SPARSE_R_32F`
`NVPL_SPARSE_R_64F`
`NVPL_SPARSE_C_32F`
`NVPL_SPARSE_C_64F`

nvpl_sparse_spsv supports the following algorithms:

Algorithm	Notes
`NVPL_SPARSE_SPSV_ALG_DEFAULT`	Default algorithm

Performance#

nvpl_sparse_spsv_create_descr() allocates a small amount of memory for the descriptor but does not perform any expensive computations.

Notes#

The routine requires extra storage memory (see nvpl_sparse_spsv_buffer_size()) for the analysis phase which is proportional to number of non-zero entries of the sparse matrix.
Provides deterministic (bit-wise) results for each run for the solving phase nvpl_sparse_spsv_solve().
The routine supports in-place operation
nvpl_sparse_spsv_buffer_size() and nvpl_sparse_spsv_analysis() routines accept NULL for vec_X and vec_Y
The routine allows the CSR column indices (or CSC row indices) of matA to be unsorted.

See nvpl_sparse_status_t for the description of the return status.

nvpl_sparse_spmm()#

nvpl_sparse_status_t
nvpl_sparse_spmm_create_descr(nvpl_sparse_spmm_descr_t* descr)

nvpl_sparse_status_t
nvpl_sparse_spmm_destroy_descr(nvpl_sparse_spmm_descr_t descr)

nvpl_sparse_status_t
nvpl_sparse_spmm_buffer_size(nvpl_sparse_handle_t           handle,
                        nvpl_sparse_operation_t             op_A,
                        nvpl_sparse_operation_t             op_B,
                        const void*                         alpha,
                        nvpl_sparse_const_sp_mat_descr_t    mat_A,
                        nvpl_sparse_const_dn_mat_descr_t    mat_B,
                        const void*                         beta,
                        nvpl_sparse_const_dn_mat_descr_t    mat_C,
                        nvpl_sparse_dn_mat_descr_t          mat_D,
                        nvpl_sparse_data_type_t             compute_type,
                        nvpl_sparse_spmm_alg_t              alg,
                        nvpl_sparse_spmm_descr_t            spmm_descr,
                        size_t*                             buffer_size)

nvpl_sparse_status_t
nvpl_sparse_spmm_analysis(nvpl_sparse_handle_t              handle,
                        nvpl_sparse_operation_t             op_A,
                        nvpl_sparse_operation_t             op_B,
                        const void*                         alpha,
                        nvpl_sparse_const_sp_mat_descr_t    mat_A,
                        nvpl_sparse_const_dn_mat_descr_t    mat_B,
                        const void*                         beta,
                        nvpl_sparse_const_dn_mat_descr_t    mat_C,
                        nvpl_sparse_dn_mat_descr_t          mat_D,
                        nvpl_sparse_data_type_t             compute_type,
                        nvpl_sparse_spmm_alg_t              alg,
                        nvpl_sparse_spmm_descr_t            spmm_descr,
                        void*                               external_buffer)

nvpl_sparse_status_t
nvpl_sparse_spmm(nvpl_sparse_handle_t          handle,
            nvpl_sparse_operation_t            op_A,
            nvpl_sparse_operation_t            op_B,
            const void*                        alpha,
            nvpl_sparse_const_sp_mat_descr_t   mat_A,
            nvpl_sparse_const_dn_mat_descr_t   mat_B,
            const void*                        beta,
            nvpl_sparse_const_dn_mat_descr_t   mat_C,
            nvpl_sparse_dn_mat_descr_t         mat_D,
            nvpl_sparse_data_type_t            compute_type,
            nvpl_sparse_spmm_alg_t             alg,
            nvpl_sparse_spmm_descr_t           spmm_descr)

This function performs the multiplication of a sparse matrix mat_A and a dense matrix mat_B

\(\mathbf{D} = \alpha op_A\left( \mathbf{A} \right) \cdot op_B\left( \mathbf{B} \right) + \beta\mathbf{C}\)

where

\(op_A(A)\) is a sparse matrix of size \(m \times k\)
\(op_B(B)\) is a dense matrix of size \(k \times n\)
\(C\) is a dense matrix of size \(m \times n\)
\(D\) is a dense matrix of size \(m \times n\)
\(\alpha\) and \(\beta\) are scalars

Also, for matrix A:

\(op_A(A) == \begin{cases} A & \text{if}\; op_A = {\small{\texttt{NVPL_SPARSE_OPERATION_NON_TRANSPOSE}}} \\ A^{T} & \text{if}\; op_A = {\small{\texttt{NVPL_SPARSE_OPERATION_TRANSPOSE}}} \\ A^{H} & \text{if}\; op_A = {\small{\texttt{NVPL_SPARSE_OPERATION_CONJUGATE_TRANSPOSE}}} \\ \end{cases}\)

and similarly for matrix B:

\(op_B(B) == \begin{cases} B & \text{if}\; op_B = {\small{\texttt{NVPL_SPARSE_OPERATION_NON_TRANSPOSE}}} \\ B^{T} & \text{if}\; op_B = {\small{\texttt{NVPL_SPARSE_OPERATION_TRANSPOSE}}} \\ B^{H} & \text{if}\; op_B = {\small{\texttt{NVPL_SPARSE_OPERATION_CONJUGATE_TRANSPOSE}}} \\ \end{cases}\)

The function nvpl_sparse_spmm_buffer_size() computes the size of the workspace needed by nvpl_sparse_spmm_analysis() and nvpl_sparse_spmm().

Routine usage#

To use this routine, you should:

Create a descriptor using nvpl_sparse_spmm_create_descr(). The opaque data structure spmm_descr is used to share information among all functions.
Call nvpl_sparse_spmm_buffer_size() to get the size of the workspace needed by nvpl_sparse_spmm_analysis().
Allocate a workspace buffer of at least buffer_size bytes. The buffer must remain valid until the execution (nvpl_sparse_spmm()) is complete, and should not be modified between the analysis and execution steps.
Call nvpl_sparse_spmm_analysis() to perform the analysis.
Call nvpl_sparse_spmm() to perform the multiplication. This step can be performed multiple times with different dense matrices mat_B, mat_C and output dense matrix mat_D.
Destroy the descriptor using nvpl_sparse_spmm_destroy_descr().

Analysis step is mandatory for this routine and cannot be omitted.

Parameters#

Param.	In/out	Meaning
`handle`	IN	Handle to the NVPL Sparse library context
`op_A`	IN	Operation `op_A(A)`
`op_B`	IN	Operation `op_B(B)`
`alpha`	IN	\(\alpha\) scalar used for multiplication of type `compute_type`
`mat_A`	IN	Sparse matrix `A`
`mat_B`	IN	Dense matrix `B`
`beta`	IN	\(\beta\) scalar used for multiplication of type `compute_type`
`mat_C`	IN	Dense matrix `C`
`mat_D`	OUT	Dense matrix `D`. Can be aliased with `mat_C`.
`compute_type`	IN	Datatype in which the computation is executed
`alg`	IN	Algorithm for the computation
`buffer_size`	OUT	Number of bytes of workspace needed by `nvpl_sparse_spmm_analysis` and `nvpl_sparse_spmm`
`external_buffer`	IN/OUT	Pointer to a workspace buffer of at least `buffer_size` bytes used by `nvpl_sparse_spmm_analysis` and `nvpl_sparse_spmm()`
`spmm_descr`	IN/OUT	Opaque descriptor for storing internal data used across the three steps

Supported types and formats#

The sparse matrix formats currently supported are listed below:

NVPL_SPARSE_FORMAT_COO
NVPL_SPARSE_FORMAT_CSR
NVPL_SPARSE_FORMAT_CSC

nvpl_sparse_spmm() supports the following index type for representing the sparse matrix mat_A:

32-bit indices (NVPL_SPARSE_INDEX_32I)
64-bit indices (NVPL_SPARSE_INDEX_64I)

nvpl_sparse_spmm() supports the following data types in uniform-precision computation:

`A`/`B`/`C`/`D`/`computeType`
`NVPL_SPARSE_R_32F`
`NVPL_SPARSE_R_64F`
`NVPL_SPARSE_C_32F`
`NVPL_SPARSE_C_64F`

nvpl_sparse_spmm() supports the following algorithms:

Algorithm	Notes
`NVPL_SPARSE_SPMM_ALG_DEFAULT`	Default algorithm for any sparse matrix format.
`NVPL_SPARSE_SPMM_COO_ALG1`	Default algorithm for COO sparse matrix format.
`NVPL_SPARSE_SPMM_CSR_ALG1`	Provides the best performance for CSR when `op_A = NVPL_SPARSE_OPERATION_NON_TRANSPOSE` and for CSC when `op_A = NVPL_SPARSE_OPERATION_TRANSPOSE`. Produces deterministic (bit-wise) results for each run in these cases. In other cases may produce slightly different results during different runs with the same input parameters.
`NVPL_SPARSE_SPMM_CSR_ALG2`	Provides the best performance for CSR when `op_A != NVPL_SPARSE_OPERATION_NON_TRANSPOSE` and for CSC when `op_A = NVPL_SPARSE_OPERATION_NON_TRANSPOSE`. Produces deterministic (bit-wise) results for each run.

Performance#

nvpl_sparse_spmm_create_descr() allocates a small amount of memory for the descriptor but does not perform any expensive computations.

Notes#

nvpl_sparse_spmm() has the following properties:

Usage of nvpl_sparse_spmm_buffer_size and nvpl_sparse_spmm_analysis is required before calling nvpl_sparse_spmm. Otherwise, the routine will return NVPL_SPARSE_STATUS_NOT_SUPPORTED.
For COO format, the routine requires the indices of mat_A to be sorted by row indices.
The routine allows the CSR column indices (or CSC row indices) of matA to be unsorted.

See nvpl_sparse_status_t for the description of the return status.