Generic API Functions#

nvpl_sparse_spmv()#

nvpl_sparse_status_t
nvpl_sparse_spmv_create_descr(nvpl_sparse_spmv_descr_t* descr)

nvpl_sparse_status_t
nvpl_sparse_spmv_destroy_descr(nvpl_sparse_spmv_descr_t descr)
nvpl_sparse_status_t
nvpl_sparse_spmv_buffer_size(nvpl_sparse_handle_t        handle,
                     nvpl_sparse_operation_t             op_A,
                     const void*                         alpha,
                     nvpl_sparse_const_sp_mat_descr_t    mat_A,
                     nvpl_sparse_const_dn_vec_descr_t    vec_X,
                     const void*                         beta,
                     nvpl_sparse_dn_vec_descr_t          vec_Y,
                     nvpl_sparse_dn_vec_descr_t          vec_Z,
                     nvpl_sparse_data_type_t             compute_type,
                     nvpl_sparse_spmv_alg_t              alg,
                     nvpl_sparse_spmv_descr_t            spmv_descr,
                     size_t*                             buffer_size)
nvpl_sparse_status_t
nvpl_sparse_spmv_analysis(nvpl_sparse_handle_t           handle,
                     nvpl_sparse_operation_t             op_A,
                     const void*                         alpha,
                     nvpl_sparse_const_sp_mat_descr_t    mat_A,
                     nvpl_sparse_const_dn_vec_descr_t    vec_X,
                     const void*                         beta,
                     nvpl_sparse_dn_vec_descr_t          vec_Y,
                     nvpl_sparse_dn_vec_descr_t          vec_Z,
                     nvpl_sparse_data_type_t             compute_type,
                     nvpl_sparse_spmv_alg_t              alg,
                     nvpl_sparse_spmv_descr_t            spmv_descr,
                     void*                               external_buffer)
nvpl_sparse_status_t
nvpl_sparse_spmv(nvpl_sparse_handle_t        handle,
          nvpl_sparse_operation_t            op_A,
          const void*                        alpha,
          nvpl_sparse_const_sp_mat_descr_t   mat_A,
          nvpl_sparse_const_dn_vec_descr_t   vec_X,
          const void*                        beta,
          nvpl_sparse_dn_vec_descr_t         vec_Y,
          nvpl_sparse_dn_vec_descr_t         vec_Z,
          nvpl_sparse_data_type_t            compute_type,
          nvpl_sparse_spmv_alg_t             alg,
          nvpl_sparse_spmv_descr_t           spmv_descr)

This function performs the multiplication of a sparse matrix mat_A and a dense vector vec_X

\(\mathbf{Z} = \alpha op\left( \mathbf{A} \right) \cdot \mathbf{X} + \beta\mathbf{Y}\)

where

  • op(A) is a sparse matrix of size \(m \times k\)

  • X is a dense vector of size \(k\)

  • Y is a dense vector of size \(m\)

  • Z is a dense vector of size \(m\)

  • \(\alpha\) and \(\beta\) are scalars

Also, for matrix A

\(\text{op}(A) == \begin{cases} A & \text{if}\; op = {\small{\texttt{NVPL_SPARSE_OPERATION_NON_TRANSPOSE}}} \\ A^{T} & \text{if}\; op = {\small{\texttt{NVPL_SPARSE_OPERATION_TRANSPOSE}}} \\ A^{H} & \text{if}\; op = {\small{\texttt{NVPL_SPARSE_OPERATION_CONJUGATE_TRANSPOSE}}} \\ \end{cases}\)

Routine usage#

To use this routine, you should:

  1. Create a descriptor using nvpl_sparse_spmv_create_descr(). The opaque data structure spmv_descr is used to share information among all functions.

  2. Call nvpl_sparse_spmv_buffer_size() to get the size of the workspace needed by nvpl_sparse_spmv_analysis().

  3. Allocate a workspace buffer of at least buffer_size bytes. The buffer must remain valid until the execution (nvpl_sparse_spmv()) is complete, and should not be modified between the analysis and execution steps.

  4. Call nvpl_sparse_spmv_analysis() to perform the analysis.

  5. Call nvpl_sparse_spmv() to perform the multiplication. This step can be performed multiple times with different right hand side vectors vec_Y and output vectors vec_Z.

  6. Destroy the descriptor using nvpl_sparse_spmv_destroy_descr().

Optionally, you can omit the analysis step and call nvpl_sparse_spmv() directly after creating the descriptor. In this case, the performance might be degraded. It is recommended not to omit the analysis step.

Note

When nvpl_sparse_spmv_analysis() is called, the operation type (op_A) and the number of OpenMP threads must match exactly when nvpl_sparse_spmv() is called later with the same descriptor. Calling nvpl_sparse_spmv() with a different op_A or a different thread count than was used during analysis will return NVPL_SPARSE_STATUS_INVALID_VALUE.

Parameters#

Param.

In/out

Meaning

handle

IN

Handle to the NVPL Sparse library context

op_A

IN

Operation op(A)

alpha

IN

\(\alpha\) scalar used for multiplication of type compute_type

mat_A

IN

Sparse matrix A

vec_X

IN

Dense vector X

beta

IN

\(\beta\) scalar used for multiplication of type compute_type

vec_Y

IN

Dense vector Y.

vec_Z

OUT

Dense vector Z. Can be aliased with vec_Y or be an alias of vec_Y.

compute_type

IN

Datatype in which the computation is executed

alg

IN

Algorithm for the computation

buffer_size

OUT

Number of bytes of workspace needed by nvpl_sparse_spmv_analysis and nvpl_sparse_spmv

external_buffer

IN/OUT

Pointer to a workspace buffer of at least buffer_size bytes used by nvpl_sparse_spmv_analysis and nvpl_sparse_spmv()

spmv_descr

IN/OUT

Opaque descriptor for storing internal data used across the three steps

Supported types and formats#

The sparse matrix formats currently supported are listed below:

  • NVPL_SPARSE_FORMAT_COO

  • NVPL_SPARSE_FORMAT_CSR

  • NVPL_SPARSE_FORMAT_CSC

  • NVPL_SPARSE_FORMAT_SLICED_ELL

nvpl_sparse_spmv supports the following index type for representing the sparse matrix mat_A:

  • 32-bit indices (NVPL_SPARSE_INDEX_32I)

  • 64-bit indices (NVPL_SPARSE_INDEX_64I)

nvpl_sparse_spmv() supports the following data types:

Uniform-precision computation:

A/X/Y/Z/computeType

NVPL_SPARSE_R_32F

NVPL_SPARSE_R_64F

NVPL_SPARSE_C_32F

NVPL_SPARSE_C_64F

nvpl_sparse_spmv() supports the following algorithms:

Algorithm

Notes

NVPL_SPARSE_SPMV_ALG_DEFAULT

Default algorithm for any sparse matrix format.

NVPL_SPARSE_SPMV_COO_ALG1

Default algorithm for COO sparse matrix format. May produce slightly different results during different runs with the same input parameters.

NVPL_SPARSE_SPMV_CSR_ALG1

Algorithm for CSR/CSC sparse matrix format with fast analysis step. Provides deterministic (bit-wise) results for each run.

NVPL_SPARSE_SPMV_CSR_ALG2

Algorithm for CSR/CSC sparse matrix format with more extensive analysis step. Can yield better performance than NVPL_SPARSE_SPMV_CSR_ALG1 at the cost of increased analysis time. Provides deterministic (bit-wise) results for each run.

NVPL_SPARSE_SPMV_SELL_ALG1

Default algorithm for Sliced Ellpack sparse matrix format. Provides deterministic (bit-wise) results for each run.

Performance#

  • NVPL_SPARSE_SPMV_CSR_ALG1 provides higher performance than NVPL_SPARSE_SPMV_COO_ALG1.

  • If the analysis step is omitted, the performance might be degraded.

  • nvpl_sparse_spmv_create_descr() allocates a small amount of memory for the descriptor but does not perform any expensive computations.

Notes#

  • The routine allows the CSR column indices (or CSC row indices) of matA to be unsorted.

See nvpl_sparse_status_t for the description of the return status.


nvpl_sparse_spsv()#

nvpl_sparse_status_t
nvpl_sparse_spsv_create_descr(nvpl_sparse_spsv_descr_t* descr)

nvpl_sparse_status_t
nvpl_sparse_spsv_destroy_descr(nvpl_sparse_spsv_descr_t descr)
nvpl_sparse_status_t
nvpl_sparse_spsv_buffer_size(nvpl_sparse_handle_t        handle,
                     nvpl_sparse_operation_t             op_A,
                     const void*                         alpha,
                     nvpl_sparse_const_sp_mat_descr_t    mat_A,
                     nvpl_sparse_const_dn_vec_descr_t    vec_X,
                     nvpl_sparse_dn_vec_descr_t          vec_Y,
                     nvpl_sparse_data_type_t             compute_type,
                     nvpl_sparse_spsv_alg_t              alg,
                     nvpl_sparse_spsv_descr_t            spsv_descr,
                     size_t*                             buffer_size)
nvpl_sparse_status_t
nvpl_sparse_spsv_analysis(nvpl_sparse_handle_t           handle,
                   nvpl_sparse_operation_t               op_A,
                   const void*                           alpha,
                   nvpl_sparse_const_sp_mat_descr_t      mat_A,
                   nvpl_sparse_const_dn_vec_descr_t      vec_X,
                   nvpl_sparse_dn_vec_descr_t            vec_Y,
                   nvpl_sparse_data_type_t               compute_type,
                   nvpl_sparse_spsv_alg_t                alg,
                   nvpl_sparse_spsv_descr_t              spsv_descr,
                   void*                                 external_buffer)
nvpl_sparse_status_t
nvpl_sparse_spsv_solve(nvpl_sparse_handle_t              handle,
                nvpl_sparse_operation_t                  op_A,
                const void*                              alpha,
                nvpl_sparse_const_sp_mat_descr_t         mat_A,
                nvpl_sparse_const_dn_vec_descr_t         vec_X,
                nvpl_sparse_dn_vec_descr_t               vec_Y,
                nvpl_sparse_data_type_t                  compute_type,
                nvpl_sparse_spsv_alg_t                   alg,
                nvpl_sparse_spsv_descr_t                 spsv_descr)

The function solves a system of linear equations whose coefficients are represented in a sparse triangular matrix:

\(op\left( \mathbf{A} \right) \cdot \mathbf{Y} = \alpha\mathbf{X}\)

where

  • op(A) is a sparse square matrix of size \(m \times m\)

  • X is a dense vector of size \(m\)

  • Y is a dense vector of size \(m\)

  • \(\alpha\) is a scalar

Also, for matrix A

\(\text{op}(A) = \begin{cases} A & \text{if}\; op = {\small{\texttt{NVPL_SPARSE_OPERATION_NON_TRANSPOSE}}} \\ A^{T} & \text{if}\; op = {\small{\texttt{NVPL_SPARSE_OPERATION_TRANSPOSE}}} \\ A^{H} & \text{if}\; op = {\small{\texttt{NVPL_SPARSE_OPERATION_CONJUGATE_TRANSPOSE}}} \\ \end{cases}\)

Routine usage#

To use this routine, you should:

  1. Create a descriptor using nvpl_sparse_spsv_create_descr(). The opaque data structure spsv_descr is used to share information among all functions.

  2. Call nvpl_sparse_spsv_buffer_size() to get the size of the workspace needed by nvpl_sparse_spsv_analysis().

  3. Allocate a workspace buffer of at least buffer_size bytes. The buffer must remain valid until the execution (nvpl_sparse_spsv_solve()) is complete, and should not be modified between the analysis and execution steps.

  4. Call nvpl_sparse_spsv_analysis() to perform the analysis.

  5. Call nvpl_sparse_spsv_solve() to execute the solve phase. This step can be performed multiple times with different right hand side vectors.

  6. Destroy the descriptor using nvpl_sparse_spsv_destroy_descr().

Analysis step is mandatory for this routine and cannot be omitted.

The function nvpl_sparse_spsv_update_matrix() can be used to update spsv_descr with new matrix values.

All parameters must be consistent across nvpl_sparse_spsv API calls and the matrix descriptors.

Parameters#

Param.

Memory

Meaning

handle

IN

Handle to the NVPL Sparse library context

op_A

IN

Operation op(A)

alpha

IN

\(\alpha\) scalar used for multiplication of type compute_type

mat_A

IN

Sparse matrix A

vec_X

IN

Dense vector X.

vec_Y

IN/OUT

Dense vector Y. Can be aliased with vec_X or be an alias of vec_X.

compute_type

IN

Datatype in which the computation is executed

alg

IN

Algorithm for the computation

buffer_size

OUT

Number of bytes of workspace needed by nvpl_sparse_spsv_analysis() and nvpl_sparse_spsv_solve()

external_buffer

IN/OUT

Pointer to a workspace buffer of at least bufferSize bytes. It is used by nvpl_sparse_spsv_analysis and nvpl_sparse_spsv_solve()

spsv_descr

IN/OUT

Opaque descriptor for storing internal data used across the three steps

Matrix update#

nvpl_sparse_status_t
nvpl_sparse_spsv_update_matrix(nvpl_sparse_handle_t       handle,
                        nvpl_sparse_spsv_descr_t          spsv_descr,
                        void*                             new_values,
                        nvpl_sparse_spsv_update_t         update_part)

nvpl_sparse_spsv_update_matrix() updates the sparse matrix after calling the analysis phase. This functions supports the following update strategies (update_part):

Strategy

Notes

NVPL_SPARSE_SPSV_UPDATE_GENERAL

Updates the sparse matrix values with values of new_values array

NVPL_SPARSE_SPSV_UPDATE_DIAGONAL

Updates the diagonal part of the matrix with diagonal values stored in new_values array. That is, new_values has the new diagonal values only

Supported types and formats#

The sparse matrix formats currently supported are listed below:

  • NVPL_SPARSE_FORMAT_CSR

  • NVPL_SPARSE_FORMAT_COO

  • NVPL_SPARSE_FORMAT_SLICED_ELL

The nvpl_sparse_spsv supports the following shapes and properties:

  • NVPL_SPARSE_FILL_MODE_LOWER and NVPL_SPARSE_FILL_MODE_UPPER fill modes

  • NVPL_SPARSE_DIAG_TYPE_NON_UNIT and NVPL_SPARSE_DIAG_TYPE_UNIT diagonal types

nvpl_sparse_spsv supports the following index type for representing the sparse matrix mat_A:

  • 32-bit indices (NVPL_SPARSE_INDEX_32I)

  • 64-bit indices (NVPL_SPARSE_INDEX_64I)

nvpl_sparse_spsv supports the following data types:

Uniform-precision computation:

A/X/ Y/computeType

NVPL_SPARSE_R_32F

NVPL_SPARSE_R_64F

NVPL_SPARSE_C_32F

NVPL_SPARSE_C_64F

nvpl_sparse_spsv supports the following algorithms:

Algorithm

Notes

NVPL_SPARSE_SPSV_ALG_DEFAULT

Default algorithm

Performance#

  • nvpl_sparse_spsv_create_descr() allocates a small amount of memory for the descriptor but does not perform any expensive computations.

Notes#

  • The routine requires extra storage memory (see nvpl_sparse_spsv_buffer_size()) for the analysis phase which is proportional to number of non-zero entries of the sparse matrix.

  • Provides deterministic (bit-wise) results for each run for the solving phase nvpl_sparse_spsv_solve().

  • The routine supports in-place operation

  • nvpl_sparse_spsv_buffer_size() and nvpl_sparse_spsv_analysis() routines accept NULL for vec_X and vec_Y

  • The routine allows the CSR column indices (or CSC row indices) of matA to be unsorted.

See nvpl_sparse_status_t for the description of the return status.


nvpl_sparse_spmm()#

nvpl_sparse_status_t
nvpl_sparse_spmm_create_descr(nvpl_sparse_spmm_descr_t* descr)

nvpl_sparse_status_t
nvpl_sparse_spmm_destroy_descr(nvpl_sparse_spmm_descr_t descr)
nvpl_sparse_status_t
nvpl_sparse_spmm_buffer_size(nvpl_sparse_handle_t           handle,
                        nvpl_sparse_operation_t             op_A,
                        nvpl_sparse_operation_t             op_B,
                        const void*                         alpha,
                        nvpl_sparse_const_sp_mat_descr_t    mat_A,
                        nvpl_sparse_const_dn_mat_descr_t    mat_B,
                        const void*                         beta,
                        nvpl_sparse_const_dn_mat_descr_t    mat_C,
                        nvpl_sparse_dn_mat_descr_t          mat_D,
                        nvpl_sparse_data_type_t             compute_type,
                        nvpl_sparse_spmm_alg_t              alg,
                        nvpl_sparse_spmm_descr_t            spmm_descr,
                        size_t*                             buffer_size)
nvpl_sparse_status_t
nvpl_sparse_spmm_analysis(nvpl_sparse_handle_t              handle,
                        nvpl_sparse_operation_t             op_A,
                        nvpl_sparse_operation_t             op_B,
                        const void*                         alpha,
                        nvpl_sparse_const_sp_mat_descr_t    mat_A,
                        nvpl_sparse_const_dn_mat_descr_t    mat_B,
                        const void*                         beta,
                        nvpl_sparse_const_dn_mat_descr_t    mat_C,
                        nvpl_sparse_dn_mat_descr_t          mat_D,
                        nvpl_sparse_data_type_t             compute_type,
                        nvpl_sparse_spmm_alg_t              alg,
                        nvpl_sparse_spmm_descr_t            spmm_descr,
                        void*                               external_buffer)
nvpl_sparse_status_t
nvpl_sparse_spmm(nvpl_sparse_handle_t          handle,
            nvpl_sparse_operation_t            op_A,
            nvpl_sparse_operation_t            op_B,
            const void*                        alpha,
            nvpl_sparse_const_sp_mat_descr_t   mat_A,
            nvpl_sparse_const_dn_mat_descr_t   mat_B,
            const void*                        beta,
            nvpl_sparse_const_dn_mat_descr_t   mat_C,
            nvpl_sparse_dn_mat_descr_t         mat_D,
            nvpl_sparse_data_type_t            compute_type,
            nvpl_sparse_spmm_alg_t             alg,
            nvpl_sparse_spmm_descr_t           spmm_descr)

This function performs the multiplication of a sparse matrix mat_A and a dense matrix mat_B

\(\mathbf{D} = \alpha op_A\left( \mathbf{A} \right) \cdot op_B\left( \mathbf{B} \right) + \beta\mathbf{C}\)

where

  • \(op_A(A)\) is a sparse matrix of size \(m \times k\)

  • \(op_B(B)\) is a dense matrix of size \(k \times n\)

  • \(C\) is a dense matrix of size \(m \times n\)

  • \(D\) is a dense matrix of size \(m \times n\)

  • \(\alpha\) and \(\beta\) are scalars

Also, for matrix A:

\(op_A(A) == \begin{cases} A & \text{if}\; op_A = {\small{\texttt{NVPL_SPARSE_OPERATION_NON_TRANSPOSE}}} \\ A^{T} & \text{if}\; op_A = {\small{\texttt{NVPL_SPARSE_OPERATION_TRANSPOSE}}} \\ A^{H} & \text{if}\; op_A = {\small{\texttt{NVPL_SPARSE_OPERATION_CONJUGATE_TRANSPOSE}}} \\ \end{cases}\)

and similarly for matrix B:

\(op_B(B) == \begin{cases} B & \text{if}\; op_B = {\small{\texttt{NVPL_SPARSE_OPERATION_NON_TRANSPOSE}}} \\ B^{T} & \text{if}\; op_B = {\small{\texttt{NVPL_SPARSE_OPERATION_TRANSPOSE}}} \\ B^{H} & \text{if}\; op_B = {\small{\texttt{NVPL_SPARSE_OPERATION_CONJUGATE_TRANSPOSE}}} \\ \end{cases}\)

The function nvpl_sparse_spmm_buffer_size() computes the size of the workspace needed by nvpl_sparse_spmm_analysis() and nvpl_sparse_spmm().

Routine usage#

To use this routine, you should:

  1. Create a descriptor using nvpl_sparse_spmm_create_descr(). The opaque data structure spmm_descr is used to share information among all functions.

  2. Call nvpl_sparse_spmm_buffer_size() to get the size of the workspace needed by nvpl_sparse_spmm_analysis().

  3. Allocate a workspace buffer of at least buffer_size bytes. The buffer must remain valid until the execution (nvpl_sparse_spmm()) is complete, and should not be modified between the analysis and execution steps.

  4. Call nvpl_sparse_spmm_analysis() to perform the analysis.

  5. Call nvpl_sparse_spmm() to perform the multiplication. This step can be performed multiple times with different dense matrices mat_B, mat_C and output dense matrix mat_D.

  6. Destroy the descriptor using nvpl_sparse_spmm_destroy_descr().

Analysis step is mandatory for this routine and cannot be omitted.

Parameters#

Param.

In/out

Meaning

handle

IN

Handle to the NVPL Sparse library context

op_A

IN

Operation op_A(A)

op_B

IN

Operation op_B(B)

alpha

IN

\(\alpha\) scalar used for multiplication of type compute_type

mat_A

IN

Sparse matrix A

mat_B

IN

Dense matrix B

beta

IN

\(\beta\) scalar used for multiplication of type compute_type

mat_C

IN

Dense matrix C

mat_D

OUT

Dense matrix D. Can be aliased with mat_C.

compute_type

IN

Datatype in which the computation is executed

alg

IN

Algorithm for the computation

buffer_size

OUT

Number of bytes of workspace needed by nvpl_sparse_spmm_analysis and nvpl_sparse_spmm

external_buffer

IN/OUT

Pointer to a workspace buffer of at least buffer_size bytes used by nvpl_sparse_spmm_analysis and nvpl_sparse_spmm()

spmm_descr

IN/OUT

Opaque descriptor for storing internal data used across the three steps

Supported types and formats#

The sparse matrix formats currently supported are listed below:

  • NVPL_SPARSE_FORMAT_COO

  • NVPL_SPARSE_FORMAT_CSR

  • NVPL_SPARSE_FORMAT_CSC

nvpl_sparse_spmm() supports the following index type for representing the sparse matrix mat_A:

  • 32-bit indices (NVPL_SPARSE_INDEX_32I)

  • 64-bit indices (NVPL_SPARSE_INDEX_64I)

nvpl_sparse_spmm() supports the following data types in uniform-precision computation:

A/B/C/D/computeType

NVPL_SPARSE_R_32F

NVPL_SPARSE_R_64F

NVPL_SPARSE_C_32F

NVPL_SPARSE_C_64F

nvpl_sparse_spmm() supports the following algorithms:

Algorithm

Notes

NVPL_SPARSE_SPMM_ALG_DEFAULT

Default algorithm for any sparse matrix format.

NVPL_SPARSE_SPMM_COO_ALG1

Default algorithm for COO sparse matrix format.

NVPL_SPARSE_SPMM_CSR_ALG1

Provides the best performance for CSR when op_A = NVPL_SPARSE_OPERATION_NON_TRANSPOSE and for CSC when op_A = NVPL_SPARSE_OPERATION_TRANSPOSE. Produces deterministic (bit-wise) results for each run in these cases. In other cases may produce slightly different results during different runs with the same input parameters.

NVPL_SPARSE_SPMM_CSR_ALG2

Provides the best performance for CSR when op_A != NVPL_SPARSE_OPERATION_NON_TRANSPOSE and for CSC when op_A = NVPL_SPARSE_OPERATION_NON_TRANSPOSE. Produces deterministic (bit-wise) results for each run.

Performance#

  • nvpl_sparse_spmm_create_descr() allocates a small amount of memory for the descriptor but does not perform any expensive computations.

Notes#

nvpl_sparse_spmm() has the following properties:

  • Usage of nvpl_sparse_spmm_buffer_size and nvpl_sparse_spmm_analysis is required before calling nvpl_sparse_spmm. Otherwise, the routine will return NVPL_SPARSE_STATUS_NOT_SUPPORTED.

  • For COO format, the routine requires the indices of mat_A to be sorted by row indices.

  • The routine allows the CSR column indices (or CSC row indices) of matA to be unsorted.

See nvpl_sparse_status_t for the description of the return status.