cuSPARSE Level 2 Function Reference

This chapter describes the sparse linear algebra functions that perform operations between sparse matrices and dense vectors.

cusparse<t>bsrmv()

cusparseStatus_t
cusparseSbsrmv(cusparseHandle_t         handle,
               cusparseDirection_t      dir,
               cusparseOperation_t      trans,
               int                      mb,
               int                      nb,
               int                      nnzb,
               const float*             alpha,
               const cusparseMatDescr_t descr,
               const float*             bsrVal,
               const int*               bsrRowPtr,
               const int*               bsrColInd,
               int                      blockDim,
               const float*             x,
               const float*             beta,
               float*                   y)

cusparseStatus_t
cusparseDbsrmv(cusparseHandle_t         handle,
               cusparseDirection_t      dir,
               cusparseOperation_t      trans,
               int                      mb,
               int                      nb,
               int                      nnzb,
               const double*            alpha,
               const cusparseMatDescr_t descr,
               const double*            bsrVal,
               const int*               bsrRowPtr,
               const int*               bsrColInd,
               int                      blockDim,
               const double*            x,
               const double*            beta,
               double*                  y)

cusparseStatus_t
cusparseCbsrmv(cusparseHandle_t         handle,
               cusparseDirection_t      dir,
               cusparseOperation_t      trans,
               int                      mb,
               int                      nb,
               int                      nnzb,
               const cuComplex*         alpha,
               const cusparseMatDescr_t descr,
               const cuComplex*         bsrVal,
               const int*               bsrRowPtr,
               const int*               bsrColInd,
               int                      blockDim,
               const cuComplex*         x,
               const cuComplex*         beta,
               cuComplex*               y)

cusparseStatus_t
cusparseZbsrmv(cusparseHandle_t         handle,
               cusparseDirection_t      dir,
               cusparseOperation_t      trans,
               int                      mb,
               int                      nb,
               int                      nnzb,
               const cuDoubleComplex*   alpha,
               const cusparseMatDescr_t descr,
               const cuDoubleComplex*   bsrVal,
               const int*               bsrRowPtr,
               const int*               bsrColInd,
               int                      blockDim,
               const cuDoubleComplex*   x,
               const cuDoubleComplex*   beta,
               cuDoubleComplex*         y)

This function performs the matrix-vector operation

\(\text{y} = \alpha \ast \text{op}(A) \ast \text{x} + \beta \ast \text{y}\)

where \(A\text{ is an }(mb \ast blockDim) \times (nb \ast blockDim)\) sparse matrix that is defined in BSR storage format by the three arrays bsrVal, bsrRowPtr, and bsrColInd); x and y are vectors; \(\alpha\text{ and }\beta\) are scalars; and

\(\text{op}(A) = \begin{cases} A & \text{if trans == CUSPARSE_OPERATION_NON_TRANSPOSE} \\ A^{T} & \text{if trans == CUSPARSE_OPERATION_TRANSPOSE} \\ A^{H} & \text{if trans == CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE} \\ \end{cases}\)

bsrmv() has the following properties:

  • The routine requires no extra storage

  • The routine supports asynchronous execution

  • The routine supports CUDA graph capture

Several comments on bsrmv():

  • Only blockDim > 1 is supported

  • Only CUSPARSE_OPERATION_NON_TRANSPOSE is supported, that is

\(\text{y} = \alpha \ast A \ast \text{x} + \beta{} \ast \text{y}\)

  • Only CUSPARSE_MATRIX_TYPE_GENERAL is supported.

  • The size of vector x should be \((nb \ast blockDim)\) at least, and the size of vector y should be \((mb \ast blockDim)\) at least; otherwise, the kernel may return CUSPARSE_STATUS_EXECUTION_FAILED because of an out-of-bounds array.

For example, suppose the user has a CSR format and wants to try bsrmv(), the following code demonstrates how to use csr2bsr() conversion and bsrmv() multiplication in single precision.

// Suppose that A is m x n sparse matrix represented by CSR format,
// hx is a host vector of size n, and hy is also a host vector of size m.
// m and n are not multiple of blockDim.
// step 1: transform CSR to BSR with column-major order
int base, nnz;
int nnzb;
cusparseDirection_t dirA = CUSPARSE_DIRECTION_COLUMN;
int mb = (m + blockDim-1)/blockDim;
int nb = (n + blockDim-1)/blockDim;
cudaMalloc((void**)&bsrRowPtrC, sizeof(int) *(mb+1));
cusparseXcsr2bsrNnz(handle, dirA, m, n,
        descrA, csrRowPtrA, csrColIndA, blockDim,
        descrC, bsrRowPtrC, &nnzb);
cudaMalloc((void**)&bsrColIndC, sizeof(int)*nnzb);
cudaMalloc((void**)&bsrValC, sizeof(float)*(blockDim*blockDim)*nnzb);
cusparseScsr2bsr(handle, dirA, m, n,
        descrA, csrValA, csrRowPtrA, csrColIndA, blockDim,
        descrC, bsrValC, bsrRowPtrC, bsrColIndC);
// step 2: allocate vector x and vector y large enough for bsrmv
cudaMalloc((void**)&x, sizeof(float)*(nb*blockDim));
cudaMalloc((void**)&y, sizeof(float)*(mb*blockDim));
cudaMemcpy(x, hx, sizeof(float)*n, cudaMemcpyHostToDevice);
cudaMemcpy(y, hy, sizeof(float)*m, cudaMemcpyHostToDevice);
// step 3: perform bsrmv
cusparseSbsrmv(handle, dirA, transA, mb, nb, nnzb, &alpha,
   descrC, bsrValC, bsrRowPtrC, bsrColIndC, blockDim, x, &beta, y);

Input

handle

handle to the cuSPARSE library context.

dir

storage format of blocks, either CUSPARSE_DIRECTION_ROW or CUSPARSE_DIRECTION_COLUMN.

trans

the operation \(\text{op}(A)\) . Only CUSPARSE_OPERATION_NON_TRANSPOSE is supported.

mb

number of block rows of matrix \(A\).

nb

number of block columns of matrix \(A\).

nnzb

number of nonzero blocks of matrix \(A\).

alpha

<type> scalar used for multiplication.

descr

the descriptor of matrix \(A\). The supported matrix type is CUSPARSE_MATRIX_TYPE_GENERAL. Also, the supported index bases are CUSPARSE_INDEX_BASE_ZERO and CUSPARSE_INDEX_BASE_ONE.

bsrVal

<type> array of nnz\(( =\)csrRowPtrA(mb)\(-\)csrRowPtrA(0)\()\) nonzero blocks of matrix \(A\).

bsrRowPtr

integer array of mb\(+ 1\) elements that contains the start of every block row and the end of the last block row plus one.

bsrColInd

integer array of nnz\(( =\)csrRowPtrA(mb)\(-\)csrRowPtrA(0)\()\) column indices of the nonzero blocks of matrix \(A\).

blockDim

block dimension of sparse matrix \(A\), larger than zero.

x

<type> vector of \(nb \ast blockDim\) elements.

beta

<type> scalar used for multiplication. If beta is zero, y does not have to be a valid input.

y

<type> vector of \(mb \ast blockDim\) elements.

Output

y

<type> updated vector.

See cusparseStatus_t for the description of the return status.

cusparse<t>bsrxmv() [DEPRECATED]

> The routine will be removed in the next major release

cusparseStatus_t
cusparseSbsrxmv(cusparseHandle_t         handle,
                cusparseDirection_t      dir,
                cusparseOperation_t      trans,
                int                      sizeOfMask,
                int                      mb,
                int                      nb,
                int                      nnzb,
                const float*             alpha,
                const cusparseMatDescr_t descr,
                const float*             bsrVal,
                const int*               bsrMaskPtr,
                const int*               bsrRowPtr,
                const int*               bsrEndPtr,
                const int*               bsrColInd,
                int                      blockDim,
                const float*             x,
                const float*             beta,
                float*                   y)

cusparseStatus_t
cusparseDbsrxmv(cusparseHandle_t         handle,
                cusparseDirection_t      dir,
                cusparseOperation_t      trans,
                int                      sizeOfMask,
                int                      mb,
                int                      nb,
                int                      nnzb,
                const double*            alpha,
                const cusparseMatDescr_t descr,
                const double*            bsrVal,
                const int*               bsrMaskPtr,
                const int*               bsrRowPtr,
                const int*               bsrEndPtr,
                const int*               bsrColInd,
                int                      blockDim,
                const double*            x,
                const double*            beta,
                double*                  y)

cusparseStatus_t
cusparseCbsrxmv(cusparseHandle_t         handle,
                cusparseDirection_t      dir,
                cusparseOperation_t      trans,
                int                      sizeOfMask,
                int                      mb,
                int                      nb,
                int                      nnzb,
                const cuComplex*         alpha,
                const cusparseMatDescr_t descr,
                const cuComplex*         bsrVal,
                const int*               bsrMaskPtr,
                const int*               bsrRowPtr,
                const int*               bsrEndPtr,
                const int*               bsrColInd,
                int                      blockDim,
                const cuComplex*         x,
                const cuComplex*         beta,
                cuComplex*               y)

cusparseStatus_t
cusparseZbsrxmv(cusparseHandle_t         handle,
                cusparseDirection_t      dir,
                cusparseOperation_t      trans,
                int                      sizeOfMask,
                int                      mb,
                int                      nb,
                int                      nnzb,
                const cuDoubleComplex*   alpha,
                const cusparseMatDescr_t descr,
                const cuDoubleComplex*   bsrVal,
                const int*               bsrMaskPtr,
                const int*               bsrRowPtr,
                const int*               bsrEndPtr,
                const int*               bsrColInd,
                int                      blockDim,
                const cuDoubleComplex*   x,
                const cuDoubleComplex*   beta,
                cuDoubleComplex*         y)

This function performs a bsrmv and a mask operation

\(\text{y(mask)} = (\alpha \ast \text{op}(A) \ast \text{x} + \beta \ast \text{y})\text{(mask)}\)

where \(A\text{ is an }(mb \ast blockDim) \times (nb \ast blockDim)\) sparse matrix that is defined in BSRX storage format by the four arrays bsrVal, bsrRowPtr, bsrEndPtr, and bsrColInd); x and y are vectors; \(\alpha\text{~and~}\beta\) are scalars; and

\(\text{op}(A) = \begin{cases} A & \text{if trans == CUSPARSE_OPERATION_NON_TRANSPOSE} \\ A^{T} & \text{if trans == CUSPARSE_OPERATION_TRANSPOSE} \\ A^{H} & \text{if trans == CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE} \\ \end{cases}\)

The mask operation is defined by array bsrMaskPtr which contains updated block row indices of \(y\) . If row \(i\) is not specified in bsrMaskPtr, then bsrxmv() does not touch row block \(i\) of \(A\) and \(y\) .

For example, consider the \(2 \times 3\) block matrix \(A\):

\(\begin{matrix} {A = \begin{bmatrix} A_{11} & A_{12} & O \\ A_{21} & A_{22} & A_{23} \\ \end{bmatrix}} \\ \end{matrix}\)

and its one-based BSR format (three vector form) is

\(\begin{matrix} \text{bsrVal} & = & \begin{bmatrix} A_{11} & A_{12} & A_{21} & A_{22} & A_{23} \\ \end{bmatrix} \\ \text{bsrRowPtr} & = & \begin{bmatrix} {1\phantom{.0}} & {3\phantom{.0}} & 6 \\ \end{bmatrix} \\ \text{bsrColInd} & = & \begin{bmatrix} {1\phantom{.0}} & {2\phantom{.0}} & {1\phantom{.0}} & {2\phantom{.0}} & 3 \\ \end{bmatrix} \\ \end{matrix}\)

Suppose we want to do the following bsrmv operation on a matrix \(\overset{¯}{A}\) which is slightly different from \(A\) .

\(\begin{bmatrix} y_{1} \\ y_{2} \\ \end{bmatrix}:=alpha \ast (\widetilde{A} = \begin{bmatrix} O & O & O \\ O & A_{22} & O \\ \end{bmatrix}) \ast \begin{bmatrix} x_{1} \\ x_{2} \\ x_{3} \\ \end{bmatrix} + \begin{bmatrix} y_{1} \\ {beta \ast y_{2}} \\ \end{bmatrix}\)

We don’t need to create another BSR format for the new matrix \(\overset{¯}{A}\) , all that we should do is to keep bsrVal and bsrColInd unchanged, but modify bsrRowPtr and add an additional array bsrEndPtr which points to the last nonzero elements per row of \(\overset{¯}{A}\) plus 1.

For example, the following bsrRowPtr and bsrEndPtr can represent matrix \(\overset{¯}{A}\) :

\(\begin{matrix} \text{bsrRowPtr} & = & \begin{bmatrix} {1\phantom{.0}} & 4 \\ \end{bmatrix} \\ \text{bsrEndPtr} & = & \begin{bmatrix} {1\phantom{.0}} & 5 \\ \end{bmatrix} \\ \end{matrix}\)

Further we can use a mask operator (specified by array bsrMaskPtr) to update particular block row indices of \(y\) only because \(y_{1}\) is never changed. In this case, bsrMaskPtr\(=\) [2] and sizeOfMask=1.

The mask operator is equivalent to the following operation:

\(\begin{bmatrix} ? \\ y_{2} \\ \end{bmatrix}:=alpha \ast \begin{bmatrix} ? & ? & ? \\ O & A_{22} & O \\ \end{bmatrix} \ast \begin{bmatrix} x_{1} \\ x_{2} \\ x_{3} \\ \end{bmatrix} + beta \ast \begin{bmatrix} ? \\ y_{2} \\ \end{bmatrix}\)

If a block row is not present in the bsrMaskPtr, then no calculation is performed on that row, and the corresponding value in y is unmodified. The question mark “?” is used to inidcate row blocks not in bsrMaskPtr.

In this case, first row block is not present in bsrMaskPtr, so bsrRowPtr[0] and bsrEndPtr[0] are not touched also.

\(\begin{matrix} \text{bsrRowPtr} & = & \begin{bmatrix} {?\phantom{.0}} & 4 \\ \end{bmatrix} \\ \text{bsrEndPtr} & = & \begin{bmatrix} {?\phantom{.0}} & 5 \\ \end{bmatrix} \\ \end{matrix}\)

bsrxmv() has the following properties:

  • The routine requires no extra storage

  • The routine supports asynchronous execution

  • The routine supports CUDA graph capture

A couple of comments on bsrxmv():

  • Only blockDim > 1 is supported

  • Only CUSPARSE_OPERATION_NON_TRANSPOSE and CUSPARSE_MATRIX_TYPE_GENERAL are supported.

  • Parameters bsrMaskPtr, bsrRowPtr, bsrEndPtr and bsrColInd are consistent with base index, either one-based or zero-based. The above example is one-based.

Input

handle

handle to the cuSPARSE library context.

dir

storage format of blocks, either CUSPARSE_DIRECTION_ROW or CUSPARSE_DIRECTION_COLUMN.

trans

the operation \(\text{op}(A)\) . Only CUSPARSE_OPERATION_NON_TRANSPOSE is supported.

sizeOfMask

number of updated block rows of \(y\).

mb

number of block rows of matrix \(A\).

nb

number of block columns of matrix \(A\).

nnzb

number of nonzero blocks of matrix \(A\).

alpha

<type> scalar used for multiplication.

descr

the descriptor of matrix \(A\). The supported matrix type is CUSPARSE_MATRIX_TYPE_GENERAL. Also, the supported index bases are CUSPARSE_INDEX_BASE_ZERO and CUSPARSE_INDEX_BASE_ONE.

bsrVal

<type> array of nnz nonzero blocks of matrix \(A\).

bsrMaskPtr

integer array of sizeOfMask elements that contains the indices corresponding to updated block rows.

bsrRowPtr

integer array of mb elements that contains the start of every block row.

bsrEndPtr

integer array of mb elements that contains the end of the every block row plus one.

bsrColInd

integer array of nnzb column indices of the nonzero blocks of matrix \(A\).

blockDim

block dimension of sparse matrix \(A\), larger than zero.

x

<type> vector of \(nb \ast blockDim\) elements.

beta

<type> scalar used for multiplication. If beta is zero, y does not have to be a valid input.

y

<type> vector of \(mb \ast blockDim\) elements.

See cusparseStatus_t for the description of the return status.

cusparse<t>bsrsv2_bufferSize() [DEPRECATED]

> The routine will be removed in the next major release

cusparseStatus_t
cusparseSbsrsv2_bufferSize(cusparseHandle_t         handle,
                           cusparseDirection_t      dirA,
                           cusparseOperation_t      transA,
                           int                      mb,
                           int                      nnzb,
                           const cusparseMatDescr_t descrA,
                           float*                   bsrValA,
                           const int*               bsrRowPtrA,
                           const int*               bsrColIndA,
                           int                      blockDim,
                           bsrsv2Info_t             info,
                           int*                     pBufferSizeInBytes)

cusparseStatus_t
cusparseDbsrsv2_bufferSize(cusparseHandle_t         handle,
                           cusparseDirection_t      dirA,
                           cusparseOperation_t      transA,
                           int                      mb,
                           int                      nnzb,
                           const cusparseMatDescr_t descrA,
                           double*                  bsrValA,
                           const int*               bsrRowPtrA,
                           const int*               bsrColIndA,
                           int                      blockDim,
                           bsrsv2Info_t             info,
                           int*                     pBufferSizeInBytes)

cusparseStatus_t
cusparseCbsrsv2_bufferSize(cusparseHandle_t         handle,
                           cusparseDirection_t      dirA,
                           cusparseOperation_t      transA,
                           int                      mb,
                           int                      nnzb,
                           const cusparseMatDescr_t descrA,
                           cuComplex*               bsrValA,
                           const int*               bsrRowPtrA,
                           const int*               bsrColIndA,
                           int                      blockDim,
                           bsrsv2Info_t             info,
                           int*                     pBufferSizeInBytes)

cusparseStatus_t
cusparseZbsrsv2_bufferSize(cusparseHandle_t         handle,
                           cusparseDirection_t      dirA,
                           cusparseOperation_t      transA,
                           int                      mb,
                           int                      nnzb,
                           const cusparseMatDescr_t descrA,
                           cuDoubleComplex*         bsrValA,
                           const int*               bsrRowPtrA,
                           const int*               bsrColIndA,
                           int                      blockDim,
                           bsrsv2Info_t             info,
                           int*                     pBufferSizeInBytes)

This function returns size of the buffer used in bsrsv2, a new sparse triangular linear system op(A)*y =\(\alpha\)x.

A is an (mb*blockDim)x(mb*blockDim) sparse matrix that is defined in BSR storage format by the three arrays bsrValA, bsrRowPtrA, and bsrColIndA); x and y are the right-hand-side and the solution vectors; \(\alpha\) is a scalar; and

\(\text{op}(A) = \begin{cases} A & \text{if trans == CUSPARSE_OPERATION_NON_TRANSPOSE} \\ A^{T} & \text{if trans == CUSPARSE_OPERATION_TRANSPOSE} \\ A^{H} & \text{if trans == CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE} \\ \end{cases}\)

Although there are six combinations in terms of parameter trans and the upper (lower) triangular part of A, bsrsv2_bufferSize() returns the maximum size buffer among these combinations. The buffer size depends on the dimensions mb, blockDim, and the number of nonzero blocks of the matrix nnzb. If the user changes the matrix, it is necessary to call bsrsv2_bufferSize() again to have the correct buffer size; otherwise a segmentation fault may occur.

  • The routine requires no extra storage

  • The routine supports asynchronous execution

  • The routine supports CUDA graph capture

Input

handle

handle to the cuSPARSE library context.

dirA

storage format of blocks, either CUSPARSE_DIRECTION_ROW or CUSPARSE_DIRECTION_COLUMN.

transA

the operation \(\text{op}(A)\) .

mb

number of block rows of matrix A.

nnzb

number of nonzero blocks of matrix A.

descrA

the descriptor of matrix A. The supported matrix type is CUSPARSE_MATRIX_TYPE_GENERAL, while the supported diagonal types are CUSPARSE_DIAG_TYPE_UNIT and CUSPARSE_DIAG_TYPE_NON_UNIT.

bsrValA

<type> array of nnzb\(( =\)bsrRowPtrA(mb)\(-\)bsrRowPtrA(0)\()\) nonzero blocks of matrix A.

bsrRowPtrA

integer array of mb\(+ 1\) elements that contains the start of every block row and the end of the last block row plus one.

bsrColIndA

integer array of nnzb\(( =\)bsrRowPtrA(mb)\(-\)bsrRowPtrA(0)\()\) column indices of the nonzero blocks of matrix A.

blockDim

block dimension of sparse matrix A; must be larger than zero.

Output

info

record of internal states based on different algorithms.

pBufferSizeInBytes

number of bytes of the buffer used in the bsrsv2_analysis() and bsrsv2_solve().

See cusparseStatus_t for the description of the return status.

cusparse<t>bsrsv2_analysis() [DEPRECATED]

> The routine will be removed in the next major release

cusparseStatus_t
cusparseSbsrsv2_analysis(cusparseHandle_t         handle,
                         cusparseDirection_t      dirA,
                         cusparseOperation_t      transA,
                         int                      mb,
                         int                      nnzb,
                         const cusparseMatDescr_t descrA,
                         const float*             bsrValA,
                         const int*               bsrRowPtrA,
                         const int*               bsrColIndA,
                         int                      blockDim,
                         bsrsv2Info_t             info,
                         cusparseSolvePolicy_t    policy,
                         void*                    pBuffer)

cusparseStatus_t
cusparseDbsrsv2_analysis(cusparseHandle_t         handle,
                         cusparseDirection_t      dirA,
                         cusparseOperation_t      transA,
                         int                      mb,
                         int                      nnzb,
                         const cusparseMatDescr_t descrA,
                         const double*            bsrValA,
                         const int*               bsrRowPtrA,
                         const int*               bsrColIndA,
                         int                      blockDim,
                         bsrsv2Info_t             info,
                         cusparseSolvePolicy_t    policy,
                         void*                    pBuffer)

cusparseStatus_t
cusparseDbsrsv2_analysis(cusparseHandle_t         handle,
                         cusparseDirection_t      dirA,
                         cusparseOperation_t      transA,
                         int                      mb,
                         int                      nnzb,
                         const cusparseMatDescr_t descrA,
                         const cuComplex*         bsrValA,
                         const int*               bsrRowPtrA,
                         const int*               bsrColIndA,
                         int                      blockDim,
                         bsrsv2Info_t             info,
                         cusparseSolvePolicy_t    policy,
                         void*                    pBuffer)

cusparseStatus_t
cusparseZbsrsv2_analysis(cusparseHandle_t         handle,
                         cusparseDirection_t      dirA,
                         cusparseOperation_t      transA,
                         int                      mb,
                         int                      nnzb,
                         const cusparseMatDescr_t descrA,
                         const cuDoubleComplex*   bsrValA,
                         const int*               bsrRowPtrA,
                         const int*               bsrColIndA,
                         int                      blockDim,
                         bsrsv2Info_t             info,
                         cusparseSolvePolicy_t    policy,
                         void*                    pBuffer)

This function performs the analysis phase of bsrsv2, a new sparse triangular linear system op(A)*y =\(\alpha\)x.

A is an (mb*blockDim)x(mb*blockDim) sparse matrix that is defined in BSR storage format by the three arrays bsrValA, bsrRowPtrA, and bsrColIndA); x and y are the right-hand side and the solution vectors; \(\alpha\) is a scalar; and

\(\text{op}(A) = \begin{cases} A & \text{if trans == CUSPARSE_OPERATION_NON_TRANSPOSE} \\ A^{T} & \text{if trans == CUSPARSE_OPERATION_TRANSPOSE} \\ A^{H} & \text{if trans == CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE} \\ \end{cases}\)

The block of BSR format is of size blockDim*blockDim, stored as column-major or row-major as determined by parameter dirA, which is either CUSPARSE_DIRECTION_COLUMN or CUSPARSE_DIRECTION_ROW. The matrix type must be CUSPARSE_MATRIX_TYPE_GENERAL, and the fill mode and diagonal type are ignored.

It is expected that this function will be executed only once for a given matrix and a particular operation type.

This function requires a buffer size returned by bsrsv2_bufferSize(). The address of pBuffer must be multiple of 128 bytes. If it is not, CUSPARSE_STATUS_INVALID_VALUE is returned.

Function bsrsv2_analysis() reports a structural zero and computes level information, which stored in the opaque structure info. The level information can extract more parallelism for a triangular solver. However bsrsv2_solve() can be done without level information. To disable level information, the user needs to specify the policy of the triangular solver as CUSPARSE_SOLVE_POLICY_NO_LEVEL.

Function bsrsv2_analysis() always reports the first structural zero, even when parameter policy is CUSPARSE_SOLVE_POLICY_NO_LEVEL. No structural zero is reported if CUSPARSE_DIAG_TYPE_UNIT is specified, even if block A(j,j) is missing for some j. The user needs to call cusparseXbsrsv2_zeroPivot() to know where the structural zero is.

It is the user’s choice whether to call bsrsv2_solve() if bsrsv2_analysis() reports a structural zero. In this case, the user can still call bsrsv2_solve(), which will return a numerical zero at the same position as a structural zero. However the result x is meaningless.

  • This function requires temporary extra storage that is allocated internally

  • The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available

  • The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available

Input

handle

handle to the cuSPARSE library context.

dirA

storage format of blocks, either CUSPARSE_DIRECTION_ROW or CUSPARSE_DIRECTION_COLUMN.

transA

the operation \(\text{op}(A)\) .

mb

number of block rows of matrix A.

nnzb

number of nonzero blocks of matrix A.

descrA

the descriptor of matrix A. The supported matrix type is CUSPARSE_MATRIX_TYPE_GENERAL, while the supported diagonal types are CUSPARSE_DIAG_TYPE_UNIT and CUSPARSE_DIAG_TYPE_NON_UNIT.

bsrValA

<type> array of nnzb\(( =\)bsrRowPtrA(mb)\(-\)bsrRowPtrA(0)\()\) nonzero blocks of matrix A.

bsrRowPtrA

integer array of mb\(+ 1\) elements that contains the start of every block row and the end of the last block row plus one.

bsrColIndA

integer array of nnzb\(( =\)bsrRowPtrA(mb)\(-\)bsrRowPtrA(0)\()\) column indices of the nonzero blocks of matrix A.

blockDim

block dimension of sparse matrix A, larger than zero.

info

structure initialized using cusparseCreateBsrsv2Info().

policy

the supported policies are CUSPARSE_SOLVE_POLICY_NO_LEVEL and CUSPARSE_SOLVE_POLICY_USE_LEVEL.

pBuffer

buffer allocated by the user, the size is return by bsrsv2_bufferSize().

Output

info

structure filled with information collected during the analysis phase (that should be passed to the solve phase unchanged).

See cusparseStatus_t for the description of the return status.

cusparse<t>bsrsv2_solve() [DEPRECATED]

> The routine will be removed in the next major release

cusparseStatus_t
cusparseSbsrsv2_solve(cusparseHandle_t         handle,
                      cusparseDirection_t      dirA,
                      cusparseOperation_t      transA,
                      int                      mb,
                      int                      nnzb,
                      const float*             alpha,
                      const cusparseMatDescr_t descrA,
                      const float*             bsrValA,
                      const int*               bsrRowPtrA,
                      const int*               bsrColIndA,
                      int                      blockDim,
                      bsrsv2Info_t             info,
                      const float*             x,
                      float*                   y,
                      cusparseSolvePolicy_t    policy,
                      void*                    pBuffer)

cusparseStatus_t
cusparseDbsrsv2_solve(cusparseHandle_t         handle,
                      cusparseDirection_t      dirA,
                      cusparseOperation_t      transA,
                      int                      mb,
                      int                      nnzb,
                      const double*            alpha,
                      const cusparseMatDescr_t descrA,
                      const double*            bsrValA,
                      const int*               bsrRowPtrA,
                      const int*               bsrColIndA,
                      int                      blockDim,
                      bsrsv2Info_t             info,
                      const double*            x,
                      double*                  y,
                      cusparseSolvePolicy_t    policy,
                      void*                    pBuffer)

cusparseStatus_t
cusparseCbsrsv2_solve(cusparseHandle_t         handle,
                      cusparseDirection_t      dirA,
                      cusparseOperation_t      transA,
                      int                      mb,
                      int                      nnzb,
                      const cuComplex*         alpha,
                      const cusparseMatDescr_t descrA,
                      const cuComplex*         bsrValA,
                      const int*               bsrRowPtrA,
                      const int*               bsrColIndA,
                      int                      blockDim,
                      bsrsv2Info_t             info,
                      const cuComplex*         x,
                      cuComplex*               y,
                      cusparseSolvePolicy_t    policy,
                      void*                    pBuffer)

cusparseStatus_t
cusparseZbsrsv2_solve(cusparseHandle_t         handle,
                      cusparseDirection_t      dirA,
                      cusparseOperation_t      transA,
                      int                      mb,
                      int                      nnzb,
                      const cuDoubleComplex*   alpha,
                      const cusparseMatDescr_t descrA,
                      const cuDoubleComplex*   bsrValA,
                      const int*               bsrRowPtrA,
                      const int*               bsrColIndA,
                      int                      blockDim,
                      bsrsv2Info_t             info,
                      const cuDoubleComplex*   x,
                      cuDoubleComplex*         y,
                      cusparseSolvePolicy_t    policy,
                      void*                    pBuffer)

This function performs the solve phase of bsrsv2, a new sparse triangular linear system op(A)*y =\(\alpha\)x.

A is an (mb*blockDim)x(mb*blockDim) sparse matrix that is defined in BSR storage format by the three arrays bsrValA, bsrRowPtrA, and bsrColIndA); x and y are the right-hand-side and the solution vectors; \(\alpha\) is a scalar; and

\(\text{op}(A) = \begin{cases} A & \text{if trans == CUSPARSE_OPERATION_NON_TRANSPOSE} \\ A^{T} & \text{if trans == CUSPARSE_OPERATION_TRANSPOSE} \\ A^{H} & \text{if trans == CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE} \\ \end{cases}\)

The block in BSR format is of size blockDim*blockDim, stored as column-major or row-major as determined by parameter dirA, which is either CUSPARSE_DIRECTION_COLUMN or CUSPARSE_DIRECTION_ROW. The matrix type must be CUSPARSE_MATRIX_TYPE_GENERAL, and the fill mode and diagonal type are ignored. Function bsrsv02_solve() can support an arbitrary blockDim.

This function may be executed multiple times for a given matrix and a particular operation type.

This function requires a buffer size returned by bsrsv2_bufferSize(). The address of pBuffer must be multiple of 128 bytes. If it is not, CUSPARSE_STATUS_INVALID_VALUE is returned.

Although bsrsv2_solve() can be done without level information, the user still needs to be aware of consistency. If bsrsv2_analysis() is called with policy CUSPARSE_SOLVE_POLICY_USE_LEVEL, bsrsv2_solve() can be run with or without levels. On the other hand, if bsrsv2_analysis() is called with CUSPARSE_SOLVE_POLICY_NO_LEVEL, bsrsv2_solve() can only accept CUSPARSE_SOLVE_POLICY_NO_LEVEL; otherwise, CUSPARSE_STATUS_INVALID_VALUE is returned.

The level information may not improve the performance, but may spend extra time doing analysis. For example, a tridiagonal matrix has no parallelism. In this case, CUSPARSE_SOLVE_POLICY_NO_LEVEL performs better than CUSPARSE_SOLVE_POLICY_USE_LEVEL. If the user has an iterative solver, the best approach is to do bsrsv2_analysis() with CUSPARSE_SOLVE_POLICY_USE_LEVEL once. Then do bsrsv2_solve() with CUSPARSE_SOLVE_POLICY_NO_LEVEL in the first run, and with CUSPARSE_SOLVE_POLICY_USE_LEVEL in the second run, and pick the fastest one to perform the remaining iterations.

Function bsrsv02_solve() has the same behavior as csrsv02_solve(). That is, bsr2csr(bsrsv02(A)) = csrsv02(bsr2csr(A)). The numerical zero of csrsv02_solve() means there exists some zero A(j,j). The numerical zero of bsrsv02_solve() means there exists some block A(j,j) that is not invertible.

Function bsrsv2_solve() reports the first numerical zero, including a structural zero. No numerical zero is reported if CUSPARSE_DIAG_TYPE_UNIT is specified, even if A(j,j) is not invertible for some j. The user needs to call cusparseXbsrsv2_zeroPivot() to know where the numerical zero is.

The function supports the following properties if pBuffer != NULL

  • The routine requires no extra storage

  • The routine supports asynchronous execution

  • The routine supports CUDA graph capture

For example, suppose L is a lower triangular matrix with unit diagonal, then the following code solves L*y=x by level information.

// Suppose that L is m x m sparse matrix represented by BSR format,
// The number of block rows/columns is mb, and
// the number of nonzero blocks is nnzb.
// L is lower triangular with unit diagonal.
// Assumption:
// - dimension of matrix L is m(=mb*blockDim),
// - matrix L has nnz(=nnzb*blockDim*blockDim) nonzero elements,
// - handle is already created by cusparseCreate(),
// - (d_bsrRowPtr, d_bsrColInd, d_bsrVal) is BSR of L on device memory,
// - d_x is right hand side vector on device memory.
// - d_y is solution vector on device memory.
// - d_x and d_y are of size m.
cusparseMatDescr_t descr = 0;
bsrsv2Info_t info = 0;
int pBufferSize;
void *pBuffer = 0;
int structural_zero;
int numerical_zero;
const double alpha = 1.;
const cusparseSolvePolicy_t policy = CUSPARSE_SOLVE_POLICY_USE_LEVEL;
const cusparseOperation_t trans = CUSPARSE_OPERATION_NON_TRANSPOSE;
const cusparseDirection_t dir = CUSPARSE_DIRECTION_COLUMN;

// step 1: create a descriptor which contains
// - matrix L is base-1
// - matrix L is lower triangular
// - matrix L has unit diagonal, specified by parameter CUSPARSE_DIAG_TYPE_UNIT
//   (L may not have all diagonal elements.)
cusparseCreateMatDescr(&descr);
cusparseSetMatIndexBase(descr, CUSPARSE_INDEX_BASE_ONE);
cusparseSetMatFillMode(descr, CUSPARSE_FILL_MODE_LOWER);
cusparseSetMatDiagType(descr, CUSPARSE_DIAG_TYPE_UNIT);

// step 2: create a empty info structure
cusparseCreateBsrsv2Info(&info);

// step 3: query how much memory used in bsrsv2, and allocate the buffer
cusparseDbsrsv2_bufferSize(handle, dir, trans, mb, nnzb, descr,
    d_bsrVal, d_bsrRowPtr, d_bsrColInd, blockDim, &pBufferSize);

// pBuffer returned by cudaMalloc is automatically aligned to 128 bytes.
cudaMalloc((void**)&pBuffer, pBufferSize);

// step 4: perform analysis
cusparseDbsrsv2_analysis(handle, dir, trans, mb, nnzb, descr,
    d_bsrVal, d_bsrRowPtr, d_bsrColInd, blockDim,
    info, policy, pBuffer);
// L has unit diagonal, so no structural zero is reported.
status = cusparseXbsrsv2_zeroPivot(handle, info, &structural_zero);
if (CUSPARSE_STATUS_ZERO_PIVOT == status){
   printf("L(%d,%d) is missing\n", structural_zero, structural_zero);
}

// step 5: solve L*y = x
cusparseDbsrsv2_solve(handle, dir, trans, mb, nnzb, &alpha, descr,
   d_bsrVal, d_bsrRowPtr, d_bsrColInd, blockDim, info,
   d_x, d_y, policy, pBuffer);
// L has unit diagonal, so no numerical zero is reported.
status = cusparseXbsrsv2_zeroPivot(handle, info, &numerical_zero);
if (CUSPARSE_STATUS_ZERO_PIVOT == status){
   printf("L(%d,%d) is zero\n", numerical_zero, numerical_zero);
}

// step 6: free resources
cudaFree(pBuffer);
cusparseDestroyBsrsv2Info(info);
cusparseDestroyMatDescr(descr);
cusparseDestroy(handle);

Input

handle

handle to the cuSPARSE library context.

dirA

storage format of blocks, either CUSPARSE_DIRECTION_ROW or CUSPARSE_DIRECTION_COLUMN.

transA

the operation \(\text{op}(A)\).

mb

number of block rows and block columns of matrix A.

alpha

<type> scalar used for multiplication.

descrA

the descriptor of matrix A. The supported matrix type is CUSPARSE_MATRIX_TYPE_GENERAL, while the supported diagonal types are CUSPARSE_DIAG_TYPE_UNIT and CUSPARSE_DIAG_TYPE_NON_UNIT.

bsrValA

<type> array of nnzb\(( =\)bsrRowPtrA(mb)\(-\)bsrRowPtrA(0)\()\) nonzero blocks of matrix A.

bsrRowPtrA

integer array of mb\(+ 1\) elements that contains the start of every block row and the end of the last block row plus one.

bsrColIndA

integer array of nnzb\(( =\)bsrRowPtrA(mb)\(-\)bsrRowPtrA(0)\()\) column indices of the nonzero blocks of matrix A.

blockDim

block dimension of sparse matrix A, larger than zero.

info

structure with information collected during the analysis phase (that should have been passed to the solve phase unchanged).

x

<type> right-hand-side vector of size m.

policy

the supported policies are CUSPARSE_SOLVE_POLICY_NO_LEVEL and CUSPARSE_SOLVE_POLICY_USE_LEVEL.

pBuffer

buffer allocated by the user, the size is returned by bsrsv2_bufferSize().

Output

y

<type> solution vector of size m.

See cusparseStatus_t for the description of the return status.

cusparseXbsrsv2_zeroPivot() [DEPRECATED]

> The routine will be removed in the next major release

cusparseStatus_t
cusparseXbsrsv2_zeroPivot(cusparseHandle_t handle,
                          bsrsv2Info_t     info,
                          int*             position)

If the returned error code is CUSPARSE_STATUS_ZERO_PIVOT, position=j means A(j,j) is either structural zero or numerical zero (singular block). Otherwise position=-1.

The position can be 0-based or 1-based, the same as the matrix.

Function cusparseXbsrsv2_zeroPivot() is a blocking call. It calls cudaDeviceSynchronize() to make sure all previous kernels are done.

The position can be in the host memory or device memory. The user can set the proper mode with cusparseSetPointerMode().

  • The routine requires no extra storage

  • The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available

  • The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available

Input

handle

handle to the cuSPARSE library context.

info

info contains a structural zero or numerical zero if the user already called bsrsv2_analysis() or bsrsv2_solve().

Output

position

if no structural or numerical zero, position is -1; otherwise if A(j,j) is missing or U(j,j) is zero, position=j.

See cusparseStatus_t for the description of the return status

cusparse<t>gemvi()

cusparseStatus_t
cusparseSgemvi_bufferSize(cusparseHandle_t    handle,
                          cusparseOperation_t transA,
                          int                 m,
                          int                 n,
                          int                 nnz,
                          int*                pBufferSize)

cusparseStatus_t
cusparseDgemvi_bufferSize(cusparseHandle_t    handle,
                          cusparseOperation_t transA,
                          int                 m,
                          int                 n,
                          int                 nnz,
                          int*                pBufferSize)

cusparseStatus_t
cusparseCgemvi_bufferSize(cusparseHandle_t    handle,
                          cusparseOperation_t transA,
                          int                 m,
                          int                 n,
                          int                 nnz,
                          int*                pBufferSize)

cusparseStatus_t
cusparseZgemvi_bufferSize(cusparseHandle_t    handle,
                          cusparseOperation_t transA,
                          int                 m,
                          int                 n,
                          int                 nnz,
                          int*                pBufferSize)
cusparseStatus_t
cusparseSgemvi(cusparseHandle_t     handle,
               cusparseOperation_t  transA,
               int                  m,
               int                  n,
               const float*         alpha,
               const float*         A,
               int                  lda,
               int                  nnz,
               const float*         x,
               const int*           xInd,
               const float*         beta,
               float*               y,
               cusparseIndexBase_t  idxBase,
               void*                pBuffer)

cusparseStatus_t
cusparseDgemvi(cusparseHandle_t     handle,
               cusparseOperation_t  transA,
               int                  m,
               int                  n,
               const double*        alpha,
               const double*        A,
               int                  lda,
               int                  nnz,
               const double*        x,
               const int*           xInd,
               const double*        beta,
               double*              y,
               cusparseIndexBase_t  idxBase,
               void*                pBuffer)

cusparseStatus_t
cusparseCgemvi(cusparseHandle_t     handle,
               cusparseOperation_t  transA,
               int                  m,
               int                  n,
               const cuComplex*     alpha,
               const cuComplex*     A,
               int                  lda,
               int                  nnz,
               const cuComplex*     x,
               const int*           xInd,
               const cuComplex*     beta,
               cuComplex*           y,
               cusparseIndexBase_t  idxBase,
               void*                pBuffer)

cusparseStatus_t
cusparseZgemvi(cusparseHandle_t       handle,
               cusparseOperation_t    transA,
               int                    m,
               int                    n,
               const cuDoubleComplex* alpha,
               const cuDoubleComplex* A,
               int                    lda,
               int                    nnz,
               const cuDoubleComplex* x,
               const int*             xInd,
               const cuDoubleComplex* beta,
               cuDoubleComplex*       y,
               cusparseIndexBase_t    idxBase,
               void*                  pBuffer)

This function performs the matrix-vector operation

\(\text{y} = \alpha \ast \text{op}(A) \ast \text{x} + \beta \ast \text{y}\)

A is an m×n dense matrix and a sparse vector x that is defined in a sparse storage format by the two arrays xVal, xInd of length nnz, and y is a dense vector; \(\alpha\)and \(\beta\)are scalars; and

\(\text{op}(A) = \begin{cases} A & \text{if trans == CUSPARSE_OPERATION_NON_TRANSPOSE} \\ A^{T} & \text{if trans == CUSPARSE_OPERATION_TRANSPOSE} \\ \end{cases}\)

To simplify the implementation, we have not (yet) optimized the transpose multiple case. We recommend the following for users interested in this case.

  1. Convert the matrix from CSR to CSC format using one of the csr2csc() functions. Notice that by interchanging the rows and columns of the result you are implicitly transposing the matrix.

  2. Call the gemvi() function with the cusparseOperation_t parameter set to CUSPARSE_OPERATION_NON_TRANSPOSE and with the interchanged rows and columns of the matrix stored in CSC format. This (implicitly) multiplies the vector by the transpose of the matrix in the original CSR format.

    • The routine requires no extra storage

    • The routine supports asynchronous execution

    • The routine supports CUDA graph capture

The function cusparse<t>gemvi_bufferSize() returns the size of buffer used in cusparse<t>gemvi().

Input

handle

Handle to the cuSPARSE library context.

trans

The operation \(\text{op}(A)\).

m

Number of rows of matrix A.

n

Number of columns of matrix A.

alpha

<type> scalar used for multiplication.

A

The pointer to dense matrix A.

lda

Size of the leading dimension of A.

nnz

Number of nonzero elements of vector x.

x

<type> sparse vector of nnz elements of size n if \(\text{op}(A)=A\), and size m if \(\text{op}(A)=A^{T}\).

xInd

Indices of non-zero values in x.

beta

<type> scalar used for multiplication. If beta is zero, y does not have to be a valid input.

y

<type> dense vector of m elements if \(\text{op}(A)=A\), and n elements if \(\text{op}(A)=A^{T}\).

idxBase

0 or 1, for 0 based or 1 based indexing, respectively.

pBufferSize

Number of elements needed the buffer used in cusparse<t>gemvi().

pBuffer

Working space buffer.

Output

y

<type> updated dense vector.

See cusparseStatus_t for the description of the return status.