cuSPARSE Level 3 Function Reference

This chapter describes sparse linear algebra functions that perform operations between sparse and (usually tall) dense matrices.

cusparse<t>bsrmm()

cusparseStatus_t
cusparseSbsrmm(cusparseHandle_t         handle,
               cusparseDirection_t      dirA,
               cusparseOperation_t      transA,
               cusparseOperation_t      transB,
               int                      mb,
               int                      n,
               int                      kb,
               int                      nnzb,
               const float*             alpha,
               const cusparseMatDescr_t descrA,
               const float*             bsrValA,
               const int*               bsrRowPtrA,
               const int*               bsrColIndA,
               int                      blockDim,
               const float*             B,
               int                      ldb,
               const float*             beta,
               float*                   C,
               int                      ldc)

cusparseStatus_t
cusparseDbsrmm(cusparseHandle_t         handle,
               cusparseDirection_t      dirA,
               cusparseOperation_t      transA,
               cusparseOperation_t      transB,
               int                      mb,
               int                      n,
               int                      kb,
               int                      nnzb,
               const double*            alpha,
               const cusparseMatDescr_t descrA,
               const double*            bsrValA,
               const int*               bsrRowPtrA,
               const int*               bsrColIndA,
               int                      blockDim,
               const double*            B,
               int                      ldb,
               const double*            beta,
               double*                  C,
               int                      ldc)

cusparseStatus_t
cusparseCbsrmm(cusparseHandle_t         handle,
               cusparseDirection_t      dirA,
               cusparseOperation_t      transA,
               cusparseOperation_t      transB,
               int                      mb,
               int                      n,
               int                      kb,
               int                      nnzb,
               const cuComplex*         alpha,
               const cusparseMatDescr_t descrA,
               const cuComplex*         bsrValA,
               const int*               bsrRowPtrA,
               const int*               bsrColIndA,
               int                      blockDim,
               const cuComplex*         B,
               int                      ldb,
               const cuComplex*         beta,
               cuComplex*               C,
               int                      ldc)

cusparseStatus_t
cusparseZbsrmm(cusparseHandle_t         handle,
               cusparseDirection_t      dirA,
               cusparseOperation_t      transA,
               cusparseOperation_t      transB,
               int                      mb,
               int                      n,
               int                      kb,
               int                      nnzb,
               const cuDoubleComplex*   alpha,
               const cusparseMatDescr_t descrA,
               const cuDoubleComplex*   bsrValA,
               const int*               bsrRowPtrA,
               const int*               bsrColIndA,
               int                      blockDim,
               const cuDoubleComplex*   B,
               int                      ldb,
               const cuDoubleComplex*   beta,
               cuDoubleComplex*         C,
               int                      ldc)

This function performs one of the following matrix-matrix operations:

\(C = \alpha \ast \text{op}(A) \ast \text{op}(B) + \beta \ast C\)

A is an mb×kb sparse matrix that is defined in BSR storage format by the three arrays bsrValA, bsrRowPtrA, and bsrColIndA; B and C are dense matrices; \(\alpha\text{~and~}\beta\) are scalars; and

\(\text{op}(A) = \begin{cases} A & \text{if transA == CUSPARSE_OPERATION_NON_TRANSPOSE} \\ A^{T} & \text{if transA == CUSPARSE_OPERATION_TRANSPOSE (not\ supported)} \\ A^{H} & \text{if transA == CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE (not supported)} \\ \end{cases}\)

and

\(\text{op}(B) = \begin{cases} B & \text{if transB == CUSPARSE_OPERATION_NON_TRANSPOSE} \\ B^{T} & \text{if transB == CUSPARSE_OPERATION_TRANSPOSE} \\ B^{H} & \text{if transB == CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE (not supported)} \\ \end{cases}\)

The function has the following limitations:

  • Only CUSPARSE_MATRIX_TYPE_GENERAL matrix type is supported

  • Only blockDim > 1 is supported

  • if blockDim ≤ 4, then max(mb)/max(n) = 524,272

  • if 4 < blockDim ≤ 8, then max(mb) = 524,272, max(n) = 262,136

  • if blockDim > 8, then m < 65,535 and max(n) = 262,136

The motivation of transpose(B) is to improve memory access of matrix B. The computational pattern of A*transpose(B) with matrix B in column-major order is equivalent to A*B with matrix B in row-major order.

In practice, no operation in an iterative solver or eigenvalue solver uses A*transpose(B). However, we can perform A*transpose(transpose(B)) which is the same as A*B. For example, suppose A is mb*kb, B is k*n and C is m*n, the following code shows usage of cusparseDbsrmm().

// A is mb*kb, B is k*n and C is m*n
    const int m = mb*blockSize;
    const int k = kb*blockSize;
    const int ldb_B = k; // leading dimension of B
    const int ldc   = m; // leading dimension of C
// perform C:=alpha*A*B + beta*C
    cusparseSetMatType(descrA, CUSPARSE_MATRIX_TYPE_GENERAL );
    cusparseDbsrmm(cusparse_handle,
               CUSPARSE_DIRECTION_COLUMN,
               CUSPARSE_OPERATION_NON_TRANSPOSE,
               CUSPARSE_OPERATION_NON_TRANSPOSE,
               mb, n, kb, nnzb, alpha,
               descrA, bsrValA, bsrRowPtrA, bsrColIndA, blockSize,
               B, ldb_B,
               beta, C, ldc);

Instead of using A*B, our proposal is to transpose B to Bt by first calling cublas<t>geam(), and then to perform A*transpose(Bt).

// step 1: Bt := transpose(B)
    const int m = mb*blockSize;
    const int k = kb*blockSize;
    double *Bt;
    const int ldb_Bt = n; // leading dimension of Bt
    cudaMalloc((void**)&Bt, sizeof(double)*ldb_Bt*k);
    double one  = 1.0;
    double zero = 0.0;
    cublasSetPointerMode(cublas_handle, CUBLAS_POINTER_MODE_HOST);
    cublasDgeam(cublas_handle, CUBLAS_OP_T, CUBLAS_OP_T,
        n, k, &one, B, int ldb_B, &zero, B, int ldb_B, Bt, ldb_Bt);

// step 2: perform C:=alpha*A*transpose(Bt) + beta*C
    cusparseDbsrmm(cusparse_handle,
               CUSPARSE_DIRECTION_COLUMN,
               CUSPARSE_OPERATION_NON_TRANSPOSE,
               CUSPARSE_OPERATION_TRANSPOSE,
               mb, n, kb, nnzb, alpha,
               descrA, bsrValA, bsrRowPtrA, bsrColIndA, blockSize,
               Bt, ldb_Bt,
               beta, C, ldc);

bsrmm() has the following properties:

  • The routine requires no extra storage

  • The routine supports asynchronous execution

  • The routine supports CUDA graph capture

Input

handle

handle to the cuSPARSE library context.

dir

storage format of blocks, either CUSPARSE_DIRECTION_ROW or CUSPARSE_DIRECTION_COLUMN.

transA

the operation op(A).

transB

the operation op(B).

mb

number of block rows of sparse matrix A.

n

number of columns of dense matrix op(B) and A.

kb

number of block columns of sparse matrix A.

nnzb

number of non-zero blocks of sparse matrix A.

alpha

<type> scalar used for multiplication.

descrA

the descriptor of matrix A. The supported matrix type is CUSPARSE_MATRIX_TYPE_GENERAL. Also, the supported index bases are CUSPARSE_INDEX_BASE_ZERO and CUSPARSE_INDEX_BASE_ONE.

bsrValA

<type> array of nnzb\(( =\)bsrRowPtrA(mb)\(-\)bsrRowPtrA(0)\()\) nonzero blocks of matrix A.

bsrRowPtrA

integer array of mb\(+ 1\) elements that contains the start of every block row and the end of the last block row plus one.

bsrColIndA

integer array of nnzb\(( =\)bsrRowPtrA(mb)\(-\)bsrRowPtrA(0)\()\) column indices of the nonzero blocks of matrix A.

blockDim

block dimension of sparse matrix A, larger than zero.

B

array of dimensions (ldb, n) if op(B)=B and (ldb,                                           k) otherwise.

ldb

leading dimension of B. If op(B)=B, it must be at least \(\max\text{(1,\ k)}\) If op(B) != B, it must be at least max(1,                                           n).

beta

<type> scalar used for multiplication. If beta is zero, C does not have to be a valid input.

C

array of dimensions (ldc, n).

ldc

leading dimension of C. It must be at least \(\max\text{(1,\ m)}\) if op(A)=A and at least \(\max\text{(1,\ k)}\) otherwise.

Output

C

<type> updated array of dimensions (ldc, n).

See cusparseStatus_t for the description of the return status

cusparse<t>bsrsm2_bufferSize() [DEPRECATED]

> The routine will be removed in the next major release

cusparseStatus_t
cusparseSbsrsm2_bufferSize(cusparseHandle_t         handle,
                           cusparseDirection_t      dirA,
                           cusparseOperation_t      transA,
                           cusparseOperation_t      transX,
                           int                      mb,
                           int                      n,
                           int                      nnzb,
                           const cusparseMatDescr_t descrA,
                           float*                   bsrSortedValA,
                           const int*               bsrSortedRowPtrA,
                           const int*               bsrSortedColIndA,
                           int                      blockDim,
                           bsrsm2Info_t             info,
                           int*                     pBufferSizeInBytes)

cusparseStatus_t
cusparseDbsrsm2_bufferSize(cusparseHandle_t         handle,
                           cusparseDirection_t      dirA,
                           cusparseOperation_t      transA,
                           cusparseOperation_t      transX,
                           int                      mb,
                           int                      n,
                           int                      nnzb,
                           const cusparseMatDescr_t descrA,
                           double*                  bsrSortedValA,
                           const int*               bsrSortedRowPtrA,
                           const int*               bsrSortedColIndA,
                           int                      blockDim,
                           bsrsm2Info_t             info,
                           int*                     pBufferSizeInBytes)

cusparseStatus_t
cusparseCbsrsm2_bufferSize(cusparseHandle_t         handle,
                           cusparseDirection_t      dirA,
                           cusparseOperation_t      transA,
                           cusparseOperation_t      transX,
                           int                      mb,
                           int                      n,
                           int                      nnzb,
                           const cusparseMatDescr_t descrA,
                           cuComplex*               bsrSortedValA,
                           const int*               bsrSortedRowPtrA,
                           const int*               bsrSortedColIndA,
                           int                      blockDim,
                           bsrsm2Info_t             info,
                           int*                     pBufferSizeInBytes)

cusparseStatus_t
cusparseZbsrsm2_bufferSize(cusparseHandle_t         handle,
                           cusparseDirection_t      dirA,
                           cusparseOperation_t      transA,
                           cusparseOperation_t      transX,
                           int                      mb,
                           int                      n,
                           int                      nnzb,
                           const cusparseMatDescr_t descrA,
                           cuDoubleComplex*         bsrSortedValA,
                           const int*               bsrSortedRowPtrA,
                           const int*               bsrSortedColIndA,
                           int                      blockDim,
                           bsrsm2Info_t             info,
                           int*                     pBufferSizeInBytes)

This function returns size of buffer used in bsrsm2(), a new sparse triangular linear system op(A)*op(X)=\(\alpha\)op(B).

A is an (mb*blockDim)x(mb*blockDim) sparse matrix that is defined in BSR storage format by the three arrays bsrValA, bsrRowPtrA, and bsrColIndA); B and X are the right-hand-side and the solution matrices; \(\alpha\) is a scalar; and

\(\text{op}(A) == \text{CUSPARSE_OPERATION_NON_TRANSPOSE}\)

Although there are six combinations in terms of parameter trans and the upper (and lower) triangular part of A, bsrsm2_bufferSize() returns the maximum size of the buffer among these combinations. The buffer size depends on dimension mb,blockDim and the number of nonzeros of the matrix, nnzb. If the user changes the matrix, it is necessary to call bsrsm2_bufferSize() again to get the correct buffer size, otherwise a segmentation fault may occur.

  • The routine requires no extra storage

  • The routine supports asynchronous execution

  • The routine supports CUDA graph capture

Input

handle

handle to the cuSPARSE library context.

dirA

storage format of blocks, either CUSPARSE_DIRECTION_ROW or CUSPARSE_DIRECTION_COLUMN.

transA

the operation op(A).

transX

the operation op(X).

mb

number of block rows of matrix A.

n

number of columns of matrix op(B) and op(X).

nnzb

number of nonzero blocks of matrix A.

descrA

the descriptor of matrix A. The supported matrix type is CUSPARSE_MATRIX_TYPE_GENERAL, while the supported diagonal types are CUSPARSE_DIAG_TYPE_UNIT and CUSPARSE_DIAG_TYPE_NON_UNIT.

bsrValA

<type> array of nnzb\(( =\)bsrRowPtrA(mb)\(-\)bsrRowPtrA(0)\()\) nonzero blocks of matrix A.

bsrRowPtrA

integer array of mb\(+ 1\) elements that contains the start of every block row and the end of the last block row plus one.

bsrColIndA

integer array of nnzb\(( =\)bsrRowPtrA(mb)\(-\)bsrRowPtrA(0)\()\) column indices of the nonzero blocks of matrix A.

blockDim

block dimension of sparse matrix A; larger than zero.

Output

info

record internal states based on different algorithms.

pBufferSizeInBytes

number of bytes of the buffer used in bsrsm2_analysis() and bsrsm2_solve().

See cusparseStatus_t for the description of the return status

cusparse<t>bsrsm2_analysis() [DEPRECATED]

> The routine will be removed in the next major release

cusparseStatus_t
cusparseSbsrsm2_analysis(cusparseHandle_t         handle,
                         cusparseDirection_t      dirA,
                         cusparseOperation_t      transA,
                         cusparseOperation_t      transX,
                         int                      mb,
                         int                      n,
                         int                      nnzb,
                         const cusparseMatDescr_t descrA,
                         const float*             bsrSortedVal,
                         const int*               bsrSortedRowPtr,
                         const int*               bsrSortedColInd,
                         int                      blockDim,
                         bsrsm2Info_t             info,
                         cusparseSolvePolicy_t    policy,
                         void*                    pBuffer)

cusparseStatus_t
cusparseDbsrsm2_analysis(cusparseHandle_t         handle,
                         cusparseDirection_t      dirA,
                         cusparseOperation_t      transA,
                         cusparseOperation_t      transX,
                         int                      mb,
                         int                      n,
                         int                      nnzb,
                         const cusparseMatDescr_t descrA,
                         const double*            bsrSortedVal,
                         const int*               bsrSortedRowPtr,
                         const int*               bsrSortedColInd,
                         int                      blockDim,
                         bsrsm2Info_t             info,
                         cusparseSolvePolicy_t    policy,
                         void*                    pBuffer)

cusparseStatus_t
cusparseCbsrsm2_analysis(cusparseHandle_t         handle,
                         cusparseDirection_t      dirA,
                         cusparseOperation_t      transA,
                         cusparseOperation_t      transX,
                         int                      mb,
                         int                      n,
                         int                      nnzb,
                         const cusparseMatDescr_t descrA,
                         const cuComplex*         bsrSortedVal,
                         const int*               bsrSortedRowPtr,
                         const int*               bsrSortedColInd,
                         int                      blockDim,
                         bsrsm2Info_t             info,
                         cusparseSolvePolicy_t    policy,
                         void*                    pBuffer)

cusparseStatus_t
cusparseZbsrsm2_analysis(cusparseHandle_t         handle,
                         cusparseDirection_t      dirA,
                         cusparseOperation_t      transA,
                         cusparseOperation_t      transX,
                         int                      mb,
                         int                      n,
                         int                      nnzb,
                         const cusparseMatDescr_t descrA,
                         const cuDoubleComplex*   bsrSortedVal,
                         const int*               bsrSortedRowPtr,
                         const int*               bsrSortedColInd,
                         int                      blockDim,
                         bsrsm2Info_t             info,
                         cusparseSolvePolicy_t    policy,
                         void*                    pBuffer)

This function performs the analysis phase of bsrsm2(), a new sparse triangular linear system op(A)*op(X) =\(\alpha\)op(B).

A is an (mb*blockDim)x(mb*blockDim) sparse matrix that is defined in BSR storage format by the three arrays bsrValA, bsrRowPtrA, and bsrColIndA); B and X are the right-hand-side and the solution matrices; \(\alpha\) is a scalar; and

\(\text{op}(A) == \text{CUSPARSE_OPERATION_NON_TRANSPOSE}\)

and

\(\text{op}(X) = \begin{cases} X & \text{if transX == CUSPARSE_OPERATION_NON_TRANSPOSE} \\ X^{T} & \text{if transX == CUSPARSE_OPERATION_TRANSPOSE} \\ X^{H} & \text{if transX == CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE (not supported)} \\ \end{cases}\)

and op(B) and op(X) are equal.

The block of BSR format is of size blockDim*blockDim, stored in column-major or row-major as determined by parameter dirA, which is either CUSPARSE_DIRECTION_ROW or CUSPARSE_DIRECTION_COLUMN. The matrix type must be CUSPARSE_MATRIX_TYPE_GENERAL, and the fill mode and diagonal type are ignored.

It is expected that this function will be executed only once for a given matrix and a particular operation type.

This function requires the buffer size returned by bsrsm2_bufferSize(). The address of pBuffer must be multiple of 128 bytes. If not, CUSPARSE_STATUS_INVALID_VALUE is returned.

Function bsrsm2_analysis() reports a structural zero and computes the level information stored in opaque structure info. The level information can extract more parallelism during a triangular solver. However bsrsm2_solve() can be done without level information. To disable level information, the user needs to specify the policy of the triangular solver as CUSPARSE_SOLVE_POLICY_NO_LEVEL.

Function bsrsm2_analysis() always reports the first structural zero, even if the parameter policy is CUSPARSE_SOLVE_POLICY_NO_LEVEL. Besides, no structural zero is reported if CUSPARSE_DIAG_TYPE_UNIT is specified, even if block A(j,j) is missing for some j. The user must call cusparseXbsrsm2_query_zero_pivot() to know where the structural zero is.

If bsrsm2_analysis() reports a structural zero, the solve will return a numerical zero in the same position as the structural zero but this result X is meaningless.

  • This function requires temporary extra storage that is allocated internally

  • The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available

  • The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available

Input

handle

handle to the cuSPARSE library context.

dirA

storage format of blocks, either CUSPARSE_DIRECTION_ROW or CUSPARSE_DIRECTION_COLUMN.

transA

the operation op(A).

transX

the operation op(B) and op(X).

mb

number of block rows of matrix A.

n

number of columns of matrix op(B) and op(X).

nnzb

number of non-zero blocks of matrix A.

descrA

the descriptor of matrix A. The supported matrix type is CUSPARSE_MATRIX_TYPE_GENERAL, while the supported diagonal types are CUSPARSE_DIAG_TYPE_UNIT and CUSPARSE_DIAG_TYPE_NON_UNIT.

bsrValA

<type> array of nnzb\(( =\)bsrRowPtrA(mb)\(-\)bsrRowPtrA(0)\()\) nonzero blocks of matrix A.

bsrRowPtrA

integer array of mb\(+ 1\) elements that contains the start of every block row and the end of the last block row plus one.

bsrColIndA

integer array of nnzb\(( =\)bsrRowPtrA(mb)\(-\)bsrRowPtrA(0)\()\) column indices of the nonzero blocks of matrix A.

blockDim

block dimension of sparse matrix A; larger than zero.

info

structure initialized using cusparseCreateBsrsm2Info.

policy

The supported policies are CUSPARSE_SOLVE_POLICY_NO_LEVEL and CUSPARSE_SOLVE_POLICY_USE_LEVEL.

pBuffer

buffer allocated by the user; the size is return by bsrsm2_bufferSize().

Output

info

structure filled with information collected during the analysis phase (that should be passed to the solve phase unchanged).

See cusparseStatus_t for the description of the return status

cusparse<t>bsrsm2_solve() [DEPRECATED]

> The routine will be removed in the next major release

cusparseStatus_t
cusparseSbsrsm2_solve(cusparseHandle_t         handle,
                      cusparseDirection_t      dirA,
                      cusparseOperation_t      transA,
                      cusparseOperation_t      transX,
                      int                      mb,
                      int                      n,
                      int                      nnzb,
                      const float*             alpha,
                      const cusparseMatDescr_t descrA,
                      const float*             bsrSortedVal,
                      const int*               bsrSortedRowPtr,
                      const int*               bsrSortedColInd,
                      int                      blockDim,
                      bsrsm2Info_t             info,
                      const float*             B,
                      int                      ldb,
                      float*                   X,
                      int                      ldx,
                      cusparseSolvePolicy_t    policy,
                      void*                    pBuffer)

cusparseStatus_t
cusparseDbsrsm2_solve(cusparseHandle_t         handle,
                      cusparseDirection_t      dirA,
                      cusparseOperation_t      transA,
                      cusparseOperation_t      transX,
                      int                      mb,
                      int                      n,
                      int                      nnzb,
                      const double*            alpha,
                      const cusparseMatDescr_t descrA,
                      const double*            bsrSortedVal,
                      const int*               bsrSortedRowPtr,
                      const int*               bsrSortedColInd,
                      int                      blockDim,
                      bsrsm2Info_t             info,
                      const double*            B,
                      int                      ldb,
                      double*                  X,
                      int                      ldx,
                      cusparseSolvePolicy_t    policy,
                      void*                    pBuffer)

cusparseStatus_t
cusparseCbsrsm2_solve(cusparseHandle_t         handle,
                      cusparseDirection_t      dirA,
                      cusparseOperation_t      transA,
                      cusparseOperation_t      transX,
                      int                      mb,
                      int                      n,
                      int                      nnzb,
                      const cuComplex*         alpha,
                      const cusparseMatDescr_t descrA,
                      const cuComplex*         bsrSortedVal,
                      const int*               bsrSortedRowPtr,
                      const int*               bsrSortedColInd,
                      int                      blockDim,
                      bsrsm2Info_t             info,
                      const cuComplex*         B,
                      int                      ldb,
                      cuComplex*               X,
                      int                      ldx,
                      cusparseSolvePolicy_t    policy,
                      void*                    pBuffer)

cusparseStatus_t
cusparseZbsrsm2_solve(cusparseHandle_t         handle,
                      cusparseDirection_t      dirA,
                      cusparseOperation_t      transA,
                      cusparseOperation_t      transX,
                      int                      mb,
                      int                      n,
                      int                      nnzb,
                      const cuDoubleComplex*   alpha,
                      const cusparseMatDescr_t descrA,
                      const cuDoubleComplex*   bsrSortedVal,
                      const int*               bsrSortedRowPtr,
                      const int*               bsrSortedColInd,
                      int                      blockDim,
                      bsrsm2Info_t             info,
                      const cuDoubleComplex*   B,
                      int                      ldb,
                      cuDoubleComplex*         X,
                      int                      ldx,
                      cusparseSolvePolicy_t    policy,
                      void*                    pBuffer)

This function performs the solve phase of the solution of a sparse triangular linear system:

\(\text{op}(A) \ast \text{op(X)} = \alpha \ast \text{op(B)}\)

A is an (mb*blockDim)x(mb*blockDim) sparse matrix that is defined in BSR storage format by the three arrays bsrValA, bsrRowPtrA, and bsrColIndA); B and X are the right-hand-side and the solution matrices; \(\alpha\) is a scalar, and

\(\text{op}(A) == \text{CUSPARSE_OPERATION_NON_TRANSPOSE}\)

and

\(\text{op}(X) = \begin{cases} X & \text{if transX == CUSPARSE_OPERATION_NON_TRANSPOSE} \\ X^{T} & \text{if transX == CUSPARSE_OPERATION_TRANSPOSE} \\ X^{H} & \text{not supported} \\ \end{cases}\)

Only op(A)=A is supported.

op(B) and op(X) must be performed in the same way. In other words, if op(B)=B, op(X)=X.

The block of BSR format is of size blockDim*blockDim, stored as column-major or row-major as determined by parameter dirA, which is either CUSPARSE_DIRECTION_ROW or CUSPARSE_DIRECTION_COLUMN. The matrix type must be CUSPARSE_MATRIX_TYPE_GENERAL, and the fill mode and diagonal type are ignored. Function bsrsm02_solve() can support an arbitrary blockDim.

This function may be executed multiple times for a given matrix and a particular operation type.

This function requires the buffer size returned by bsrsm2_bufferSize(). The address of pBuffer must be multiple of 128 bytes. If it is not, CUSPARSE_STATUS_INVALID_VALUE is returned.

Although bsrsm2_solve() can be done without level information, the user still needs to be aware of consistency. If bsrsm2_analysis() is called with policy CUSPARSE_SOLVE_POLICY_USE_LEVEL, bsrsm2_solve() can be run with or without levels. On the other hand, if bsrsm2_analysis() is called with CUSPARSE_SOLVE_POLICY_NO_LEVEL, bsrsm2_solve() can only accept CUSPARSE_SOLVE_POLICY_NO_LEVEL; otherwise, CUSPARSE_STATUS_INVALID_VALUE is returned.

Function bsrsm02_solve() has the same behavior as bsrsv02_solve(), reporting the first numerical zero, including a structural zero. The user must call cusparseXbsrsm2_query_zero_pivot() to know where the numerical zero is.

The motivation of transpose(X) is to improve the memory access of matrix X. The computational pattern of transpose(X) with matrix X in column-major order is equivalent to X with matrix X in row-major order.

In-place is supported and requires that B and X point to the same memory block, and ldb=ldx.

The function supports the following properties if pBuffer != NULL:

  • The routine requires no extra storage

  • The routine supports asynchronous execution

  • The routine supports CUDA graph capture

Input

handle

handle to the cuSPARSE library context.

dirA

storage format of blocks, either CUSPARSE_DIRECTION_ROW or CUSPARSE_DIRECTION_COLUMN.

transA

the operation op(A).

transX

the operation op(B) and op(X).

mb

number of block rows of matrix A.

n

number of columns of matrix op(B) and op(X).

nnzb

number of non-zero blocks of matrix A.

alpha

<type> scalar used for multiplication.

descrA

the descriptor of matrix A. The supported matrix type is CUSPARSE_MATRIX_TYPE_GENERAL, while the supported diagonal types are CUSPARSE_DIAG_TYPE_UNIT and CUSPARSE_DIAG_TYPE_NON_UNIT.

bsrValA

<type> array of nnzb\(( =\)bsrRowPtrA(mb)\(-\)bsrRowPtrA(0)\()\) non-zero blocks of matrix A.

bsrRowPtrA

integer array of mb\(+ 1\) elements that contains the start of every block row and the end of the last block row plus one.

bsrColIndA

integer array of nnzb\(( =\)bsrRowPtrA(mb)\(-\)bsrRowPtrA(0)\()\) column indices of the nonzero blocks of matrix A.

blockDim

block dimension of sparse matrix A; larger than zero.

info

structure initialized using cusparseCreateBsrsm2Info().

B

<type> right-hand-side array.

ldb

leading dimension of B. If op(B)=B, ldb >= (mb*blockDim); otherwise, ldb >= n.

ldx

leading dimension of X. If op(X)=X, then ldx >= (mb*blockDim). otherwise ldx >= n.

policy

the supported policies are CUSPARSE_SOLVE_POLICY_NO_LEVEL and CUSPARSE_SOLVE_POLICY_USE_LEVEL.

pBuffer

buffer allocated by the user; the size is returned by bsrsm2_bufferSize().

Output

X

<type> solution array with leading dimensions ldx.

See cusparseStatus_t for the description of the return status.

cusparseXbsrsm2_zeroPivot() [DEPRECATED]

> The routine will be removed in the next major release

cusparseStatus_t
cusparseXbsrsm2_zeroPivot(cusparseHandle_t handle,
                          bsrsm2Info_t     info,
                          int*             position)

If the returned error code is CUSPARSE_STATUS_ZERO_PIVOT, position=j means A(j,j) is either a structural zero or a numerical zero (singular block). Otherwise position=-1.

The position can be 0-base or 1-base, the same as the matrix.

Function cusparseXbsrsm2_zeroPivot() is a blocking call. It calls cudaDeviceSynchronize() to make sure all previous kernels are done.

The position can be in the host memory or device memory. The user can set the proper mode with cusparseSetPointerMode().

  • The routine requires no extra storage

  • The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available

  • The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available

Input

handle

handle to the cuSPARSE library context.

info

info contains a structural zero or a numerical zero if the user already called bsrsm2_analysis() or bsrsm2_solve().

Output

position

if no structural or numerical zero, position is -1; otherwise, if A(j,j) is missing or U(j,j) is zero, position=j.

See cusparseStatus_t for the description of the return status.