cuSPARSE Level 2 Function Reference
This chapter describes the sparse linear algebra functions that perform operations between sparse matrices and dense vectors.
cusparse<t>bsrmv()
cusparseStatus_t
cusparseSbsrmv(cusparseHandle_t handle,
cusparseDirection_t dir,
cusparseOperation_t trans,
int mb,
int nb,
int nnzb,
const float* alpha,
const cusparseMatDescr_t descr,
const float* bsrVal,
const int* bsrRowPtr,
const int* bsrColInd,
int blockDim,
const float* x,
const float* beta,
float* y)
cusparseStatus_t
cusparseDbsrmv(cusparseHandle_t handle,
cusparseDirection_t dir,
cusparseOperation_t trans,
int mb,
int nb,
int nnzb,
const double* alpha,
const cusparseMatDescr_t descr,
const double* bsrVal,
const int* bsrRowPtr,
const int* bsrColInd,
int blockDim,
const double* x,
const double* beta,
double* y)
cusparseStatus_t
cusparseCbsrmv(cusparseHandle_t handle,
cusparseDirection_t dir,
cusparseOperation_t trans,
int mb,
int nb,
int nnzb,
const cuComplex* alpha,
const cusparseMatDescr_t descr,
const cuComplex* bsrVal,
const int* bsrRowPtr,
const int* bsrColInd,
int blockDim,
const cuComplex* x,
const cuComplex* beta,
cuComplex* y)
cusparseStatus_t
cusparseZbsrmv(cusparseHandle_t handle,
cusparseDirection_t dir,
cusparseOperation_t trans,
int mb,
int nb,
int nnzb,
const cuDoubleComplex* alpha,
const cusparseMatDescr_t descr,
const cuDoubleComplex* bsrVal,
const int* bsrRowPtr,
const int* bsrColInd,
int blockDim,
const cuDoubleComplex* x,
const cuDoubleComplex* beta,
cuDoubleComplex* y)
This function performs the matrix-vector operation
\(\text{y} = \alpha \ast \text{op}(A) \ast \text{x} + \beta \ast \text{y}\) |
where \(A\text{ is an }(mb \ast blockDim) \times (nb \ast blockDim)\) sparse matrix that is defined in BSR storage format by the three arrays bsrVal
, bsrRowPtr
, and bsrColInd
); x
and y
are vectors; \(\alpha\text{ and }\beta\) are scalars; and
\(\text{op}(A) = \begin{cases} A & \text{if trans == CUSPARSE_OPERATION_NON_TRANSPOSE} \\ A^{T} & \text{if trans == CUSPARSE_OPERATION_TRANSPOSE} \\ A^{H} & \text{if trans == CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE} \\ \end{cases}\)
bsrmv()
has the following properties:
The routine requires no extra storage
The routine supports asynchronous execution
The routine supports CUDA graph capture
Several comments on bsrmv()
:
Only
blockDim > 1
is supportedOnly
CUSPARSE_OPERATION_NON_TRANSPOSE
is supported, that is
\(\text{y} = \alpha \ast A \ast \text{x} + \beta{} \ast \text{y}\) |
Only
CUSPARSE_MATRIX_TYPE_GENERAL
is supported.The size of vector
x
should be \((nb \ast blockDim)\) at least, and the size of vectory
should be \((mb \ast blockDim)\) at least; otherwise, the kernel may returnCUSPARSE_STATUS_EXECUTION_FAILED
because of an out-of-bounds array.
For example, suppose the user has a CSR format and wants to try bsrmv()
, the following code demonstrates how to use csr2bsr()
conversion and bsrmv()
multiplication in single precision.
// Suppose that A is m x n sparse matrix represented by CSR format,
// hx is a host vector of size n, and hy is also a host vector of size m.
// m and n are not multiple of blockDim.
// step 1: transform CSR to BSR with column-major order
int base, nnz;
int nnzb;
cusparseDirection_t dirA = CUSPARSE_DIRECTION_COLUMN;
int mb = (m + blockDim-1)/blockDim;
int nb = (n + blockDim-1)/blockDim;
cudaMalloc((void**)&bsrRowPtrC, sizeof(int) *(mb+1));
cusparseXcsr2bsrNnz(handle, dirA, m, n,
descrA, csrRowPtrA, csrColIndA, blockDim,
descrC, bsrRowPtrC, &nnzb);
cudaMalloc((void**)&bsrColIndC, sizeof(int)*nnzb);
cudaMalloc((void**)&bsrValC, sizeof(float)*(blockDim*blockDim)*nnzb);
cusparseScsr2bsr(handle, dirA, m, n,
descrA, csrValA, csrRowPtrA, csrColIndA, blockDim,
descrC, bsrValC, bsrRowPtrC, bsrColIndC);
// step 2: allocate vector x and vector y large enough for bsrmv
cudaMalloc((void**)&x, sizeof(float)*(nb*blockDim));
cudaMalloc((void**)&y, sizeof(float)*(mb*blockDim));
cudaMemcpy(x, hx, sizeof(float)*n, cudaMemcpyHostToDevice);
cudaMemcpy(y, hy, sizeof(float)*m, cudaMemcpyHostToDevice);
// step 3: perform bsrmv
cusparseSbsrmv(handle, dirA, transA, mb, nb, nnzb, &alpha,
descrC, bsrValC, bsrRowPtrC, bsrColIndC, blockDim, x, &beta, y);
Input
|
handle to the cuSPARSE library context. |
|
storage format of blocks, either |
|
the operation \(\text{op}(A)\) . Only |
|
number of block rows of matrix \(A\). |
|
number of block columns of matrix \(A\). |
|
number of nonzero blocks of matrix \(A\). |
|
<type> scalar used for multiplication. |
|
the descriptor of matrix \(A\). The supported matrix type is |
|
<type> array of |
|
integer array of |
|
integer array of |
|
block dimension of sparse matrix \(A\), larger than zero. |
|
<type> vector of \(nb \ast blockDim\) elements. |
|
<type> scalar used for multiplication. If |
|
<type> vector of \(mb \ast blockDim\) elements. |
Output
|
<type> updated vector. |
See cusparseStatus_t for the description of the return status.
cusparse<t>bsrxmv() [DEPRECATED]
> The routine will be removed in the next major release
cusparseStatus_t
cusparseSbsrxmv(cusparseHandle_t handle,
cusparseDirection_t dir,
cusparseOperation_t trans,
int sizeOfMask,
int mb,
int nb,
int nnzb,
const float* alpha,
const cusparseMatDescr_t descr,
const float* bsrVal,
const int* bsrMaskPtr,
const int* bsrRowPtr,
const int* bsrEndPtr,
const int* bsrColInd,
int blockDim,
const float* x,
const float* beta,
float* y)
cusparseStatus_t
cusparseDbsrxmv(cusparseHandle_t handle,
cusparseDirection_t dir,
cusparseOperation_t trans,
int sizeOfMask,
int mb,
int nb,
int nnzb,
const double* alpha,
const cusparseMatDescr_t descr,
const double* bsrVal,
const int* bsrMaskPtr,
const int* bsrRowPtr,
const int* bsrEndPtr,
const int* bsrColInd,
int blockDim,
const double* x,
const double* beta,
double* y)
cusparseStatus_t
cusparseCbsrxmv(cusparseHandle_t handle,
cusparseDirection_t dir,
cusparseOperation_t trans,
int sizeOfMask,
int mb,
int nb,
int nnzb,
const cuComplex* alpha,
const cusparseMatDescr_t descr,
const cuComplex* bsrVal,
const int* bsrMaskPtr,
const int* bsrRowPtr,
const int* bsrEndPtr,
const int* bsrColInd,
int blockDim,
const cuComplex* x,
const cuComplex* beta,
cuComplex* y)
cusparseStatus_t
cusparseZbsrxmv(cusparseHandle_t handle,
cusparseDirection_t dir,
cusparseOperation_t trans,
int sizeOfMask,
int mb,
int nb,
int nnzb,
const cuDoubleComplex* alpha,
const cusparseMatDescr_t descr,
const cuDoubleComplex* bsrVal,
const int* bsrMaskPtr,
const int* bsrRowPtr,
const int* bsrEndPtr,
const int* bsrColInd,
int blockDim,
const cuDoubleComplex* x,
const cuDoubleComplex* beta,
cuDoubleComplex* y)
This function performs a bsrmv
and a mask operation
\(\text{y(mask)} = (\alpha \ast \text{op}(A) \ast \text{x} + \beta \ast \text{y})\text{(mask)}\) |
where \(A\text{ is an }(mb \ast blockDim) \times (nb \ast blockDim)\) sparse matrix that is defined in BSRX storage format by the four arrays bsrVal
, bsrRowPtr
, bsrEndPtr
, and bsrColInd
); x
and y
are vectors; \(\alpha\text{~and~}\beta\) are scalars; and
\(\text{op}(A) = \begin{cases} A & \text{if trans == CUSPARSE_OPERATION_NON_TRANSPOSE} \\ A^{T} & \text{if trans == CUSPARSE_OPERATION_TRANSPOSE} \\ A^{H} & \text{if trans == CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE} \\ \end{cases}\)
The mask operation is defined by array bsrMaskPtr
which contains updated block row indices of \(y\) . If row \(i\) is not specified in bsrMaskPtr
, then bsrxmv()
does not touch row block \(i\) of \(A\) and \(y\) .
For example, consider the \(2 \times 3\) block matrix \(A\):
\(\begin{matrix} {A = \begin{bmatrix} A_{11} & A_{12} & O \\ A_{21} & A_{22} & A_{23} \\ \end{bmatrix}} \\ \end{matrix}\) |
and its one-based BSR format (three vector form) is
\(\begin{matrix} \text{bsrVal} & = & \begin{bmatrix} A_{11} & A_{12} & A_{21} & A_{22} & A_{23} \\ \end{bmatrix} \\ \text{bsrRowPtr} & = & \begin{bmatrix} {1\phantom{.0}} & {3\phantom{.0}} & 6 \\ \end{bmatrix} \\ \text{bsrColInd} & = & \begin{bmatrix} {1\phantom{.0}} & {2\phantom{.0}} & {1\phantom{.0}} & {2\phantom{.0}} & 3 \\ \end{bmatrix} \\ \end{matrix}\) |
Suppose we want to do the following bsrmv
operation on a matrix \(\overset{¯}{A}\) which is slightly different from \(A\) .
\(\begin{bmatrix} y_{1} \\ y_{2} \\ \end{bmatrix}:=alpha \ast (\widetilde{A} = \begin{bmatrix} O & O & O \\ O & A_{22} & O \\ \end{bmatrix}) \ast \begin{bmatrix} x_{1} \\ x_{2} \\ x_{3} \\ \end{bmatrix} + \begin{bmatrix} y_{1} \\ {beta \ast y_{2}} \\ \end{bmatrix}\) |
We don’t need to create another BSR format for the new matrix \(\overset{¯}{A}\) , all that we should do is to keep bsrVal
and bsrColInd
unchanged, but modify bsrRowPtr
and add an additional array bsrEndPtr
which points to the last nonzero elements per row of \(\overset{¯}{A}\) plus 1.
For example, the following bsrRowPtr
and bsrEndPtr
can represent matrix \(\overset{¯}{A}\) :
\(\begin{matrix} \text{bsrRowPtr} & = & \begin{bmatrix} {1\phantom{.0}} & 4 \\ \end{bmatrix} \\ \text{bsrEndPtr} & = & \begin{bmatrix} {1\phantom{.0}} & 5 \\ \end{bmatrix} \\ \end{matrix}\) |
Further we can use a mask operator (specified by array bsrMaskPtr
) to update particular block row indices of \(y\) only because \(y_{1}\) is never changed. In this case, bsrMaskPtr
\(=\) [2] and sizeOfMask
=1.
The mask operator is equivalent to the following operation:
\(\begin{bmatrix} ? \\ y_{2} \\ \end{bmatrix}:=alpha \ast \begin{bmatrix} ? & ? & ? \\ O & A_{22} & O \\ \end{bmatrix} \ast \begin{bmatrix} x_{1} \\ x_{2} \\ x_{3} \\ \end{bmatrix} + beta \ast \begin{bmatrix} ? \\ y_{2} \\ \end{bmatrix}\) |
If a block row is not present in the bsrMaskPtr
, then no calculation is performed on that row, and the corresponding value in y
is unmodified. The question mark “?” is used to inidcate row blocks not in bsrMaskPtr
.
In this case, first row block is not present in bsrMaskPtr
, so bsrRowPtr[0]
and bsrEndPtr[0]
are not touched also.
\(\begin{matrix} \text{bsrRowPtr} & = & \begin{bmatrix} {?\phantom{.0}} & 4 \\ \end{bmatrix} \\ \text{bsrEndPtr} & = & \begin{bmatrix} {?\phantom{.0}} & 5 \\ \end{bmatrix} \\ \end{matrix}\) |
bsrxmv()
has the following properties:
The routine requires no extra storage
The routine supports asynchronous execution
The routine supports CUDA graph capture
A couple of comments on bsrxmv()
:
Only
blockDim > 1
is supportedOnly
CUSPARSE_OPERATION_NON_TRANSPOSE
andCUSPARSE_MATRIX_TYPE_GENERAL
are supported.Parameters
bsrMaskPtr
,bsrRowPtr
,bsrEndPtr
andbsrColInd
are consistent with base index, either one-based or zero-based. The above example is one-based.
Input
|
handle to the cuSPARSE library context. |
|
storage format of blocks, either |
|
the operation \(\text{op}(A)\) . Only |
|
number of updated block rows of \(y\). |
|
number of block rows of matrix \(A\). |
|
number of block columns of matrix \(A\). |
|
number of nonzero blocks of matrix \(A\). |
|
<type> scalar used for multiplication. |
|
the descriptor of matrix \(A\). The supported matrix type is |
|
<type> array of |
|
integer array of |
|
integer array of |
|
integer array of |
|
integer array of |
|
block dimension of sparse matrix \(A\), larger than zero. |
|
<type> vector of \(nb \ast blockDim\) elements. |
|
<type> scalar used for multiplication. If |
|
<type> vector of \(mb \ast blockDim\) elements. |
See cusparseStatus_t for the description of the return status.
cusparse<t>bsrsv2_bufferSize() [DEPRECATED]
> The routine will be removed in the next major release
cusparseStatus_t
cusparseSbsrsv2_bufferSize(cusparseHandle_t handle,
cusparseDirection_t dirA,
cusparseOperation_t transA,
int mb,
int nnzb,
const cusparseMatDescr_t descrA,
float* bsrValA,
const int* bsrRowPtrA,
const int* bsrColIndA,
int blockDim,
bsrsv2Info_t info,
int* pBufferSizeInBytes)
cusparseStatus_t
cusparseDbsrsv2_bufferSize(cusparseHandle_t handle,
cusparseDirection_t dirA,
cusparseOperation_t transA,
int mb,
int nnzb,
const cusparseMatDescr_t descrA,
double* bsrValA,
const int* bsrRowPtrA,
const int* bsrColIndA,
int blockDim,
bsrsv2Info_t info,
int* pBufferSizeInBytes)
cusparseStatus_t
cusparseCbsrsv2_bufferSize(cusparseHandle_t handle,
cusparseDirection_t dirA,
cusparseOperation_t transA,
int mb,
int nnzb,
const cusparseMatDescr_t descrA,
cuComplex* bsrValA,
const int* bsrRowPtrA,
const int* bsrColIndA,
int blockDim,
bsrsv2Info_t info,
int* pBufferSizeInBytes)
cusparseStatus_t
cusparseZbsrsv2_bufferSize(cusparseHandle_t handle,
cusparseDirection_t dirA,
cusparseOperation_t transA,
int mb,
int nnzb,
const cusparseMatDescr_t descrA,
cuDoubleComplex* bsrValA,
const int* bsrRowPtrA,
const int* bsrColIndA,
int blockDim,
bsrsv2Info_t info,
int* pBufferSizeInBytes)
This function returns size of the buffer used in bsrsv2
, a new sparse triangular linear system op(A)*y =
\(\alpha\)x
.
A
is an (mb*blockDim)x(mb*blockDim)
sparse matrix that is defined in BSR storage format by the three arrays bsrValA
, bsrRowPtrA
, and bsrColIndA
); x
and y
are the right-hand-side and the solution vectors; \(\alpha\) is a scalar; and
\(\text{op}(A) = \begin{cases} A & \text{if trans == CUSPARSE_OPERATION_NON_TRANSPOSE} \\ A^{T} & \text{if trans == CUSPARSE_OPERATION_TRANSPOSE} \\ A^{H} & \text{if trans == CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE} \\ \end{cases}\)
Although there are six combinations in terms of parameter trans
and the upper (lower) triangular part of A
, bsrsv2_bufferSize()
returns the maximum size buffer among these combinations. The buffer size depends on the dimensions mb
, blockDim
, and the number of nonzero blocks of the matrix nnzb
. If the user changes the matrix, it is necessary to call bsrsv2_bufferSize()
again to have the correct buffer size; otherwise a segmentation fault may occur.
The routine requires no extra storage
The routine supports asynchronous execution
The routine supports CUDA graph capture
Input
|
handle to the cuSPARSE library context. |
|
storage format of blocks, either |
|
the operation \(\text{op}(A)\) . |
|
number of block rows of matrix |
|
number of nonzero blocks of matrix |
|
the descriptor of matrix |
|
<type> array of |
|
integer array of |
|
integer array of |
|
block dimension of sparse matrix A; must be larger than zero. |
Output
|
record of internal states based on different algorithms. |
|
number of bytes of the buffer used in the |
See cusparseStatus_t for the description of the return status.
cusparse<t>bsrsv2_analysis() [DEPRECATED]
> The routine will be removed in the next major release
cusparseStatus_t
cusparseSbsrsv2_analysis(cusparseHandle_t handle,
cusparseDirection_t dirA,
cusparseOperation_t transA,
int mb,
int nnzb,
const cusparseMatDescr_t descrA,
const float* bsrValA,
const int* bsrRowPtrA,
const int* bsrColIndA,
int blockDim,
bsrsv2Info_t info,
cusparseSolvePolicy_t policy,
void* pBuffer)
cusparseStatus_t
cusparseDbsrsv2_analysis(cusparseHandle_t handle,
cusparseDirection_t dirA,
cusparseOperation_t transA,
int mb,
int nnzb,
const cusparseMatDescr_t descrA,
const double* bsrValA,
const int* bsrRowPtrA,
const int* bsrColIndA,
int blockDim,
bsrsv2Info_t info,
cusparseSolvePolicy_t policy,
void* pBuffer)
cusparseStatus_t
cusparseDbsrsv2_analysis(cusparseHandle_t handle,
cusparseDirection_t dirA,
cusparseOperation_t transA,
int mb,
int nnzb,
const cusparseMatDescr_t descrA,
const cuComplex* bsrValA,
const int* bsrRowPtrA,
const int* bsrColIndA,
int blockDim,
bsrsv2Info_t info,
cusparseSolvePolicy_t policy,
void* pBuffer)
cusparseStatus_t
cusparseZbsrsv2_analysis(cusparseHandle_t handle,
cusparseDirection_t dirA,
cusparseOperation_t transA,
int mb,
int nnzb,
const cusparseMatDescr_t descrA,
const cuDoubleComplex* bsrValA,
const int* bsrRowPtrA,
const int* bsrColIndA,
int blockDim,
bsrsv2Info_t info,
cusparseSolvePolicy_t policy,
void* pBuffer)
This function performs the analysis phase of bsrsv2
, a new sparse triangular linear system op(A)*y =
\(\alpha\)x
.
A
is an (mb*blockDim)x(mb*blockDim)
sparse matrix that is defined in BSR storage format by the three arrays bsrValA
, bsrRowPtrA
, and bsrColIndA
); x
and y
are the right-hand side and the solution vectors; \(\alpha\) is a scalar; and
\(\text{op}(A) = \begin{cases} A & \text{if trans == CUSPARSE_OPERATION_NON_TRANSPOSE} \\ A^{T} & \text{if trans == CUSPARSE_OPERATION_TRANSPOSE} \\ A^{H} & \text{if trans == CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE} \\ \end{cases}\)
The block of BSR format is of size blockDim*blockDim
, stored as column-major or row-major as determined by parameter dirA
, which is either CUSPARSE_DIRECTION_COLUMN
or CUSPARSE_DIRECTION_ROW
. The matrix type must be CUSPARSE_MATRIX_TYPE_GENERAL
, and the fill mode and diagonal type are ignored.
It is expected that this function will be executed only once for a given matrix and a particular operation type.
This function requires a buffer size returned by bsrsv2_bufferSize()
. The address of pBuffer
must be multiple of 128 bytes. If it is not, CUSPARSE_STATUS_INVALID_VALUE
is returned.
Function bsrsv2_analysis()
reports a structural zero and computes level information, which stored in the opaque structure info
. The level information can extract more parallelism for a triangular solver. However bsrsv2_solve()
can be done without level information. To disable level information, the user needs to specify the policy of the triangular solver as CUSPARSE_SOLVE_POLICY_NO_LEVEL
.
Function bsrsv2_analysis()
always reports the first structural zero, even when parameter policy
is CUSPARSE_SOLVE_POLICY_NO_LEVEL
. No structural zero is reported if CUSPARSE_DIAG_TYPE_UNIT
is specified, even if block A(j,j)
is missing for some j
. The user needs to call cusparseXbsrsv2_zeroPivot()
to know where the structural zero is.
It is the user’s choice whether to call bsrsv2_solve()
if bsrsv2_analysis()
reports a structural zero. In this case, the user can still call bsrsv2_solve()
, which will return a numerical zero at the same position as a structural zero. However the result x
is meaningless.
This function requires temporary extra storage that is allocated internally
The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available
The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available
Input
|
handle to the cuSPARSE library context. |
|
storage format of blocks, either |
|
the operation \(\text{op}(A)\) . |
|
number of block rows of matrix |
|
number of nonzero blocks of matrix |
|
the descriptor of matrix |
|
<type> array of |
|
integer array of |
|
integer array of |
|
block dimension of sparse matrix A, larger than zero. |
|
structure initialized using |
|
the supported policies are |
|
buffer allocated by the user, the size is return by |
Output
|
structure filled with information collected during the analysis phase (that should be passed to the solve phase unchanged). |
See cusparseStatus_t for the description of the return status.
cusparse<t>bsrsv2_solve() [DEPRECATED]
> The routine will be removed in the next major release
cusparseStatus_t
cusparseSbsrsv2_solve(cusparseHandle_t handle,
cusparseDirection_t dirA,
cusparseOperation_t transA,
int mb,
int nnzb,
const float* alpha,
const cusparseMatDescr_t descrA,
const float* bsrValA,
const int* bsrRowPtrA,
const int* bsrColIndA,
int blockDim,
bsrsv2Info_t info,
const float* x,
float* y,
cusparseSolvePolicy_t policy,
void* pBuffer)
cusparseStatus_t
cusparseDbsrsv2_solve(cusparseHandle_t handle,
cusparseDirection_t dirA,
cusparseOperation_t transA,
int mb,
int nnzb,
const double* alpha,
const cusparseMatDescr_t descrA,
const double* bsrValA,
const int* bsrRowPtrA,
const int* bsrColIndA,
int blockDim,
bsrsv2Info_t info,
const double* x,
double* y,
cusparseSolvePolicy_t policy,
void* pBuffer)
cusparseStatus_t
cusparseCbsrsv2_solve(cusparseHandle_t handle,
cusparseDirection_t dirA,
cusparseOperation_t transA,
int mb,
int nnzb,
const cuComplex* alpha,
const cusparseMatDescr_t descrA,
const cuComplex* bsrValA,
const int* bsrRowPtrA,
const int* bsrColIndA,
int blockDim,
bsrsv2Info_t info,
const cuComplex* x,
cuComplex* y,
cusparseSolvePolicy_t policy,
void* pBuffer)
cusparseStatus_t
cusparseZbsrsv2_solve(cusparseHandle_t handle,
cusparseDirection_t dirA,
cusparseOperation_t transA,
int mb,
int nnzb,
const cuDoubleComplex* alpha,
const cusparseMatDescr_t descrA,
const cuDoubleComplex* bsrValA,
const int* bsrRowPtrA,
const int* bsrColIndA,
int blockDim,
bsrsv2Info_t info,
const cuDoubleComplex* x,
cuDoubleComplex* y,
cusparseSolvePolicy_t policy,
void* pBuffer)
This function performs the solve phase of bsrsv2
, a new sparse triangular linear system op(A)*y =
\(\alpha\)x
.
A
is an (mb*blockDim)x(mb*blockDim)
sparse matrix that is defined in BSR storage format by the three arrays bsrValA
, bsrRowPtrA
, and bsrColIndA
); x
and y
are the right-hand-side and the solution vectors; \(\alpha\) is a scalar; and
\(\text{op}(A) = \begin{cases} A & \text{if trans == CUSPARSE_OPERATION_NON_TRANSPOSE} \\ A^{T} & \text{if trans == CUSPARSE_OPERATION_TRANSPOSE} \\ A^{H} & \text{if trans == CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE} \\ \end{cases}\)
The block in BSR format is of size blockDim*blockDim
, stored as column-major or row-major as determined by parameter dirA
, which is either CUSPARSE_DIRECTION_COLUMN
or CUSPARSE_DIRECTION_ROW
. The matrix type must be CUSPARSE_MATRIX_TYPE_GENERAL
, and the fill mode and diagonal type are ignored. Function bsrsv02_solve()
can support an arbitrary blockDim
.
This function may be executed multiple times for a given matrix and a particular operation type.
This function requires a buffer size returned by bsrsv2_bufferSize()
. The address of pBuffer
must be multiple of 128 bytes. If it is not, CUSPARSE_STATUS_INVALID_VALUE
is returned.
Although bsrsv2_solve()
can be done without level information, the user still needs to be aware of consistency. If bsrsv2_analysis()
is called with policy CUSPARSE_SOLVE_POLICY_USE_LEVEL
, bsrsv2_solve()
can be run with or without levels. On the other hand, if bsrsv2_analysis()
is called with CUSPARSE_SOLVE_POLICY_NO_LEVEL
, bsrsv2_solve()
can only accept CUSPARSE_SOLVE_POLICY_NO_LEVEL
; otherwise, CUSPARSE_STATUS_INVALID_VALUE
is returned.
The level information may not improve the performance, but may spend extra time doing analysis. For example, a tridiagonal matrix has no parallelism. In this case, CUSPARSE_SOLVE_POLICY_NO_LEVEL
performs better than CUSPARSE_SOLVE_POLICY_USE_LEVEL
. If the user has an iterative solver, the best approach is to do bsrsv2_analysis()
with CUSPARSE_SOLVE_POLICY_USE_LEVEL
once. Then do bsrsv2_solve()
with CUSPARSE_SOLVE_POLICY_NO_LEVEL
in the first run, and with CUSPARSE_SOLVE_POLICY_USE_LEVEL
in the second run, and pick the fastest one to perform the remaining iterations.
Function bsrsv02_solve()
has the same behavior as csrsv02_solve()
. That is, bsr2csr(bsrsv02(A)) = csrsv02(bsr2csr(A))
. The numerical zero of csrsv02_solve()
means there exists some zero A(j,j)
. The numerical zero of bsrsv02_solve()
means there exists some block A(j,j)
that is not invertible.
Function bsrsv2_solve()
reports the first numerical zero, including a structural zero. No numerical zero is reported if CUSPARSE_DIAG_TYPE_UNIT
is specified, even if A(j,j)
is not invertible for some j
. The user needs to call cusparseXbsrsv2_zeroPivot()
to know where the numerical zero is.
The function supports the following properties if pBuffer != NULL
The routine requires no extra storage
The routine supports asynchronous execution
The routine supports CUDA graph capture
For example, suppose L is a lower triangular matrix with unit diagonal, then the following code solves L*y=x
by level information.
// Suppose that L is m x m sparse matrix represented by BSR format,
// The number of block rows/columns is mb, and
// the number of nonzero blocks is nnzb.
// L is lower triangular with unit diagonal.
// Assumption:
// - dimension of matrix L is m(=mb*blockDim),
// - matrix L has nnz(=nnzb*blockDim*blockDim) nonzero elements,
// - handle is already created by cusparseCreate(),
// - (d_bsrRowPtr, d_bsrColInd, d_bsrVal) is BSR of L on device memory,
// - d_x is right hand side vector on device memory.
// - d_y is solution vector on device memory.
// - d_x and d_y are of size m.
cusparseMatDescr_t descr = 0;
bsrsv2Info_t info = 0;
int pBufferSize;
void *pBuffer = 0;
int structural_zero;
int numerical_zero;
const double alpha = 1.;
const cusparseSolvePolicy_t policy = CUSPARSE_SOLVE_POLICY_USE_LEVEL;
const cusparseOperation_t trans = CUSPARSE_OPERATION_NON_TRANSPOSE;
const cusparseDirection_t dir = CUSPARSE_DIRECTION_COLUMN;
// step 1: create a descriptor which contains
// - matrix L is base-1
// - matrix L is lower triangular
// - matrix L has unit diagonal, specified by parameter CUSPARSE_DIAG_TYPE_UNIT
// (L may not have all diagonal elements.)
cusparseCreateMatDescr(&descr);
cusparseSetMatIndexBase(descr, CUSPARSE_INDEX_BASE_ONE);
cusparseSetMatFillMode(descr, CUSPARSE_FILL_MODE_LOWER);
cusparseSetMatDiagType(descr, CUSPARSE_DIAG_TYPE_UNIT);
// step 2: create a empty info structure
cusparseCreateBsrsv2Info(&info);
// step 3: query how much memory used in bsrsv2, and allocate the buffer
cusparseDbsrsv2_bufferSize(handle, dir, trans, mb, nnzb, descr,
d_bsrVal, d_bsrRowPtr, d_bsrColInd, blockDim, &pBufferSize);
// pBuffer returned by cudaMalloc is automatically aligned to 128 bytes.
cudaMalloc((void**)&pBuffer, pBufferSize);
// step 4: perform analysis
cusparseDbsrsv2_analysis(handle, dir, trans, mb, nnzb, descr,
d_bsrVal, d_bsrRowPtr, d_bsrColInd, blockDim,
info, policy, pBuffer);
// L has unit diagonal, so no structural zero is reported.
status = cusparseXbsrsv2_zeroPivot(handle, info, &structural_zero);
if (CUSPARSE_STATUS_ZERO_PIVOT == status){
printf("L(%d,%d) is missing\n", structural_zero, structural_zero);
}
// step 5: solve L*y = x
cusparseDbsrsv2_solve(handle, dir, trans, mb, nnzb, &alpha, descr,
d_bsrVal, d_bsrRowPtr, d_bsrColInd, blockDim, info,
d_x, d_y, policy, pBuffer);
// L has unit diagonal, so no numerical zero is reported.
status = cusparseXbsrsv2_zeroPivot(handle, info, &numerical_zero);
if (CUSPARSE_STATUS_ZERO_PIVOT == status){
printf("L(%d,%d) is zero\n", numerical_zero, numerical_zero);
}
// step 6: free resources
cudaFree(pBuffer);
cusparseDestroyBsrsv2Info(info);
cusparseDestroyMatDescr(descr);
cusparseDestroy(handle);
Input
|
handle to the cuSPARSE library context. |
|
storage format of blocks, either |
|
the operation \(\text{op}(A)\). |
|
number of block rows and block columns of matrix |
|
<type> scalar used for multiplication. |
|
the descriptor of matrix |
|
<type> array of |
|
integer array of |
|
integer array of |
|
block dimension of sparse matrix |
|
structure with information collected during the analysis phase (that should have been passed to the solve phase unchanged). |
|
<type> right-hand-side vector of size |
|
the supported policies are |
|
buffer allocated by the user, the size is returned by |
Output
|
<type> solution vector of size |
See cusparseStatus_t for the description of the return status.
cusparseXbsrsv2_zeroPivot() [DEPRECATED]
> The routine will be removed in the next major release
cusparseStatus_t
cusparseXbsrsv2_zeroPivot(cusparseHandle_t handle,
bsrsv2Info_t info,
int* position)
If the returned error code is CUSPARSE_STATUS_ZERO_PIVOT
, position=j
means A(j,j)
is either structural zero or numerical zero (singular block). Otherwise position=-1
.
The position
can be 0-based or 1-based, the same as the matrix.
Function cusparseXbsrsv2_zeroPivot()
is a blocking call. It calls cudaDeviceSynchronize()
to make sure all previous kernels are done.
The position
can be in the host memory or device memory. The user can set the proper mode with cusparseSetPointerMode()
.
The routine requires no extra storage
The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available
The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available
Input
|
handle to the cuSPARSE library context. |
|
|
Output
|
if no structural or numerical zero, |
See cusparseStatus_t for the description of the return status
cusparse<t>gemvi()
cusparseStatus_t
cusparseSgemvi_bufferSize(cusparseHandle_t handle,
cusparseOperation_t transA,
int m,
int n,
int nnz,
int* pBufferSize)
cusparseStatus_t
cusparseDgemvi_bufferSize(cusparseHandle_t handle,
cusparseOperation_t transA,
int m,
int n,
int nnz,
int* pBufferSize)
cusparseStatus_t
cusparseCgemvi_bufferSize(cusparseHandle_t handle,
cusparseOperation_t transA,
int m,
int n,
int nnz,
int* pBufferSize)
cusparseStatus_t
cusparseZgemvi_bufferSize(cusparseHandle_t handle,
cusparseOperation_t transA,
int m,
int n,
int nnz,
int* pBufferSize)
cusparseStatus_t
cusparseSgemvi(cusparseHandle_t handle,
cusparseOperation_t transA,
int m,
int n,
const float* alpha,
const float* A,
int lda,
int nnz,
const float* x,
const int* xInd,
const float* beta,
float* y,
cusparseIndexBase_t idxBase,
void* pBuffer)
cusparseStatus_t
cusparseDgemvi(cusparseHandle_t handle,
cusparseOperation_t transA,
int m,
int n,
const double* alpha,
const double* A,
int lda,
int nnz,
const double* x,
const int* xInd,
const double* beta,
double* y,
cusparseIndexBase_t idxBase,
void* pBuffer)
cusparseStatus_t
cusparseCgemvi(cusparseHandle_t handle,
cusparseOperation_t transA,
int m,
int n,
const cuComplex* alpha,
const cuComplex* A,
int lda,
int nnz,
const cuComplex* x,
const int* xInd,
const cuComplex* beta,
cuComplex* y,
cusparseIndexBase_t idxBase,
void* pBuffer)
cusparseStatus_t
cusparseZgemvi(cusparseHandle_t handle,
cusparseOperation_t transA,
int m,
int n,
const cuDoubleComplex* alpha,
const cuDoubleComplex* A,
int lda,
int nnz,
const cuDoubleComplex* x,
const int* xInd,
const cuDoubleComplex* beta,
cuDoubleComplex* y,
cusparseIndexBase_t idxBase,
void* pBuffer)
This function performs the matrix-vector operation
\(\text{y} = \alpha \ast \text{op}(A) \ast \text{x} + \beta \ast \text{y}\) |
A
is an m×n
dense matrix and a sparse vector x
that is defined in a sparse storage format by the two arrays xVal, xInd
of length nnz
, and y
is a dense vector; \(\alpha\)and \(\beta\)are scalars; and
\(\text{op}(A) = \begin{cases} A & \text{if trans == CUSPARSE_OPERATION_NON_TRANSPOSE} \\ A^{T} & \text{if trans == CUSPARSE_OPERATION_TRANSPOSE} \\ \end{cases}\)
To simplify the implementation, we have not (yet) optimized the transpose multiple case. We recommend the following for users interested in this case.
Convert the matrix from CSR to CSC format using one of the
csr2csc()
functions. Notice that by interchanging the rows and columns of the result you are implicitly transposing the matrix.Call the
gemvi()
function with thecusparseOperation_t
parameter set toCUSPARSE_OPERATION_NON_TRANSPOSE
and with the interchanged rows and columns of the matrix stored in CSC format. This (implicitly) multiplies the vector by the transpose of the matrix in the original CSR format.The routine requires no extra storage
The routine supports asynchronous execution
The routine supports CUDA graph capture
The function cusparse<t>gemvi_bufferSize()
returns the size of buffer used in cusparse<t>gemvi()
.
Input
|
Handle to the cuSPARSE library context. |
|
The operation \(\text{op}(A)\). |
|
Number of rows of matrix |
|
Number of columns of matrix |
|
<type> scalar used for multiplication. |
|
The pointer to dense matrix |
|
Size of the leading dimension of |
|
Number of nonzero elements of vector |
|
<type> sparse vector of |
|
Indices of non-zero values in |
|
<type> scalar used for multiplication. If |
|
<type> dense vector of |
|
0 or 1, for 0 based or 1 based indexing, respectively. |
|
Number of elements needed the buffer used in |
|
Working space buffer. |
Output
|
<type> updated dense vector. |
See cusparseStatus_t for the description of the return status.