cuSPARSE Level 3 Function Reference
This chapter describes sparse linear algebra functions that perform operations between sparse and (usually tall) dense matrices.
cusparse<t>bsrmm()
cusparseStatus_t
cusparseSbsrmm(cusparseHandle_t handle,
cusparseDirection_t dirA,
cusparseOperation_t transA,
cusparseOperation_t transB,
int mb,
int n,
int kb,
int nnzb,
const float* alpha,
const cusparseMatDescr_t descrA,
const float* bsrValA,
const int* bsrRowPtrA,
const int* bsrColIndA,
int blockDim,
const float* B,
int ldb,
const float* beta,
float* C,
int ldc)
cusparseStatus_t
cusparseDbsrmm(cusparseHandle_t handle,
cusparseDirection_t dirA,
cusparseOperation_t transA,
cusparseOperation_t transB,
int mb,
int n,
int kb,
int nnzb,
const double* alpha,
const cusparseMatDescr_t descrA,
const double* bsrValA,
const int* bsrRowPtrA,
const int* bsrColIndA,
int blockDim,
const double* B,
int ldb,
const double* beta,
double* C,
int ldc)
cusparseStatus_t
cusparseCbsrmm(cusparseHandle_t handle,
cusparseDirection_t dirA,
cusparseOperation_t transA,
cusparseOperation_t transB,
int mb,
int n,
int kb,
int nnzb,
const cuComplex* alpha,
const cusparseMatDescr_t descrA,
const cuComplex* bsrValA,
const int* bsrRowPtrA,
const int* bsrColIndA,
int blockDim,
const cuComplex* B,
int ldb,
const cuComplex* beta,
cuComplex* C,
int ldc)
cusparseStatus_t
cusparseZbsrmm(cusparseHandle_t handle,
cusparseDirection_t dirA,
cusparseOperation_t transA,
cusparseOperation_t transB,
int mb,
int n,
int kb,
int nnzb,
const cuDoubleComplex* alpha,
const cusparseMatDescr_t descrA,
const cuDoubleComplex* bsrValA,
const int* bsrRowPtrA,
const int* bsrColIndA,
int blockDim,
const cuDoubleComplex* B,
int ldb,
const cuDoubleComplex* beta,
cuDoubleComplex* C,
int ldc)
This function performs one of the following matrix-matrix operations:
\(C = \alpha \ast \text{op}(A) \ast \text{op}(B) + \beta \ast C\) |
A
is an mb×kb
sparse matrix that is defined in BSR storage format by the three arrays bsrValA
, bsrRowPtrA
, and bsrColIndA
; B
and C
are dense matrices; \(\alpha\text{~and~}\beta\) are scalars; and
\(\text{op}(A) = \begin{cases} A & \text{if transA == CUSPARSE_OPERATION_NON_TRANSPOSE} \\ A^{T} & \text{if transA == CUSPARSE_OPERATION_TRANSPOSE (not\ supported)} \\ A^{H} & \text{if transA == CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE (not supported)} \\ \end{cases}\)
and
\(\text{op}(B) = \begin{cases} B & \text{if transB == CUSPARSE_OPERATION_NON_TRANSPOSE} \\ B^{T} & \text{if transB == CUSPARSE_OPERATION_TRANSPOSE} \\ B^{H} & \text{if transB == CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE (not supported)} \\ \end{cases}\)
The function has the following limitations:
Only
CUSPARSE_MATRIX_TYPE_GENERAL
matrix type is supportedOnly
blockDim > 1
is supportedif
blockDim
≤ 4, then max(mb)/max(n) = 524,272if 4 <
blockDim
≤ 8, then max(mb) = 524,272, max(n) = 262,136if
blockDim
> 8, then m < 65,535 and max(n) = 262,136
The motivation of transpose(B)
is to improve memory access of matrix B
. The computational pattern of A*transpose(B)
with matrix B
in column-major order is equivalent to A*B
with matrix B
in row-major order.
In practice, no operation in an iterative solver or eigenvalue solver uses A*transpose(B)
. However, we can perform A*transpose(transpose(B))
which is the same as A*B
. For example, suppose A
is mb*kb
, B
is k*n
and C
is m*n
, the following code shows usage of cusparseDbsrmm()
.
// A is mb*kb, B is k*n and C is m*n
const int m = mb*blockSize;
const int k = kb*blockSize;
const int ldb_B = k; // leading dimension of B
const int ldc = m; // leading dimension of C
// perform C:=alpha*A*B + beta*C
cusparseSetMatType(descrA, CUSPARSE_MATRIX_TYPE_GENERAL );
cusparseDbsrmm(cusparse_handle,
CUSPARSE_DIRECTION_COLUMN,
CUSPARSE_OPERATION_NON_TRANSPOSE,
CUSPARSE_OPERATION_NON_TRANSPOSE,
mb, n, kb, nnzb, alpha,
descrA, bsrValA, bsrRowPtrA, bsrColIndA, blockSize,
B, ldb_B,
beta, C, ldc);
Instead of using A*B
, our proposal is to transpose B
to Bt
by first calling cublas<t>geam()
, and then to perform A*transpose(Bt)
.
// step 1: Bt := transpose(B)
const int m = mb*blockSize;
const int k = kb*blockSize;
double *Bt;
const int ldb_Bt = n; // leading dimension of Bt
cudaMalloc((void**)&Bt, sizeof(double)*ldb_Bt*k);
double one = 1.0;
double zero = 0.0;
cublasSetPointerMode(cublas_handle, CUBLAS_POINTER_MODE_HOST);
cublasDgeam(cublas_handle, CUBLAS_OP_T, CUBLAS_OP_T,
n, k, &one, B, int ldb_B, &zero, B, int ldb_B, Bt, ldb_Bt);
// step 2: perform C:=alpha*A*transpose(Bt) + beta*C
cusparseDbsrmm(cusparse_handle,
CUSPARSE_DIRECTION_COLUMN,
CUSPARSE_OPERATION_NON_TRANSPOSE,
CUSPARSE_OPERATION_TRANSPOSE,
mb, n, kb, nnzb, alpha,
descrA, bsrValA, bsrRowPtrA, bsrColIndA, blockSize,
Bt, ldb_Bt,
beta, C, ldc);
bsrmm()
has the following properties:
The routine requires no extra storage
The routine supports asynchronous execution
The routine supports CUDA graph capture
Input
|
handle to the cuSPARSE library context. |
|
storage format of blocks, either |
|
the operation |
|
the operation |
|
number of block rows of sparse matrix |
|
number of columns of dense matrix |
|
number of block columns of sparse matrix |
|
number of non-zero blocks of sparse matrix |
|
<type> scalar used for multiplication. |
|
the descriptor of matrix |
|
<type> array of |
|
integer array of |
|
integer array of |
|
block dimension of sparse matrix |
|
array of dimensions |
|
leading dimension of |
|
<type> scalar used for multiplication. If |
|
array of dimensions |
|
leading dimension of |
Output
|
<type> updated array of dimensions |
See cusparseStatus_t for the description of the return status
cusparse<t>bsrsm2_bufferSize() [DEPRECATED]
> The routine will be removed in the next major release
cusparseStatus_t
cusparseSbsrsm2_bufferSize(cusparseHandle_t handle,
cusparseDirection_t dirA,
cusparseOperation_t transA,
cusparseOperation_t transX,
int mb,
int n,
int nnzb,
const cusparseMatDescr_t descrA,
float* bsrSortedValA,
const int* bsrSortedRowPtrA,
const int* bsrSortedColIndA,
int blockDim,
bsrsm2Info_t info,
int* pBufferSizeInBytes)
cusparseStatus_t
cusparseDbsrsm2_bufferSize(cusparseHandle_t handle,
cusparseDirection_t dirA,
cusparseOperation_t transA,
cusparseOperation_t transX,
int mb,
int n,
int nnzb,
const cusparseMatDescr_t descrA,
double* bsrSortedValA,
const int* bsrSortedRowPtrA,
const int* bsrSortedColIndA,
int blockDim,
bsrsm2Info_t info,
int* pBufferSizeInBytes)
cusparseStatus_t
cusparseCbsrsm2_bufferSize(cusparseHandle_t handle,
cusparseDirection_t dirA,
cusparseOperation_t transA,
cusparseOperation_t transX,
int mb,
int n,
int nnzb,
const cusparseMatDescr_t descrA,
cuComplex* bsrSortedValA,
const int* bsrSortedRowPtrA,
const int* bsrSortedColIndA,
int blockDim,
bsrsm2Info_t info,
int* pBufferSizeInBytes)
cusparseStatus_t
cusparseZbsrsm2_bufferSize(cusparseHandle_t handle,
cusparseDirection_t dirA,
cusparseOperation_t transA,
cusparseOperation_t transX,
int mb,
int n,
int nnzb,
const cusparseMatDescr_t descrA,
cuDoubleComplex* bsrSortedValA,
const int* bsrSortedRowPtrA,
const int* bsrSortedColIndA,
int blockDim,
bsrsm2Info_t info,
int* pBufferSizeInBytes)
This function returns size of buffer used in bsrsm2()
, a new sparse triangular linear system op(A)*op(X)=
\(\alpha\)op(B)
.
A
is an (mb*blockDim)x(mb*blockDim)
sparse matrix that is defined in BSR storage format by the three arrays bsrValA
, bsrRowPtrA
, and bsrColIndA
); B
and X
are the right-hand-side and the solution matrices; \(\alpha\) is a scalar; and
\(\text{op}(A) == \text{CUSPARSE_OPERATION_NON_TRANSPOSE}\)
Although there are six combinations in terms of parameter trans
and the upper (and lower) triangular part of A
, bsrsm2_bufferSize()
returns the maximum size of the buffer among these combinations. The buffer size depends on dimension mb,blockDim
and the number of nonzeros of the matrix, nnzb
. If the user changes the matrix, it is necessary to call bsrsm2_bufferSize()
again to get the correct buffer size, otherwise a segmentation fault may occur.
The routine requires no extra storage
The routine supports asynchronous execution
The routine supports CUDA graph capture
Input
|
handle to the cuSPARSE library context. |
|
storage format of blocks, either |
|
the operation |
|
the operation |
|
number of block rows of matrix |
|
number of columns of matrix |
|
number of nonzero blocks of matrix |
|
the descriptor of matrix |
|
<type> array of |
|
integer array of |
|
integer array of |
|
block dimension of sparse matrix |
Output
|
record internal states based on different algorithms. |
|
number of bytes of the buffer used in |
See cusparseStatus_t for the description of the return status
cusparse<t>bsrsm2_analysis() [DEPRECATED]
> The routine will be removed in the next major release
cusparseStatus_t
cusparseSbsrsm2_analysis(cusparseHandle_t handle,
cusparseDirection_t dirA,
cusparseOperation_t transA,
cusparseOperation_t transX,
int mb,
int n,
int nnzb,
const cusparseMatDescr_t descrA,
const float* bsrSortedVal,
const int* bsrSortedRowPtr,
const int* bsrSortedColInd,
int blockDim,
bsrsm2Info_t info,
cusparseSolvePolicy_t policy,
void* pBuffer)
cusparseStatus_t
cusparseDbsrsm2_analysis(cusparseHandle_t handle,
cusparseDirection_t dirA,
cusparseOperation_t transA,
cusparseOperation_t transX,
int mb,
int n,
int nnzb,
const cusparseMatDescr_t descrA,
const double* bsrSortedVal,
const int* bsrSortedRowPtr,
const int* bsrSortedColInd,
int blockDim,
bsrsm2Info_t info,
cusparseSolvePolicy_t policy,
void* pBuffer)
cusparseStatus_t
cusparseCbsrsm2_analysis(cusparseHandle_t handle,
cusparseDirection_t dirA,
cusparseOperation_t transA,
cusparseOperation_t transX,
int mb,
int n,
int nnzb,
const cusparseMatDescr_t descrA,
const cuComplex* bsrSortedVal,
const int* bsrSortedRowPtr,
const int* bsrSortedColInd,
int blockDim,
bsrsm2Info_t info,
cusparseSolvePolicy_t policy,
void* pBuffer)
cusparseStatus_t
cusparseZbsrsm2_analysis(cusparseHandle_t handle,
cusparseDirection_t dirA,
cusparseOperation_t transA,
cusparseOperation_t transX,
int mb,
int n,
int nnzb,
const cusparseMatDescr_t descrA,
const cuDoubleComplex* bsrSortedVal,
const int* bsrSortedRowPtr,
const int* bsrSortedColInd,
int blockDim,
bsrsm2Info_t info,
cusparseSolvePolicy_t policy,
void* pBuffer)
This function performs the analysis phase of bsrsm2()
, a new sparse triangular linear system op(A)*op(X) =
\(\alpha\)op(B)
.
A
is an (mb*blockDim)x(mb*blockDim)
sparse matrix that is defined in BSR storage format by the three arrays bsrValA
, bsrRowPtrA
, and bsrColIndA
); B
and X
are the right-hand-side and the solution matrices; \(\alpha\) is a scalar; and
\(\text{op}(A) == \text{CUSPARSE_OPERATION_NON_TRANSPOSE}\)
and
\(\text{op}(X) = \begin{cases} X & \text{if transX == CUSPARSE_OPERATION_NON_TRANSPOSE} \\ X^{T} & \text{if transX == CUSPARSE_OPERATION_TRANSPOSE} \\ X^{H} & \text{if transX == CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE (not supported)} \\ \end{cases}\)
and op(B)
and op(X)
are equal.
The block of BSR format is of size blockDim*blockDim
, stored in column-major or row-major as determined by parameter dirA
, which is either CUSPARSE_DIRECTION_ROW
or CUSPARSE_DIRECTION_COLUMN
. The matrix type must be CUSPARSE_MATRIX_TYPE_GENERAL
, and the fill mode and diagonal type are ignored.
It is expected that this function will be executed only once for a given matrix and a particular operation type.
This function requires the buffer size returned by bsrsm2_bufferSize()
. The address of pBuffer
must be multiple of 128 bytes. If not, CUSPARSE_STATUS_INVALID_VALUE
is returned.
Function bsrsm2_analysis()
reports a structural zero and computes the level information stored in opaque structure info
. The level information can extract more parallelism during a triangular solver. However bsrsm2_solve()
can be done without level information. To disable level information, the user needs to specify the policy of the triangular solver as CUSPARSE_SOLVE_POLICY_NO_LEVEL
.
Function bsrsm2_analysis()
always reports the first structural zero, even if the parameter policy
is CUSPARSE_SOLVE_POLICY_NO_LEVEL
. Besides, no structural zero is reported if CUSPARSE_DIAG_TYPE_UNIT
is specified, even if block A(j,j)
is missing for some j
. The user must call cusparseXbsrsm2_query_zero_pivot()
to know where the structural zero is.
If bsrsm2_analysis()
reports a structural zero, the solve will return a numerical zero in the same position as the structural zero but this result X
is meaningless.
This function requires temporary extra storage that is allocated internally
The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available
The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available
Input
|
handle to the cuSPARSE library context. |
|
storage format of blocks, either |
|
the operation |
|
the operation |
|
number of block rows of matrix |
|
number of columns of matrix |
|
number of non-zero blocks of matrix |
|
the descriptor of matrix |
|
<type> array of |
|
integer array of |
|
integer array of |
|
block dimension of sparse matrix |
|
structure initialized using |
|
The supported policies are |
|
buffer allocated by the user; the size is return by |
Output
|
structure filled with information collected during the analysis phase (that should be passed to the solve phase unchanged). |
See cusparseStatus_t for the description of the return status
cusparse<t>bsrsm2_solve() [DEPRECATED]
> The routine will be removed in the next major release
cusparseStatus_t
cusparseSbsrsm2_solve(cusparseHandle_t handle,
cusparseDirection_t dirA,
cusparseOperation_t transA,
cusparseOperation_t transX,
int mb,
int n,
int nnzb,
const float* alpha,
const cusparseMatDescr_t descrA,
const float* bsrSortedVal,
const int* bsrSortedRowPtr,
const int* bsrSortedColInd,
int blockDim,
bsrsm2Info_t info,
const float* B,
int ldb,
float* X,
int ldx,
cusparseSolvePolicy_t policy,
void* pBuffer)
cusparseStatus_t
cusparseDbsrsm2_solve(cusparseHandle_t handle,
cusparseDirection_t dirA,
cusparseOperation_t transA,
cusparseOperation_t transX,
int mb,
int n,
int nnzb,
const double* alpha,
const cusparseMatDescr_t descrA,
const double* bsrSortedVal,
const int* bsrSortedRowPtr,
const int* bsrSortedColInd,
int blockDim,
bsrsm2Info_t info,
const double* B,
int ldb,
double* X,
int ldx,
cusparseSolvePolicy_t policy,
void* pBuffer)
cusparseStatus_t
cusparseCbsrsm2_solve(cusparseHandle_t handle,
cusparseDirection_t dirA,
cusparseOperation_t transA,
cusparseOperation_t transX,
int mb,
int n,
int nnzb,
const cuComplex* alpha,
const cusparseMatDescr_t descrA,
const cuComplex* bsrSortedVal,
const int* bsrSortedRowPtr,
const int* bsrSortedColInd,
int blockDim,
bsrsm2Info_t info,
const cuComplex* B,
int ldb,
cuComplex* X,
int ldx,
cusparseSolvePolicy_t policy,
void* pBuffer)
cusparseStatus_t
cusparseZbsrsm2_solve(cusparseHandle_t handle,
cusparseDirection_t dirA,
cusparseOperation_t transA,
cusparseOperation_t transX,
int mb,
int n,
int nnzb,
const cuDoubleComplex* alpha,
const cusparseMatDescr_t descrA,
const cuDoubleComplex* bsrSortedVal,
const int* bsrSortedRowPtr,
const int* bsrSortedColInd,
int blockDim,
bsrsm2Info_t info,
const cuDoubleComplex* B,
int ldb,
cuDoubleComplex* X,
int ldx,
cusparseSolvePolicy_t policy,
void* pBuffer)
This function performs the solve phase of the solution of a sparse triangular linear system:
\(\text{op}(A) \ast \text{op(X)} = \alpha \ast \text{op(B)}\) |
A
is an (mb*blockDim)x(mb*blockDim)
sparse matrix that is defined in BSR storage format by the three arrays bsrValA
, bsrRowPtrA
, and bsrColIndA
); B
and X
are the right-hand-side and the solution matrices; \(\alpha\) is a scalar, and
\(\text{op}(A) == \text{CUSPARSE_OPERATION_NON_TRANSPOSE}\)
and
\(\text{op}(X) = \begin{cases} X & \text{if transX == CUSPARSE_OPERATION_NON_TRANSPOSE} \\ X^{T} & \text{if transX == CUSPARSE_OPERATION_TRANSPOSE} \\ X^{H} & \text{not supported} \\ \end{cases}\)
Only op(A)=A
is supported.
op(B)
and op(X)
must be performed in the same way. In other words, if op(B)=B
, op(X)=X
.
The block of BSR format is of size blockDim*blockDim
, stored as column-major or row-major as determined by parameter dirA
, which is either CUSPARSE_DIRECTION_ROW
or CUSPARSE_DIRECTION_COLUMN
. The matrix type must be CUSPARSE_MATRIX_TYPE_GENERAL
, and the fill mode and diagonal type are ignored. Function bsrsm02_solve()
can support an arbitrary blockDim
.
This function may be executed multiple times for a given matrix and a particular operation type.
This function requires the buffer size returned by bsrsm2_bufferSize()
. The address of pBuffer
must be multiple of 128 bytes. If it is not, CUSPARSE_STATUS_INVALID_VALUE
is returned.
Although bsrsm2_solve()
can be done without level information, the user still needs to be aware of consistency. If bsrsm2_analysis()
is called with policy CUSPARSE_SOLVE_POLICY_USE_LEVEL
, bsrsm2_solve()
can be run with or without levels. On the other hand, if bsrsm2_analysis()
is called with CUSPARSE_SOLVE_POLICY_NO_LEVEL
, bsrsm2_solve()
can only accept CUSPARSE_SOLVE_POLICY_NO_LEVEL
; otherwise, CUSPARSE_STATUS_INVALID_VALUE
is returned.
Function bsrsm02_solve()
has the same behavior as bsrsv02_solve()
, reporting the first numerical zero, including a structural zero. The user must call cusparseXbsrsm2_query_zero_pivot()
to know where the numerical zero is.
The motivation of transpose(X)
is to improve the memory access of matrix X
. The computational pattern of transpose(X)
with matrix X
in column-major order is equivalent to X
with matrix X
in row-major order.
In-place is supported and requires that B
and X
point to the same memory block, and ldb=ldx
.
The function supports the following properties if pBuffer != NULL
:
The routine requires no extra storage
The routine supports asynchronous execution
The routine supports CUDA graph capture
Input
|
handle to the cuSPARSE library context. |
|
storage format of blocks, either |
|
the operation |
|
the operation |
|
number of block rows of matrix |
|
number of columns of matrix |
|
number of non-zero blocks of matrix |
|
<type> scalar used for multiplication. |
|
the descriptor of matrix |
|
<type> array of |
|
integer array of |
|
integer array of |
|
block dimension of sparse matrix |
|
structure initialized using |
|
<type> right-hand-side array. |
|
leading dimension of |
|
leading dimension of |
|
the supported policies are |
|
buffer allocated by the user; the size is returned by |
Output
|
<type> solution array with leading dimensions |
See cusparseStatus_t for the description of the return status.
cusparseXbsrsm2_zeroPivot() [DEPRECATED]
> The routine will be removed in the next major release
cusparseStatus_t
cusparseXbsrsm2_zeroPivot(cusparseHandle_t handle,
bsrsm2Info_t info,
int* position)
If the returned error code is CUSPARSE_STATUS_ZERO_PIVOT
, position=j
means A(j,j)
is either a structural zero or a numerical zero (singular block). Otherwise position=-1
.
The position
can be 0-base or 1-base, the same as the matrix.
Function cusparseXbsrsm2_zeroPivot()
is a blocking call. It calls cudaDeviceSynchronize()
to make sure all previous kernels are done.
The position
can be in the host memory or device memory. The user can set the proper mode with cusparseSetPointerMode()
.
The routine requires no extra storage
The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available
The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available
Input
|
handle to the cuSPARSE library context. |
|
|
Output
|
if no structural or numerical zero, |
See cusparseStatus_t for the description of the return status.