cuSPARSE Extra Function Reference
This chapter describes the extra routines used to manipulate sparse matrices.
cusparse<t>csrgeam2()
cusparseStatus_t
cusparseScsrgeam2_bufferSizeExt(cusparseHandle_t handle,
int m,
int n,
const float* alpha,
const cusparseMatDescr_t descrA,
int nnzA,
const float* csrSortedValA,
const int* csrSortedRowPtrA,
const int* csrSortedColIndA,
const float* beta,
const cusparseMatDescr_t descrB,
int nnzB,
const float* csrSortedValB,
const int* csrSortedRowPtrB,
const int* csrSortedColIndB,
const cusparseMatDescr_t descrC,
const float* csrSortedValC,
const int* csrSortedRowPtrC,
const int* csrSortedColIndC,
size_t* pBufferSizeInBytes)
cusparseStatus_t
cusparseDcsrgeam2_bufferSizeExt(cusparseHandle_t handle,
int m,
int n,
const double* alpha,
const cusparseMatDescr_t descrA,
int nnzA,
const double* csrSortedValA,
const int* csrSortedRowPtrA,
const int* csrSortedColIndA,
const double* beta,
const cusparseMatDescr_t descrB,
int nnzB,
const double* csrSortedValB,
const int* csrSortedRowPtrB,
const int* csrSortedColIndB,
const cusparseMatDescr_t descrC,
const double* csrSortedValC,
const int* csrSortedRowPtrC,
const int* csrSortedColIndC,
size_t* pBufferSizeInBytes)
cusparseStatus_t
cusparseCcsrgeam2_bufferSizeExt(cusparseHandle_t handle,
int m,
int n,
const cuComplex* alpha,
const cusparseMatDescr_t descrA,
int nnzA,
const cuComplex* csrSortedValA,
const int* csrSortedRowPtrA,
const int* csrSortedColIndA,
const cuComplex* beta,
const cusparseMatDescr_t descrB,
int nnzB,
const cuComplex* csrSortedValB,
const int* csrSortedRowPtrB,
const int* csrSortedColIndB,
const cusparseMatDescr_t descrC,
const cuComplex* csrSortedValC,
const int* csrSortedRowPtrC,
const int* csrSortedColIndC,
size_t* pBufferSizeInBytes)
cusparseStatus_t
cusparseZcsrgeam2_bufferSizeExt(cusparseHandle_t handle,
int m,
int n,
const cuDoubleComplex* alpha,
const cusparseMatDescr_t descrA,
int nnzA,
const cuDoubleComplex* csrSortedValA,
const int* csrSortedRowPtrA,
const int* csrSortedColIndA,
const cuDoubleComplex* beta,
const cusparseMatDescr_t descrB,
int nnzB,
const cuDoubleComplex* csrSortedValB,
const int* csrSortedRowPtrB,
const int* csrSortedColIndB,
const cusparseMatDescr_t descrC,
const cuDoubleComplex* csrSortedValC,
const int* csrSortedRowPtrC,
const int* csrSortedColIndC,
size_t* pBufferSizeInBytes)
cusparseStatus_t
cusparseXcsrgeam2Nnz(cusparseHandle_t handle,
int m,
int n,
const cusparseMatDescr_t descrA,
int nnzA,
const int* csrSortedRowPtrA,
const int* csrSortedColIndA,
const cusparseMatDescr_t descrB,
int nnzB,
const int* csrSortedRowPtrB,
const int* csrSortedColIndB,
const cusparseMatDescr_t descrC,
int* csrSortedRowPtrC,
int* nnzTotalDevHostPtr,
void* workspace)
cusparseStatus_t
cusparseScsrgeam2(cusparseHandle_t handle,
int m,
int n,
const float* alpha,
const cusparseMatDescr_t descrA,
int nnzA,
const float* csrSortedValA,
const int* csrSortedRowPtrA,
const int* csrSortedColIndA,
const float* beta,
const cusparseMatDescr_t descrB,
int nnzB,
const float* csrSortedValB,
const int* csrSortedRowPtrB,
const int* csrSortedColIndB,
const cusparseMatDescr_t descrC,
float* csrSortedValC,
int* csrSortedRowPtrC,
int* csrSortedColIndC,
void* pBuffer)
cusparseStatus_t
cusparseDcsrgeam2(cusparseHandle_t handle,
int m,
int n,
const double* alpha,
const cusparseMatDescr_t descrA,
int nnzA,
const double* csrSortedValA,
const int* csrSortedRowPtrA,
const int* csrSortedColIndA,
const double* beta,
const cusparseMatDescr_t descrB,
int nnzB,
const double* csrSortedValB,
const int* csrSortedRowPtrB,
const int* csrSortedColIndB,
const cusparseMatDescr_t descrC,
double* csrSortedValC,
int* csrSortedRowPtrC,
int* csrSortedColIndC,
void* pBuffer)
cusparseStatus_t
cusparseCcsrgeam2(cusparseHandle_t handle,
int m,
int n,
const cuComplex* alpha,
const cusparseMatDescr_t descrA,
int nnzA,
const cuComplex* csrSortedValA,
const int* csrSortedRowPtrA,
const int* csrSortedColIndA,
const cuComplex* beta,
const cusparseMatDescr_t descrB,
int nnzB,
const cuComplex* csrSortedValB,
const int* csrSortedRowPtrB,
const int* csrSortedColIndB,
const cusparseMatDescr_t descrC,
cuComplex* csrSortedValC,
int* csrSortedRowPtrC,
int* csrSortedColIndC,
void* pBuffer)
cusparseStatus_t
cusparseZcsrgeam2(cusparseHandle_t handle,
int m,
int n,
const cuDoubleComplex* alpha,
const cusparseMatDescr_t descrA,
int nnzA,
const cuDoubleComplex* csrSortedValA,
const int* csrSortedRowPtrA,
const int* csrSortedColIndA,
const cuDoubleComplex* beta,
const cusparseMatDescr_t descrB,
int nnzB,
const cuDoubleComplex* csrSortedValB,
const int* csrSortedRowPtrB,
const int* csrSortedColIndB,
const cusparseMatDescr_t descrC,
cuDoubleComplex* csrSortedValC,
int* csrSortedRowPtrC,
int* csrSortedColIndC,
void* pBuffer)
This function performs following matrix-matrix operation
\(C = \alpha \ast A + \beta \ast B\) |
where A
, B
, and C
are m×n
sparse matrices (defined in CSR storage format by the three arrays csrValA|csrValB|csrValC
, csrRowPtrA|csrRowPtrB|csrRowPtrC
, and csrColIndA|csrColIndB|csrcolIndC
respectively), and \(\alpha\text{~and~}\beta\) are scalars. Since A
and B
have different sparsity patterns, cuSPARSE adopts a two-step approach to complete sparse matrix C
. In the first step, the user allocates csrRowPtrC
of m+1
elements and uses function cusparseXcsrgeam2Nnz()
to determine csrRowPtrC
and the total number of nonzero elements. In the second step, the user gathers nnzC
(number of nonzero elements of matrix C
) from either (nnzC=*nnzTotalDevHostPtr)
or (nnzC=csrRowPtrC(m)-csrRowPtrC(0))
and allocates csrValC, csrColIndC
of nnzC
elements respectively, then finally calls function cusparse[S|D|C|Z]csrgeam2()
to complete matrix C
.
The general procedure is as follows:
int baseC, nnzC;
/* alpha, nnzTotalDevHostPtr points to host memory */
size_t BufferSizeInBytes;
char *buffer = NULL;
int *nnzTotalDevHostPtr = &nnzC;
cusparseSetPointerMode(handle, CUSPARSE_POINTER_MODE_HOST);
cudaMalloc((void**)&csrRowPtrC, sizeof(int)*(m+1));
/* prepare buffer */
cusparseScsrgeam2_bufferSizeExt(handle, m, n,
alpha,
descrA, nnzA,
csrValA, csrRowPtrA, csrColIndA,
beta,
descrB, nnzB,
csrValB, csrRowPtrB, csrColIndB,
descrC,
csrValC, csrRowPtrC, csrColIndC
&bufferSizeInBytes
);
cudaMalloc((void**)&buffer, sizeof(char)*bufferSizeInBytes);
cusparseXcsrgeam2Nnz(handle, m, n,
descrA, nnzA, csrRowPtrA, csrColIndA,
descrB, nnzB, csrRowPtrB, csrColIndB,
descrC, csrRowPtrC, nnzTotalDevHostPtr,
buffer);
if (NULL != nnzTotalDevHostPtr){
nnzC = *nnzTotalDevHostPtr;
}else{
cudaMemcpy(&nnzC, csrRowPtrC+m, sizeof(int), cudaMemcpyDeviceToHost);
cudaMemcpy(&baseC, csrRowPtrC, sizeof(int), cudaMemcpyDeviceToHost);
nnzC -= baseC;
}
cudaMalloc((void**)&csrColIndC, sizeof(int)*nnzC);
cudaMalloc((void**)&csrValC, sizeof(float)*nnzC);
cusparseScsrgeam2(handle, m, n,
alpha,
descrA, nnzA,
csrValA, csrRowPtrA, csrColIndA,
beta,
descrB, nnzB,
csrValB, csrRowPtrB, csrColIndB,
descrC,
csrValC, csrRowPtrC, csrColIndC
buffer);
Several comments on csrgeam2()
:
The other three combinations, NT, TN, and TT, are not supported by cuSPARSE. In order to do any one of the three, the user should use the routine
csr2csc()
to convert \(A\) | \(B\) to \(A^{T}\) | \(B^{T}\) .Only
CUSPARSE_MATRIX_TYPE_GENERAL
is supported. If eitherA
orB
is symmetric or Hermitian, then the user must extend the matrix to a full one and reconfigure theMatrixType
field of the descriptor toCUSPARSE_MATRIX_TYPE_GENERAL
.If the sparsity pattern of matrix
C
is known, the user can skip the call to functioncusparseXcsrgeam2Nnz()
. For example, suppose that the user has an iterative algorithm which would updateA
andB
iteratively but keep the sparsity patterns. The user can call functioncusparseXcsrgeam2Nnz()
once to set up the sparsity pattern ofC
, then call functioncusparse[S|D|C|Z]geam()
only for each iteration.The pointers
alpha
andbeta
must be valid.When
alpha
orbeta
is zero, it is not considered a special case by cuSPARSE. The sparsity pattern ofC
is independent of the value ofalpha
andbeta
. If the user wants \(C = 0 \times A + 1 \times B^{T}\) , thencsr2csc()
is better thancsrgeam2()
.csrgeam2()
is the same ascsrgeam()
exceptcsrgeam2()
needs explicit buffer wherecsrgeam()
allocates the buffer internally.This function requires temporary extra storage that is allocated internally
The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available
The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available
Input
|
handle to the cuSPARSE library context. |
|
number of rows of sparse matrix |
|
number of columns of sparse matrix |
|
<type> scalar used for multiplication. |
|
the descriptor of matrix |
|
number of nonzero elements of sparse matrix |
|
<type> array of |
|
integer array of |
|
integer array of |
|
<type> scalar used for multiplication. If |
|
the descriptor of matrix |
|
number of nonzero elements of sparse matrix |
|
<type> array of |
|
integer array of |
|
integer array of |
|
the descriptor of matrix |
Output
|
<type> array of |
|
integer array of |
|
integer array of |
|
total number of nonzero elements in device or host memory. It is equal to |
See cusparseStatus_t for the description of the return status