cuSPARSE Format Conversion Reference
This chapter describes the conversion routines between different sparse and dense storage formats.
coosort
, csrsort
, cscsort
, and csru2csr
are sorting routines without malloc inside, the following table estimates the buffer size.
|
|
|
|
|
125M |
|
|
100M |
|
|
71M for ‘d’ and 55M for ‘z’ |
cusparse<t>bsr2csr()
cusparseStatus_t
cusparseSbsr2csr(cusparseHandle_t handle,
cusparseDirection_t dir,
int mb,
int nb,
const cusparseMatDescr_t descrA,
const float* bsrValA,
const int* bsrRowPtrA,
const int* bsrColIndA,
int blockDim,
const cusparseMatDescr_t descrC,
float* csrValC,
int* csrRowPtrC,
int* csrColIndC)
cusparseStatus_t
cusparseDbsr2csr(cusparseHandle_t handle,
cusparseDirection_t dir,
int mb,
int nb,
const cusparseMatDescr_t descrA,
const double* bsrValA,
const int* bsrRowPtrA,
const int* bsrColIndA,
int blockDim,
const cusparseMatDescr_t descrC,
double* csrValC,
int* csrRowPtrC,
int* csrColIndC)
cusparseStatus_t
cusparseCbsr2csr(cusparseHandle_t handle,
cusparseDirection_t dir,
int mb,
int nb,
const cusparseMatDescr_t descrA,
const cuComplex* bsrValA,
const int* bsrRowPtrA,
const int* bsrColIndA,
int blockDim,
const cusparseMatDescr_t descrC,
cuComplex* csrValC,
int* csrRowPtrC,
int* csrColIndC)
cusparseStatus_t
cusparseZbsr2csr(cusparseHandle_t handle,
cusparseDirection_t dir,
int mb,
int nb,
const cusparseMatDescr_t descrA,
const cuDoubleComplex* bsrValA,
const int* bsrRowPtrA,
const int* bsrColIndA,
int blockDim,
const cusparseMatDescr_t descrC,
cuDoubleComplex* csrValC,
int* csrRowPtrC,
int* csrColIndC)
This function converts a sparse matrix in BSR format that is defined by the three arrays bsrValA
, bsrRowPtrA
, and bsrColIndA
) into a sparse matrix in CSR format that is defined by arrays csrValC
, csrRowPtrC
, and csrColIndC
.
Let m(=mb*blockDim)
be the number of rows of A
and n(=nb*blockDim)
be number of columns of A
, then A
and C
are m*n
sparse matrices. The BSR format of A
contains nnzb(=bsrRowPtrA[mb] - bsrRowPtrA[0])
nonzero blocks, whereas the sparse matrix A
contains nnz(=nnzb*blockDim*blockDim)
elements. The user must allocate enough space for arrays csrRowPtrC
, csrColIndC
, and csrValC
. The requirements are as follows:
csrRowPtrC
of m+1
elements
csrValC
of nnz
elements
csrColIndC
of nnz
elements
The general procedure is as follows:
// Given BSR format (bsrRowPtrA, bsrcolIndA, bsrValA) and
// blocks of BSR format are stored in column-major order.
cusparseDirection_t dir = CUSPARSE_DIRECTION_COLUMN;
int m = mb*blockDim;
int nnzb = bsrRowPtrA[mb] - bsrRowPtrA[0]; // number of blocks
int nnz = nnzb * blockDim * blockDim; // number of elements
cudaMalloc((void**)&csrRowPtrC, sizeof(int)*(m+1));
cudaMalloc((void**)&csrColIndC, sizeof(int)*nnz);
cudaMalloc((void**)&csrValC, sizeof(float)*nnz);
cusparseSbsr2csr(handle, dir, mb, nb,
descrA,
bsrValA, bsrRowPtrA, bsrColIndA,
blockDim,
descrC,
csrValC, csrRowPtrC, csrColIndC);
The routine requires no extra storage
The routine supports asynchronous execution if
blockDim != 1
or the Stream Ordered Memory Allocator is availableThe routine supports CUDA graph capture if
blockDim != 1
or the Stream Ordered Memory Allocator is available
Input
|
handle to the cuSPARSE library context. |
|
storage format of blocks, either |
|
number of block rows of sparse matrix |
|
number of block columns of sparse matrix |
|
the descriptor of matrix |
|
<type> array of |
|
integer array of |
|
integer array of |
|
block dimension of sparse matrix |
|
the descriptor of matrix |
Output
|
<type> array of |
|
integer array of |
|
integer array of |
See cusparseStatus_t for the description of the return status.
cusparse<t>gebsr2gebsc()
cusparseStatus_t
cusparseSgebsr2gebsc_bufferSize(cusparseHandle_t handle,
int mb,
int nb,
int nnzb,
const float* bsrVal,
const int* bsrRowPtr,
const int* bsrColInd,
int rowBlockDim,
int colBlockDim,
int* pBufferSize)
cusparseStatus_t
cusparseDgebsr2gebsc_bufferSize(cusparseHandle_t handle,
int mb,
int nb,
int nnzb,
const double* bsrVal,
const int* bsrRowPtr,
const int* bsrColInd,
int rowBlockDim,
int colBlockDim,
int* pBufferSize)
cusparseStatus_t
cusparseCgebsr2gebsc_bufferSize(cusparseHandle_t handle,
int mb,
int nb,
int nnzb,
const cuComplex* bsrVal,
const int* bsrRowPtr,
const int* bsrColInd,
int rowBlockDim,
int colBlockDim,
int* pBufferSize)
cusparseStatus_t
cusparseZgebsr2gebsc_bufferSize(cusparseHandle_t handle,
int mb,
int nb,
int nnzb,
const cuDoubleComplex* bsrVal,
const int* bsrRowPtr,
const int* bsrColInd,
int rowBlockDim,
int colBlockDim,
int* pBufferSize)
cusparseStatus_t
cusparseSgebsr2gebsc(cusparseHandle_t handle,
int mb,
int nb,
int nnzb,
const float* bsrVal,
const int* bsrRowPtr,
const int* bsrColInd,
int rowBlockDim,
int colBlockDim,
float* bscVal,
int* bscRowInd,
int* bscColPtr,
cusparseAction_t copyValues,
cusparseIndexBase_t baseIdx,
void* pBuffer)
cusparseStatus_t
cusparseDgebsr2gebsc(cusparseHandle_t handle,
int mb,
int nb,
int nnzb,
const double* bsrVal,
const int* bsrRowPtr,
const int* bsrColInd,
int rowBlockDim,
int colBlockDim,
double* bscVal,
int* bscRowInd,
int* bscColPtr,
cusparseAction_t copyValues,
cusparseIndexBase_t baseIdx,
void* pBuffer)
cusparseStatus_t
cusparseCgebsr2gebsc(cusparseHandle_t handle,
int mb,
int nb,
int nnzb,
const cuComplex* bsrVal,
const int* bsrRowPtr,
const int* bsrColInd,
int rowBlockDim,
int colBlockDim,
cuComplex* bscVal,
int* bscRowInd,
int* bscColPtr,
cusparseAction_t copyValues,
cusparseIndexBase_t baseIdx,
void* pBuffer)
cusparseStatus_t
cusparseZgebsr2gebsc(cusparseHandle_t handle,
int mb,
int nb,
int nnzb,
const cuDoubleComplex* bsrVal,
const int* bsrRowPtr,
const int* bsrColInd,
int rowBlockDim,
int colBlockDim,
cuDoubleComplex* bscVal,
int* bscRowInd,
int* bscColPtr,
cusparseAction_t copyValues,
cusparseIndexBase_t baseIdx,
void* pBuffer)
This function can be seen as the same as csr2csc()
when each block of size rowBlockDim*colBlockDim
is regarded as a scalar.
This sparsity pattern of the result matrix can also be seen as the transpose of the original sparse matrix, but the memory layout of a block does not change.
The user must call gebsr2gebsc_bufferSize()
to determine the size of the buffer required by gebsr2gebsc()
, allocate the buffer, and pass the buffer pointer to gebsr2gebsc()
.
The routine requires no extra storage if
pBuffer != NULL
The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available
The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available
Input
|
handle to the cuSPARSE library context. |
|
number of block rows of sparse matrix |
|
number of block columns of sparse matrix |
|
number of nonzero blocks of matrix |
|
<type> array of |
|
integer array of |
|
integer array of |
|
number of rows within a block of |
|
number of columns within a block of |
|
|
|
|
|
host pointer containing number of bytes of the buffer used in |
|
buffer allocated by the user; the size is return by |
Output
|
<type> array of |
|
integer array of |
|
integer array of |
See cusparseStatus_t for the description of the return status.
cusparse<t>gebsr2gebsr()
cusparseStatus_t
cusparseSgebsr2gebsr_bufferSize(cusparseHandle_t handle,
cusparseDirection_t dir,
int mb,
int nb,
int nnzb,
const cusparseMatDescr_t descrA,
const float* bsrValA,
const int* bsrRowPtrA,
const int* bsrColIndA,
int rowBlockDimA,
int colBlockDimA,
int rowBlockDimC,
int colBlockDimC,
int* pBufferSize)
cusparseStatus_t
cusparseDgebsr2gebsr_bufferSize(cusparseHandle_t handle,
cusparseDirection_t dir,
int mb,
int nb,
int nnzb,
const cusparseMatDescr_t descrA,
const double* bsrValA,
const int* bsrRowPtrA,
const int* bsrColIndA,
int rowBlockDimA,
int colBlockDimA,
int rowBlockDimC,
int colBlockDimC,
int* pBufferSize)
cusparseStatus_t
cusparseCgebsr2gebsr_bufferSize(cusparseHandle_t handle,
cusparseDirection_t dir,
int mb,
int nb,
int nnzb,
const cusparseMatDescr_t descrA,
const cuComplex* bsrValA,
const int* bsrRowPtrA,
const int* bsrColIndA,
int rowBlockDimA,
int colBlockDimA,
int rowBlockDimC,
int colBlockDimC,
int* pBufferSize)
cusparseStatus_t
cusparseZgebsr2gebsr_bufferSize(cusparseHandle_t handle,
cusparseDirection_t dir,
int mb,
int nb,
int nnzb,
const cusparseMatDescr_t descrA,
const cuDoubleComplex* bsrValA,
const int* bsrRowPtrA,
const int* bsrColIndA,
int rowBlockDimA,
int colBlockDimA,
int rowBlockDimC,
int colBlockDimC,
int* pBufferSize)
cusparseStatus_t
cusparseXgebsr2gebsrNnz(cusparseHandle_t handle,
cusparseDirection_t dir,
int mb,
int nb,
int nnzb,
const cusparseMatDescr_t descrA,
const int* bsrRowPtrA,
const int* bsrColIndA,
int rowBlockDimA,
int colBlockDimA,
const cusparseMatDescr_t descrC,
int* bsrRowPtrC,
int rowBlockDimC,
int colBlockDimC,
int* nnzTotalDevHostPtr,
void* pBuffer)
cusparseStatus_t
cusparseSgebsr2gebsr(cusparseHandle_t handle,
cusparseDirection_t dir,
int mb,
int nb,
int nnzb,
const cusparseMatDescr_t descrA,
const float* bsrValA,
const int* bsrRowPtrA,
const int* bsrColIndA,
int rowBlockDimA,
int colBlockDimA,
const cusparseMatDescr_t descrC,
float* bsrValC,
int* bsrRowPtrC,
int* bsrColIndC,
int rowBlockDimC,
int colBlockDimC,
void* pBuffer)
cusparseStatus_t
cusparseDgebsr2gebsr(cusparseHandle_t handle,
cusparseDirection_t dir,
int mb,
int nb,
int nnzb,
const cusparseMatDescr_t descrA,
const double* bsrValA,
const int* bsrRowPtrA,
const int* bsrColIndA,
int rowBlockDimA,
int colBlockDimA,
const cusparseMatDescr_t descrC,
double* bsrValC,
int* bsrRowPtrC,
int* bsrColIndC,
int rowBlockDimC,
int colBlockDimC,
void* pBuffer)
cusparseStatus_t
cusparseCgebsr2gebsr(cusparseHandle_t handle,
cusparseDirection_t dir,
int mb,
int nb,
int nnzb,
const cusparseMatDescr_t descrA,
const cuComplex* bsrValA,
const int* bsrRowPtrA,
const int* bsrColIndA,
int rowBlockDimA,
int colBlockDimA,
const cusparseMatDescr_t descrC,
cuComplex* bsrValC,
int* bsrRowPtrC,
int* bsrColIndC,
int rowBlockDimC,
int colBlockDimC,
void* pBuffer)
cusparseStatus_t
cusparseZgebsr2gebsr(cusparseHandle_t handle,
cusparseDirection_t dir,
int mb,
int nb,
int nnzb,
const cusparseMatDescr_t descrA,
const cuDoubleComplex* bsrValA,
const int* bsrRowPtrA,
const int* bsrColIndA,
int rowBlockDimA,
int colBlockDimA,
const cusparseMatDescr_t descrC,
cuDoubleComplex* bsrValC,
int* bsrRowPtrC,
int* bsrColIndC,
int rowBlockDimC,
int colBlockDimC,
void* pBuffer)
This function converts a sparse matrix in general BSR format that is defined by the three arrays bsrValA
, bsrRowPtrA
, and bsrColIndA
into a sparse matrix in another general BSR format that is defined by arrays bsrValC
, bsrRowPtrC
, and bsrColIndC
.
If rowBlockDimA=1
and colBlockDimA=1
, cusparse[S|D|C|Z]gebsr2gebsr()
is the same as cusparse[S|D|C|Z]csr2gebsr()
.
If rowBlockDimC=1
and colBlockDimC=1
, cusparse[S|D|C|Z]gebsr2gebsr()
is the same as cusparse[S|D|C|Z]gebsr2csr()
.
A
is an m*n
sparse matrix where m(=mb*rowBlockDim)
is the number of rows of A
, and n(=nb*colBlockDim)
is the number of columns of A
. The general BSR format of A
contains nnzb(=bsrRowPtrA[mb] - bsrRowPtrA[0])
nonzero blocks. The matrix C
is also general BSR format with a different block size, rowBlockDimC*colBlockDimC
. If m
is not a multiple of rowBlockDimC
, or n
is not a multiple of colBlockDimC
, zeros are filled in. The number of block rows of C
is mc(=(m+rowBlockDimC-1)/rowBlockDimC)
. The number of block rows of C
is nc(=(n+colBlockDimC-1)/colBlockDimC)
. The number of nonzero blocks of C
is nnzc
.
The implementation adopts a two-step approach to do the conversion. First, the user allocates bsrRowPtrC
of mc+1
elements and uses function cusparseXgebsr2gebsrNnz()
to determine the number of nonzero block columns per block row of matrix C
. Second, the user gathers nnzc
(number of non-zero block columns of matrix C
) from either (nnzc=*nnzTotalDevHostPtr)
or (nnzc=bsrRowPtrC[mc]-bsrRowPtrC[0])
and allocates bsrValC
of nnzc*rowBlockDimC*colBlockDimC
elements and bsrColIndC
of nnzc
integers. Finally the function cusparse[S|D|C|Z]gebsr2gebsr()
is called to complete the conversion.
The user must call gebsr2gebsr_bufferSize()
to know the size of the buffer required by gebsr2gebsr()
, allocate the buffer, and pass the buffer pointer to gebsr2gebsr()
.
The general procedure is as follows:
// Given general BSR format (bsrRowPtrA, bsrColIndA, bsrValA) and
// blocks of BSR format are stored in column-major order.
cusparseDirection_t dir = CUSPARSE_DIRECTION_COLUMN;
int base, nnzc;
int m = mb*rowBlockDimA;
int n = nb*colBlockDimA;
int mc = (m+rowBlockDimC-1)/rowBlockDimC;
int nc = (n+colBlockDimC-1)/colBlockDimC;
int bufferSize;
void *pBuffer;
cusparseSgebsr2gebsr_bufferSize(handle, dir, mb, nb, nnzb,
descrA, bsrValA, bsrRowPtrA, bsrColIndA,
rowBlockDimA, colBlockDimA,
rowBlockDimC, colBlockDimC,
&bufferSize);
cudaMalloc((void**)&pBuffer, bufferSize);
cudaMalloc((void**)&bsrRowPtrC, sizeof(int)*(mc+1));
// nnzTotalDevHostPtr points to host memory
int *nnzTotalDevHostPtr = &nnzc;
cusparseXgebsr2gebsrNnz(handle, dir, mb, nb, nnzb,
descrA, bsrRowPtrA, bsrColIndA,
rowBlockDimA, colBlockDimA,
descrC, bsrRowPtrC,
rowBlockDimC, colBlockDimC,
nnzTotalDevHostPtr,
pBuffer);
if (NULL != nnzTotalDevHostPtr){
nnzc = *nnzTotalDevHostPtr;
}else{
cudaMemcpy(&nnzc, bsrRowPtrC+mc, sizeof(int), cudaMemcpyDeviceToHost);
cudaMemcpy(&base, bsrRowPtrC, sizeof(int), cudaMemcpyDeviceToHost);
nnzc -= base;
}
cudaMalloc((void**)&bsrColIndC, sizeof(int)*nnzc);
cudaMalloc((void**)&bsrValC, sizeof(float)*(rowBlockDimC*colBlockDimC)*nnzc);
cusparseSgebsr2gebsr(handle, dir, mb, nb, nnzb,
descrA, bsrValA, bsrRowPtrA, bsrColIndA,
rowBlockDimA, colBlockDimA,
descrC, bsrValC, bsrRowPtrC, bsrColIndC,
rowBlockDimC, colBlockDimC,
pBuffer);
The routines require no extra storage if
pBuffer != NULL
The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available
The routines do not support CUDA graph capture
Input
|
handle to the cuSPARSE library context. |
|
storage format of blocks, either |
|
number of block rows of sparse matrix |
|
number of block columns of sparse matrix |
|
number of nonzero blocks of matrix |
|
the descriptor of matrix |
|
<type> array of |
|
integer array of |
|
integer array of |
|
number of rows within a block of |
|
number of columns within a block of |
|
the descriptor of matrix |
|
number of rows within a block of |
|
number of columns within a block of |
|
host pointer containing number of bytes of the buffer used in |
|
buffer allocated by the user; the size is return by |
Output
|
<type> array of |
|
integer array of |
|
integer array of |
|
total number of nonzero blocks of |
See cusparseStatus_t for the description of the return status.
cusparse<t>gebsr2csr()
cusparseStatus_t
cusparseSgebsr2csr(cusparseHandle_t handle,
cusparseDirection_t dir,
int mb,
int nb,
const cusparseMatDescr_t descrA,
const float* bsrValA,
const int* bsrRowPtrA,
const int* bsrColIndA,
int rowBlockDim,
int colBlockDim,
const cusparseMatDescr_t descrC,
float* csrValC,
int* csrRowPtrC,
int* csrColIndC)
cusparseStatus_t
cusparseDgebsr2csr(cusparseHandle_t handle,
cusparseDirection_t dir,
int mb,
int nb,
const cusparseMatDescr_t descrA,
const double* bsrValA,
const int* bsrRowPtrA,
const int* bsrColIndA,
int rowBlockDim,
int colBlockDim,
const cusparseMatDescr_t descrC,
double* csrValC,
int* csrRowPtrC,
int* csrColIndC)
cusparseStatus_t
cusparseCgebsr2csr(cusparseHandle_t handle,
cusparseDirection_t dir,
int mb,
int nb,
const cusparseMatDescr_t descrA,
const cuComplex* bsrValA,
const int* bsrRowPtrA,
const int* bsrColIndA,
int rowBlockDim,
int colBlockDim,
const cusparseMatDescr_t descrC,
cuComplex* csrValC,
int* csrRowPtrC,
int* csrColIndC)
cusparseStatus_t
cusparseZgebsr2csr(cusparseHandle_t handle,
cusparseDirection_t dir,
int mb,
int nb,
const cusparseMatDescr_t descrA,
const cuDoubleComplex* bsrValA,
const int* bsrRowPtrA,
const int* bsrColIndA,
int rowBlockDim,
int colBlockDim,
const cusparseMatDescr_t descrC,
cuDoubleComplex* csrValC,
int* csrRowPtrC,
int* csrColIndC)
This function converts a sparse matrix in general BSR format that is defined by the three arrays bsrValA
, bsrRowPtrA
, and bsrColIndA
into a sparse matrix in CSR format that is defined by arrays csrValC
, csrRowPtrC
, and csrColIndC
.
Let m(=mb*rowBlockDim)
be number of rows of A
and n(=nb*colBlockDim)
be number of columns of A
, then A
and C
are m*n
sparse matrices. The general BSR format of A
contains nnzb(=bsrRowPtrA[mb] - bsrRowPtrA[0])
non-zero blocks, whereas sparse matrix A
contains nnz(=nnzb*rowBlockDim*colBlockDim)
elements. The user must allocate enough space for arrays csrRowPtrC
, csrColIndC
, and csrValC
. The requirements are as follows:
csrRowPtrC
of m+1
elements
csrValC
of nnz
elements
csrColIndC
of nnz
elements
The general procedure is as follows:
// Given general BSR format (bsrRowPtrA, bsrColIndA, bsrValA) and
// blocks of BSR format are stored in column-major order.
cusparseDirection_t dir = CUSPARSE_DIRECTION_COLUMN;
int m = mb*rowBlockDim;
int n = nb*colBlockDim;
int nnzb = bsrRowPtrA[mb] - bsrRowPtrA[0]; // number of blocks
int nnz = nnzb * rowBlockDim * colBlockDim; // number of elements
cudaMalloc((void**)&csrRowPtrC, sizeof(int)*(m+1));
cudaMalloc((void**)&csrColIndC, sizeof(int)*nnz);
cudaMalloc((void**)&csrValC, sizeof(float)*nnz);
cusparseSgebsr2csr(handle, dir, mb, nb,
descrA,
bsrValA, bsrRowPtrA, bsrColIndA,
rowBlockDim, colBlockDim,
descrC,
csrValC, csrRowPtrC, csrColIndC);
The routine requires no extra storage
The routine supports asynchronous execution
The routine supports CUDA graph capture
Input
|
handle to the cuSPARSE library context. |
|
storage format of blocks, either |
|
number of block rows of sparse matrix |
|
number of block columns of sparse matrix |
|
the descriptor of matrix |
|
<type> array of |
|
integer array of |
|
integer array of |
|
number of rows within a block of |
|
number of columns within a block of |
|
the descriptor of matrix |
Output
|
<type> array of |
|
integer array of |
|
integer array of |
See cusparseStatus_t for the description of the return status.
cusparse<t>csr2gebsr()
cusparseStatus_t
cusparseScsr2gebsr_bufferSize(cusparseHandle_t handle,
cusparseDirection_t dir,
int m,
int n,
const cusparseMatDescr_t descrA,
const float* csrValA,
const int* csrRowPtrA,
const int* csrColIndA,
int rowBlockDim,
int colBlockDim,
int* pBufferSize)
cusparseStatus_t
cusparseDcsr2gebsr_bufferSize(cusparseHandle_t handle,
cusparseDirection_t dir,
int m,
int n,
const cusparseMatDescr_t descrA,
const double* csrValA,
const int* csrRowPtrA,
const int* csrColIndA,
int rowBlockDim,
int colBlockDim,
int* pBufferSize)
cusparseStatus_t
cusparseCcsr2gebsr_bufferSize(cusparseHandle_t handle,
cusparseDirection_t dir,
int m,
int n,
const cusparseMatDescr_t descrA,
const cuComplex* csrValA,
const int* csrRowPtrA,
const int* csrColIndA,
int rowBlockDim,
int colBlockDim,
int* pBufferSize)
cusparseStatus_t
cusparseZcsr2gebsr_bufferSize(cusparseHandle_t handle,
cusparseDirection_t dir,
int m,
int n,
const cusparseMatDescr_t descrA,
const cuDoubleComplex* csrValA,
const int* csrRowPtrA,
const int* csrColIndA,
int rowBlockDim,
int colBlockDim,
int* pBufferSize)
cusparseStatus_t
cusparseXcsr2gebsrNnz(cusparseHandle_t handle,
cusparseDirection_t dir,
int m,
int n,
const cusparseMatDescr_t descrA,
const int* csrRowPtrA,
const int* csrColIndA,
const cusparseMatDescr_t descrC,
int* bsrRowPtrC,
int rowBlockDim,
int colBlockDim,
int* nnzTotalDevHostPtr,
void* pBuffer)
cusparseStatus_t
cusparseScsr2gebsr(cusparseHandle_t handle,
cusparseDirection_t dir,
int m,
int n,
const cusparseMatDescr_t descrA,
const float* csrValA,
const int* csrRowPtrA,
const int* csrColIndA,
const cusparseMatDescr_t descrC,
float* bsrValC,
int* bsrRowPtrC,
int* bsrColIndC,
int rowBlockDim,
int colBlockDim,
void* pBuffer)
cusparseStatus_t
cusparseDcsr2gebsr(cusparseHandle_t handle,
cusparseDirection_t dir,
int m,
int n,
const cusparseMatDescr_t descrA,
const double* csrValA,
const int* csrRowPtrA,
const int* csrColIndA,
const cusparseMatDescr_t descrC,
double* bsrValC,
int* bsrRowPtrC,
int* bsrColIndC,
int rowBlockDim,
int colBlockDim,
void* pBuffer)
cusparseStatus_t
cusparseCcsr2gebsr(cusparseHandle_t handle,
cusparseDirection_t dir,
int m,
int n,
const cusparseMatDescr_t descrA,
const cuComplex* csrValA,
const int* csrRowPtrA,
const int* csrColIndA,
const cusparseMatDescr_t descrC,
cuComplex* bsrValC,
int* bsrRowPtrC,
int* bsrColIndC,
int rowBlockDim,
int colBlockDim,
void* pBuffer)
cusparseStatus_t
cusparseZcsr2gebsr(cusparseHandle_t handle,
cusparseDirection_t dir,
int m,
int n,
const cusparseMatDescr_t descrA,
const cuDoubleComplex* csrValA,
const int* csrRowPtrA,
const int* csrColIndA,
const cusparseMatDescr_t descrC,
cuDoubleComplex* bsrValC,
int* bsrRowPtrC,
int* bsrColIndC,
int rowBlockDim,
int colBlockDim,
void* pBuffer)
This function converts a sparse matrix A
in CSR format (that is defined by arrays csrValA
, csrRowPtrA
, and csrColIndA
) into a sparse matrix C
in general BSR format (that is defined by the three arrays bsrValC
, bsrRowPtrC
, and bsrColIndC
).
The matrix A is a m*n
sparse matrix and matrix C
is a (mb*rowBlockDim)*(nb*colBlockDim)
sparse matrix, where mb(=(m+rowBlockDim-1)/rowBlockDim)
is the number of block rows of C
, and nb(=(n+colBlockDim-1)/colBlockDim)
is the number of block columns of C
.
The block of C
is of size rowBlockDim*colBlockDim
. If m
is not multiple of rowBlockDim
or n
is not multiple of colBlockDim
, zeros are filled in.
The implementation adopts a two-step approach to do the conversion. First, the user allocates bsrRowPtrC
of mb+1
elements and uses function cusparseXcsr2gebsrNnz()
to determine the number of nonzero block columns per block row. Second, the user gathers nnzb
(number of nonzero block columns of matrix C
) from either (nnzb=*nnzTotalDevHostPtr)
or (nnzb=bsrRowPtrC[mb]-bsrRowPtrC[0])
and allocates bsrValC
of nnzb*rowBlockDim*colBlockDim
elements and bsrColIndC
of nnzb
integers. Finally function cusparse[S|D|C|Z]csr2gebsr()
is called to complete the conversion.
The user must obtain the size of the buffer required by csr2gebsr()
by calling csr2gebsr_bufferSize()
, allocate the buffer, and pass the buffer pointer to csr2gebsr()
.
The general procedure is as follows:
// Given CSR format (csrRowPtrA, csrColIndA, csrValA) and
// blocks of BSR format are stored in column-major order.
cusparseDirection_t dir = CUSPARSE_DIRECTION_COLUMN;
int base, nnzb;
int mb = (m + rowBlockDim-1)/rowBlockDim;
int nb = (n + colBlockDim-1)/colBlockDim;
int bufferSize;
void *pBuffer;
cusparseScsr2gebsr_bufferSize(handle, dir, m, n,
descrA, csrValA, csrRowPtrA, csrColIndA,
rowBlockDim, colBlockDim,
&bufferSize);
cudaMalloc((void**)&pBuffer, bufferSize);
cudaMalloc((void**)&bsrRowPtrC, sizeof(int) *(mb+1));
// nnzTotalDevHostPtr points to host memory
int *nnzTotalDevHostPtr = &nnzb;
cusparseXcsr2gebsrNnz(handle, dir, m, n,
descrA, csrRowPtrA, csrColIndA,
descrC, bsrRowPtrC, rowBlockDim, colBlockDim,
nnzTotalDevHostPtr,
pBuffer);
if (NULL != nnzTotalDevHostPtr){
nnzb = *nnzTotalDevHostPtr;
}else{
cudaMemcpy(&nnzb, bsrRowPtrC+mb, sizeof(int), cudaMemcpyDeviceToHost);
cudaMemcpy(&base, bsrRowPtrC, sizeof(int), cudaMemcpyDeviceToHost);
nnzb -= base;
}
cudaMalloc((void**)&bsrColIndC, sizeof(int)*nnzb);
cudaMalloc((void**)&bsrValC, sizeof(float)*(rowBlockDim*colBlockDim)*nnzb);
cusparseScsr2gebsr(handle, dir, m, n,
descrA,
csrValA, csrRowPtrA, csrColIndA,
descrC,
bsrValC, bsrRowPtrC, bsrColIndC,
rowBlockDim, colBlockDim,
pBuffer);
The routine cusparseXcsr2gebsrNnz()
has the following properties:
The routine requires no extra storage
The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available
The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available
The routine cusparse<t>csr2gebsr()
has the following properties:
The routine requires no extra storage if
pBuffer != NULL
The routine supports asynchronous execution
The routine supports CUDA graph capture
Input
|
handle to the cuSPARSE library context. |
|
storage format of blocks, either |
|
number of rows of sparse matrix |
|
number of columns of sparse matrix |
|
the descriptor of matrix |
|
<type> array of |
|
integer array of |
|
integer array of |
|
the descriptor of matrix |
|
number of rows within a block of |
|
number of columns within a block of |
|
buffer allocated by the user, the size is return by |
Output
|
<type> array of |
|
integer array of |
|
integer array of |
|
total number of nonzero blocks of matrix |
See cusparseStatus_t for the description of the return status.
cusparse<t>coo2csr()
cusparseStatus_t
cusparseXcoo2csr(cusparseHandle_t handle,
const int* cooRowInd,
int nnz,
int m,
int* csrRowPtr,
cusparseIndexBase_t idxBase)
This function converts the array containing the uncompressed row indices (corresponding to COO format) into an array of compressed row pointers (corresponding to CSR format).
It can also be used to convert the array containing the uncompressed column indices (corresponding to COO format) into an array of column pointers (corresponding to CSC format).
The routine requires no extra storage
The routine supports asynchronous execution
The routine supports CUDA graph capture
Input
|
handle to the cuSPARSE library context. |
|
integer array of |
|
number of non-zeros of the sparse matrix (that is also the length of array |
|
number of rows of matrix |
|
|
Output
|
integer array of |
See cusparseStatus_t for the description of the return status.
cusparse<t>csr2coo()
cusparseStatus_t
cusparseXcsr2coo(cusparseHandle_t handle,
const int* csrRowPtr,
int nnz,
int m,
int* cooRowInd,
cusparseIndexBase_t idxBase)
This function converts the array containing the compressed row pointers (corresponding to CSR format) into an array of uncompressed row indices (corresponding to COO format).
It can also be used to convert the array containing the compressed column indices (corresponding to CSC format) into an array of uncompressed column indices (corresponding to COO format).
The routine requires no extra storage
The routine supports asynchronous execution
The routine supports CUDA graph capture
Input
|
handle to the cuSPARSE library context. |
|
integer array of |
|
number of nonzeros of the sparse matrix (that is also the length of array |
|
number of rows of matrix |
|
|
Output
|
integer array of |
See cusparseStatus_t for the description of the return status.
cusparseCsr2cscEx2()
cusparseStatus_t
cusparseCsr2cscEx2_bufferSize(cusparseHandle_t handle,
int m,
int n,
int nnz,
const void* csrVal,
const int* csrRowPtr,
const int* csrColInd,
void* cscVal,
int* cscColPtr,
int* cscRowInd,
cudaDataType valType,
cusparseAction_t copyValues,
cusparseIndexBase_t idxBase,
cusparseCsr2CscAlg_t alg,
size_t* bufferSize)
cusparseStatus_t
cusparseCsr2cscEx2(cusparseHandle_t handle,
int m,
int n,
int nnz,
const void* csrVal,
const int* csrRowPtr,
const int* csrColInd,
void* cscVal,
int* cscColPtr,
int* cscRowInd,
cudaDataType valType,
cusparseAction_t copyValues,
cusparseIndexBase_t idxBase,
cusparseCsr2CscAlg_t alg,
void* buffer)
This function converts a sparse matrix in CSR format (that is defined by the three arrays csrVal
, csrRowPtr
, and csrColInd
) into a sparse matrix in CSC format (that is defined by arrays cscVal
, cscRowInd
, and cscColPtr
). The resulting matrix can also be seen as the transpose of the original sparse matrix. Notice that this routine can also be used to convert a matrix in CSC format into a matrix in CSR format.
The routine requires extra storage proportional to the number of nonzero values nnz
. It provides in output always the same matrix.
It is executed asynchronously with respect to the host, and it may return control to the application on the host before the result is ready.
The function cusparseCsr2cscEx2_bufferSize()
returns the size of the workspace needed by cusparseCsr2cscEx2()
. User needs to allocate a buffer of this size and give that buffer to cusparseCsr2cscEx2()
as an argument.
If nnz == 0
, then csrColInd
, csrVal
, cscVal
, and cscRowInd
could have NULL
value. In this case, cscColPtr
is set to idxBase
for all values.
If m == 0
or n == 0
, the pointers are not checked and the routine returns CUSPARSE_STATUS_SUCCESS
.
Input
|
Handle to the cuSPARSE library context |
|
Number of rows of the CSR input matrix; number of columns of the CSC ouput matrix |
|
Number of columns of the CSR input matrix; number of rows of the CSC ouput matrix |
|
Number of nonzero elements of the CSR and CSC matrices |
|
Value array of size |
|
Integer array of size |
|
Integer array of size |
|
Value array of size |
|
Integer array of size |
|
Integer array of size |
|
Value type for both CSR and CSC matrices |
|
|
|
Index base |
|
Algorithm implementation. see |
|
Number of bytes of workspace needed by |
|
Pointer to workspace buffer |
cusparseCsr2cscEx2()
supports the following data types:
|
---|
|
|
|
|
|
|
|
|
|
cusparseCsr2cscEx2()
supports the following algorithms (cusparseCsr2CscAlg_t
):
Algorithm |
Notes |
---|---|
|
Default algorithm |
Action |
Notes |
---|---|
|
Compute the “structure” of the CSC output matrix (offset, row indices) |
|
Compute the “structure” of the CSC output matrix and copy the values |
cusparseCsr2cscEx2()
has the following properties:
The routine requires no extra storage
The routine supports asynchronous execution
cusparseCsr2cscEx2()
supports the following optimizations:
CUDA graph capture
Hardware Memory Compression
See cusparseStatus_t for the description of the return status.
cusparse<t>nnz()
cusparseStatus_t
cusparseSnnz(cusparseHandle_t handle,
cusparseDirection_t dirA,
int m,
int n,
const cusparseMatDescr_t descrA,
const float* A,
int lda,
int* nnzPerRowColumn,
int* nnzTotalDevHostPtr)
cusparseStatus_t
cusparseDnnz(cusparseHandle_t handle,
cusparseDirection_t dirA,
int m,
int n,
const cusparseMatDescr_t descrA,
const double* A,
int lda,
int* nnzPerRowColumn,
int* nnzTotalDevHostPtr)
cusparseStatus_t
cusparseCnnz(cusparseHandle_t handle,
cusparseDirection_t dirA,
int m,
int n,
const cusparseMatDescr_t descrA,
const cuComplex* A,
int lda,
int* nnzPerRowColumn,
int* nnzTotalDevHostPtr)
cusparseStatus_t
cusparseZnnz(cusparseHandle_t handle,
cusparseDirection_t dirA,
int m,
int n,
const cusparseMatDescr_t descrA,
const cuDoubleComplex* A,
int lda,
int* nnzPerRowColumn,
int* nnzTotalDevHostPtr)
This function computes the number of nonzero elements per row or column and the total number of nonzero elements in a dense matrix.
This function requires temporary extra storage that is allocated internally
The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available
The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available
Input
|
handle to the cuSPARSE library context. |
|
direction that specifies whether to count nonzero elements by |
|
number of rows of matrix |
|
number of columns of matrix |
|
the descriptor of matrix |
|
array of dimensions |
|
leading dimension of dense array |
Output
|
array of size |
|
total number of nonzero elements in device or host memory. |
See cusparseStatus_t for the description of the return status.
cusparseCreateIdentityPermutation() [DEPRECATED]
> The routine will be removed in the next major release
cusparseStatus_t
cusparseCreateIdentityPermutation(cusparseHandle_t handle,
int n,
int* p);
This function creates an identity map. The output parameter p
represents such map by p = 0:1:(n-1)
.
This function is typically used with coosort
, csrsort
, cscsort
.
The routine requires no extra storage
The routine supports asynchronous execution
The routine supports CUDA graph capture
Input
|
|
|
|
|
handle to the cuSPARSE library context. |
|
|
size of the map. |
Output
|
|
|
|
|
integer array of dimensions |
See cusparseStatus_t for the description of the return status.
cusparseXcoosort()
cusparseStatus_t
cusparseXcoosort_bufferSizeExt(cusparseHandle_t handle,
int m,
int n,
int nnz,
const int* cooRows,
const int* cooCols,
size_t* pBufferSizeInBytes)
cusparseStatus_t
cusparseXcoosortByRow(cusparseHandle_t handle,
int m,
int n,
int nnz,
int* cooRows,
int* cooCols,
int* P,
void* pBuffer)
cusparseStatus_t
cusparseXcoosortByColumn(cusparseHandle_t handle,
int m,
int n,
int nnz,
int* cooRows,
int* cooCols,
int* P,
void* pBuffer);
This function sorts COO format. The sorting is in-place. Also the user can sort by row or sort by column.
A
is an m×n
sparse matrix that is defined in COO storage format by the three arrays cooVals
, cooRows
, and cooCols
.
There is no assumption for the base index of the matrix. coosort
uses stable sort on signed integer, so the value of cooRows
or cooCols
can be negative.
This function coosort()
requires buffer size returned by coosort_bufferSizeExt()
. The address of pBuffer
must be multiple of 128 bytes. If not, CUSPARSE_STATUS_INVALID_VALUE
is returned.
The parameter P
is both input and output. If the user wants to compute sorted cooVal
, P
must be set as 0:1:(nnz-1) before coosort()
, and after coosort()
, new sorted value array satisfies cooVal_sorted = cooVal(P)
.
Remark: the dimension m
and n
are not used. If the user does not know the value of m
or n
, just passes a value positive. This usually happens if the user only reads a COO array first and needs to decide the dimension m
or n
later.
The routine requires no extra storage if
pBuffer != NULL
The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available
The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available
Input
|
|
|
|
|
handle to the cuSPARSE library context. |
|
|
number of rows of matrix |
|
|
number of columns of matrix |
|
|
number of nonzero elements of matrix |
|
|
integer array of |
|
|
integer array of |
|
|
integer array of |
|
|
buffer allocated by the user; the size is returned by |
Output
|
|
|
|
|
integer array of |
|
|
integer array of |
|
|
integer array of |
|
|
number of bytes of the buffer. |
See cusparseStatus_t for the description of the return status
Please visit cuSPARSE Library Samples - cusparseXcoosortByRow for a code example.
cusparseXcsrsort()
cusparseStatus_t
cusparseXcsrsort_bufferSizeExt(cusparseHandle_t handle,
int m,
int n,
int nnz,
const int* csrRowPtr,
const int* csrColInd,
size_t* pBufferSizeInBytes)
cusparseStatus_t
cusparseXcsrsort(cusparseHandle_t handle,
int m,
int n,
int nnz,
const cusparseMatDescr_t descrA,
const int* csrRowPtr,
int* csrColInd,
int* P,
void* pBuffer)
This function sorts CSR format. The stable sorting is in-place.
The matrix type is regarded as CUSPARSE_MATRIX_TYPE_GENERAL
implicitly. In other words, any symmetric property is ignored.
This function csrsort()
requires buffer size returned by csrsort_bufferSizeExt()
. The address of pBuffer
must be multiple of 128 bytes. If not, CUSPARSE_STATUS_INVALID_VALUE
is returned.
The parameter P
is both input and output. If the user wants to compute sorted csrVal
, P
must be set as 0:1:(nnz-1) before csrsort()
, and after csrsort()
, new sorted value array satisfies csrVal_sorted = csrVal(P)
.
The general procedure is as follows:
// A is a 3x3 sparse matrix, base-0
// | 1 2 3 |
// A = | 4 5 6 |
// | 7 8 9 |
const int m = 3;
const int n = 3;
const int nnz = 9;
csrRowPtr[m+1] = { 0, 3, 6, 9}; // on device
csrColInd[nnz] = { 2, 1, 0, 0, 2,1, 1, 2, 0}; // on device
csrVal[nnz] = { 3, 2, 1, 4, 6, 5, 8, 9, 7}; // on device
size_t pBufferSizeInBytes = 0;
void *pBuffer = NULL;
int *P = NULL;
// step 1: allocate buffer
cusparseXcsrsort_bufferSizeExt(handle, m, n, nnz, csrRowPtr, csrColInd, &pBufferSizeInBytes);
cudaMalloc( &pBuffer, sizeof(char)* pBufferSizeInBytes);
// step 2: setup permutation vector P to identity
cudaMalloc( (void**)&P, sizeof(int)*nnz);
cusparseCreateIdentityPermutation(handle, nnz, P);
// step 3: sort CSR format
cusparseXcsrsort(handle, m, n, nnz, descrA, csrRowPtr, csrColInd, P, pBuffer);
// step 4: gather sorted csrVal
cusparseDgthr(handle, nnz, csrVal, csrVal_sorted, P, CUSPARSE_INDEX_BASE_ZERO);
The routine requires no extra storage if
pBuffer != NULL
The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available
The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available
Input
|
|
|
|
|
handle to the cuSPARSE library context. |
|
|
number of rows of matrix |
|
|
number of columns of matrix |
|
|
number of nonzero elements of matrix |
|
|
integer array of |
|
|
integer array of |
|
|
integer array of |
|
|
buffer allocated by the user; the size is returned by |
Output
|
|
|
|
|
integer array of |
|
|
integer array of |
|
|
number of bytes of the buffer. |
See cusparseStatus_t for the description of the return status.
cusparseXcscsort()
cusparseStatus_t
cusparseXcscsort_bufferSizeExt(cusparseHandle_t handle,
int m,
int n,
int nnz,
const int* cscColPtr,
const int* cscRowInd,
size_t* pBufferSizeInBytes)
cusparseStatus_t
cusparseXcscsort(cusparseHandle_t handle,
int m,
int n,
int nnz,
const cusparseMatDescr_t descrA,
const int* cscColPtr,
int* cscRowInd,
int* P,
void* pBuffer)
This function sorts CSC format. The stable sorting is in-place.
The matrix type is regarded as CUSPARSE_MATRIX_TYPE_GENERAL
implicitly. In other words, any symmetric property is ignored.
This function cscsort()
requires buffer size returned by cscsort_bufferSizeExt()
. The address of pBuffer
must be multiple of 128 bytes. If not, CUSPARSE_STATUS_INVALID_VALUE
is returned.
The parameter P
is both input and output. If the user wants to compute sorted cscVal
, P
must be set as 0:1:(nnz-1) before cscsort()
, and after cscsort()
, new sorted value array satisfies cscVal_sorted = cscVal(P)
.
The general procedure is as follows:
// A is a 3x3 sparse matrix, base-0
// | 1 2 |
// A = | 4 0 |
// | 0 8 |
const int m = 3;
const int n = 2;
const int nnz = 4;
cscColPtr[n+1] = { 0, 2, 4}; // on device
cscRowInd[nnz] = { 1, 0, 2, 0}; // on device
cscVal[nnz] = { 4.0, 1.0, 8.0, 2.0 }; // on device
size_t pBufferSizeInBytes = 0;
void *pBuffer = NULL;
int *P = NULL;
// step 1: allocate buffer
cusparseXcscsort_bufferSizeExt(handle, m, n, nnz, cscColPtr, cscRowInd, &pBufferSizeInBytes);
cudaMalloc( &pBuffer, sizeof(char)* pBufferSizeInBytes);
// step 2: setup permutation vector P to identity
cudaMalloc( (void**)&P, sizeof(int)*nnz);
cusparseCreateIdentityPermutation(handle, nnz, P);
// step 3: sort CSC format
cusparseXcscsort(handle, m, n, nnz, descrA, cscColPtr, cscRowInd, P, pBuffer);
// step 4: gather sorted cscVal
cusparseDgthr(handle, nnz, cscVal, cscVal_sorted, P, CUSPARSE_INDEX_BASE_ZERO);
The routine requires no extra storage if
pBuffer != NULL
The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available
The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available
Input
|
|
|
|
|
handle to the cuSPARSE library context. |
|
|
number of rows of matrix |
|
|
number of columns of matrix |
|
|
number of nonzero elements of matrix |
|
|
integer array of |
|
|
integer array of |
|
|
integer array of |
|
|
buffer allocated by the user; the size is returned by |
Output
|
|
|
|
|
integer array of |
|
|
integer array of |
|
|
number of bytes of the buffer. |
See cusparseStatus_t for the description of the return status.
cusparseXcsru2csr() [DEPRECATED]
> The routine will be removed in the next major release
cusparseStatus_t
cusparseCreateCsru2csrInfo(csru2csrInfo_t *info);
cusparseStatus_t
cusparseDestroyCsru2csrInfo(csru2csrInfo_t info);
cusparseStatus_t
cusparseScsru2csr_bufferSizeExt(cusparseHandle_t handle,
int m,
int n,
int nnz,
float* csrVal,
const int* csrRowPtr,
int* csrColInd,
csru2csrInfo_t info,
size_t* pBufferSizeInBytes)
cusparseStatus_t
cusparseDcsru2csr_bufferSizeExt(cusparseHandle_t handle,
int m,
int n,
int nnz,
double* csrVal,
const int* csrRowPtr,
int* csrColInd,
csru2csrInfo_t info,
size_t* pBufferSizeInBytes)
cusparseStatus_t
cusparseCcsru2csr_bufferSizeExt(cusparseHandle_t handle,
int m,
int n,
int nnz,
cuComplex* csrVal,
const int* csrRowPtr,
int* csrColInd,
csru2csrInfo_t info,
size_t* pBufferSizeInBytes)
cusparseStatus_t
cusparseZcsru2csr_bufferSizeExt(cusparseHandle_t handle,
int m,
int n,
int nnz,
cuDoubleComplex* csrVal,
const int* csrRowPtr,
int* csrColInd,
csru2csrInfo_t info,
size_t* pBufferSizeInBytes)
cusparseStatus_t
cusparseScsru2csr(cusparseHandle_t handle,
int m,
int n,
int nnz,
const cusparseMatDescr_t descrA,
float* csrVal,
const int* csrRowPtr,
int* csrColInd,
csru2csrInfo_t info,
void* pBuffer)
cusparseStatus_t
cusparseDcsru2csr(cusparseHandle_t handle,
int m,
int n,
int nnz,
const cusparseMatDescr_t descrA,
double* csrVal,
const int* csrRowPtr,
int* csrColInd,
csru2csrInfo_t info,
void* pBuffer)
cusparseStatus_t
cusparseCcsru2csr(cusparseHandle_t handle,
int m,
int n,
int nnz,
const cusparseMatDescr_t descrA,
cuComplex* csrVal,
const int* csrRowPtr,
int* csrColInd,
csru2csrInfo_t info,
void* pBuffer)
cusparseStatus_t
cusparseZcsru2csr(cusparseHandle_t handle,
int m,
int n,
int nnz,
const cusparseMatDescr_t descrA,
cuDoubleComplex* csrVal,
const int* csrRowPtr,
int* csrColInd,
csru2csrInfo_t info,
void* pBuffer)
cusparseStatus_t
cusparseScsr2csru(cusparseHandle_t handle,
int m,
int n,
int nnz,
const cusparseMatDescr_t descrA,
float* csrVal,
const int* csrRowPtr,
int* csrColInd,
csru2csrInfo_t info,
void* pBuffer)
cusparseStatus_t
cusparseDcsr2csru(cusparseHandle_t handle,
int m,
int n,
int nnz,
const cusparseMatDescr_t descrA,
double* csrVal,
const int* csrRowPtr,
int* csrColInd,
csru2csrInfo_t info,
void* pBuffer)
cusparseStatus_t
cusparseCcsr2csru(cusparseHandle_t handle,
int m,
int n,
int nnz,
const cusparseMatDescr_t descrA,
cuComplex* csrVal,
const int* csrRowPtr,
int* csrColInd,
csru2csrInfo_t info,
void* pBuffer)
cusparseStatus_t
cusparseZcsr2csru(cusparseHandle_t handle,
int m,
int n,
int nnz,
const cusparseMatDescr_t descrA,
cuDoubleComplex* csrVal,
const int* csrRowPtr,
int* csrColInd,
csru2csrInfo_t info,
void* pBuffer)
This function transfers unsorted CSR format to CSR format, and vice versa. The operation is in-place.
This function is a wrapper of csrsort
and gthr
. The usecase is the following scenario.
If the user has a matrix A
of CSR format which is unsorted, and implements his own code (which can be CPU or GPU kernel) based on this special order (for example, diagonal first, then lower triangle, then upper triangle), and wants to convert it to CSR format when calling CUSPARSE library, and then convert it back when doing something else on his/her kernel. For example, suppose the user wants to solve a linear system Ax=b
by the following iterative scheme
\(x^{(k+1)} = x^{(k)} + L^{(-1)}*(b - Ax^{(k)})\) |
The code heavily uses SpMV and triangular solve. Assume that the user has an in-house design of SpMV (Sparse Matrix-Vector multiplication) based on special order of A
. However the user wants to use the cuSPARSE library for triangular solver. Then the following code can work.
do step 1: compute residual vector r = b - A x (k) by in-house SpMV step 2: B := sort(A), and L is lower triangular part of B (only sort A once and keep the permutation vector) step 3: solve z = L (-1) * ( b - A x (k) ) by cusparseXcsrsv step 4: add correction x (k+1) = x (k) + z step 5: A := unsort(B) (use permutation vector to get back the unsorted CSR) until convergence |
The requirements of step 2 and step 5 are
In-place operation.
The permutation vector
P
is hidden in an opaque structure.No
cudaMalloc
inside the conversion routine. Instead, the user has to provide the buffer explicitly.The conversion between unsorted CSR and sorted CSR may needs several times, but the function only generates the permutation vector
P
once.The function is based on
csrsort
,gather
andscatter
operations.
The operation is called csru2csr
, which means unsorted CSR to sorted CSR. Also we provide the inverse operation, called csr2csru
.
In order to keep the permutation vector invisible, we need an opaque structure called csru2csrInfo
. Then two functions (cusparseCreateCsru2csrInfo
, cusparseDestroyCsru2csrInfo
) are used to initialize and to destroy the opaque structure.
cusparse[S|D|C|Z]csru2csr_bufferSizeExt
returns the size of the buffer. The permutation vector P
is also allcated inside csru2csrInfo
. The lifetime of the permutation vector is the same as the lifetime of csru2csrInfo
.
cusparse[S|D|C|Z]csru2csr
performs forward transformation from unsorted CSR to sorted CSR. First call uses csrsort to generate the permutation vector P
, and subsequent call uses P
to do transformation.
cusparse[S|D|C|Z]csr2csru
performs backward transformation from sorted CSR to unsorted CSR. P
is used to get unsorted form back.
The routine cusparse<t>csru2csr()
has the following properties:
The routine requires no extra storage if
pBuffer != NULL
The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available
The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available
The routine cusparse<t>csr2csru()
has the following properties if pBuffer != NULL
:
The routine requires no extra storage
The routine supports asynchronous execution
The routine supports CUDA graph capture
The following tables describe parameters of csr2csru_bufferSizeExt
and csr2csru
.
Input
|
|
|
|
|
handle to the cuSPARSE library context. |
|
|
number of rows of matrix |
|
|
number of columns of matrix |
|
|
number of nonzero elements of matrix |
|
|
the descriptor of matrix |
|
|
<type> array of nnz unsorted nonzero elements of matrix |
|
|
integer array of |
|
|
integer array of |
|
|
opaque structure initialized using |
|
|
buffer allocated by the user; the size is returned by |
Output
|
|
|
|
|
<type> array of nnz sorted nonzero elements of matrix |
|
|
integer array of |
|
|
number of bytes of the buffer. |
See cusparseStatus_t for the description of the return status.
cusparseXpruneDense2csr() [DEPRECATED]
> The routine will be removed in the next major release
cusparseStatus_t
cusparseHpruneDense2csr_bufferSizeExt(cusparseHandle_t handle,
int m,
int n,
const __half* A,
int lda,
const __half* threshold,
const cusparseMatDescr_t descrC,
const __half* csrValC,
const int* csrRowPtrC,
const int* csrColIndC,
size_t* pBufferSizeInBytes)
cusparseStatus_t
cusparseSpruneDense2csr_bufferSizeExt(cusparseHandle_t handle,
int m,
int n,
const float* A,
int lda,
const float* threshold,
const cusparseMatDescr_t descrC,
const float* csrValC,
const int* csrRowPtrC,
const int* csrColIndC,
size_t* pBufferSizeInBytes)
cusparseStatus_t
cusparseDpruneDense2csr_bufferSizeExt(cusparseHandle_t handle,
int m,
int n,
const double* A,
int lda,
const double* threshold,
const cusparseMatDescr_t descrC,
const double* csrValC,
const int* csrRowPtrC,
const int* csrColIndC,
size_t* pBufferSizeInBytes)
cusparseStatus_t
cusparseHpruneDense2csrNnz(cusparseHandle_t handle,
int m,
int n,
const __half* A,
int lda,
const __half* threshold,
const cusparseMatDescr_t descrC,
int* csrRowPtrC,
int* nnzTotalDevHostPtr,
void* pBuffer)
cusparseStatus_t
cusparseSpruneDense2csrNnz(cusparseHandle_t handle,
int m,
int n,
const float* A,
int lda,
const float* threshold,
const cusparseMatDescr_t descrC,
int* csrRowPtrC,
int* nnzTotalDevHostPtr,
void* pBuffer)
cusparseStatus_t
cusparseDpruneDense2csrNnz(cusparseHandle_t handle,
int m,
int n,
const double* A,
int lda,
const double* threshold,
const cusparseMatDescr_t descrC,
int* csrRowPtrC,
int* nnzTotalDevHostPtr,
void* pBuffer)
cusparseStatus_t
cusparseHpruneDense2csr(cusparseHandle_t handle,
int m,
int n,
const __half* A,
int lda,
const __half* threshold,
const cusparseMatDescr_t descrC,
__half* csrValC,
const int* csrRowPtrC,
int* csrColIndC,
void* pBuffer)
cusparseStatus_t
cusparseSpruneDense2csr(cusparseHandle_t handle,
int m,
int n,
const float* A,
int lda,
const float* threshold,
const cusparseMatDescr_t descrC,
float* csrValC,
const int* csrRowPtrC,
int* csrColIndC,
void* pBuffer)
cusparseStatus_t
cusparseDpruneDense2csr(cusparseHandle_t handle,
int m,
int n,
const double* A,
int lda,
const double* threshold,
const cusparseMatDescr_t descrC,
double* csrValC,
const int* csrRowPtrC,
int* csrColIndC,
void* pBuffer)
This function prunes a dense matrix to a sparse matrix with CSR format.
Given a dense matrix A
and a non-negative value threshold
, the function returns a sparse matrix C
, defined by
\(\begin{matrix} {{C(i,j)} = {A(i,j)}} & \text{if\ |A(i,j)|\ >\ threshold} \\ \end{matrix}\) |
The implementation adopts a two-step approach to do the conversion. First, the user allocates csrRowPtrC
of m+1
elements and uses function pruneDense2csrNnz()
to determine the number of nonzeros columns per row. Second, the user gathers nnzC
(number of nonzeros of matrix C
) from either (nnzC=*nnzTotalDevHostPtr)
or (nnzC=csrRowPtrC[m]-csrRowPtrC[0])
and allocates csrValC
of nnzC
elements and csrColIndC
of nnzC
integers. Finally function pruneDense2csr()
is called to complete the conversion.
The user must obtain the size of the buffer required by pruneDense2csr()
by calling pruneDense2csr_bufferSizeExt()
, allocate the buffer, and pass the buffer pointer to pruneDense2csr()
.
Examples of prune chapter provides a simple example of pruneDense2csr()
.
The routine cusparse<t>pruneDense2csrNnz()
has the following properties:
This function requires temporary extra storage that is allocated internally
The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available
The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available
The routine cusparse<t>DpruneDense2csr()
has the following properties:
The routine requires no extra storage
The routine supports asynchronous execution
The routine supports CUDA graph capture
Input
|
|
|
|
|
handle to the cuSPARSE library context. |
|
|
number of rows of matrix |
|
|
number of columns of matrix |
|
|
array of dimension (lda, n). |
|
|
leading dimension of |
|
|
a value to drop the entries of A. |
|
|
the descriptor of matrix |
|
|
buffer allocated by the user; the size is returned by |
Output
|
|
|
|
|
total number of nonzero of matrix |
|
|
<type> array of |
|
|
integer array of |
|
|
integer array of |
|
|
number of bytes of the buffer. |
See cusparseStatus_t for the description of the return status.
cusparseXpruneCsr2csr() [DEPRECATED]
> The routine will be removed in the next major release
cusparseStatus_t
cusparseHpruneCsr2csr_bufferSizeExt(cusparseHandle_t handle,
int m,
int n,
int nnzA,
const cusparseMatDescr_t descrA,
const __half* csrValA,
const int* csrRowPtrA,
const int* csrColIndA,
const __half* threshold,
const cusparseMatDescr_t descrC,
const __half* csrValC,
const int* csrRowPtrC,
const int* csrColIndC,
size_t* pBufferSizeInBytes)
cusparseStatus_t
cusparseSpruneCsr2csr_bufferSizeExt(cusparseHandle_t handle,
int m,
int n,
int nnzA,
const cusparseMatDescr_t descrA,
const float* csrValA,
const int* csrRowPtrA,
const int* csrColIndA,
const float* threshold,
const cusparseMatDescr_t descrC,
const float* csrValC,
const int* csrRowPtrC,
const int* csrColIndC,
size_t* pBufferSizeInBytes)
cusparseStatus_t
cusparseDpruneCsr2csr_bufferSizeExt(cusparseHandle_t handle,
int m,
int n,
int nnzA,
const cusparseMatDescr_t descrA,
const double* csrValA,
const int* csrRowPtrA,
const int* csrColIndA,
const double* threshold,
const cusparseMatDescr_t descrC,
const double* csrValC,
const int* csrRowPtrC,
const int* csrColIndC,
size_t* pBufferSizeInBytes)
cusparseStatus_t
cusparseHpruneCsr2csrNnz(cusparseHandle_t handle,
int m,
int n,
int nnzA,
const cusparseMatDescr_t descrA,
const __half* csrValA,
const int* csrRowPtrA,
const int* csrColIndA,
const __half* threshold,
const cusparseMatDescr_t descrC,
int* csrRowPtrC,
int* nnzTotalDevHostPtr,
void* pBuffer)
cusparseStatus_t
cusparseSpruneCsr2csrNnz(cusparseHandle_t handle,
int m,
int n,
int nnzA,
const cusparseMatDescr_t descrA,
const float* csrValA,
const int* csrRowPtrA,
const int* csrColIndA,
const float* threshold,
const cusparseMatDescr_t descrC,
int* csrRowPtrC,
int* nnzTotalDevHostPtr,
void* pBuffer)
cusparseStatus_t
cusparseDpruneCsr2csrNnz(cusparseHandle_t handle,
int m,
int n,
int nnzA,
const cusparseMatDescr_t descrA,
const double* csrValA,
const int* csrRowPtrA,
const int* csrColIndA,
const double* threshold,
const cusparseMatDescr_t descrC,
int* csrRowPtrC,
int* nnzTotalDevHostPtr,
void* pBuffer)
cusparseStatus_t
cusparseHpruneCsr2csr(cusparseHandle_t handle,
int m,
int n,
int nnzA,
const cusparseMatDescr_t descrA,
const __half* csrValA,
const int* csrRowPtrA,
const int* csrColIndA,
const __half* threshold,
const cusparseMatDescr_t descrC,
__half* csrValC,
const int* csrRowPtrC,
int* csrColIndC,
void* pBuffer)
cusparseStatus_t
cusparseSpruneCsr2csr(cusparseHandle_t handle,
int m,
int n,
int nnzA,
const cusparseMatDescr_t descrA,
const float* csrValA,
const int* csrRowPtrA,
const int* csrColIndA,
const float* threshold,
const cusparseMatDescr_t descrC,
float* csrValC,
const int* csrRowPtrC,
int* csrColIndC,
void* pBuffer)
cusparseStatus_t
cusparseDpruneCsr2csr(cusparseHandle_t handle,
int m,
int n,
int nnzA,
const cusparseMatDescr_t descrA,
const double* csrValA,
const int* csrRowPtrA,
const int* csrColIndA,
const double* threshold,
const cusparseMatDescr_t descrC,
double* csrValC,
const int* csrRowPtrC,
int* csrColIndC,
void* pBuffer)
This function prunes a sparse matrix to a sparse matrix with CSR format.
Given a sparse matrix A
and a non-negative value threshold
, the function returns a sparse matrix C
, defined by
\(\begin{matrix} {{C(i,j)} = {A(i,j)}} & \text{if\ |A(i,j)|\ >\ threshold} \\ \end{matrix}\) |
The implementation adopts a two-step approach to do the conversion. First, the user allocates csrRowPtrC
of m+1
elements and uses function pruneCsr2csrNnz()
to determine the number of nonzeros columns per row. Second, the user gathers nnzC
(number of nonzeros of matrix C
) from either (nnzC=*nnzTotalDevHostPtr)
or (nnzC=csrRowPtrC[m]-csrRowPtrC[0])
and allocates csrValC
of nnzC
elements and csrColIndC
of nnzC
integers. Finally function pruneCsr2csr()
is called to complete the conversion.
The user must obtain the size of the buffer required by pruneCsr2csr()
by calling pruneCsr2csr_bufferSizeExt()
, allocate the buffer, and pass the buffer pointer to pruneCsr2csr()
.
Examples of prune chapter provides a simple example of pruneCsr2csr()
.
The routine cusparse<t>pruneCsr2csrNnz()
has the following properties:
This function requires temporary extra storage that is allocated internally
The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available
The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available
The routine cusparse<t>pruneCsr2csr()
has the following properties:
The routine requires no extra storage
The routine supports asynchronous execution
The routine supports CUDA graph capture
Input
|
|
|
|
|
handle to the cuSPARSE library context. |
|
|
number of rows of matrix |
|
|
number of columns of matrix |
|
|
number of nonzeros of matrix |
|
|
the descriptor of matrix |
|
|
<type> array of |
|
|
integer array of |
|
|
integer array of |
|
|
a value to drop the entries of A. |
|
|
the descriptor of matrix |
|
|
buffer allocated by the user; the size is returned by |
Output
|
|
|
|
|
total number of nonzero of matrix |
|
|
<type> array of |
|
|
integer array of |
|
|
integer array of |
|
|
number of bytes of the buffer. |
See cusparseStatus_t for the description of the return status.
cusparseXpruneDense2csrPercentage() [DEPRECATED]
> The routine will be removed in the next major release
cusparseStatus_t
cusparseHpruneDense2csrByPercentage_bufferSizeExt(cusparseHandle_t handle,
int m,
int n,
const __half* A,
int lda,
float percentage,
const cusparseMatDescr_t descrC,
const __half* csrValC,
const int* csrRowPtrC,
const int* csrColIndC,
pruneInfo_t info,
size_t* pBufferSizeInBytes)
cusparseStatus_t
cusparseSpruneDense2csrByPercentage_bufferSizeExt(cusparseHandle_t handle,
int m,
int n,
const float* A,
int lda,
float percentage,
const cusparseMatDescr_t descrC,
const float* csrValC,
const int* csrRowPtrC,
const int* csrColIndC,
pruneInfo_t info,
size_t* pBufferSizeInBytes)
cusparseStatus_t
cusparseDpruneDense2csrByPercentage_bufferSizeExt(cusparseHandle_t handle,
int m,
int n,
const double* A,
int lda,
float percentage,
const cusparseMatDescr_t descrC,
const double* csrValC,
const int* csrRowPtrC,
const int* csrColIndC,
pruneInfo_t info,
size_t* pBufferSizeInBytes)
cusparseStatus_t
cusparseHpruneDense2csrNnzByPercentage(cusparseHandle_t handle,
int m,
int n,
const __half* A,
int lda,
float percentage,
const cusparseMatDescr_t descrC,
int* csrRowPtrC,
int* nnzTotalDevHostPtr,
pruneInfo_t info,
void* pBuffer)
cusparseStatus_t
cusparseSpruneDense2csrNnzByPercentage(cusparseHandle_t handle,
int m,
int n,
const float* A,
int lda,
float percentage,
const cusparseMatDescr_t descrC,
int* csrRowPtrC,
int* nnzTotalDevHostPtr,
pruneInfo_t info,
void* pBuffer)
cusparseStatus_t
cusparseDpruneDense2csrNnzByPercentage(cusparseHandle_t handle,
int m,
int n,
const double* A,
int lda,
float percentage,
const cusparseMatDescr_t descrC,
int* csrRowPtrC,
int* nnzTotalDevHostPtr,
pruneInfo_t info,
void* pBuffer)
cusparseStatus_t
cusparseHpruneDense2csrByPercentage(cusparseHandle_t handle,
int m,
int n,
const __half* A,
int lda,
float percentage,
const cusparseMatDescr_t descrC,
__half* csrValC,
const int* csrRowPtrC,
int* csrColIndC,
pruneInfo_t info,
void* pBuffer)
cusparseStatus_t
cusparseSpruneDense2csrByPercentage(cusparseHandle_t handle,
int m,
int n,
const float* A,
int lda,
float percentage,
const cusparseMatDescr_t descrC,
float* csrValC,
const int* csrRowPtrC,
int* csrColIndC,
pruneInfo_t info,
void* pBuffer)
cusparseStatus_t
cusparseDpruneDense2csrByPercentage(cusparseHandle_t handle,
int m,
int n,
const double* A,
int lda,
float percentage,
const cusparseMatDescr_t descrC,
double* csrValC,
const int* csrRowPtrC,
int* csrColIndC,
pruneInfo_t info,
void* pBuffer)
This function prunes a dense matrix to a sparse matrix by percentage.
Given a dense matrix A
and a non-negative value percentage
, the function computes sparse matrix C
by the following three steps:
Step 1: sort absolute value of A
in ascending order.
\(\begin{matrix} {key\ :=\ sort(\ |A|\ )} \\ \end{matrix}\) |
Step 2: choose threshold by the parameter percentage
\(\begin{matrix} {pos\ =\ ceil(m*n*(percentage/100))\ -\ 1} \\ {pos\ =\ min(pos,\ m*n-1)} \\ {pos\ =\ max(pos,\ 0)} \\ {threshold\ =\ key\lbrack pos\rbrack} \\ \end{matrix}\) |
Step 3: call pruneDense2csr()
by with the parameter threshold
.
The implementation adopts a two-step approach to do the conversion. First, the user allocates csrRowPtrC
of m+1
elements and uses function pruneDense2csrNnzByPercentage()
to determine the number of nonzeros columns per row. Second, the user gathers nnzC
(number of nonzeros of matrix C
) from either (nnzC=*nnzTotalDevHostPtr)
or (nnzC=csrRowPtrC[m]-csrRowPtrC[0])
and allocates csrValC
of nnzC
elements and csrColIndC
of nnzC
integers. Finally function pruneDense2csrByPercentage()
is called to complete the conversion.
The user must obtain the size of the buffer required by pruneDense2csrByPercentage()
by calling pruneDense2csrByPercentage_bufferSizeExt()
, allocate the buffer, and pass the buffer pointer to pruneDense2csrByPercentage()
.
Remark 1: the value of percentage
must be not greater than 100. Otherwise, CUSPARSE_STATUS_INVALID_VALUE
is returned.
Remark 2: the zeros of A
are not ignored. All entries are sorted, including zeros. This is different from pruneCsr2csrByPercentage()
Examples of prune chapter provides a simple example of pruneDense2csrNnzByPercentage()
.
The routine cusparse<t>pruneDense2csrNnzByPercentage()
has the following properties:
This function requires temporary extra storage that is allocated internally
The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available
The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available
The routine cusparse<t>pruneDense2csrByPercentage()
has the following properties:
The routine requires no extra storage
The routine supports asynchronous execution
The routine supports CUDA graph capture
Input
|
|
|
|
|
handle to the cuSPARSE library context. |
|
|
number of rows of matrix |
|
|
number of columns of matrix |
|
|
array of dimension (lda, n). |
|
|
leading dimension of |
|
|
percentage <=100 and percentage >= 0 |
|
|
the descriptor of matrix |
|
|
buffer allocated by the user; the size is returned by |
Output
|
|
|
|
|
total number of nonzero of matrix |
|
|
<type> array of |
|
|
integer array of |
|
|
integer array of |
|
|
number of bytes of the buffer. |
See cusparseStatus_t for the description of the return status.
cusparseXpruneCsr2csrByPercentage() [DEPRECATED]
> The routine will be removed in the next major release
cusparseStatus_t
cusparseHpruneCsr2csrByPercentage_bufferSizeExt(cusparseHandle_t handle,
int m,
int n,
int nnzA,
const cusparseMatDescr_t descrA,
const __half* csrValA,
const int* csrRowPtrA,
const int* csrColIndA,
float percentage,
const cusparseMatDescr_t descrC,
const __half* csrValC,
const int* csrRowPtrC,
const int* csrColIndC,
pruneInfo_t info,
size_t* pBufferSizeInBytes)
cusparseStatus_t
cusparseSpruneCsr2csrByPercentage_bufferSizeExt(cusparseHandle_t handle,
int m,
int n,
int nnzA,
const cusparseMatDescr_t descrA,
const float* csrValA,
const int* csrRowPtrA,
const int* csrColIndA,
float percentage,
const cusparseMatDescr_t descrC,
const float* csrValC,
const int* csrRowPtrC,
const int* csrColIndC,
pruneInfo_t info,
size_t* pBufferSizeInBytes)
cusparseStatus_t
cusparseDpruneCsr2csrByPercentage_bufferSizeExt(cusparseHandle_t handle,
int m,
int n,
int nnzA,
const cusparseMatDescr_t descrA,
const double* csrValA,
const int* csrRowPtrA,
const int* csrColIndA,
float percentage,
const cusparseMatDescr_t descrC,
const double* csrValC,
const int* csrRowPtrC,
const int* csrColIndC,
pruneInfo_t info,
size_t* pBufferSizeInBytes)
cusparseStatus_t
cusparseHpruneCsr2csrNnzByPercentage(cusparseHandle_t handle,
int m,
int n,
int nnzA,
const cusparseMatDescr_t descrA,
const __half* csrValA,
const int* csrRowPtrA,
const int* csrColIndA,
float percentage,
const cusparseMatDescr_t descrC,
int* csrRowPtrC,
int* nnzTotalDevHostPtr,
pruneInfo_t info,
void* pBuffer)
cusparseStatus_t
cusparseSpruneCsr2csrNnzByPercentage(cusparseHandle_t handle,
int m,
int n,
int nnzA,
const cusparseMatDescr_t descrA,
const float* csrValA,
const int* csrRowPtrA,
const int* csrColIndA,
float percentage,
const cusparseMatDescr_t descrC,
int* csrRowPtrC,
int* nnzTotalDevHostPtr,
pruneInfo_t info,
void* pBuffer)
cusparseStatus_t
cusparseDpruneCsr2csrNnzByPercentage(cusparseHandle_t handle,
int m,
int n,
int nnzA,
const cusparseMatDescr_t descrA,
const double* csrValA,
const int* csrRowPtrA,
const int* csrColIndA,
float percentage,
const cusparseMatDescr_t descrC,
int* csrRowPtrC,
int* nnzTotalDevHostPtr,
pruneInfo_t info,
void* pBuffer)
cusparseStatus_t
cusparseHpruneCsr2csrByPercentage(cusparseHandle_t handle,
int m,
int n,
int nnzA,
const cusparseMatDescr_t descrA,
const __half* csrValA,
const int* csrRowPtrA,
const int* csrColIndA,
float percentage,
const cusparseMatDescr_t descrC,
__half* csrValC,
const int* csrRowPtrC,
int* csrColIndC,
pruneInfo_t info,
void* pBuffer)
cusparseStatus_t
cusparseSpruneCsr2csrByPercentage(cusparseHandle_t handle,
int m,
int n,
int nnzA,
const cusparseMatDescr_t descrA,
const float* csrValA,
const int* csrRowPtrA,
const int* csrColIndA,
float percentage,
const cusparseMatDescr_t descrC,
float* csrValC,
const int* csrRowPtrC,
int* csrColIndC,
pruneInfo_t info,
void* pBuffer)
cusparseStatus_t
cusparseDpruneCsr2csrByPercentage(cusparseHandle_t handle,
int m,
int n,
int nnzA,
const cusparseMatDescr_t descrA,
const double* csrValA,
const int* csrRowPtrA,
const int* csrColIndA,
float percentage,
const cusparseMatDescr_t descrC,
double* csrValC,
const int* csrRowPtrC,
int* csrColIndC,
pruneInfo_t info,
void* pBuffer)
This function prunes a sparse matrix to a sparse matrix by percentage.
Given a sparse matrix A
and a non-negative value percentage
, the function computes sparse matrix C
by the following three steps:
Step 1: sort absolute value of A
in ascending order.
\(\begin{matrix} {key\ :=\ sort(\ |csrValA|\ )} \\ \end{matrix}\) |
Step 2: choose threshold by the parameter percentage
\(\begin{matrix} {pos\ =\ ceil(nnzA*(percentage/100))\ -\ 1} \\ {pos\ =\ min(pos,\ nnzA-1)} \\ {pos\ =\ max(pos,\ 0)} \\ {threshold\ =\ key\lbrack pos\rbrack} \\ \end{matrix}\) |
Step 3: call pruneCsr2csr()
by with the parameter threshold
.
The implementation adopts a two-step approach to do the conversion. First, the user allocates csrRowPtrC
of m+1
elements and uses function pruneCsr2csrNnzByPercentage()
to determine the number of nonzeros columns per row. Second, the user gathers nnzC
(number of nonzeros of matrix C
) from either (nnzC=*nnzTotalDevHostPtr)
or (nnzC=csrRowPtrC[m]-csrRowPtrC[0])
and allocates csrValC
of nnzC
elements and csrColIndC
of nnzC
integers. Finally function pruneCsr2csrByPercentage()
is called to complete the conversion.
The user must obtain the size of the buffer required by pruneCsr2csrByPercentage()
by calling pruneCsr2csrByPercentage_bufferSizeExt()
, allocate the buffer, and pass the buffer pointer to pruneCsr2csrByPercentage()
.
Remark 1: the value of percentage
must be not greater than 100. Otherwise, CUSPARSE_STATUS_INVALID_VALUE
is returned.
Examples of prune chapter provides a simple example of pruneCsr2csrByPercentage()
.
The routine cusparse<t>pruneCsr2csrNnzByPercentage()
has the following properties:
This function requires temporary extra storage that is allocated internally
The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available
The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available
The routine cusparse<t>pruneCsr2csrByPercentage()
has the following properties:
The routine requires no extra storage
The routine supports asynchronous execution
The routine supports CUDA graph capture
Input
|
|
|
|
|
handle to the cuSPARSE library context. |
|
|
number of rows of matrix |
|
|
number of columns of matrix |
|
|
number of nonzeros of matrix |
|
|
the descriptor of matrix |
|
|
<type> array of |
|
|
integer array of |
|
|
integer array of |
|
|
percentage <=100 and percentage >= 0 |
|
|
the descriptor of matrix |
|
|
buffer allocated by the user; the size is returned by |
Output
|
|
|
|
|
total number of nonzero of matrix |
|
|
<type> array of |
|
|
integer array of |
|
|
integer array of |
|
|
number of bytes of the buffer. |
See cusparseStatus_t for the description of the return status.
cusparse<t>nnz_compress() [DEPRECATED]
> The routine will be removed in the next major release
cusparseStatus_t
cusparseSnnz_compress(cusparseHandle_t handle,
int m,
const cusparseMatDescr_t descr,
const float* csrValA,
const int* csrRowPtrA,
int* nnzPerRow,
int* nnzC,
float tol)
cusparseStatus_t
cusparseDnnz_compress(cusparseHandle_t handle,
int m,
const cusparseMatDescr_t descr,
const double* csrValA,
const int* csrRowPtrA,
int* nnzPerRow,
int* nnzC,
double tol)
cusparseStatus_t
cusparseCnnz_compress(cusparseHandle_t handle,
int m,
const cusparseMatDescr_t descr,
const cuComplex* csrValA,
const int* csrRowPtrA,
int* nnzPerRow,
int* nnzC,
cuComplex tol)
cusparseStatus_t
cusparseZnnz_compress(cusparseHandle_t handle,
int m,
const cusparseMatDescr_t descr,
const cuDoubleComplex* csrValA,
const int* csrRowPtrA,
int* nnzPerRow,
int* nnzC,
cuDoubleComplex tol)
This function is the step one to convert from csr format to compressed csr format.
Given a sparse matrix A and a non-negative value threshold, the function returns nnzPerRow(the number of nonzeros columns per row) and nnzC(the total number of nonzeros) of a sparse matrix C, defined by
\(\begin{matrix} {{C(i,j)} = {A(i,j)}} & \text{if\ |A(i,j)|\ >\ threshold} \\ \end{matrix}\) |
A key assumption for the cuComplex and cuDoubleComplex case is that this tolerance is given as the real part. For example tol = 1e-8 + 0*i and we extract cureal, that is the x component of this struct.
This function requires temporary extra storage that is allocated internally
The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available
The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available
Input
|
handle to the cuSPARSE library context. |
|
number of rows of matrix |
|
the descriptor of matrix |
|
csr noncompressed values array |
|
the corresponding input noncompressed row pointer. |
|
non-negative tolerance to determine if a number less than or equal to it. |
Output
|
this array contains the number of elements whose absolute values are greater than tol per row. |
|
host/device pointer of the total number of elements whose absolute values are greater than tol. |
See cusparseStatus_t for the description of the return status.