cuSPARSE Format Conversion Reference

This chapter describes the conversion routines between different sparse and dense storage formats.

coosort, csrsort, cscsort, and csru2csr are sorting routines without malloc inside, the following table estimates the buffer size.

routine

buffer size

maximum problem size if buffer is limited by 2GB

coosort

> 16*n bytes

125M

csrsort or cscsort

> 20*n bytes

100M

csru2csr

'd' > 28*n bytes ; 'z' > 36*n bytes

71M for ‘d’ and 55M for ‘z’

cusparse<t>bsr2csr()

cusparseStatus_t
cusparseSbsr2csr(cusparseHandle_t         handle,
                 cusparseDirection_t      dir,
                 int                      mb,
                 int                      nb,
                 const cusparseMatDescr_t descrA,
                 const float*             bsrValA,
                 const int*               bsrRowPtrA,
                 const int*               bsrColIndA,
                 int                      blockDim,
                 const cusparseMatDescr_t descrC,
                 float*                   csrValC,
                 int*                     csrRowPtrC,
                 int*                     csrColIndC)

cusparseStatus_t
cusparseDbsr2csr(cusparseHandle_t         handle,
                 cusparseDirection_t      dir,
                 int                      mb,
                 int                      nb,
                 const cusparseMatDescr_t descrA,
                 const double*            bsrValA,
                 const int*               bsrRowPtrA,
                 const int*               bsrColIndA,
                 int                      blockDim,
                 const cusparseMatDescr_t descrC,
                 double*                  csrValC,
                 int*                     csrRowPtrC,
                 int*                     csrColIndC)

cusparseStatus_t
cusparseCbsr2csr(cusparseHandle_t         handle,
                 cusparseDirection_t      dir,
                 int                      mb,
                 int                      nb,
                 const cusparseMatDescr_t descrA,
                 const cuComplex*         bsrValA,
                 const int*               bsrRowPtrA,
                 const int*               bsrColIndA,
                 int                      blockDim,
                 const cusparseMatDescr_t descrC,
                 cuComplex*               csrValC,
                 int*                     csrRowPtrC,
                 int*                     csrColIndC)

cusparseStatus_t
cusparseZbsr2csr(cusparseHandle_t         handle,
                 cusparseDirection_t      dir,
                 int                      mb,
                 int                      nb,
                 const cusparseMatDescr_t descrA,
                 const cuDoubleComplex*   bsrValA,
                 const int*               bsrRowPtrA,
                 const int*               bsrColIndA,
                 int                      blockDim,
                 const cusparseMatDescr_t descrC,
                 cuDoubleComplex*         csrValC,
                 int*                     csrRowPtrC,
                 int*                     csrColIndC)

This function converts a sparse matrix in BSR format that is defined by the three arrays bsrValA, bsrRowPtrA, and bsrColIndA) into a sparse matrix in CSR format that is defined by arrays csrValC, csrRowPtrC, and csrColIndC.

Let m(=mb*blockDim) be the number of rows of A and n(=nb*blockDim) be number of columns of A, then A and C are m*n sparse matrices. The BSR format of A contains nnzb(=bsrRowPtrA[mb] - bsrRowPtrA[0]) nonzero blocks, whereas the sparse matrix A contains nnz(=nnzb*blockDim*blockDim) elements. The user must allocate enough space for arrays csrRowPtrC, csrColIndC, and csrValC. The requirements are as follows:

csrRowPtrC of m+1 elements

csrValC of nnz elements

csrColIndC of nnz elements

The general procedure is as follows:

// Given BSR format (bsrRowPtrA, bsrcolIndA, bsrValA) and
// blocks of BSR format are stored in column-major order.
cusparseDirection_t dir = CUSPARSE_DIRECTION_COLUMN;
int m = mb*blockDim;
int nnzb = bsrRowPtrA[mb] - bsrRowPtrA[0]; // number of blocks
int nnz  = nnzb * blockDim * blockDim; // number of elements
cudaMalloc((void**)&csrRowPtrC, sizeof(int)*(m+1));
cudaMalloc((void**)&csrColIndC, sizeof(int)*nnz);
cudaMalloc((void**)&csrValC, sizeof(float)*nnz);
cusparseSbsr2csr(handle, dir, mb, nb,
        descrA,
        bsrValA, bsrRowPtrA, bsrColIndA,
        blockDim,
        descrC,
        csrValC, csrRowPtrC, csrColIndC);
  • The routine requires no extra storage

  • The routine supports asynchronous execution if blockDim != 1 or the Stream Ordered Memory Allocator is available

  • The routine supports CUDA graph capture if blockDim != 1 or the Stream Ordered Memory Allocator is available

Input

handle

handle to the cuSPARSE library context.

dir

storage format of blocks, either CUSPARSE_DIRECTION_ROW or CUSPARSE_DIRECTION_COLUMN.

mb

number of block rows of sparse matrix A.

nb

number of block columns of sparse matrix A.

descrA

the descriptor of matrix A.

bsrValA

<type> array of nnzb*blockDim*blockDim nonzero elements of matrix A.

bsrRowPtrA

integer array of mb+1 elements that contains the start of every block row and the end of the last block row plus one of matrix A.

bsrColIndA

integer array of nnzb column indices of the nonzero blocks of matrix A.

blockDim

block dimension of sparse matrix A.

descrC

the descriptor of matrix C.

Output

csrValC

<type> array of nnz(=csrRowPtrC[m]-csrRowPtrC[0]) nonzero elements of matrix C.

csrRowPtrC

integer array of m+1 elements that contains the start of every row and the end of the last row plus one of matrix C.

csrColIndC

integer array of nnz column indices of the nonzero elements of matrix C.

See cusparseStatus_t for the description of the return status.

cusparse<t>gebsr2gebsc()

cusparseStatus_t
cusparseSgebsr2gebsc_bufferSize(cusparseHandle_t handle,
                               int               mb,
                               int               nb,
                               int               nnzb,
                               const float*      bsrVal,
                               const int*        bsrRowPtr,
                               const int*        bsrColInd,
                               int               rowBlockDim,
                               int               colBlockDim,
                               int*              pBufferSize)

cusparseStatus_t
cusparseDgebsr2gebsc_bufferSize(cusparseHandle_t handle,
                               int               mb,
                               int               nb,
                               int               nnzb,
                               const double*     bsrVal,
                               const int*        bsrRowPtr,
                               const int*        bsrColInd,
                               int               rowBlockDim,
                               int               colBlockDim,
                               int*              pBufferSize)

cusparseStatus_t
cusparseCgebsr2gebsc_bufferSize(cusparseHandle_t handle,
                               int               mb,
                               int               nb,
                               int               nnzb,
                               const cuComplex*  bsrVal,
                               const int*        bsrRowPtr,
                               const int*        bsrColInd,
                               int               rowBlockDim,
                               int               colBlockDim,
                               int*              pBufferSize)

cusparseStatus_t
cusparseZgebsr2gebsc_bufferSize(cusparseHandle_t      handle,
                               int                    mb,
                               int                    nb,
                               int                    nnzb,
                               const cuDoubleComplex* bsrVal,
                               const int*             bsrRowPtr,
                               const int*             bsrColInd,
                               int                    rowBlockDim,
                               int                    colBlockDim,
                               int*                   pBufferSize)
cusparseStatus_t
cusparseSgebsr2gebsc(cusparseHandle_t    handle,
                     int                 mb,
                     int                 nb,
                     int                 nnzb,
                     const float*        bsrVal,
                     const int*          bsrRowPtr,
                     const int*          bsrColInd,
                     int                 rowBlockDim,
                     int                 colBlockDim,
                     float*              bscVal,
                     int*                bscRowInd,
                     int*                bscColPtr,
                     cusparseAction_t    copyValues,
                     cusparseIndexBase_t baseIdx,
                     void*               pBuffer)

cusparseStatus_t
cusparseDgebsr2gebsc(cusparseHandle_t    handle,
                     int                 mb,
                     int                 nb,
                     int                 nnzb,
                     const double*       bsrVal,
                     const int*          bsrRowPtr,
                     const int*          bsrColInd,
                     int                 rowBlockDim,
                     int                 colBlockDim,
                     double*             bscVal,
                     int*                bscRowInd,
                     int*                bscColPtr,
                     cusparseAction_t    copyValues,
                     cusparseIndexBase_t baseIdx,
                     void*               pBuffer)

cusparseStatus_t
cusparseCgebsr2gebsc(cusparseHandle_t    handle,
                     int                 mb,
                     int                 nb,
                     int                 nnzb,
                     const cuComplex*    bsrVal,
                     const int*          bsrRowPtr,
                     const int*          bsrColInd,
                     int                 rowBlockDim,
                     int                 colBlockDim,
                     cuComplex*          bscVal,
                     int*                bscRowInd,
                     int*                bscColPtr,
                     cusparseAction_t    copyValues,
                     cusparseIndexBase_t baseIdx,
                     void*               pBuffer)

cusparseStatus_t
cusparseZgebsr2gebsc(cusparseHandle_t       handle,
                     int                    mb,
                     int                    nb,
                     int                    nnzb,
                     const cuDoubleComplex* bsrVal,
                     const int*             bsrRowPtr,
                     const int*             bsrColInd,
                     int                    rowBlockDim,
                     int                    colBlockDim,
                     cuDoubleComplex*       bscVal,
                     int*                   bscRowInd,
                     int*                   bscColPtr,
                     cusparseAction_t       copyValues,
                     cusparseIndexBase_t    baseIdx,
                     void*                  pBuffer)

This function can be seen as the same as csr2csc() when each block of size rowBlockDim*colBlockDim is regarded as a scalar.

This sparsity pattern of the result matrix can also be seen as the transpose of the original sparse matrix, but the memory layout of a block does not change.

The user must call gebsr2gebsc_bufferSize() to determine the size of the buffer required by gebsr2gebsc(), allocate the buffer, and pass the buffer pointer to gebsr2gebsc().

  • The routine requires no extra storage if pBuffer != NULL

  • The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available

  • The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available

Input

handle

handle to the cuSPARSE library context.

mb

number of block rows of sparse matrix A.

nb

number of block columns of sparse matrix A.

nnzb

number of nonzero blocks of matrix A.

bsrVal

<type> array of nnzb*rowBlockDim*colBlockDim nonzero elements of matrix A.

bsrRowPtr

integer array of mb+1 elements that contains the start of every block row and the end of the last block row plus one.

bsrColInd

integer array of nnzb column indices of the non-zero blocks of matrix A.

rowBlockDim

number of rows within a block of A.

colBlockDim

number of columns within a block of A.

copyValues

CUSPARSE_ACTION_SYMBOLIC or CUSPARSE_ACTION_NUMERIC.

baseIdx

CUSPARSE_INDEX_BASE_ZERO or CUSPARSE_INDEX_BASE_ONE.

pBufferSize

host pointer containing number of bytes of the buffer used in gebsr2gebsc().

pBuffer

buffer allocated by the user; the size is return by gebsr2gebsc_bufferSize().

Output

bscVal

<type> array of nnzb*rowBlockDim*colBlockDim non-zero elements of matrix A. It is only filled-in if copyValues is set to CUSPARSE_ACTION_NUMERIC.

bscRowInd

integer array of nnzb row indices of the non-zero blocks of matrix A.

bscColPtr

integer array of nb+1 elements that contains the start of every block column and the end of the last block column plus one.

See cusparseStatus_t for the description of the return status.

cusparse<t>gebsr2gebsr()

cusparseStatus_t
cusparseSgebsr2gebsr_bufferSize(cusparseHandle_t         handle,
                                cusparseDirection_t      dir,
                                int                      mb,
                                int                      nb,
                                int                      nnzb,
                                const cusparseMatDescr_t descrA,
                                const float*             bsrValA,
                                const int*               bsrRowPtrA,
                                const int*               bsrColIndA,
                                int                      rowBlockDimA,
                                int                      colBlockDimA,
                                int                      rowBlockDimC,
                                int                      colBlockDimC,
                                int*                     pBufferSize)

cusparseStatus_t
cusparseDgebsr2gebsr_bufferSize(cusparseHandle_t         handle,
                                cusparseDirection_t      dir,
                                int                      mb,
                                int                      nb,
                                int                      nnzb,
                                const cusparseMatDescr_t descrA,
                                const double*            bsrValA,
                                const int*               bsrRowPtrA,
                                const int*               bsrColIndA,
                                int                      rowBlockDimA,
                                int                      colBlockDimA,
                                int                      rowBlockDimC,
                                int                      colBlockDimC,
                                int*                     pBufferSize)

cusparseStatus_t
cusparseCgebsr2gebsr_bufferSize(cusparseHandle_t         handle,
                                cusparseDirection_t      dir,
                                int                      mb,
                                int                      nb,
                                int                      nnzb,
                                const cusparseMatDescr_t descrA,
                                const cuComplex*         bsrValA,
                                const int*               bsrRowPtrA,
                                const int*               bsrColIndA,
                                int                      rowBlockDimA,
                                int                      colBlockDimA,
                                int                      rowBlockDimC,
                                int                      colBlockDimC,
                                int*                     pBufferSize)

cusparseStatus_t
cusparseZgebsr2gebsr_bufferSize(cusparseHandle_t         handle,
                                cusparseDirection_t      dir,
                                int                      mb,
                                int                      nb,
                                int                      nnzb,
                                const cusparseMatDescr_t descrA,
                                const cuDoubleComplex*   bsrValA,
                                const int*               bsrRowPtrA,
                                const int*               bsrColIndA,
                                int                      rowBlockDimA,
                                int                      colBlockDimA,
                                int                      rowBlockDimC,
                                int                      colBlockDimC,
                                int*                     pBufferSize)
cusparseStatus_t
cusparseXgebsr2gebsrNnz(cusparseHandle_t         handle,
                        cusparseDirection_t      dir,
                        int                      mb,
                        int                      nb,
                        int                      nnzb,
                        const cusparseMatDescr_t descrA,
                        const int*               bsrRowPtrA,
                        const int*               bsrColIndA,
                        int                      rowBlockDimA,
                        int                      colBlockDimA,
                        const cusparseMatDescr_t descrC,
                        int*                     bsrRowPtrC,
                        int                      rowBlockDimC,
                        int                      colBlockDimC,
                        int*                     nnzTotalDevHostPtr,
                        void*                    pBuffer)

cusparseStatus_t
cusparseSgebsr2gebsr(cusparseHandle_t         handle,
                     cusparseDirection_t      dir,
                     int                      mb,
                     int                      nb,
                     int                      nnzb,
                     const cusparseMatDescr_t descrA,
                     const float*             bsrValA,
                     const int*               bsrRowPtrA,
                     const int*               bsrColIndA,
                     int                      rowBlockDimA,
                     int                      colBlockDimA,
                     const cusparseMatDescr_t descrC,
                     float*                   bsrValC,
                     int*                     bsrRowPtrC,
                     int*                     bsrColIndC,
                     int                      rowBlockDimC,
                     int                      colBlockDimC,
                     void*                    pBuffer)

cusparseStatus_t
cusparseDgebsr2gebsr(cusparseHandle_t         handle,
                     cusparseDirection_t      dir,
                     int                      mb,
                     int                      nb,
                     int                      nnzb,
                     const cusparseMatDescr_t descrA,
                     const double*            bsrValA,
                     const int*               bsrRowPtrA,
                     const int*               bsrColIndA,
                     int                      rowBlockDimA,
                     int                      colBlockDimA,
                     const cusparseMatDescr_t descrC,
                     double*                  bsrValC,
                     int*                     bsrRowPtrC,
                     int*                     bsrColIndC,
                     int                      rowBlockDimC,
                     int                      colBlockDimC,
                     void*                    pBuffer)

cusparseStatus_t
cusparseCgebsr2gebsr(cusparseHandle_t         handle,
                     cusparseDirection_t      dir,
                     int                      mb,
                     int                      nb,
                     int                      nnzb,
                     const cusparseMatDescr_t descrA,
                     const cuComplex*         bsrValA,
                     const int*               bsrRowPtrA,
                     const int*               bsrColIndA,
                     int                      rowBlockDimA,
                     int                      colBlockDimA,
                     const cusparseMatDescr_t descrC,
                     cuComplex*               bsrValC,
                     int*                     bsrRowPtrC,
                     int*                     bsrColIndC,
                     int                      rowBlockDimC,
                     int                      colBlockDimC,
                     void*                    pBuffer)

cusparseStatus_t
cusparseZgebsr2gebsr(cusparseHandle_t         handle,
                     cusparseDirection_t      dir,
                     int                      mb,
                     int                      nb,
                     int                      nnzb,
                     const cusparseMatDescr_t descrA,
                     const cuDoubleComplex*   bsrValA,
                     const int*               bsrRowPtrA,
                     const int*               bsrColIndA,
                     int                      rowBlockDimA,
                     int                      colBlockDimA,
                     const cusparseMatDescr_t descrC,
                     cuDoubleComplex*         bsrValC,
                     int*                     bsrRowPtrC,
                     int*                     bsrColIndC,
                     int                      rowBlockDimC,
                     int                      colBlockDimC,
                     void*                    pBuffer)

This function converts a sparse matrix in general BSR format that is defined by the three arrays bsrValA, bsrRowPtrA, and bsrColIndA into a sparse matrix in another general BSR format that is defined by arrays bsrValC, bsrRowPtrC, and bsrColIndC.

If rowBlockDimA=1 and colBlockDimA=1, cusparse[S|D|C|Z]gebsr2gebsr() is the same as cusparse[S|D|C|Z]csr2gebsr().

If rowBlockDimC=1 and colBlockDimC=1, cusparse[S|D|C|Z]gebsr2gebsr() is the same as cusparse[S|D|C|Z]gebsr2csr().

A is an m*n sparse matrix where m(=mb*rowBlockDim) is the number of rows of A, and n(=nb*colBlockDim) is the number of columns of A. The general BSR format of A contains nnzb(=bsrRowPtrA[mb] -                               bsrRowPtrA[0]) nonzero blocks. The matrix C is also general BSR format with a different block size, rowBlockDimC*colBlockDimC. If m is not a multiple of rowBlockDimC, or n is not a multiple of colBlockDimC, zeros are filled in. The number of block rows of C is mc(=(m+rowBlockDimC-1)/rowBlockDimC). The number of block rows of C is nc(=(n+colBlockDimC-1)/colBlockDimC). The number of nonzero blocks of C is nnzc.

The implementation adopts a two-step approach to do the conversion. First, the user allocates bsrRowPtrC of mc+1 elements and uses function cusparseXgebsr2gebsrNnz() to determine the number of nonzero block columns per block row of matrix C. Second, the user gathers nnzc (number of non-zero block columns of matrix C) from either (nnzc=*nnzTotalDevHostPtr) or (nnzc=bsrRowPtrC[mc]-bsrRowPtrC[0]) and allocates bsrValC of nnzc*rowBlockDimC*colBlockDimC elements and bsrColIndC of nnzc integers. Finally the function cusparse[S|D|C|Z]gebsr2gebsr() is called to complete the conversion.

The user must call gebsr2gebsr_bufferSize() to know the size of the buffer required by gebsr2gebsr(), allocate the buffer, and pass the buffer pointer to gebsr2gebsr().

The general procedure is as follows:

// Given general BSR format (bsrRowPtrA, bsrColIndA, bsrValA) and
// blocks of BSR format are stored in column-major order.
cusparseDirection_t dir = CUSPARSE_DIRECTION_COLUMN;
int base, nnzc;
int m = mb*rowBlockDimA;
int n = nb*colBlockDimA;
int mc = (m+rowBlockDimC-1)/rowBlockDimC;
int nc = (n+colBlockDimC-1)/colBlockDimC;
int bufferSize;
void *pBuffer;
cusparseSgebsr2gebsr_bufferSize(handle, dir, mb, nb, nnzb,
    descrA, bsrValA, bsrRowPtrA, bsrColIndA,
    rowBlockDimA, colBlockDimA,
    rowBlockDimC, colBlockDimC,
    &bufferSize);
cudaMalloc((void**)&pBuffer, bufferSize);
cudaMalloc((void**)&bsrRowPtrC, sizeof(int)*(mc+1));
// nnzTotalDevHostPtr points to host memory
int *nnzTotalDevHostPtr = &nnzc;
cusparseXgebsr2gebsrNnz(handle, dir, mb, nb, nnzb,
    descrA, bsrRowPtrA, bsrColIndA,
    rowBlockDimA, colBlockDimA,
    descrC, bsrRowPtrC,
    rowBlockDimC, colBlockDimC,
    nnzTotalDevHostPtr,
    pBuffer);
if (NULL != nnzTotalDevHostPtr){
    nnzc = *nnzTotalDevHostPtr;
}else{
    cudaMemcpy(&nnzc, bsrRowPtrC+mc, sizeof(int), cudaMemcpyDeviceToHost);
    cudaMemcpy(&base, bsrRowPtrC, sizeof(int), cudaMemcpyDeviceToHost);
    nnzc -= base;
}
cudaMalloc((void**)&bsrColIndC, sizeof(int)*nnzc);
cudaMalloc((void**)&bsrValC, sizeof(float)*(rowBlockDimC*colBlockDimC)*nnzc);
cusparseSgebsr2gebsr(handle, dir, mb, nb, nnzb,
    descrA, bsrValA, bsrRowPtrA, bsrColIndA,
    rowBlockDimA, colBlockDimA,
    descrC, bsrValC, bsrRowPtrC, bsrColIndC,
    rowBlockDimC, colBlockDimC,
    pBuffer);
  • The routines require no extra storage if pBuffer != NULL

  • The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available

  • The routines do not support CUDA graph capture

Input

handle

handle to the cuSPARSE library context.

dir

storage format of blocks, either CUSPARSE_DIRECTION_ROW or CUSPARSE_DIRECTION_COLUMN.

mb

number of block rows of sparse matrix A.

nb

number of block columns of sparse matrix A.

nnzb

number of nonzero blocks of matrix A.

descrA

the descriptor of matrix A. The supported matrix type is CUSPARSE_MATRIX_TYPE_GENERAL. Also, the supported index bases are CUSPARSE_INDEX_BASE_ZERO and CUSPARSE_INDEX_BASE_ONE.

bsrValA

<type> array of nnzb*rowBlockDimA*colBlockDimA non-zero elements of matrix A.

bsrRowPtrA

integer array of mb+1 elements that contains the start of every block row and the end of the last block row plus one of matrix A.

bsrColIndA

integer array of nnzb column indices of the non-zero blocks of matrix A.

rowBlockDimA

number of rows within a block of A.

colBlockDimA

number of columns within a block of A.

descrC

the descriptor of matrix C. The supported matrix type is CUSPARSE_MATRIX_TYPE_GENERAL. Also, the supported index bases are CUSPARSE_INDEX_BASE_ZERO and CUSPARSE_INDEX_BASE_ONE.

rowBlockDimC

number of rows within a block of C.

colBlockDimC

number of columns within a block of C.

pBufferSize

host pointer containing number of bytes of the buffer used in gebsr2gebsr().

pBuffer

buffer allocated by the user; the size is return by gebsr2gebsr_bufferSize().

Output

bsrValC

<type> array of nnzc*rowBlockDimC*colBlockDimC non-zero elements of matrix C.

bsrRowPtrC

integer array of mc+1 elements that contains the start of every block row and the end of the last block row plus one of matrix C.

bsrColIndC

integer array of nnzc block column indices of the nonzero blocks of matrix C.

nnzTotalDevHostPtr

total number of nonzero blocks of C. *nnzTotalDevHostPtr is the same as bsrRowPtrC[mc]-bsrRowPtrC[0].

See cusparseStatus_t for the description of the return status.

cusparse<t>gebsr2csr()

cusparseStatus_t
cusparseSgebsr2csr(cusparseHandle_t         handle,
                   cusparseDirection_t      dir,
                   int                      mb,
                   int                      nb,
                   const cusparseMatDescr_t descrA,
                   const float*             bsrValA,
                   const int*               bsrRowPtrA,
                   const int*               bsrColIndA,
                   int                      rowBlockDim,
                   int                      colBlockDim,
                   const cusparseMatDescr_t descrC,
                   float*                   csrValC,
                   int*                     csrRowPtrC,
                   int*                     csrColIndC)

cusparseStatus_t
cusparseDgebsr2csr(cusparseHandle_t         handle,
                   cusparseDirection_t      dir,
                   int                      mb,
                   int                      nb,
                   const cusparseMatDescr_t descrA,
                   const double*            bsrValA,
                   const int*               bsrRowPtrA,
                   const int*               bsrColIndA,
                   int                      rowBlockDim,
                   int                      colBlockDim,
                   const cusparseMatDescr_t descrC,
                   double*                  csrValC,
                   int*                     csrRowPtrC,
                   int*                     csrColIndC)

cusparseStatus_t
cusparseCgebsr2csr(cusparseHandle_t         handle,
                   cusparseDirection_t      dir,
                   int                      mb,
                   int                      nb,
                   const cusparseMatDescr_t descrA,
                   const cuComplex*         bsrValA,
                   const int*               bsrRowPtrA,
                   const int*               bsrColIndA,
                   int                      rowBlockDim,
                   int                      colBlockDim,
                   const cusparseMatDescr_t descrC,
                   cuComplex*               csrValC,
                   int*                     csrRowPtrC,
                   int*                     csrColIndC)

cusparseStatus_t
cusparseZgebsr2csr(cusparseHandle_t         handle,
                   cusparseDirection_t      dir,
                   int                      mb,
                   int                      nb,
                   const cusparseMatDescr_t descrA,
                   const cuDoubleComplex*   bsrValA,
                   const int*               bsrRowPtrA,
                   const int*               bsrColIndA,
                   int                      rowBlockDim,
                   int                      colBlockDim,
                   const cusparseMatDescr_t descrC,
                   cuDoubleComplex*         csrValC,
                   int*                     csrRowPtrC,
                   int*                     csrColIndC)

This function converts a sparse matrix in general BSR format that is defined by the three arrays bsrValA, bsrRowPtrA, and bsrColIndA into a sparse matrix in CSR format that is defined by arrays csrValC, csrRowPtrC, and csrColIndC.

Let m(=mb*rowBlockDim) be number of rows of A and n(=nb*colBlockDim) be number of columns of A, then A and C are m*n sparse matrices. The general BSR format of A contains nnzb(=bsrRowPtrA[mb] -                               bsrRowPtrA[0]) non-zero blocks, whereas sparse matrix A contains nnz(=nnzb*rowBlockDim*colBlockDim) elements. The user must allocate enough space for arrays csrRowPtrC, csrColIndC, and csrValC. The requirements are as follows:

csrRowPtrC of m+1 elements

csrValC of nnz elements

csrColIndC of nnz elements

The general procedure is as follows:

// Given general BSR format (bsrRowPtrA, bsrColIndA, bsrValA) and
// blocks of BSR format are stored in column-major order.
cusparseDirection_t dir = CUSPARSE_DIRECTION_COLUMN;
int m = mb*rowBlockDim;
int n = nb*colBlockDim;
int nnzb = bsrRowPtrA[mb] - bsrRowPtrA[0]; // number of blocks
int nnz  = nnzb * rowBlockDim * colBlockDim; // number of elements
cudaMalloc((void**)&csrRowPtrC, sizeof(int)*(m+1));
cudaMalloc((void**)&csrColIndC, sizeof(int)*nnz);
cudaMalloc((void**)&csrValC, sizeof(float)*nnz);
cusparseSgebsr2csr(handle, dir, mb, nb,
        descrA,
        bsrValA, bsrRowPtrA, bsrColIndA,
        rowBlockDim, colBlockDim,
        descrC,
        csrValC, csrRowPtrC, csrColIndC);
  • The routine requires no extra storage

  • The routine supports asynchronous execution

  • The routine supports CUDA graph capture

Input

handle

handle to the cuSPARSE library context.

dir

storage format of blocks, either CUSPARSE_DIRECTION_ROW or CUSPARSE_DIRECTION_COLUMN.

mb

number of block rows of sparse matrix A.

nb

number of block columns of sparse matrix A.

descrA

the descriptor of matrix A. The supported matrix type is CUSPARSE_MATRIX_TYPE_GENERAL. Also, the supported index bases are CUSPARSE_INDEX_BASE_ZERO and CUSPARSE_INDEX_BASE_ONE.

bsrValA

<type> array of nnzb*rowBlockDim*colBlockDim non-zero elements of matrix A.

bsrRowPtrA

integer array of mb+1 elements that contains the start of every block row and the end of the last block row plus one of matrix A.

bsrColIndA

integer array of nnzb column indices of the non-zero blocks of matrix A.

rowBlockDim

number of rows within a block of A.

colBlockDim

number of columns within a block of A.

descrC

the descriptor of matrix C. The supported matrix type is CUSPARSE_MATRIX_TYPE_GENERAL. Also, the supported index bases are CUSPARSE_INDEX_BASE_ZERO and CUSPARSE_INDEX_BASE_ONE.

Output

csrValC

<type> array of nnz non-zero elements of matrix C.

csrRowPtrC

integer array of m+1 elements that contains the start of every row and the end of the last row plus one of matrix C.

csrColIndC

integer array of nnz column indices of the non-zero elements of matrix C.

See cusparseStatus_t for the description of the return status.

cusparse<t>csr2gebsr()

cusparseStatus_t
cusparseScsr2gebsr_bufferSize(cusparseHandle_t         handle,
                              cusparseDirection_t      dir,
                              int                      m,
                              int                      n,
                              const cusparseMatDescr_t descrA,
                              const float*             csrValA,
                              const int*               csrRowPtrA,
                              const int*               csrColIndA,
                              int                      rowBlockDim,
                              int                      colBlockDim,
                              int*                     pBufferSize)

cusparseStatus_t
cusparseDcsr2gebsr_bufferSize(cusparseHandle_t         handle,
                              cusparseDirection_t      dir,
                              int                      m,
                              int                      n,
                              const cusparseMatDescr_t descrA,
                              const double*            csrValA,
                              const int*               csrRowPtrA,
                              const int*               csrColIndA,
                              int                      rowBlockDim,
                              int                      colBlockDim,
                              int*                     pBufferSize)

cusparseStatus_t
cusparseCcsr2gebsr_bufferSize(cusparseHandle_t         handle,
                              cusparseDirection_t      dir,
                              int                      m,
                              int                      n,
                              const cusparseMatDescr_t descrA,
                              const cuComplex*         csrValA,
                              const int*               csrRowPtrA,
                              const int*               csrColIndA,
                              int                      rowBlockDim,
                              int                      colBlockDim,
                              int*                     pBufferSize)

cusparseStatus_t
cusparseZcsr2gebsr_bufferSize(cusparseHandle_t         handle,
                              cusparseDirection_t      dir,
                              int                      m,
                              int                      n,
                              const cusparseMatDescr_t descrA,
                              const cuDoubleComplex*   csrValA,
                              const int*               csrRowPtrA,
                              const int*               csrColIndA,
                              int                      rowBlockDim,
                              int                      colBlockDim,
                              int*                     pBufferSize)
cusparseStatus_t
cusparseXcsr2gebsrNnz(cusparseHandle_t         handle,
                      cusparseDirection_t      dir,
                      int                      m,
                      int                      n,
                      const cusparseMatDescr_t descrA,
                      const int*               csrRowPtrA,
                      const int*               csrColIndA,
                      const cusparseMatDescr_t descrC,
                      int*                     bsrRowPtrC,
                      int                      rowBlockDim,
                      int                      colBlockDim,
                      int*                     nnzTotalDevHostPtr,
                      void*                    pBuffer)

cusparseStatus_t
cusparseScsr2gebsr(cusparseHandle_t         handle,
                   cusparseDirection_t      dir,
                   int                      m,
                   int                      n,
                   const cusparseMatDescr_t descrA,
                   const float*             csrValA,
                   const int*               csrRowPtrA,
                   const int*               csrColIndA,
                   const cusparseMatDescr_t descrC,
                   float*                   bsrValC,
                   int*                     bsrRowPtrC,
                   int*                     bsrColIndC,
                   int                      rowBlockDim,
                   int                      colBlockDim,
                   void*                    pBuffer)

cusparseStatus_t
cusparseDcsr2gebsr(cusparseHandle_t         handle,
                   cusparseDirection_t      dir,
                   int                      m,
                   int                      n,
                   const cusparseMatDescr_t descrA,
                   const double*            csrValA,
                   const int*               csrRowPtrA,
                   const int*               csrColIndA,
                   const cusparseMatDescr_t descrC,
                   double*                  bsrValC,
                   int*                     bsrRowPtrC,
                   int*                     bsrColIndC,
                   int                      rowBlockDim,
                   int                      colBlockDim,
                   void*                    pBuffer)

cusparseStatus_t
cusparseCcsr2gebsr(cusparseHandle_t         handle,
                   cusparseDirection_t      dir,
                   int                      m,
                   int                      n,
                   const cusparseMatDescr_t descrA,
                   const cuComplex*         csrValA,
                   const int*               csrRowPtrA,
                   const int*               csrColIndA,
                   const cusparseMatDescr_t descrC,
                   cuComplex*               bsrValC,
                   int*                     bsrRowPtrC,
                   int*                     bsrColIndC,
                   int                      rowBlockDim,
                   int                      colBlockDim,
                   void*                    pBuffer)

cusparseStatus_t
cusparseZcsr2gebsr(cusparseHandle_t         handle,
                   cusparseDirection_t      dir,
                   int                      m,
                   int                      n,
                   const cusparseMatDescr_t descrA,
                   const cuDoubleComplex*   csrValA,
                   const int*               csrRowPtrA,
                   const int*               csrColIndA,
                   const cusparseMatDescr_t descrC,
                   cuDoubleComplex*         bsrValC,
                   int*                     bsrRowPtrC,
                   int*                     bsrColIndC,
                   int                      rowBlockDim,
                   int                      colBlockDim,
                   void*                    pBuffer)

This function converts a sparse matrix A in CSR format (that is defined by arrays csrValA, csrRowPtrA, and csrColIndA) into a sparse matrix C in general BSR format (that is defined by the three arrays bsrValC, bsrRowPtrC, and bsrColIndC).

The matrix A is a m*n sparse matrix and matrix C is a (mb*rowBlockDim)*(nb*colBlockDim) sparse matrix, where mb(=(m+rowBlockDim-1)/rowBlockDim) is the number of block rows of C, and nb(=(n+colBlockDim-1)/colBlockDim) is the number of block columns of C.

The block of C is of size rowBlockDim*colBlockDim. If m is not multiple of rowBlockDim or n is not multiple of colBlockDim, zeros are filled in.

The implementation adopts a two-step approach to do the conversion. First, the user allocates bsrRowPtrC of mb+1 elements and uses function cusparseXcsr2gebsrNnz() to determine the number of nonzero block columns per block row. Second, the user gathers nnzb (number of nonzero block columns of matrix C) from either (nnzb=*nnzTotalDevHostPtr) or (nnzb=bsrRowPtrC[mb]-bsrRowPtrC[0]) and allocates bsrValC of nnzb*rowBlockDim*colBlockDim elements and bsrColIndC of nnzb integers. Finally function cusparse[S|D|C|Z]csr2gebsr() is called to complete the conversion.

The user must obtain the size of the buffer required by csr2gebsr() by calling csr2gebsr_bufferSize(), allocate the buffer, and pass the buffer pointer to csr2gebsr().

The general procedure is as follows:

// Given CSR format (csrRowPtrA, csrColIndA, csrValA) and
// blocks of BSR format are stored in column-major order.
cusparseDirection_t dir = CUSPARSE_DIRECTION_COLUMN;
int base, nnzb;
int mb = (m + rowBlockDim-1)/rowBlockDim;
int nb = (n + colBlockDim-1)/colBlockDim;
int bufferSize;
void *pBuffer;
cusparseScsr2gebsr_bufferSize(handle, dir, m, n,
    descrA, csrValA, csrRowPtrA, csrColIndA,
    rowBlockDim, colBlockDim,
    &bufferSize);
cudaMalloc((void**)&pBuffer, bufferSize);
cudaMalloc((void**)&bsrRowPtrC, sizeof(int) *(mb+1));
// nnzTotalDevHostPtr points to host memory
int *nnzTotalDevHostPtr = &nnzb;
cusparseXcsr2gebsrNnz(handle, dir, m, n,
    descrA, csrRowPtrA, csrColIndA,
    descrC, bsrRowPtrC, rowBlockDim, colBlockDim,
    nnzTotalDevHostPtr,
    pBuffer);
if (NULL != nnzTotalDevHostPtr){
    nnzb = *nnzTotalDevHostPtr;
}else{
    cudaMemcpy(&nnzb, bsrRowPtrC+mb, sizeof(int), cudaMemcpyDeviceToHost);
    cudaMemcpy(&base, bsrRowPtrC, sizeof(int), cudaMemcpyDeviceToHost);
    nnzb -= base;
}
cudaMalloc((void**)&bsrColIndC, sizeof(int)*nnzb);
cudaMalloc((void**)&bsrValC, sizeof(float)*(rowBlockDim*colBlockDim)*nnzb);
cusparseScsr2gebsr(handle, dir, m, n,
        descrA,
        csrValA, csrRowPtrA, csrColIndA,
        descrC,
        bsrValC, bsrRowPtrC, bsrColIndC,
        rowBlockDim, colBlockDim,
        pBuffer);

The routine cusparseXcsr2gebsrNnz() has the following properties:

  • The routine requires no extra storage

  • The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available

  • The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available

The routine cusparse<t>csr2gebsr() has the following properties:

  • The routine requires no extra storage if pBuffer != NULL

  • The routine supports asynchronous execution

  • The routine supports CUDA graph capture

Input

handle

handle to the cuSPARSE library context.

dir

storage format of blocks, either CUSPARSE_DIRECTION_ROW or CUSPARSE_DIRECTION_COLUMN.

m

number of rows of sparse matrix A.

n

number of columns of sparse matrix A.

descrA

the descriptor of matrix A. The supported matrix type is CUSPARSE_MATRIX_TYPE_GENERAL. Also, the supported index bases are CUSPARSE_INDEX_BASE_ZERO and CUSPARSE_INDEX_BASE_ONE.

csrValA

<type> array of nnz nonzero elements of matrix A.

csrRowPtrA

integer array of m+1 elements that contains the start of every row and the end of the last row plus one of matrix A.

csrColIndA

integer array of nnz column indices of the nonzero elements of matrix A.

descrC

the descriptor of matrix C. The supported matrix type is CUSPARSE_MATRIX_TYPE_GENERAL. Also, the supported index bases are CUSPARSE_INDEX_BASE_ZERO and CUSPARSE_INDEX_BASE_ONE.

rowBlockDim

number of rows within a block of C.

colBlockDim

number of columns within a block of C.

pBuffer

buffer allocated by the user, the size is return by csr2gebsr_bufferSize().

Output

bsrValC

<type> array of nnzb*rowBlockDim*colBlockDim nonzero elements of matrix C.

bsrRowPtrC

integer array of mb+1 elements that contains the start of every block row and the end of the last block row plus one of matrix C.

bsrColIndC

integer array of nnzb column indices of the nonzero blocks of matrix C.

nnzTotalDevHostPtr

total number of nonzero blocks of matrix C. Pointer nnzTotalDevHostPtr can point to a device memory or host memory.

See cusparseStatus_t for the description of the return status.

cusparse<t>coo2csr()

cusparseStatus_t
cusparseXcoo2csr(cusparseHandle_t   handle,
                 const int*         cooRowInd,
                 int                nnz,
                 int                m,
                 int*               csrRowPtr,
                 cusparseIndexBase_t idxBase)

This function converts the array containing the uncompressed row indices (corresponding to COO format) into an array of compressed row pointers (corresponding to CSR format).

It can also be used to convert the array containing the uncompressed column indices (corresponding to COO format) into an array of column pointers (corresponding to CSC format).

  • The routine requires no extra storage

  • The routine supports asynchronous execution

  • The routine supports CUDA graph capture

Input

handle

handle to the cuSPARSE library context.

cooRowInd

integer array of nnz uncompressed row indices.

nnz

number of non-zeros of the sparse matrix (that is also the length of array cooRowInd).

m

number of rows of matrix A.

idxBase

CUSPARSE_INDEX_BASE_ZERO or CUSPARSE_INDEX_BASE_ONE.

Output

csrRowPtr

integer array of m+1 elements that contains the start of every row and the end of the last row plus one.

See cusparseStatus_t for the description of the return status.

cusparse<t>csr2coo()

cusparseStatus_t
cusparseXcsr2coo(cusparseHandle_t    handle,
                 const int*          csrRowPtr,
                 int                 nnz,
                 int                 m,
                 int*                cooRowInd,
                 cusparseIndexBase_t idxBase)

This function converts the array containing the compressed row pointers (corresponding to CSR format) into an array of uncompressed row indices (corresponding to COO format).

It can also be used to convert the array containing the compressed column indices (corresponding to CSC format) into an array of uncompressed column indices (corresponding to COO format).

  • The routine requires no extra storage

  • The routine supports asynchronous execution

  • The routine supports CUDA graph capture

Input

handle

handle to the cuSPARSE library context.

csrRowPtr

integer array of m+1 elements that contains the start of every row and the end of the last row plus one.

nnz

number of nonzeros of the sparse matrix (that is also the length of array cooRowInd).

m

number of rows of matrix A.

idxBase

CUSPARSE_INDEX_BASE_ZERO or CUSPARSE_INDEX_BASE_ONE.

Output

cooRowInd

integer array of nnz uncompressed row indices.

See cusparseStatus_t for the description of the return status.

cusparseCsr2cscEx2()

cusparseStatus_t
cusparseCsr2cscEx2_bufferSize(cusparseHandle_t     handle,
                              int                  m,
                              int                  n,
                              int                  nnz,
                              const void*          csrVal,
                              const int*           csrRowPtr,
                              const int*           csrColInd,
                              void*                cscVal,
                              int*                 cscColPtr,
                              int*                 cscRowInd,
                              cudaDataType         valType,
                              cusparseAction_t     copyValues,
                              cusparseIndexBase_t  idxBase,
                              cusparseCsr2CscAlg_t alg,
                              size_t*              bufferSize)
cusparseStatus_t
cusparseCsr2cscEx2(cusparseHandle_t     handle,
                   int                  m,
                   int                  n,
                   int                  nnz,
                   const void*          csrVal,
                   const int*           csrRowPtr,
                   const int*           csrColInd,
                   void*                cscVal,
                   int*                 cscColPtr,
                   int*                 cscRowInd,
                   cudaDataType         valType,
                   cusparseAction_t     copyValues,
                   cusparseIndexBase_t  idxBase,
                   cusparseCsr2CscAlg_t alg,
                   void*                buffer)

This function converts a sparse matrix in CSR format (that is defined by the three arrays csrVal, csrRowPtr, and csrColInd) into a sparse matrix in CSC format (that is defined by arrays cscVal, cscRowInd, and cscColPtr). The resulting matrix can also be seen as the transpose of the original sparse matrix. Notice that this routine can also be used to convert a matrix in CSC format into a matrix in CSR format.

The routine requires extra storage proportional to the number of nonzero values nnz. It provides in output always the same matrix.

It is executed asynchronously with respect to the host, and it may return control to the application on the host before the result is ready.

The function cusparseCsr2cscEx2_bufferSize() returns the size of the workspace needed by cusparseCsr2cscEx2(). User needs to allocate a buffer of this size and give that buffer to cusparseCsr2cscEx2() as an argument.

If nnz == 0, then csrColInd, csrVal, cscVal, and cscRowInd could have NULL value. In this case, cscColPtr is set to idxBase for all values.

If m == 0 or n == 0, the pointers are not checked and the routine returns CUSPARSE_STATUS_SUCCESS.

Input

handle

Handle to the cuSPARSE library context

m

Number of rows of the CSR input matrix; number of columns of the CSC ouput matrix

n

Number of columns of the CSR input matrix; number of rows of the CSC ouput matrix

nnz

Number of nonzero elements of the CSR and CSC matrices

csrVal

Value array of size nnz of the CSR matrix; of same type as valType

csrRowPtr

Integer array of size m + 1 that containes the CSR row offsets

csrColInd

Integer array of size nnz that containes the CSR column indices

cscVal

Value array of size nnz of the CSC matrix; of same type as valType

cscColPtr

Integer array of size n + 1 that containes the CSC column offsets

cscRowInd

Integer array of size nnz that containes the CSC row indices

valType

Value type for both CSR and CSC matrices

copyValues

CUSPARSE_ACTION_SYMBOLIC or CUSPARSE_ACTION_NUMERIC

idxBase

Index base CUSPARSE_INDEX_BASE_ZERO or CUSPARSE_INDEX_BASE_ONE

alg

Algorithm implementation. see cusparseCsr2CscAlg_t for possible values.

bufferSize

Number of bytes of workspace needed by cusparseCsr2cscEx2()

buffer

Pointer to workspace buffer

cusparseCsr2cscEx2() supports the following data types:

X/Y

CUDA_R_8I

CUDA_R_16F

CUDA_R_16BF

CUDA_R_32F

CUDA_R_64F

CUDA_C_16F [DEPRECATED]

CUDA_C_16BF [DEPRECATED]

CUDA_C_32F

CUDA_C_64F

cusparseCsr2cscEx2() supports the following algorithms (cusparseCsr2CscAlg_t):

Algorithm

Notes

CUSPARSE_CSR2CSC_ALG_DEFAULT, CUSPARSE_CSR2CSC_ALG1

Default algorithm

Action

Notes

CUSPARSE_ACTION_SYMBOLIC

Compute the “structure” of the CSC output matrix (offset, row indices)

CUSPARSE_ACTION_NUMERIC

Compute the “structure” of the CSC output matrix and copy the values

cusparseCsr2cscEx2() has the following properties:

  • The routine requires no extra storage

  • The routine supports asynchronous execution

cusparseCsr2cscEx2() supports the following optimizations:

  • CUDA graph capture

  • Hardware Memory Compression

See cusparseStatus_t for the description of the return status.

cusparse<t>nnz()

cusparseStatus_t
cusparseSnnz(cusparseHandle_t        handle,
             cusparseDirection_t     dirA,
             int                      m,
             int                      n,
             const cusparseMatDescr_t descrA,
             const float*             A,
             int                      lda,
             int*                     nnzPerRowColumn,
             int*                     nnzTotalDevHostPtr)

cusparseStatus_t
cusparseDnnz(cusparseHandle_t        handle,
             cusparseDirection_t     dirA,
             int                      m,
             int                      n,
             const cusparseMatDescr_t descrA,
             const double*            A,
             int                      lda,
             int*                     nnzPerRowColumn,
             int*                     nnzTotalDevHostPtr)

cusparseStatus_t
cusparseCnnz(cusparseHandle_t        handle,
             cusparseDirection_t     dirA,
             int                      m,
             int                      n,
             const cusparseMatDescr_t descrA,
             const cuComplex*         A,
             int                      lda,
             int*                     nnzPerRowColumn,
             int*                     nnzTotalDevHostPtr)

cusparseStatus_t
cusparseZnnz(cusparseHandle_t        handle,
             cusparseDirection_t     dirA,
             int                      m,
             int                      n,
             const cusparseMatDescr_t descrA,
             const cuDoubleComplex*   A,
             int                      lda,
             int*                     nnzPerRowColumn,
             int*                     nnzTotalDevHostPtr)

This function computes the number of nonzero elements per row or column and the total number of nonzero elements in a dense matrix.

  • This function requires temporary extra storage that is allocated internally

  • The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available

  • The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available

Input

handle

handle to the cuSPARSE library context.

dirA

direction that specifies whether to count nonzero elements by CUSPARSE_DIRECTION_ROW or by CUSPARSE_DIRECTION_COLUMN.

m

number of rows of matrix A.

n

number of columns of matrix A.

descrA

the descriptor of matrix A. The supported matrix type is CUSPARSE_MATRIX_TYPE_GENERAL. Also, the supported index bases are CUSPARSE_INDEX_BASE_ZERO and CUSPARSE_INDEX_BASE_ONE.

A

array of dimensions (lda, n).

lda

leading dimension of dense array A.

Output

nnzPerRowColumn

array of size m or n containing the number of nonzero elements per row or column, respectively.

nnzTotalDevHostPtr

total number of nonzero elements in device or host memory.

See cusparseStatus_t for the description of the return status.

cusparseCreateIdentityPermutation() [DEPRECATED]

> The routine will be removed in the next major release

cusparseStatus_t
cusparseCreateIdentityPermutation(cusparseHandle_t handle,
                                  int              n,
                                  int*             p);

This function creates an identity map. The output parameter p represents such map by p = 0:1:(n-1).

This function is typically used with coosort, csrsort, cscsort.

  • The routine requires no extra storage

  • The routine supports asynchronous execution

  • The routine supports CUDA graph capture

Input

parameter

device or host

description

handle

host

handle to the cuSPARSE library context.

n

host

size of the map.

Output

parameter

device or host

description

p

device

integer array of dimensions n.

See cusparseStatus_t for the description of the return status.

cusparseXcoosort()

cusparseStatus_t
cusparseXcoosort_bufferSizeExt(cusparseHandle_t handle,
                               int              m,
                               int              n,
                               int              nnz,
                               const int*       cooRows,
                               const int*       cooCols,
                               size_t*          pBufferSizeInBytes)
cusparseStatus_t
cusparseXcoosortByRow(cusparseHandle_t handle,
                      int              m,
                      int              n,
                      int              nnz,
                      int*             cooRows,
                      int*             cooCols,
                      int*             P,
                      void*            pBuffer)

cusparseStatus_t
cusparseXcoosortByColumn(cusparseHandle_t handle,
                         int              m,
                         int              n,
                         int              nnz,
                         int*             cooRows,
                         int*             cooCols,
                         int*             P,
                         void*            pBuffer);

This function sorts COO format. The sorting is in-place. Also the user can sort by row or sort by column.

A is an m×n sparse matrix that is defined in COO storage format by the three arrays cooVals, cooRows, and cooCols.

There is no assumption for the base index of the matrix. coosort uses stable sort on signed integer, so the value of cooRows or cooCols can be negative.

This function coosort() requires buffer size returned by coosort_bufferSizeExt(). The address of pBuffer must be multiple of 128 bytes. If not, CUSPARSE_STATUS_INVALID_VALUE is returned.

The parameter P is both input and output. If the user wants to compute sorted cooVal, P must be set as 0:1:(nnz-1) before coosort(), and after coosort(), new sorted value array satisfies cooVal_sorted = cooVal(P).

Remark: the dimension m and n are not used. If the user does not know the value of m or n, just passes a value positive. This usually happens if the user only reads a COO array first and needs to decide the dimension m or n later.

  • The routine requires no extra storage if pBuffer != NULL

  • The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available

  • The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available

Input

parameter

device or host

description

handle

host

handle to the cuSPARSE library context.

m

host

number of rows of matrix A.

n

host

number of columns of matrix A.

nnz

host

number of nonzero elements of matrix A.

cooRows

device

integer array of nnz unsorted row indices of A.

cooCols

device

integer array of nnz unsorted column indices of A.

P

device

integer array of nnz unsorted map indices. To construct cooVal, the user has to set P=0:1:(nnz-1).

pBuffer

device

buffer allocated by the user; the size is returned by coosort_bufferSizeExt().

Output

parameter

device or host

description

cooRows

device

integer array of nnz sorted row indices of A.

cooCols

device

integer array of nnz sorted column indices of A.

P

device

integer array of nnz sorted map indices.

pBufferSizeInBytes

host

number of bytes of the buffer.

See cusparseStatus_t for the description of the return status

Please visit cuSPARSE Library Samples - cusparseXcoosortByRow for a code example.

cusparseXcsrsort()

cusparseStatus_t
cusparseXcsrsort_bufferSizeExt(cusparseHandle_t handle,
                               int              m,
                               int              n,
                               int              nnz,
                               const int*       csrRowPtr,
                               const int*       csrColInd,
                               size_t*          pBufferSizeInBytes)
cusparseStatus_t
cusparseXcsrsort(cusparseHandle_t         handle,
                 int                      m,
                 int                      n,
                 int                      nnz,
                 const cusparseMatDescr_t descrA,
                 const int*               csrRowPtr,
                 int*                     csrColInd,
                 int*                     P,
                 void*                    pBuffer)

This function sorts CSR format. The stable sorting is in-place.

The matrix type is regarded as CUSPARSE_MATRIX_TYPE_GENERAL implicitly. In other words, any symmetric property is ignored.

This function csrsort() requires buffer size returned by csrsort_bufferSizeExt(). The address of pBuffer must be multiple of 128 bytes. If not, CUSPARSE_STATUS_INVALID_VALUE is returned.

The parameter P is both input and output. If the user wants to compute sorted csrVal, P must be set as 0:1:(nnz-1) before csrsort(), and after csrsort(), new sorted value array satisfies csrVal_sorted = csrVal(P).

The general procedure is as follows:

// A is a 3x3 sparse matrix, base-0
//     | 1 2 3 |
// A = | 4 5 6 |
//     | 7 8 9 |
const int m = 3;
const int n = 3;
const int nnz = 9;
csrRowPtr[m+1] = { 0, 3, 6, 9}; // on device
csrColInd[nnz] = { 2, 1, 0, 0, 2,1, 1, 2, 0}; // on device
csrVal[nnz] = { 3, 2, 1, 4, 6, 5, 8, 9, 7}; // on device
size_t pBufferSizeInBytes = 0;
void *pBuffer = NULL;
int *P = NULL;

// step 1: allocate buffer
cusparseXcsrsort_bufferSizeExt(handle, m, n, nnz, csrRowPtr, csrColInd, &pBufferSizeInBytes);
cudaMalloc( &pBuffer, sizeof(char)* pBufferSizeInBytes);

// step 2: setup permutation vector P to identity
cudaMalloc( (void**)&P, sizeof(int)*nnz);
cusparseCreateIdentityPermutation(handle, nnz, P);

// step 3: sort CSR format
cusparseXcsrsort(handle, m, n, nnz, descrA, csrRowPtr, csrColInd, P, pBuffer);

// step 4: gather sorted csrVal
cusparseDgthr(handle, nnz, csrVal, csrVal_sorted, P, CUSPARSE_INDEX_BASE_ZERO);
  • The routine requires no extra storage if pBuffer != NULL

  • The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available

  • The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available

Input

parameter

device or host

description

handle

host

handle to the cuSPARSE library context.

m

host

number of rows of matrix A.

n

host

number of columns of matrix A.

nnz

host

number of nonzero elements of matrix A.

csrRowsPtr

device

integer array of m+1 elements that contains the start of every row and the end of the last row plus one.

csrColInd

device

integer array of nnz unsorted column indices of A.

P

device

integer array of nnz unsorted map indices. To construct csrVal, the user has to set P=0:1:(nnz-1).

pBuffer

device

buffer allocated by the user; the size is returned by csrsort_bufferSizeExt().

Output

parameter

device or host

description

csrColInd

device

integer array of nnz sorted column indices of A.

P

device

integer array of nnz sorted map indices.

pBufferSizeInBytes

host

number of bytes of the buffer.

See cusparseStatus_t for the description of the return status.

cusparseXcscsort()

cusparseStatus_t
cusparseXcscsort_bufferSizeExt(cusparseHandle_t handle,
                               int              m,
                               int              n,
                               int              nnz,
                               const int*       cscColPtr,
                               const int*       cscRowInd,
                               size_t*          pBufferSizeInBytes)
cusparseStatus_t
cusparseXcscsort(cusparseHandle_t         handle,
                 int                      m,
                 int                      n,
                 int                      nnz,
                 const cusparseMatDescr_t descrA,
                 const int*               cscColPtr,
                 int*                     cscRowInd,
                 int*                     P,
                 void*                    pBuffer)

This function sorts CSC format. The stable sorting is in-place.

The matrix type is regarded as CUSPARSE_MATRIX_TYPE_GENERAL implicitly. In other words, any symmetric property is ignored.

This function cscsort() requires buffer size returned by cscsort_bufferSizeExt(). The address of pBuffer must be multiple of 128 bytes. If not, CUSPARSE_STATUS_INVALID_VALUE is returned.

The parameter P is both input and output. If the user wants to compute sorted cscVal, P must be set as 0:1:(nnz-1) before cscsort(), and after cscsort(), new sorted value array satisfies cscVal_sorted = cscVal(P).

The general procedure is as follows:

// A is a 3x3 sparse matrix, base-0
//     | 1 2  |
// A = | 4 0  |
//     | 0 8  |
const int m = 3;
const int n = 2;
const int nnz = 4;
cscColPtr[n+1] = { 0, 2, 4}; // on device
cscRowInd[nnz] = { 1, 0, 2, 0}; // on device
cscVal[nnz]    = { 4.0, 1.0, 8.0, 2.0 }; // on device
size_t pBufferSizeInBytes = 0;
void *pBuffer = NULL;
int *P = NULL;

// step 1: allocate buffer
cusparseXcscsort_bufferSizeExt(handle, m, n, nnz, cscColPtr, cscRowInd, &pBufferSizeInBytes);
cudaMalloc( &pBuffer, sizeof(char)* pBufferSizeInBytes);

// step 2: setup permutation vector P to identity
cudaMalloc( (void**)&P, sizeof(int)*nnz);
cusparseCreateIdentityPermutation(handle, nnz, P);

// step 3: sort CSC format
cusparseXcscsort(handle, m, n, nnz, descrA, cscColPtr, cscRowInd, P, pBuffer);

// step 4: gather sorted cscVal
cusparseDgthr(handle, nnz, cscVal, cscVal_sorted, P, CUSPARSE_INDEX_BASE_ZERO);
  • The routine requires no extra storage if pBuffer != NULL

  • The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available

  • The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available

Input

parameter

device or host

description

handle

host

handle to the cuSPARSE library context.

m

host

number of rows of matrix A.

n

host

number of columns of matrix A.

nnz

host

number of nonzero elements of matrix A.

cscColPtr

device

integer array of n+1 elements that contains the start of every column and the end of the last column plus one.

cscRowInd

device

integer array of nnz unsorted row indices of A.

P

device

integer array of nnz unsorted map indices. To construct cscVal, the user has to set P=0:1:(nnz-1).

pBuffer

device

buffer allocated by the user; the size is returned by cscsort_bufferSizeExt().

Output

parameter

device or host

description

cscRowInd

device

integer array of nnz sorted row indices of A.

P

device

integer array of nnz sorted map indices.

pBufferSizeInBytes

host

number of bytes of the buffer.

See cusparseStatus_t for the description of the return status.

cusparseXcsru2csr() [DEPRECATED]

> The routine will be removed in the next major release

cusparseStatus_t
cusparseCreateCsru2csrInfo(csru2csrInfo_t *info);

cusparseStatus_t
cusparseDestroyCsru2csrInfo(csru2csrInfo_t info);

cusparseStatus_t
cusparseScsru2csr_bufferSizeExt(cusparseHandle_t handle,
                                int              m,
                                int              n,
                                int              nnz,
                                float*           csrVal,
                                const int*       csrRowPtr,
                                int*             csrColInd,
                                csru2csrInfo_t   info,
                                size_t*          pBufferSizeInBytes)

cusparseStatus_t
cusparseDcsru2csr_bufferSizeExt(cusparseHandle_t handle,
                                int              m,
                                int              n,
                                int              nnz,
                                double*          csrVal,
                                const int*       csrRowPtr,
                                int*             csrColInd,
                                csru2csrInfo_t   info,
                                size_t*          pBufferSizeInBytes)

cusparseStatus_t
cusparseCcsru2csr_bufferSizeExt(cusparseHandle_t handle,
                                int              m,
                                int              n,
                                int              nnz,
                                cuComplex*       csrVal,
                                const int*       csrRowPtr,
                                int*             csrColInd,
                                csru2csrInfo_t   info,
                                size_t*          pBufferSizeInBytes)

cusparseStatus_t
cusparseZcsru2csr_bufferSizeExt(cusparseHandle_t handle,
                                int              m,
                                int              n,
                                int              nnz,
                                cuDoubleComplex* csrVal,
                                const int*       csrRowPtr,
                                int*             csrColInd,
                                csru2csrInfo_t   info,
                                size_t*          pBufferSizeInBytes)
cusparseStatus_t
cusparseScsru2csr(cusparseHandle_t         handle,
                  int                      m,
                  int                      n,
                  int                      nnz,
                  const cusparseMatDescr_t descrA,
                  float*                   csrVal,
                  const int*               csrRowPtr,
                  int*                     csrColInd,
                  csru2csrInfo_t           info,
                  void*                    pBuffer)

cusparseStatus_t
cusparseDcsru2csr(cusparseHandle_t         handle,
                  int                      m,
                  int                      n,
                  int                      nnz,
                  const cusparseMatDescr_t descrA,
                  double*                  csrVal,
                  const int*               csrRowPtr,
                  int*                     csrColInd,
                  csru2csrInfo_t           info,
                  void*                    pBuffer)

cusparseStatus_t
cusparseCcsru2csr(cusparseHandle_t         handle,
                  int                      m,
                  int                      n,
                  int                      nnz,
                  const cusparseMatDescr_t descrA,
                  cuComplex*               csrVal,
                  const int*               csrRowPtr,
                  int*                     csrColInd,
                  csru2csrInfo_t           info,
                  void*                    pBuffer)

cusparseStatus_t
cusparseZcsru2csr(cusparseHandle_t         handle,
                  int                      m,
                  int                      n,
                  int                      nnz,
                  const cusparseMatDescr_t descrA,
                  cuDoubleComplex*         csrVal,
                  const int*               csrRowPtr,
                  int*                     csrColInd,
                  csru2csrInfo_t           info,
                  void*                    pBuffer)
cusparseStatus_t
cusparseScsr2csru(cusparseHandle_t         handle,
                  int                      m,
                  int                      n,
                  int                      nnz,
                  const cusparseMatDescr_t descrA,
                  float*                   csrVal,
                  const int*               csrRowPtr,
                  int*                     csrColInd,
                  csru2csrInfo_t           info,
                  void*                    pBuffer)

cusparseStatus_t
cusparseDcsr2csru(cusparseHandle_t         handle,
                  int                      m,
                  int                      n,
                  int                      nnz,
                  const cusparseMatDescr_t descrA,
                  double*                  csrVal,
                  const int*               csrRowPtr,
                  int*                     csrColInd,
                  csru2csrInfo_t           info,
                  void*                    pBuffer)

cusparseStatus_t
cusparseCcsr2csru(cusparseHandle_t         handle,
                  int                      m,
                  int                      n,
                  int                      nnz,
                  const cusparseMatDescr_t descrA,
                  cuComplex*               csrVal,
                  const int*               csrRowPtr,
                  int*                     csrColInd,
                  csru2csrInfo_t           info,
                  void*                    pBuffer)

cusparseStatus_t
cusparseZcsr2csru(cusparseHandle_t         handle,
                  int                      m,
                  int                      n,
                  int                      nnz,
                  const cusparseMatDescr_t descrA,
                  cuDoubleComplex*         csrVal,
                  const int*               csrRowPtr,
                  int*                     csrColInd,
                  csru2csrInfo_t           info,
                  void*                    pBuffer)

This function transfers unsorted CSR format to CSR format, and vice versa. The operation is in-place.

This function is a wrapper of csrsort and gthr. The usecase is the following scenario.

If the user has a matrix A of CSR format which is unsorted, and implements his own code (which can be CPU or GPU kernel) based on this special order (for example, diagonal first, then lower triangle, then upper triangle), and wants to convert it to CSR format when calling CUSPARSE library, and then convert it back when doing something else on his/her kernel. For example, suppose the user wants to solve a linear system Ax=b by the following iterative scheme

\(x^{(k+1)} = x^{(k)} + L^{(-1)}*(b - Ax^{(k)})\)

The code heavily uses SpMV and triangular solve. Assume that the user has an in-house design of SpMV (Sparse Matrix-Vector multiplication) based on special order of A. However the user wants to use the cuSPARSE library for triangular solver. Then the following code can work.

do     step 1: compute residual vector r = b - A x (k) by in-house SpMV     step 2: B := sort(A), and L is lower triangular part of B     (only sort A once and keep the permutation vector)     step 3: solve z = L (-1) * ( b - A x (k) ) by cusparseXcsrsv     step 4: add correction x (k+1) = x (k) + z     step 5: A := unsort(B)     (use permutation vector to get back the unsorted CSR) until convergence

The requirements of step 2 and step 5 are

  1. In-place operation.

  2. The permutation vector P is hidden in an opaque structure.

  3. No cudaMalloc inside the conversion routine. Instead, the user has to provide the buffer explicitly.

  4. The conversion between unsorted CSR and sorted CSR may needs several times, but the function only generates the permutation vector P once.

  5. The function is based on csrsort, gather and scatter operations.

The operation is called csru2csr, which means unsorted CSR to sorted CSR. Also we provide the inverse operation, called csr2csru.

In order to keep the permutation vector invisible, we need an opaque structure called csru2csrInfo. Then two functions (cusparseCreateCsru2csrInfo, cusparseDestroyCsru2csrInfo) are used to initialize and to destroy the opaque structure.

cusparse[S|D|C|Z]csru2csr_bufferSizeExt returns the size of the buffer. The permutation vector P is also allcated inside csru2csrInfo. The lifetime of the permutation vector is the same as the lifetime of csru2csrInfo.

cusparse[S|D|C|Z]csru2csr performs forward transformation from unsorted CSR to sorted CSR. First call uses csrsort to generate the permutation vector P, and subsequent call uses P to do transformation.

cusparse[S|D|C|Z]csr2csru performs backward transformation from sorted CSR to unsorted CSR. P is used to get unsorted form back.

The routine cusparse<t>csru2csr() has the following properties:

  • The routine requires no extra storage if pBuffer != NULL

  • The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available

  • The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available

The routine cusparse<t>csr2csru() has the following properties if pBuffer != NULL:

  • The routine requires no extra storage

  • The routine supports asynchronous execution

  • The routine supports CUDA graph capture

The following tables describe parameters of csr2csru_bufferSizeExt and csr2csru.

Input

parameter

device or host

description

handle

host

handle to the cuSPARSE library context.

m

host

number of rows of matrix A.

n

host

number of columns of matrix A.

nnz

host

number of nonzero elements of matrix A.

descrA

host

the descriptor of matrix A. The supported matrix type is CUSPARSE_MATRIX_TYPE_GENERAL, Also, the supported index bases are CUSPARSE_INDEX_BASE_ZERO and CUSPARSE_INDEX_BASE_ONE.

csrVal

device

<type> array of nnz unsorted nonzero elements of matrix A.

csrRowsPtr

device

integer array of m+1 elements that contains the start of every row and the end of the last row plus one.

csrColInd

device

integer array of nnz unsorted column indices of A.

info

host

opaque structure initialized using cusparseCreateCsru2csrInfo().

pBuffer

device

buffer allocated by the user; the size is returned by csru2csr_bufferSizeExt().

Output

parameter

device or host

description

csrVal

device

<type> array of nnz sorted nonzero elements of matrix A.

csrColInd

device

integer array of nnz sorted column indices of A.

pBufferSizeInBytes

host

number of bytes of the buffer.

See cusparseStatus_t for the description of the return status.

cusparseXpruneDense2csr() [DEPRECATED]

> The routine will be removed in the next major release

cusparseStatus_t
cusparseHpruneDense2csr_bufferSizeExt(cusparseHandle_t          handle,
                                       int                      m,
                                       int                      n,
                                       const __half*            A,
                                       int                      lda,
                                       const __half*            threshold,
                                       const cusparseMatDescr_t descrC,
                                       const __half*            csrValC,
                                       const int*               csrRowPtrC,
                                       const int*               csrColIndC,
                                       size_t*                  pBufferSizeInBytes)

cusparseStatus_t
cusparseSpruneDense2csr_bufferSizeExt(cusparseHandle_t          handle,
                                       int                      m,
                                       int                      n,
                                       const float*             A,
                                       int                      lda,
                                       const float*             threshold,
                                       const cusparseMatDescr_t descrC,
                                       const float*             csrValC,
                                       const int*               csrRowPtrC,
                                       const int*               csrColIndC,
                                       size_t*                  pBufferSizeInBytes)

cusparseStatus_t
cusparseDpruneDense2csr_bufferSizeExt(cusparseHandle_t          handle,
                                       int                      m,
                                       int                      n,
                                       const double*            A,
                                       int                      lda,
                                       const double*            threshold,
                                       const cusparseMatDescr_t descrC,
                                       const double*            csrValC,
                                       const int*               csrRowPtrC,
                                       const int*               csrColIndC,
                                       size_t*                  pBufferSizeInBytes)
cusparseStatus_t
cusparseHpruneDense2csrNnz(cusparseHandle_t         handle,
                           int                      m,
                           int                      n,
                           const __half*            A,
                           int                      lda,
                           const __half*            threshold,
                           const cusparseMatDescr_t descrC,
                           int*                     csrRowPtrC,
                           int*                     nnzTotalDevHostPtr,
                           void*                    pBuffer)

cusparseStatus_t
cusparseSpruneDense2csrNnz(cusparseHandle_t         handle,
                           int                      m,
                           int                      n,
                           const float*             A,
                           int                      lda,
                           const float*             threshold,
                           const cusparseMatDescr_t descrC,
                           int*                     csrRowPtrC,
                           int*                     nnzTotalDevHostPtr,
                           void*                    pBuffer)

cusparseStatus_t
cusparseDpruneDense2csrNnz(cusparseHandle_t         handle,
                           int                      m,
                           int                      n,
                           const double*            A,
                           int                      lda,
                           const double*            threshold,
                           const cusparseMatDescr_t descrC,
                           int*                     csrRowPtrC,
                           int*                     nnzTotalDevHostPtr,
                           void*                    pBuffer)
cusparseStatus_t
cusparseHpruneDense2csr(cusparseHandle_t         handle,
                        int                      m,
                        int                      n,
                        const __half*            A,
                        int                      lda,
                        const __half*            threshold,
                        const cusparseMatDescr_t descrC,
                        __half*                  csrValC,
                        const int*               csrRowPtrC,
                        int*                     csrColIndC,
                        void*                    pBuffer)

cusparseStatus_t
cusparseSpruneDense2csr(cusparseHandle_t         handle,
                        int                      m,
                        int                      n,
                        const float*             A,
                        int                      lda,
                        const float*             threshold,
                        const cusparseMatDescr_t descrC,
                        float*                   csrValC,
                        const int*               csrRowPtrC,
                        int*                     csrColIndC,
                        void*                    pBuffer)

cusparseStatus_t
cusparseDpruneDense2csr(cusparseHandle_t         handle,
                        int                      m,
                        int                      n,
                        const double*            A,
                        int                      lda,
                        const double*            threshold,
                        const cusparseMatDescr_t descrC,
                        double*                  csrValC,
                        const int*               csrRowPtrC,
                        int*                     csrColIndC,
                        void*                    pBuffer)

This function prunes a dense matrix to a sparse matrix with CSR format.

Given a dense matrix A and a non-negative value threshold, the function returns a sparse matrix C, defined by

\(\begin{matrix} {{C(i,j)} = {A(i,j)}} & \text{if\ |A(i,j)|\ >\ threshold} \\ \end{matrix}\)

The implementation adopts a two-step approach to do the conversion. First, the user allocates csrRowPtrC of m+1 elements and uses function pruneDense2csrNnz() to determine the number of nonzeros columns per row. Second, the user gathers nnzC (number of nonzeros of matrix C) from either (nnzC=*nnzTotalDevHostPtr) or (nnzC=csrRowPtrC[m]-csrRowPtrC[0]) and allocates csrValC of nnzC elements and csrColIndC of nnzC integers. Finally function pruneDense2csr() is called to complete the conversion.

The user must obtain the size of the buffer required by pruneDense2csr() by calling pruneDense2csr_bufferSizeExt(), allocate the buffer, and pass the buffer pointer to pruneDense2csr().

Examples of prune chapter provides a simple example of pruneDense2csr().

The routine cusparse<t>pruneDense2csrNnz() has the following properties:

  • This function requires temporary extra storage that is allocated internally

  • The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available

  • The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available

The routine cusparse<t>DpruneDense2csr() has the following properties:

  • The routine requires no extra storage

  • The routine supports asynchronous execution

  • The routine supports CUDA graph capture

Input

parameter

device or host

description

handle

host

handle to the cuSPARSE library context.

m

host

number of rows of matrix A.

n

host

number of columns of matrix A.

A

device

array of dimension (lda, n).

lda

device

leading dimension of A. It must be at least max(1, m).

threshold

host or device

a value to drop the entries of A. threshold can point to a device memory or host memory.

descrC

host

the descriptor of matrix C. The supported matrix type is CUSPARSE_MATRIX_TYPE_GENERAL, Also, the supported index bases are CUSPARSE_INDEX_BASE_ZERO and CUSPARSE_INDEX_BASE_ONE.

pBuffer

device

buffer allocated by the user; the size is returned by pruneDense2csr_bufferSizeExt().

Output

parameter

device or host

description

nnzTotalDevHostPtr

device or host

total number of nonzero of matrix C. nnzTotalDevHostPtr can point to a device memory or host memory.

csrValC

device

<type> array of nnzC nonzero elements of matrix C.

csrRowsPtrC

device

integer array of m+1 elements that contains the start of every row and the end of the last row plus one.

csrColIndC

device

integer array of nnzC column indices of C.

pBufferSizeInBytes

host

number of bytes of the buffer.

See cusparseStatus_t for the description of the return status.

cusparseXpruneCsr2csr() [DEPRECATED]

> The routine will be removed in the next major release

cusparseStatus_t
cusparseHpruneCsr2csr_bufferSizeExt(cusparseHandle_t         handle,
                                    int                      m,
                                    int                      n,
                                    int                      nnzA,
                                    const cusparseMatDescr_t descrA,
                                    const __half*            csrValA,
                                    const int*               csrRowPtrA,
                                    const int*               csrColIndA,
                                    const __half*            threshold,
                                    const cusparseMatDescr_t descrC,
                                    const __half*            csrValC,
                                    const int*               csrRowPtrC,
                                    const int*               csrColIndC,
                                    size_t*                  pBufferSizeInBytes)

cusparseStatus_t
cusparseSpruneCsr2csr_bufferSizeExt(cusparseHandle_t         handle,
                                    int                      m,
                                    int                      n,
                                    int                      nnzA,
                                    const cusparseMatDescr_t descrA,
                                    const float*             csrValA,
                                    const int*               csrRowPtrA,
                                    const int*               csrColIndA,
                                    const float*             threshold,
                                    const cusparseMatDescr_t descrC,
                                    const float*             csrValC,
                                    const int*               csrRowPtrC,
                                    const int*               csrColIndC,
                                    size_t*                  pBufferSizeInBytes)

cusparseStatus_t
cusparseDpruneCsr2csr_bufferSizeExt(cusparseHandle_t         handle,
                                    int                      m,
                                    int                      n,
                                    int                      nnzA,
                                    const cusparseMatDescr_t descrA,
                                    const double*            csrValA,
                                    const int*               csrRowPtrA,
                                    const int*               csrColIndA,
                                    const double*            threshold,
                                    const cusparseMatDescr_t descrC,
                                    const double*            csrValC,
                                    const int*               csrRowPtrC,
                                    const int*               csrColIndC,
                                    size_t*                  pBufferSizeInBytes)
cusparseStatus_t
cusparseHpruneCsr2csrNnz(cusparseHandle_t         handle,
                         int                      m,
                         int                      n,
                         int                      nnzA,
                         const cusparseMatDescr_t descrA,
                         const __half*            csrValA,
                         const int*               csrRowPtrA,
                         const int*               csrColIndA,
                         const __half*            threshold,
                         const cusparseMatDescr_t descrC,
                         int*                     csrRowPtrC,
                         int*                     nnzTotalDevHostPtr,
                         void*                    pBuffer)

cusparseStatus_t
cusparseSpruneCsr2csrNnz(cusparseHandle_t         handle,
                         int                      m,
                         int                      n,
                         int                      nnzA,
                         const cusparseMatDescr_t descrA,
                         const float*             csrValA,
                         const int*               csrRowPtrA,
                         const int*               csrColIndA,
                         const float*             threshold,
                         const cusparseMatDescr_t descrC,
                         int*                     csrRowPtrC,
                         int*                     nnzTotalDevHostPtr,
                         void*                    pBuffer)

cusparseStatus_t
cusparseDpruneCsr2csrNnz(cusparseHandle_t         handle,
                         int                      m,
                         int                      n,
                         int                      nnzA,
                         const cusparseMatDescr_t descrA,
                         const double*            csrValA,
                         const int*               csrRowPtrA,
                         const int*               csrColIndA,
                         const double*            threshold,
                         const cusparseMatDescr_t descrC,
                         int*                     csrRowPtrC,
                         int*                     nnzTotalDevHostPtr,
                         void*                    pBuffer)
cusparseStatus_t
cusparseHpruneCsr2csr(cusparseHandle_t         handle,
                      int                      m,
                      int                      n,
                      int                      nnzA,
                      const cusparseMatDescr_t descrA,
                      const __half*            csrValA,
                      const int*               csrRowPtrA,
                      const int*               csrColIndA,
                      const __half*            threshold,
                      const cusparseMatDescr_t descrC,
                      __half*                  csrValC,
                      const int*               csrRowPtrC,
                      int*                     csrColIndC,
                      void*                    pBuffer)

cusparseStatus_t
cusparseSpruneCsr2csr(cusparseHandle_t         handle,
                      int                      m,
                      int                      n,
                      int                      nnzA,
                      const cusparseMatDescr_t descrA,
                      const float*             csrValA,
                      const int*               csrRowPtrA,
                      const int*               csrColIndA,
                      const float*             threshold,
                      const cusparseMatDescr_t descrC,
                      float*                   csrValC,
                      const int*               csrRowPtrC,
                      int*                     csrColIndC,
                      void*                    pBuffer)

cusparseStatus_t
cusparseDpruneCsr2csr(cusparseHandle_t         handle,
                      int                      m,
                      int                      n,
                      int                      nnzA,
                      const cusparseMatDescr_t descrA,
                      const double*            csrValA,
                      const int*               csrRowPtrA,
                      const int*               csrColIndA,
                      const double*            threshold,
                      const cusparseMatDescr_t descrC,
                      double*                  csrValC,
                      const int*               csrRowPtrC,
                      int*                     csrColIndC,
                      void*                    pBuffer)

This function prunes a sparse matrix to a sparse matrix with CSR format.

Given a sparse matrix A and a non-negative value threshold, the function returns a sparse matrix C, defined by

\(\begin{matrix} {{C(i,j)} = {A(i,j)}} & \text{if\ |A(i,j)|\ >\ threshold} \\ \end{matrix}\)

The implementation adopts a two-step approach to do the conversion. First, the user allocates csrRowPtrC of m+1 elements and uses function pruneCsr2csrNnz() to determine the number of nonzeros columns per row. Second, the user gathers nnzC (number of nonzeros of matrix C) from either (nnzC=*nnzTotalDevHostPtr) or (nnzC=csrRowPtrC[m]-csrRowPtrC[0]) and allocates csrValC of nnzC elements and csrColIndC of nnzC integers. Finally function pruneCsr2csr() is called to complete the conversion.

The user must obtain the size of the buffer required by pruneCsr2csr() by calling pruneCsr2csr_bufferSizeExt(), allocate the buffer, and pass the buffer pointer to pruneCsr2csr().

Examples of prune chapter provides a simple example of pruneCsr2csr().

The routine cusparse<t>pruneCsr2csrNnz() has the following properties:

  • This function requires temporary extra storage that is allocated internally

  • The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available

  • The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available

The routine cusparse<t>pruneCsr2csr() has the following properties:

  • The routine requires no extra storage

  • The routine supports asynchronous execution

  • The routine supports CUDA graph capture

Input

parameter

device or host

description

handle

host

handle to the cuSPARSE library context.

m

host

number of rows of matrix A.

n

host

number of columns of matrix A.

nnzA

host

number of nonzeros of matrix A.

descrA

host

the descriptor of matrix A. The supported matrix type is CUSPARSE_MATRIX_TYPE_GENERAL, Also, the supported index bases are CUSPARSE_INDEX_BASE_ZERO and CUSPARSE_INDEX_BASE_ONE.

csrValA

device

<type> array of nnzA nonzero elements of matrix A.

csrRowsPtrA

device

integer array of m+1 elements that contains the start of every row and the end of the last row plus one.

csrColIndA

device

integer array of nnzA column indices of A.

threshold

host or device

a value to drop the entries of A. threshold can point to a device memory or host memory.

descrC

host

the descriptor of matrix C. The supported matrix type is CUSPARSE_MATRIX_TYPE_GENERAL, Also, the supported index bases are CUSPARSE_INDEX_BASE_ZERO and CUSPARSE_INDEX_BASE_ONE.

pBuffer

device

buffer allocated by the user; the size is returned by pruneCsr2csr_bufferSizeExt().

Output

parameter

device or host

description

nnzTotalDevHostPtr

device or host

total number of nonzero of matrix C. nnzTotalDevHostPtr can point to a device memory or host memory.

csrValC

device

<type> array of nnzC nonzero elements of matrix C.

csrRowsPtrC

device

integer array of m+1 elements that contains the start of every row and the end of the last row plus one.

csrColIndC

device

integer array of nnzC column indices of C.

pBufferSizeInBytes

host

number of bytes of the buffer.

See cusparseStatus_t for the description of the return status.

cusparseXpruneDense2csrPercentage() [DEPRECATED]

> The routine will be removed in the next major release

cusparseStatus_t
cusparseHpruneDense2csrByPercentage_bufferSizeExt(cusparseHandle_t         handle,
                                                  int                      m,
                                                  int                      n,
                                                  const __half*            A,
                                                  int                      lda,
                                                  float                    percentage,
                                                  const cusparseMatDescr_t descrC,
                                                  const __half*            csrValC,
                                                  const int*               csrRowPtrC,
                                                  const int*               csrColIndC,
                                                  pruneInfo_t              info,
                                                  size_t*                  pBufferSizeInBytes)

cusparseStatus_t
cusparseSpruneDense2csrByPercentage_bufferSizeExt(cusparseHandle_t         handle,
                                                  int                      m,
                                                  int                      n,
                                                  const float*             A,
                                                  int                      lda,
                                                  float                    percentage,
                                                  const cusparseMatDescr_t descrC,
                                                  const float*             csrValC,
                                                  const int*               csrRowPtrC,
                                                  const int*               csrColIndC,
                                                  pruneInfo_t              info,
                                                  size_t*                  pBufferSizeInBytes)

cusparseStatus_t
cusparseDpruneDense2csrByPercentage_bufferSizeExt(cusparseHandle_t         handle,
                                                  int                      m,
                                                  int                      n,
                                                  const double*            A,
                                                  int                      lda,
                                                  float                    percentage,
                                                  const cusparseMatDescr_t descrC,
                                                  const double*            csrValC,
                                                  const int*               csrRowPtrC,
                                                  const int*               csrColIndC,
                                                  pruneInfo_t              info,
                                                  size_t*                  pBufferSizeInBytes)
cusparseStatus_t
cusparseHpruneDense2csrNnzByPercentage(cusparseHandle_t         handle,
                                       int                      m,
                                       int                      n,
                                       const __half*            A,
                                       int                      lda,
                                       float                    percentage,
                                       const cusparseMatDescr_t descrC,
                                       int*                     csrRowPtrC,
                                       int*                     nnzTotalDevHostPtr,
                                       pruneInfo_t              info,
                                       void*                    pBuffer)

cusparseStatus_t
cusparseSpruneDense2csrNnzByPercentage(cusparseHandle_t         handle,
                                       int                      m,
                                       int                      n,
                                       const float*             A,
                                       int                      lda,
                                       float                    percentage,
                                       const cusparseMatDescr_t descrC,
                                       int*                     csrRowPtrC,
                                       int*                     nnzTotalDevHostPtr,
                                       pruneInfo_t              info,
                                       void*                    pBuffer)

cusparseStatus_t
cusparseDpruneDense2csrNnzByPercentage(cusparseHandle_t         handle,
                                       int                      m,
                                       int                      n,
                                       const double*            A,
                                       int                      lda,
                                       float                    percentage,
                                       const cusparseMatDescr_t descrC,
                                       int*                     csrRowPtrC,
                                       int*                     nnzTotalDevHostPtr,
                                       pruneInfo_t              info,
                                       void*                    pBuffer)
cusparseStatus_t
cusparseHpruneDense2csrByPercentage(cusparseHandle_t         handle,
                                    int                      m,
                                    int                      n,
                                    const __half*            A,
                                    int                      lda,
                                    float                    percentage,
                                    const cusparseMatDescr_t descrC,
                                    __half*                  csrValC,
                                    const int*               csrRowPtrC,
                                    int*                     csrColIndC,
                                    pruneInfo_t              info,
                                    void*                    pBuffer)

cusparseStatus_t
cusparseSpruneDense2csrByPercentage(cusparseHandle_t         handle,
                                    int                      m,
                                    int                      n,
                                    const float*             A,
                                    int                      lda,
                                    float                    percentage,
                                    const cusparseMatDescr_t descrC,
                                    float*                   csrValC,
                                    const int*               csrRowPtrC,
                                    int*                     csrColIndC,
                                    pruneInfo_t              info,
                                    void*                    pBuffer)

cusparseStatus_t
cusparseDpruneDense2csrByPercentage(cusparseHandle_t         handle,
                                    int                      m,
                                    int                      n,
                                    const double*            A,
                                    int                      lda,
                                    float                    percentage,
                                    const cusparseMatDescr_t descrC,
                                    double*                  csrValC,
                                    const int*               csrRowPtrC,
                                    int*                     csrColIndC,
                                    pruneInfo_t              info,
                                    void*                    pBuffer)

This function prunes a dense matrix to a sparse matrix by percentage.

Given a dense matrix A and a non-negative value percentage, the function computes sparse matrix C by the following three steps:

Step 1: sort absolute value of A in ascending order.

\(\begin{matrix} {key\ :=\ sort(\ |A|\ )} \\ \end{matrix}\)

Step 2: choose threshold by the parameter percentage

\(\begin{matrix} {pos\ =\ ceil(m*n*(percentage/100))\ -\ 1} \\ {pos\ =\ min(pos,\ m*n-1)} \\ {pos\ =\ max(pos,\ 0)} \\ {threshold\ =\ key\lbrack pos\rbrack} \\ \end{matrix}\)

Step 3: call pruneDense2csr() by with the parameter threshold.

The implementation adopts a two-step approach to do the conversion. First, the user allocates csrRowPtrC of m+1 elements and uses function pruneDense2csrNnzByPercentage() to determine the number of nonzeros columns per row. Second, the user gathers nnzC (number of nonzeros of matrix C) from either (nnzC=*nnzTotalDevHostPtr) or (nnzC=csrRowPtrC[m]-csrRowPtrC[0]) and allocates csrValC of nnzC elements and csrColIndC of nnzC integers. Finally function pruneDense2csrByPercentage() is called to complete the conversion.

The user must obtain the size of the buffer required by pruneDense2csrByPercentage() by calling pruneDense2csrByPercentage_bufferSizeExt(), allocate the buffer, and pass the buffer pointer to pruneDense2csrByPercentage().

Remark 1: the value of percentage must be not greater than 100. Otherwise, CUSPARSE_STATUS_INVALID_VALUE is returned.

Remark 2: the zeros of A are not ignored. All entries are sorted, including zeros. This is different from pruneCsr2csrByPercentage()

Examples of prune chapter provides a simple example of pruneDense2csrNnzByPercentage().

The routine cusparse<t>pruneDense2csrNnzByPercentage() has the following properties:

  • This function requires temporary extra storage that is allocated internally

  • The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available

  • The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available

The routine cusparse<t>pruneDense2csrByPercentage() has the following properties:

  • The routine requires no extra storage

  • The routine supports asynchronous execution

  • The routine supports CUDA graph capture

Input

parameter

device or host

description

handle

host

handle to the cuSPARSE library context.

m

host

number of rows of matrix A.

n

host

number of columns of matrix A.

A

device

array of dimension (lda, n).

lda

device

leading dimension of A. It must be at least max(1, m).

percentage

host

percentage <=100 and percentage >= 0

descrC

host

the descriptor of matrix C. The supported matrix type is CUSPARSE_MATRIX_TYPE_GENERAL, Also, the supported index bases are CUSPARSE_INDEX_BASE_ZERO and CUSPARSE_INDEX_BASE_ONE.

pBuffer

device

buffer allocated by the user; the size is returned by pruneDense2csrByPercentage_bufferSizeExt().

Output

parameter

device or host

description

nnzTotalDevHostPtr

device or host

total number of nonzero of matrix C. nnzTotalDevHostPtr can point to a device memory or host memory.

csrValC

device

<type> array of nnzC nonzero elements of matrix C.

csrRowsPtrC

device

integer array of m+1 elements that contains the start of every row and the end of the last row plus one.

csrColIndC

device

integer array of nnzC column indices of C.

pBufferSizeInBytes

host

number of bytes of the buffer.

See cusparseStatus_t for the description of the return status.

cusparseXpruneCsr2csrByPercentage() [DEPRECATED]

> The routine will be removed in the next major release

cusparseStatus_t
cusparseHpruneCsr2csrByPercentage_bufferSizeExt(cusparseHandle_t         handle,
                                                int                      m,
                                                int                      n,
                                                int                      nnzA,
                                                const cusparseMatDescr_t descrA,
                                                const __half*            csrValA,
                                                const int*               csrRowPtrA,
                                                const int*               csrColIndA,
                                                float                    percentage,
                                                const cusparseMatDescr_t descrC,
                                                const __half*            csrValC,
                                                const int*               csrRowPtrC,
                                                const int*               csrColIndC,
                                                pruneInfo_t              info,
                                                size_t*                  pBufferSizeInBytes)

cusparseStatus_t
cusparseSpruneCsr2csrByPercentage_bufferSizeExt(cusparseHandle_t         handle,
                                                int                      m,
                                                int                      n,
                                                int                      nnzA,
                                                const cusparseMatDescr_t descrA,
                                                const float*             csrValA,
                                                const int*               csrRowPtrA,
                                                const int*               csrColIndA,
                                                float                    percentage,
                                                const cusparseMatDescr_t descrC,
                                                const float*             csrValC,
                                                const int*               csrRowPtrC,
                                                const int*               csrColIndC,
                                                pruneInfo_t              info,
                                                size_t*                  pBufferSizeInBytes)

cusparseStatus_t
cusparseDpruneCsr2csrByPercentage_bufferSizeExt(cusparseHandle_t         handle,
                                                int                      m,
                                                int                      n,
                                                int                      nnzA,
                                                const cusparseMatDescr_t descrA,
                                                const double*            csrValA,
                                                const int*               csrRowPtrA,
                                                const int*               csrColIndA,
                                                float                    percentage,
                                                const cusparseMatDescr_t descrC,
                                                const double*            csrValC,
                                                const int*               csrRowPtrC,
                                                const int*               csrColIndC,
                                                pruneInfo_t              info,
                                                size_t*                  pBufferSizeInBytes)
cusparseStatus_t
cusparseHpruneCsr2csrNnzByPercentage(cusparseHandle_t        handle,
                                    int                      m,
                                    int                      n,
                                    int                      nnzA,
                                    const cusparseMatDescr_t descrA,
                                    const __half*            csrValA,
                                    const int*               csrRowPtrA,
                                    const int*               csrColIndA,
                                    float                    percentage,
                                    const cusparseMatDescr_t descrC,
                                    int*                     csrRowPtrC,
                                    int*                     nnzTotalDevHostPtr,
                                    pruneInfo_t              info,
                                    void*                    pBuffer)

cusparseStatus_t
cusparseSpruneCsr2csrNnzByPercentage(cusparseHandle_t        handle,
                                    int                      m,
                                    int                      n,
                                    int                      nnzA,
                                    const cusparseMatDescr_t descrA,
                                    const float*             csrValA,
                                    const int*               csrRowPtrA,
                                    const int*               csrColIndA,
                                    float                    percentage,
                                    const cusparseMatDescr_t descrC,
                                    int*                     csrRowPtrC,
                                    int*                     nnzTotalDevHostPtr,
                                    pruneInfo_t              info,
                                    void*                    pBuffer)

cusparseStatus_t
cusparseDpruneCsr2csrNnzByPercentage(cusparseHandle_t        handle,
                                    int                      m,
                                    int                      n,
                                    int                      nnzA,
                                    const cusparseMatDescr_t descrA,
                                    const double*            csrValA,
                                    const int*               csrRowPtrA,
                                    const int*               csrColIndA,
                                    float                    percentage,
                                    const cusparseMatDescr_t descrC,
                                    int*                     csrRowPtrC,
                                    int*                     nnzTotalDevHostPtr,
                                    pruneInfo_t              info,
                                    void*                    pBuffer)
cusparseStatus_t
cusparseHpruneCsr2csrByPercentage(cusparseHandle_t         handle,
                                  int                      m,
                                  int                      n,
                                  int                      nnzA,
                                  const cusparseMatDescr_t descrA,
                                  const __half*            csrValA,
                                  const int*               csrRowPtrA,
                                  const int*               csrColIndA,
                                  float                    percentage,
                                  const cusparseMatDescr_t descrC,
                                  __half*                  csrValC,
                                  const int*               csrRowPtrC,
                                  int*                     csrColIndC,
                                  pruneInfo_t              info,
                                  void*                    pBuffer)

cusparseStatus_t
cusparseSpruneCsr2csrByPercentage(cusparseHandle_t         handle,
                                  int                      m,
                                  int                      n,
                                  int                      nnzA,
                                  const cusparseMatDescr_t descrA,
                                  const float*             csrValA,
                                  const int*               csrRowPtrA,
                                  const int*               csrColIndA,
                                  float                    percentage,
                                  const cusparseMatDescr_t descrC,
                                  float*                   csrValC,
                                  const int*               csrRowPtrC,
                                  int*                     csrColIndC,
                                  pruneInfo_t              info,
                                  void*                    pBuffer)

cusparseStatus_t
cusparseDpruneCsr2csrByPercentage(cusparseHandle_t         handle,
                                  int                      m,
                                  int                      n,
                                  int                      nnzA,
                                  const cusparseMatDescr_t descrA,
                                  const double*            csrValA,
                                  const int*               csrRowPtrA,
                                  const int*               csrColIndA,
                                  float                    percentage,
                                  const cusparseMatDescr_t descrC,
                                  double*                  csrValC,
                                  const int*               csrRowPtrC,
                                  int*                     csrColIndC,
                                  pruneInfo_t              info,
                                  void*                    pBuffer)

This function prunes a sparse matrix to a sparse matrix by percentage.

Given a sparse matrix A and a non-negative value percentage, the function computes sparse matrix C by the following three steps:

Step 1: sort absolute value of A in ascending order.

\(\begin{matrix} {key\ :=\ sort(\ |csrValA|\ )} \\ \end{matrix}\)

Step 2: choose threshold by the parameter percentage

\(\begin{matrix} {pos\ =\ ceil(nnzA*(percentage/100))\ -\ 1} \\ {pos\ =\ min(pos,\ nnzA-1)} \\ {pos\ =\ max(pos,\ 0)} \\ {threshold\ =\ key\lbrack pos\rbrack} \\ \end{matrix}\)

Step 3: call pruneCsr2csr() by with the parameter threshold.

The implementation adopts a two-step approach to do the conversion. First, the user allocates csrRowPtrC of m+1 elements and uses function pruneCsr2csrNnzByPercentage() to determine the number of nonzeros columns per row. Second, the user gathers nnzC (number of nonzeros of matrix C) from either (nnzC=*nnzTotalDevHostPtr) or (nnzC=csrRowPtrC[m]-csrRowPtrC[0]) and allocates csrValC of nnzC elements and csrColIndC of nnzC integers. Finally function pruneCsr2csrByPercentage() is called to complete the conversion.

The user must obtain the size of the buffer required by pruneCsr2csrByPercentage() by calling pruneCsr2csrByPercentage_bufferSizeExt(), allocate the buffer, and pass the buffer pointer to pruneCsr2csrByPercentage().

Remark 1: the value of percentage must be not greater than 100. Otherwise, CUSPARSE_STATUS_INVALID_VALUE is returned.

Examples of prune chapter provides a simple example of pruneCsr2csrByPercentage().

The routine cusparse<t>pruneCsr2csrNnzByPercentage() has the following properties:

  • This function requires temporary extra storage that is allocated internally

  • The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available

  • The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available

The routine cusparse<t>pruneCsr2csrByPercentage() has the following properties:

  • The routine requires no extra storage

  • The routine supports asynchronous execution

  • The routine supports CUDA graph capture

Input

parameter

device or host

description

handle

host

handle to the cuSPARSE library context.

m

host

number of rows of matrix A.

n

host

number of columns of matrix A.

nnzA

host

number of nonzeros of matrix A.

descrA

host

the descriptor of matrix A. The supported matrix type is CUSPARSE_MATRIX_TYPE_GENERAL, Also, the supported index bases are CUSPARSE_INDEX_BASE_ZERO and CUSPARSE_INDEX_BASE_ONE.

csrValA

device

<type> array of nnzA nonzero elements of matrix A.

csrRowsPtrA

device

integer array of m+1 elements that contains the start of every row and the end of the last row plus one.

csrColIndA

device

integer array of nnzA column indices of A.

percentage

host

percentage <=100 and percentage >= 0

descrC

host

the descriptor of matrix C. The supported matrix type is CUSPARSE_MATRIX_TYPE_GENERAL, Also, the supported index bases are CUSPARSE_INDEX_BASE_ZERO and CUSPARSE_INDEX_BASE_ONE.

pBuffer

device

buffer allocated by the user; the size is returned by pruneCsr2csrByPercentage_bufferSizeExt().

Output

parameter

device or host

description

nnzTotalDevHostPtr

device or host

total number of nonzero of matrix C. nnzTotalDevHostPtr can point to a device memory or host memory.

csrValC

device

<type> array of nnzC nonzero elements of matrix C.

csrRowsPtrC

device

integer array of m+1 elements that contains the start of every row and the end of the last row plus one.

csrColIndC

device

integer array of nnzC column indices of C.

pBufferSizeInBytes

host

number of bytes of the buffer.

See cusparseStatus_t for the description of the return status.

cusparse<t>nnz_compress() [DEPRECATED]

> The routine will be removed in the next major release

cusparseStatus_t
cusparseSnnz_compress(cusparseHandle_t         handle,
                      int                      m,
                      const cusparseMatDescr_t descr,
                      const float*             csrValA,
                      const int*               csrRowPtrA,
                      int*                     nnzPerRow,
                      int*                     nnzC,
                      float                    tol)

cusparseStatus_t
cusparseDnnz_compress(cusparseHandle_t         handle,
                      int                      m,
                      const cusparseMatDescr_t descr,
                      const double*            csrValA,
                      const int*               csrRowPtrA,
                      int*                     nnzPerRow,
                      int*                     nnzC,
                      double                   tol)

cusparseStatus_t
cusparseCnnz_compress(cusparseHandle_t         handle,
                      int                      m,
                      const cusparseMatDescr_t descr,
                      const cuComplex*         csrValA,
                      const int*               csrRowPtrA,
                      int*                     nnzPerRow,
                      int*                     nnzC,
                      cuComplex                tol)

cusparseStatus_t
cusparseZnnz_compress(cusparseHandle_t         handle,
                      int                      m,
                      const cusparseMatDescr_t descr,
                      const cuDoubleComplex*   csrValA,
                      const int*               csrRowPtrA,
                      int*                     nnzPerRow,
                      int*                     nnzC,
                      cuDoubleComplex          tol)

This function is the step one to convert from csr format to compressed csr format.

Given a sparse matrix A and a non-negative value threshold, the function returns nnzPerRow(the number of nonzeros columns per row) and nnzC(the total number of nonzeros) of a sparse matrix C, defined by

\(\begin{matrix} {{C(i,j)} = {A(i,j)}} & \text{if\ |A(i,j)|\ >\ threshold} \\ \end{matrix}\)

A key assumption for the cuComplex and cuDoubleComplex case is that this tolerance is given as the real part. For example tol = 1e-8 + 0*i and we extract cureal, that is the x component of this struct.

  • This function requires temporary extra storage that is allocated internally

  • The routine supports asynchronous execution if the Stream Ordered Memory Allocator is available

  • The routine supports CUDA graph capture if the Stream Ordered Memory Allocator is available

Input

handle

handle to the cuSPARSE library context.

m

number of rows of matrix A.

descrA

the descriptor of matrix A. The supported matrix type is CUSPARSE_MATRIX_TYPE_GENERAL. Also, the supported index bases are CUSPARSE_INDEX_BASE_ZERO and CUSPARSE_INDEX_BASE_ONE.

csrValA

csr noncompressed values array

csrRowPtrA

the corresponding input noncompressed row pointer.

tol

non-negative tolerance to determine if a number less than or equal to it.

Output

nnzPerRow

this array contains the number of elements whose absolute values are greater than tol per row.

nnzC

host/device pointer of the total number of elements whose absolute values are greater than tol.

See cusparseStatus_t for the description of the return status.