cuSPARSELt Functions

Library Management Functions

cusparseLtInit

cusparseStatus_t
cusparseLtInit(cusparseLtHandle_t* handle)
The function initializes the cuSPARSELt library handle (cusparseLtHandle_t) which holds the cuSPARSELt library context. It allocates light hardware resources on the host, and must be called prior to making any other cuSPARSELt library calls. Calling any cusparseLt function which uses cusparseLtHandle_t without a previous call of cusparseLtInit() will return an error.
The cuSPARSELt library context is tied to the current CUDA device. To use the library on multiple devices, one cuSPARSELt handle should be created for each device.

Parameter

Memory

In/Out

Description

handle

Host

OUT

cuSPARSELt library handle

See cusparseStatus_t for the description of the return status.


cusparseLtDestroy

cusparseStatus_t
cusparseLtDestroy(const cusparseLtHandle_t* handle)
The function releases hardware resources used by the cuSPARSELt library. This function is the last call with a particular handle to the cuSPARSELt library.
Calling any cusparseLt function which uses cusparseLtHandle_t after cusparseLtDestroy() will return an error.

Parameter

Memory

In/Out

Description

handle

Host

IN

cuSPARSELt library handle

See cusparseStatus_t for the description of the return status.


Matmul Functions

cusparseLtDenseDescriptorInit

cusparseStatus_t
cusparseLtDenseDescriptorInit(const cusparseLtHandle_t*  handle,
                              cusparseLtMatDescriptor_t* matDescr,
                              int64_t                    rows,
                              int64_t                    cols,
                              int64_t                    ld,
                              uint32_t                   alignment,
                              cudaDataType               valueType,
                              cusparseOrder_t            order)

The function initializes the descriptor of a dense matrix.

Parameter

Memory

In/Out

Description

handle

Host

IN

cuSPARSELt library handle

matDescr

Host

OUT

Dense matrix description

rows

Host

IN

Number of rows

cols

Host

IN

Number of columns

ld

Host

IN

Leading dimension

alignment

Host

IN

Memory alignment in bytes

valueType

Host

IN

Data type of the matrix

order

Host

IN

Memory layout

See cusparseStatus_t for the description of the return status.


cusparseLtStructuredDescriptorInit

cusparseStatus_t
cusparseLtStructuredDescriptorInit(const cusparseLtHandle_t*  handle,
                                   cusparseLtMatDescriptor_t* matDescr,
                                   int64_t                    rows,
                                   int64_t                    cols,
                                   int64_t                    ld,
                                   uint32_t                   alignment,
                                   cudaDataType               valueType,
                                   cusparseOrder_t            order,
                                   cusparseLtSparsity_t       sparsity)

The function initializes the descriptor of a structured matrix.

Parameter

Memory

In/Out

Description

handle

Host

IN

cuSPARSELt library handle

matDescr

Host

OUT

Dense matrix description

rows

Host

IN

Number of rows

cols

Host

IN

Number of columns

ld

Host

IN

Leading dimension

alignment

Host

IN

Memory alignment in bytes

valueType

Host

IN

Data type of the matrix

order

Host

IN

Memory layout

sparsity

Host

IN

Matrix sparsity ratio

Sparsity ratio

Value

Description

CUSPARSELT_SPARSITY_50_PERCENT

50% Sparsity Ratio (2:4 Sparse MMA)

See cusparseStatus_t for the description of the return status.


cusparseLtMatmulDescriptorInit

cusparseStatus_t CUSPARSELT_API
cusparseLtMatmulDescriptorInit(const cusparseLtHandle_t*        handle,
                               cusparseLtMatmulDescriptor_t*    matMulDescr,
                               cusparseOperation_t              opA,
                               cusparseOperation_t              opB,
                               const cusparseLtMatDescriptor_t* matA,
                               const cusparseLtMatDescriptor_t* matB,
                               const cusparseLtMatDescriptor_t* matC,
                               const cusparseLtMatDescriptor_t* matD,
                               cusparseComputeType              computeType)

The function initializes the matrix multiplication descriptor.

Parameter

Memory

In/Out

Description

handle

Host

IN

cuSPARSELt library handle

matMulDescr

Host

OUT

Matrix multiplication descriptor

opA

Host

IN

Operation applied to the matrix A

opB

Host

IN

Operation applied to the matrix B

matA

Host

IN

Matrix A descriptor

matB

Host

IN

Matrix B descriptor

matC

Host

IN

Matrix C descriptor

matD

Host

IN

Matrix D descriptor

computeType

Host

IN

Compute precision

See cusparseStatus_t for the description of the return status.


cusparseLtMatmulAlgSelectionInit

cusparseStatus_t
cusparseLtMatmulAlgSelectionInit(const cusparseLtHandle_t*           handle,
                                 cusparseLtMatmulAlgSelection_t*     algSelection,
                                 const cusparseLtMatmulDescriptor_t* matmulDescr,
                                 cusparseLtMatmulAlg_t               alg)

The function initializes the algorithm selection descriptor.

Parameter

Memory

In/Out

Description

handle

Host

IN

cuSPARSELt library handle

algSelection

Host

OUT

Algorithm selection descriptor

matMulDescr

Host

IN

Matrix multiplication descriptor

alg

Host

IN

Algorithm mode

See cusparseStatus_t for the description of the return status.


cusparseLtMatmulAlgSetAttribute

cusparseStatus_t
cusparseLtMatmulAlgSetAttribute(const cusparseLtHandle_t*       handle,
                                cusparseLtMatmulAlgSelection_t* algSelection,
                                cusparseLtMatmulAlgAttribute_t  attribute,
                                const void*                     data,
                                size_t                          dataSize)

The function sets the value of the specified attribute belonging to algorithm selection descriptor.

Parameter

Memory

In/Out

Description

handle

Host

IN

cuSPARSELt library handle

algSelection

Host

OUT

Algorithm selection descriptor

attribute

Host

IN

The attribute that will be set by this function

data

Host

IN

Pointer to the value to which the specified attribute will be set

dataSize

Host

IN

Size in bytes of the attribute value used for verification

See cusparseStatus_t for the description of the return status.


cusparseLtMatmulAlgGetAttribute

cusparseStatus_t
cusparseLtMatmulAlgGetAttribute(const cusparseLtHandle_t*             handle,
                                const cusparseLtMatmulAlgSelection_t* algSelection,
                                cusparseLtMatmulAlgAttribute_t        attribute,
                                void*                                 data,
                                size_t                                dataSize)

The function returns the value of the queried attribute belonging to algorithm selection descriptor.

Parameter

Memory

In/Out

Description

handle

Host

IN

cuSPARSELt library handle

algSelection

Host

IN

Algorithm selection descriptor

attribute

Host

IN

The attribute that will be retrieved by this function

data

Host

OUT

Memory address containing the attribute value retrieved by this function

dataSize

Host

IN

Size in bytes of the attribute value used for verification

See cusparseStatus_t for the description of the return status.


cusparseLtMatmulGetWorkspace

cusparseStatus_t
cusparseLtMatmulGetWorkspace(const cusparseLtHandle_t*             handle,
                             const cusparseLtMatmulAlgSelection_t* algSelection,
                             size_t*                               workspaceSize)

The function determines the required workspace size associated to the selected algorithm.

Parameter

Memory

In/Out

Description

handle

Host

IN

cuSPARSELt library handle

algSelection

Host

IN

Algorithm selection descriptor

workspaceSize

Host

OUT

Workspace size in bytes

See cusparseStatus_t for the description of the return status.


cusparseLtMatmulPlanInit

cusparseStatus_t
cusparseLtMatmulPlanInit(const cusparseLtHandle_t*             handle,
                         cusparseLtMatmulPlan_t*               plan,
                         const cusparseLtMatmulDescriptor_t*   matmulDescr,
                         const cusparseLtMatmulAlgSelection_t* algSelection,
                         size_t                                workspaceSize)

Parameter

Memory

In/Out

Description

handle

Host

IN

cuSPARSELt library handle

plan

Host

OUT

Matrix multiplication plan

matMulDescr

Host

IN

Matrix multiplication descriptor

algSelection

Host

IN

Algorithm selection descriptor

workspaceSize

Host

IN

Workspace size in bytes

See cusparseStatus_t for the description of the return status.


cusparseLtMatmulPlanDestroy

cusparseStatus_t
cusparseLtMatmulPlanDestroy(const cusparseLtMatmulPlan_t* plan)
The function releases the resources used by an instance of the matrix multiplication plan. This function is the last call with a specific plan instance.
Calling any cusparseLt function which uses cusparseLtMatmulPlan_t after cusparseLtMatmulPlanDestroy() will return an error.

Parameter

Memory

In/Out

Description

plan

Host

IN

Matrix multiplication plan

See cusparseStatus_t for the description of the return status.


cusparseLtMatmul

cusparseStatus_t
cusparseLtMatmul(const cusparseLtHandle_t*     handle,
                 const cusparseLtMatmulPlan_t* plan,
                 const void*                   alpha,
                 const void*                   d_A,
                 const void*                   d_B,
                 const void*                   beta,
                 const void*                   d_C,
                 void*                         d_D,
                 void*                         workspace,
                 cudaStream_t*                 streams,
                 int32_t                       numStreams)

The function computes the matrix multiplication of matrices A and B to produce the the output matrix D, according to the following operation:

D = \alpha op(A) * op(B) + \beta op(C)

where A, B, and C are input matrices, and \alpha and \beta are input scalars.
Note: The function currently only supports the case where C == D

Parameter

Memory

In/Out

Description

handle

Host

IN

cuSPARSELt library handle

plan

Host

IN

Matrix multiplication plan

alpha

Host

IN

\alpha scalar used for multiplication (float data type)

d_A

Device

IN

Pointer to the structured matrix A

d_B

Device

IN

Pointer to the dense matrix B

beta

Host

IN

\beta scalar used for multiplication (float data type)

d_C

Device

OUT

Pointer to the dense matrix C

d_D

Device

OUT

Pointer to the dense matrix D

workspace

Device

IN

Pointer to workspace

streams

Host

IN

Pointer to CUDA stream array for the computation

numStreams

Host

IN

Number of CUDA streams in streams

Datatypes Supported:

Input

Ouput

Compute

CUDA_R_16F

CUDA_R_16F

CUSPARSE_COMPUTE_16F

CUDA_R_16BF

CUDA_R_16BF

CUSPARSE_COMPUTE_16F

CUDA_R_8I

CUDA_R_8I

CUSPARSE_COMPUTE_32I

The structured matrix A (compressed) must respect the following constrains:

  • For opA = CUSPARSE_NON_TRANSPOSE, each row must have at least two non-zero values every four elements

  • For opA = CUSPARSE_TRANSPOSE, each column must have at least two non-zero values every four elements

The correctness of the pruning result (matrix A) can be check with the function cusparseLtSpMMAPruneCheck().

Limitations:

  • All pointers must be aligned to 16 bytes

  • For CUDA_R_16F and CUDA_R_16BF data types, the total size of the matrix cannot exceed 2^{31}-1 elements

  • For CUDA_R_8I data type, the total size of the matrix cannot exceed 2^{32}-1 elements

  • CUDA_R_8I data type only supports:

    • opA/opB = TN if the matrix orders are orderA/orderB = Col/Col

    • opA/opB = NT if the matrix orders are orderA/orderB = Row/Row

    • opA/opB = NN if the matrix orders are orderA/orderB = Row/Col

    • opA/opB = TT if the matrix orders are orderA/orderB = Col/Row

  • Given A of size m \times k, B of size k \times n, and C of size m \times n (regardless opA, opB)

    • k must be a multiple of 32

    • For CUDA_R_16F and CUDA_R_16BF data types, m, n must be a multiple of 8

    • For CUDA_R_8I, m, n must be a multiple of 16

Properties

  • The routine requires no extra storage

  • The routine supports asynchronous execution with respect to streams[0]

  • Provides deterministic (bit-wise) results for each run

cusparseLtMatmul supports the following optimizations:

  • CUDA graph capture

  • Hardware Memory Compression

See cusparseStatus_t for the description of the return status.


cusparseLtMatmulSearch

cusparseStatus_t
cusparseLtMatmulSearch(const cusparseLtHandle_t* handle,
                       cusparseLtMatmulPlan_t*   plan,
                       const void*               alpha,
                       const void*               d_A,
                       const void*               d_B,
                       const void*               beta,
                       const void*               d_C,
                       void*                     d_D,
                       void*                     workspace,
                       cudaStream_t*             streams,
                       int32_t                   numStreams)
The function evaluates all available algorithms for the matrix multiplication and automatically updates the plan by selecting the fastest one. The functionality is intended to be used for auto-tuning purposes when the same operation is repeated multiple times over different inputs.
The function behavior is the same of cusparseLtMatmul().

Helper Functions

cusparseLtSpMMAPrune

cusparseStatus_t
cusparseLtSpMMAPrune(const cusparseLtHandle_t*           handle,
                     const cusparseLtMatmulDescriptor_t* matmulDescr,
                     const void*                         d_in,
                     void*                               d_out,
                     cusparseLtPruneAlg_t                pruneAlg,
                     cudaStream_t                        stream)

The function prunes a dense matrix d_in according to the specified algorithm pruneAlg.

Parameter

Memory

In/Out

Description

handle

Host

IN

cuSPARSELt library handle

matMulDescr

Host

IN

Matrix multiplication descriptor

d_in

Device

IN

Pointer to the dense matrix

d_out

Device

OUT

Pointer to the pruned matrix

pruneAlg

Device

IN

Pruning algorithm

stream

Host

IN

CUDA stream for the computation

Pruning Algorithms

Value

Description

CUSPARSELT_PRUNE_SPMMA_TILE

Zero-out eight values in a 4x4 tile to maximize the L1-norm of the resulting tile, under the constraint of selecting exactly two elements for each row and column

CUSPARSELT_PRUNE_SPMMA_STRIP

Zero-out two values in a 1x4 strip to maximize the L1-norm of the resulting strip. The strip direction is chosen according to the operation opA specified in matmulDescr

Properties

  • The routine requires no extra storage

  • The routine supports asynchronous execution with respect to stream

  • Provides deterministic (bit-wise) results for each run

cusparseLtSpMMAPrune supports the following optimizations:

  • CUDA graph capture

  • Hardware Memory Compression

See cusparseStatus_t for the description of the return status.


cusparseLtSpMMAPruneCheck

cusparseStatus_t
cusparseLtSpMMAPruneCheck(const cusparseLtHandle_t*           handle,
                          const cusparseLtMatmulDescriptor_t* matmulDescr,
                          const void*                         d_in,
                          int*                                valid,
                          cudaStream_t                        stream)

The function checks the correctness of the pruning structure for a given matrix.

Parameter

Memory

In/Out

Description

handle

Host

IN

cuSPARSELt library handle

matMulDescr

Host

IN

Matrix multiplication descriptor

d_in

Device

IN

Pointer to the matrix to check

valid

Host

OUT

Validation results (0 correct, 1 wrong)

stream

Host

IN

CUDA stream for the computation

See cusparseStatus_t for the description of the return status.


cusparseLtSpMMACompressedSize

cusparseStatus_t
cusparseLtSpMMACompressedSize(const cusparseLtHandle_t*     handle,
                              const cusparseLtMatmulPlan_t* plan,
                              size_t*                       compressedSize)

The function provides the size of the compressed matrix to be allocated before calling cusparseLtSpMMACompress().

Parameter

Memory

In/Out

Description

handle

Host

IN

cuSPARSELt library handle

plan

Host

IN

Matrix plan descriptor

compressedSize

Host

OUT

Size in bytes of the compressed matrix

See cusparseStatus_t for the description of the return status.


cusparseLtSpMMACompress

cusparseStatus_t
cusparseLtSpMMACompress(const cusparseLtHandle_t*     handle,
                        const cusparseLtMatmulPlan_t* plan,
                        const void*                   d_dense,
                        void*                         d_compressed,
                        cudaStream_t                  stream)

The function compresses a dense matrix d_dense. The compressed matrix is intended to be used as the first operand A in the cusparseLtMatmul() function.

Parameter

Memory

In/Out

Description

handle

Host

IN

cuSPARSELt library handle

plan

Host

IN

Matrix multiplication plan

d_dense

Device

IN

Pointer to the dense matrix

d_compressed

Device

OUT

Pointer to the compressed matrix

stream

Host

IN

CUDA stream for the computation

Properties

  • The routine requires no extra storage

  • The routine supports asynchronous execution with respect to stream

  • Provides deterministic (bit-wise) results for each run

cusparseLtSpMMACompress supports the following optimizations:

  • CUDA graph capture

  • Hardware Memory Compression

See cusparseStatus_t for the description of the return status.