cuSPARSELt Functions#

Library Management Functions#

cusparseLtInit#

cusparseStatus_t
cusparseLtInit(cusparseLtHandle_t* handle)
The function initializes the cuSPARSELt library handle (cusparseLtHandle_t) which holds the cuSPARSELt library context. It allocates light hardware resources on the host, and must be called prior to making any other cuSPARSELt library calls. Calling any cusparseLt function which uses cusparseLtHandle_t without a previous call of cusparseLtInit() will return an error.
The cuSPARSELt library context is tied to the current CUDA device. To use the library on multiple devices, one cuSPARSELt handle should be created for each device.

Parameter

Memory

In/Out

Description

handle

Host

OUT

cuSPARSELt library handle

See cusparseStatus_t for the description of the return status.


cusparseLtDestroy#

cusparseStatus_t
cusparseLtDestroy(const cusparseLtHandle_t* handle)
The function releases hardware resources used by the cuSPARSELt library. This function is the last call with a particular handle to the cuSPARSELt library.
Calling any cusparseLt function which uses cusparseLtHandle_t after cusparseLtDestroy() will return an error.

Parameter

Memory

In/Out

Description

handle

Host

IN

cuSPARSELt library handle

See cusparseStatus_t for the description of the return status.


cusparseLtGetVersion#

cusparseStatus_t
cusparseLtGetVersion(const cusparseLtHandle_t* handle,
                     int*                      version)
This function returns the version number of the cuSPARSELt library.

Parameter

Memory

In/Out

Description

handle

Host

IN

cuSPARSELt library handle

version

Host

OUT

The version number of the library

See cusparseStatus_t for the description of the return status.


cusparseLtGetProperty#

cusparseStatus_t
cusparseLtGetProperty(libraryPropertyType propertyType,
                      int*                value)
The function returns the value of the requested property. Refer to libraryPropertyType for supported types.

Parameter

Memory

In/Out

Description

propertyType

Host

IN

Requested property

value

Host

OUT

Value of the requested property

libraryPropertyType (defined in library_types.h):

Value

Meaning

MAJOR_VERSION

Enumerator to query the major version

MINOR_VERSION

Enumerator to query the minor version

PATCH_LEVEL

Number to identify the patch level

See cusparseStatus_t for the description of the return status.


Matrix Descriptor Functions#

cusparseLtDenseDescriptorInit#

cusparseStatus_t
cusparseLtDenseDescriptorInit(const cusparseLtHandle_t*  handle,
                              cusparseLtMatDescriptor_t* matDescr,
                              int64_t                    rows,
                              int64_t                    cols,
                              int64_t                    ld,
                              uint32_t                   alignment,
                              cudaDataType               valueType,
                              cusparseOrder_t            order)

The function initializes the descriptor of a dense matrix.

Parameter

Memory

In/Out

Description

Possible Values

handle

Host

IN

cuSPARSELt library handle

matDescr

Host

OUT

Dense matrix description

rows

Host

IN

Number of rows

cols

Host

IN

Number of columns

ld

Host

IN

Leading dimension

rows if column-major, cols if row-major

alignment

Host

IN

Memory alignment in bytes

Multiple of 16

valueType

Host

IN

Data type of the matrix

CUDA_R_32F, CUDA_R_16F, CUDA_R_16BF, CUDA_R_8I, CUDA_R_8F_E4M3, CUDA_R_8F_E5M2

order

Host

IN

Memory layout

CUSPARSE_ORDER_COL, CUSPARSE_ORDER_ROW

Constrains:

  • rows, cols, and ld must be a multiple of

    • 16 if valueType is CUDA_R_8I, CUDA_R_8F_E4M3 or CUDA_R_8F_E5M2

    • 8 if valueType is CUDA_R_16F or CUDA_R_16BF

    • 4 if valueType is CUDA_R_32F

See cusparseStatus_t for the description of the return status.


cusparseLtStructuredDescriptorInit#

cusparseStatus_t
cusparseLtStructuredDescriptorInit(const cusparseLtHandle_t*  handle,
                                   cusparseLtMatDescriptor_t* matDescr,
                                   int64_t                    rows,
                                   int64_t                    cols,
                                   int64_t                    ld,
                                   uint32_t                   alignment,
                                   cudaDataType               valueType,
                                   cusparseOrder_t            order,
                                   cusparseLtSparsity_t       sparsity)

The function initializes the descriptor of a structured matrix.

Parameter

Memory

In/Out

Description

Possible Values

handle

Host

IN

cuSPARSELt library handle

matDescr

Host

OUT

Sparse matrix description

rows

Host

IN

Number of rows

cols

Host

IN

Number of columns

ld

Host

IN

Leading dimension

rows if column-major, cols if row-major

alignment

Host

IN

Memory alignment in bytes

Multiple of 16

valueType

Host

IN

Data type of the matrix

CUDA_R_32F, CUDA_R_16F, CUDA_R_16BF, CUDA_R_8I, CUDA_R_8F_E4M3, CUDA_R_8F_E5M2

order

Host

IN

Memory layout

CUSPARSE_ORDER_COL, CUSPARSE_ORDER_ROW

sparsity

Host

IN

Matrix sparsity ratio

CUSPARSELT_SPARSITY_50_PERCENT

Constrains:

  • rows, cols, and ld must be a multiple of

    • 32 if valueType is CUDA_R_8I, CUDA_R_8F_E4M3 or CUDA_R_8F_E5M2

    • 16 if valueType is CUDA_R_16F, or CUDA_R_16BF

    • 8 if valueType is CUDA_R_32F

See cusparseStatus_t for the description of the return status.


cusparseLtMatDescriptorDestroy#

cusparseStatus_t
cusparseLtMatDescriptorDestroy(const cusparseLtMatDescriptor_t* matDescr)
The function releases the resources used by an instance of a matrix descriptor. After this call, the matrix descriptor, the matmul descriptor, and the plan can no longer be used.

Parameter

Memory

In/Out

Description

matDescr

Host

IN

Matrix descriptor

See cusparseStatus_t for the description of the return status.


cusparseLtMatDescSetAttribute#

cusparseStatus_t
cusparseLtMatDescSetAttribute(const cusparseLtHandle_t*    handle,
                              cusparseLtMatDescriptor_t*   matmulDescr,
                              cusparseLtMatDescAttribute_t matAttribute,
                              const void*                  data,
                              size_t                       dataSize)
The function sets the value of the specified attribute belonging to matrix descriptor such as number of batches and their stride.

Parameter

Memory

In/Out

Description

Possible Values

handle

Host

IN

cuSPARSELt library handle

matmulDescr

Host

OUT

Matrix descriptor

matAttribute

Host

IN

Attribute to set

CUSPARSELT_MAT_NUM_BATCHES, CUSPARSELT_MAT_BATCH_STRIDE

data

Host

IN

Pointer to the value to which the specified attribute will be set

dataSize

Host

IN

Size in bytes of the attribute value used for verification

See cusparseStatus_t for the description of the return status.


cusparseLtMatDescGetAttribute#

cusparseStatus_t
cusparseLtMatDescGetAttribute(const cusparseLtHandle_t*        handle,
                              const cusparseLtMatDescriptor_t* matmulDescr,
                              cusparseLtMatDescAttribute_t     matAttribute,
                              void*                            data,
                              size_t                           dataSize)
The function gets the value of the specified attribute belonging to matrix descriptor such as number of batches and their stride.

Parameter

Memory

In/Out

Description

Possible Values

handle

Host

IN

cuSPARSELt library handle

matmulDescr

Host

IN

Matrix descriptor

matAttribute

Host

IN

Attribute to retrieve

CUSPARSELT_MAT_NUM_BATCHES, CUSPARSELT_MAT_BATCH_STRIDE

data

Host

OUT

Memory address containing the attribute value retrieved by this function

dataSize

Host

IN

Size in bytes of the attribute value used for verification

See cusparseStatus_t for the description of the return status.


Matmul Descriptor Functions#

cusparseLtMatmulDescriptorInit#

cusparseStatus_t
cusparseLtMatmulDescriptorInit(const cusparseLtHandle_t*        handle,
                               cusparseLtMatmulDescriptor_t*    matmulDescr,
                               cusparseOperation_t              opA,
                               cusparseOperation_t              opB,
                               const cusparseLtMatDescriptor_t* matA,
                               const cusparseLtMatDescriptor_t* matB,
                               const cusparseLtMatDescriptor_t* matC,
                               const cusparseLtMatDescriptor_t* matD,
                               cusparseComputeType              computeType)

The function initializes the matrix multiplication descriptor.

Parameter

Memory

In/Out

Description

Possible Values

handle

Host

IN

cuSPARSELt library handle

matmulDescr

Host

OUT

Matrix multiplication descriptor

opA

Host

IN

Operation applied to the matrix A

CUSPARSE_OPERATION_NON_TRANSPOSE, CUSPARSE_OPERATION_TRANSPOSE

opB

Host

IN

Operation applied to the matrix B

CUSPARSE_OPERATION_NON_TRANSPOSE, CUSPARSE_OPERATION_TRANSPOSE

matA

Host

IN

Structured or dense matrix descriptor A

matB

Host

IN

Structured or dense matrix descriptor B

matC

Host

IN

Dense matrix descriptor C

matD

Host

IN

Dense matrix descriptor D

computeType

Host

IN

Compute precision

CUSPARSE_COMPUTE_32I, CUSPARSE_COMPUTE_32F, CUSPARSE_COMPUTE_16F

The structured matrix descriptor can used for matA or matB but not both.

Constrains:

  • See cusparseLtMatmul() for the supported data types.

  • CUDA_R_8I, CUDA_R_8F_E4M3 and CUDA_R_8F_E5M2 data type only supports (the opposite if B is structured):

    • opA/opB = TN if the matrix orders are orderA/orderB = Col/Col

    • opA/opB = NT if the matrix orders are orderA/orderB = Row/Row

    • opA/opB = NN if the matrix orders are orderA/orderB = Row/Col

    • opA/opB = TT if the matrix orders are orderA/orderB = Col/Row

  • C and D must have the same leading dimension and memory layout (see cusparseOrder_t for different memory layouts).

  • The maximum number of elements for each dimension (rows and columns) of matrices C and D is limited to 2097120.

See cusparseStatus_t for the description of the return status.


cusparseLtMatmulDescSetAttribute#

cusparseStatus_t
cusparseLtMatmulDescSetAttribute(const cusparseLtHandle_t*       handle,
                                 cusparseLtMatmulDescriptor_t*   matmulDescr,
                                 cusparseLtMatmulDescAttribute_t matmulAttribute,
                                 const void*                     data,
                                 size_t                          dataSize)
The function sets the value of the specified attribute belonging to matrix descriptor such as activation function and bias.

Parameter

Memory

In/Out

Description

handle

Host

IN

cuSPARSELt library handle

matmulDescr

Host

OUT

Matrix descriptor

matmulAttribute

Host

IN

Attribute to set

CUSPARSELT_MATMUL_ACTIVATION_RELU, CUSPARSELT_MATMUL_ACTIVATION_RELU_UPPERBOUND, CUSPARSELT_MATMUL_ACTIVATION_RELU_THRESHOLD, CUSPARSELT_MATMUL_ACTIVATION_GELU, CUSPARSELT_MATMUL_ACTIVATION_GELU_SCALING, CUSPARSELT_MATMUL_ALPHA_VECTOR_SCALING, CUSPARSELT_MATMUL_BETA_VECTOR_SCALING, CUSPARSELT_MATMUL_BIAS_POINTER, CUSPARSELT_MATMUL_BIAS_STRIDE

data

Host

IN

Pointer to the value to which the specified attribute will be set

dataSize

Host

IN

Size in bytes of the attribute value used for verification

See cusparseStatus_t for the description of the return status.


cusparseLtMatmulDescGetAttribute#

cusparseStatus_t
cusparseLtMatmulDescGetAttribute(const cusparseLtHandle_t*           handle,
                                 const cusparseLtMatmulDescriptor_t* matmulDescr,
                                 cusparseLtMatmulDescAttribute_t     matmulAttribute,
                                 void*                               data,
                                 size_t                              dataSize)
The function gets the value of the specified attribute belonging to matrix descriptor such as activation function and bias.

Parameter

Memory

In/Out

Description

handle

Host

IN

cuSPARSELt library handle

matmulDescr

Host

OUT

Matrix descriptor

matmulAttribute

Host

IN

Attribute to retrieve

CUSPARSELT_MATMUL_ACTIVATION_RELU, CUSPARSELT_MATMUL_ACTIVATION_RELU_UPPERBOUND, CUSPARSELT_MATMUL_ACTIVATION_RELU_THRESHOLD, CUSPARSELT_MATMUL_ACTIVATION_GELU, CUSPARSELT_MATMUL_ACTIVATION_GELU_SCALING, CUSPARSELT_MATMUL_ALPHA_VECTOR_SCALING, CUSPARSELT_MATMUL_BETA_VECTOR_SCALING, CUSPARSELT_MATMUL_BIAS_POINTER, CUSPARSELT_MATMUL_BIAS_STRIDE

data

Host

OUT

Memory address containing the attribute value retrieved by this function

dataSize

Host

IN

Size in bytes of the attribute value used for verification

See cusparseStatus_t for the description of the return status.


Matmul Algorithm Functions#

cusparseLtMatmulAlgSelectionInit#

cusparseStatus_t
cusparseLtMatmulAlgSelectionInit(const cusparseLtHandle_t*           handle,
                                 cusparseLtMatmulAlgSelection_t*     algSelection,
                                 const cusparseLtMatmulDescriptor_t* matmulDescr,
                                 cusparseLtMatmulAlg_t               alg)

The function initializes the algorithm selection descriptor.

Parameter

Memory

In/Out

Description

Possible Values

handle

Host

IN

cuSPARSELt library handle

algSelection

Host

OUT

Algorithm selection descriptor

matmulDescr

Host

IN

Matrix multiplication descriptor

alg

Host

IN

Algorithm mode

CUSPARSELT_MATMUL_ALG_DEFAULT

See cusparseStatus_t for the description of the return status.


cusparseLtMatmulAlgSetAttribute#

cusparseStatus_t
cusparseLtMatmulAlgSetAttribute(const cusparseLtHandle_t*       handle,
                                cusparseLtMatmulAlgSelection_t* algSelection,
                                cusparseLtMatmulAlgAttribute_t  attribute,
                                const void*                     data,
                                size_t                          dataSize)

The function sets the value of the specified attribute belonging to algorithm selection descriptor.

Parameter

Memory

In/Out

Description

Possible Values

handle

Host

IN

cuSPARSELt library handle

algSelection

Host

OUT

Algorithm selection descriptor

attribute

Host

IN

The attribute to set

CUSPARSELT_MATMUL_ALG_CONFIG_ID, CUSPARSELT_MATMUL_SEARCH_ITERATIONS, CUSPARSELT_MATMUL_SPLIT_K, CUSPARSELT_MATMUL_SPLIT_K_MODE, CUSPARSELT_MATMUL_SPLIT_K_BUFFERS

data

Host

IN

Pointer to the value to which the specified attribute will be set

dataSize

Host

IN

Size in bytes of the attribute value used for verification

See cusparseStatus_t for the description of the return status.


cusparseLtMatmulAlgGetAttribute#

cusparseStatus_t
cusparseLtMatmulAlgGetAttribute(const cusparseLtHandle_t*             handle,
                                const cusparseLtMatmulAlgSelection_t* algSelection,
                                cusparseLtMatmulAlgAttribute_t        attribute,
                                void*                                 data,
                                size_t                                dataSize)

The function returns the value of the queried attribute belonging to algorithm selection descriptor.

Parameter

Memory

In/Out

Description

Possible Values

handle

Host

IN

cuSPARSELt library handle

algSelection

Host

IN

Algorithm selection descriptor

attribute

Host

IN

The attribute that will be retrieved by this function

CUSPARSELT_MATMUL_ALG_CONFIG_ID, CUSPARSELT_MATMUL_ALG_CONFIG_MAX_ID, CUSPARSELT_MATMUL_SEARCH_ITERATIONS, CUSPARSELT_MATMUL_SPLIT_K, CUSPARSELT_MATMUL_SPLIT_K_MODE, CUSPARSELT_MATMUL_SPLIT_K_BUFFERS

data

Host

OUT

Memory address containing the attribute value retrieved by this function

dataSize

Host

IN

Size in bytes of the attribute value used for verification

See cusparseStatus_t for the description of the return status.


Matmul Functions#

cusparseLtMatmulGetWorkspace#

cusparseStatus_t
cusparseLtMatmulGetWorkspace(const cusparseLtHandle_t*     handle,
                             const cusparseLtMatmulPlan_t* plan,
                             size_t*                       workspaceSize)

The function determines the required workspace size associated to the selected algorithm.

Parameter

Memory

In/Out

Description

handle

Host

IN

cuSPARSELt library handle

plan

Host

IN

Matrix multiplication plan

workspaceSize

Host

OUT

Workspace size in bytes

See cusparseStatus_t for the description of the return status.


cusparseLtMatmulPlanInit#

cusparseStatus_t
cusparseLtMatmulPlanInit(const cusparseLtHandle_t*             handle,
                         cusparseLtMatmulPlan_t*               plan,
                         const cusparseLtMatmulDescriptor_t*   matmulDescr,
                         const cusparseLtMatmulAlgSelection_t* algSelection)

Parameter

Memory

In/Out

Description

handle

Host

IN

cuSPARSELt library handle

plan

Host

OUT

Matrix multiplication plan

matmulDescr

Host

IN

Matrix multiplication descriptor

algSelection

Host

IN

Algorithm selection descriptor

See cusparseStatus_t for the description of the return status.


cusparseLtMatmulPlanDestroy#

cusparseStatus_t
cusparseLtMatmulPlanDestroy(const cusparseLtMatmulPlan_t* plan)
The function releases the resources used by an instance of the matrix multiplication plan. This function is the last call with a specific plan instance.
Calling any cusparseLt function which uses cusparseLtMatmulPlan_t after cusparseLtMatmulPlanDestroy() will return an error.

Parameter

Memory

In/Out

Description

plan

Host

IN

Matrix multiplication plan

See cusparseStatus_t for the description of the return status.


cusparseLtMatmul#

cusparseStatus_t
cusparseLtMatmul(const cusparseLtHandle_t*     handle,
                 const cusparseLtMatmulPlan_t* plan,
                 const void*                   alpha,
                 const void*                   d_A,
                 const void*                   d_B,
                 const void*                   beta,
                 const void*                   d_C,
                 void*                         d_D,
                 void*                         workspace,
                 cudaStream_t*                 streams,
                 int32_t                       numStreams)

The function computes the matrix multiplication of matrices A and B to produce the the output matrix D, according to the following operation:

D = Activation(\alpha op(A) \cdot op(B) + \beta op(C) + bias)

where A, B, and C are input matrices, and \alpha and \beta are input scalars or vectors of scalars (device-side pointers).
As described by cusparseLtMatmulDescriptorInit(), one and only one input matrix A or B should have structured sparsity, and respective d_A or d_B structured sparse matrix pointer should be output of cusparseLtSpMMACompress() or cusparseLtSpMMACompress2().
Note: The function currently only supports the case where D has the same shape of C

Parameter

Memory

In/Out

Description

handle

Host

IN

cuSPARSELt library handle

plan

Host

IN

Matrix multiplication plan

alpha

Host/Device

IN

\alpha scalar/vector of scalars used for multiplication (float data type). If alpha is scalar - this should be pointer to the host memory, otherwise - pointer to the device memory

d_A

Device

IN

Pointer to the structured or dense matrix A

d_B

Device

IN

Pointer to the structured or dense matrix B

beta

Host/Device

IN

\beta scalar/vector of scalars used for multiplication (float data type). It can have a NULL value only if CUSPARSELT_MATMUL_ALPHA_VECTOR_SCALING is set without CUSPARSELT_MATMUL_BETA_VECTOR_SCALING. If beta is scalar - this should be pointer to the host memory, otherwise - pointer to the device memory

d_C

Device

IN

Pointer to the dense matrix C

d_D

Device

OUT

Pointer to the dense matrix D

workspace

Device

IN

Pointer to workspace

streams

Host

IN

Pointer to CUDA stream array for the computation

numStreams

Host

IN

Number of CUDA streams in streams

Data types Supported:

Input A/B

Input C

Output D

Compute

CUDA_R_32F

CUDA_R_32F

CUSPARSE_COMPUTE_32F

CUDA_R_16F

CUDA_R_16F

CUSPARSE_COMPUTE_32F

CUSPARSE_COMPUTE_16F

CUDA_R_16BF

CUDA_R_16BF

CUSPARSE_COMPUTE_32F

CUDA_R_8I

CUDA_R_8I

CUSPARSE_COMPUTE_32I

CUDA_R_32I

CUDA_R_16F

CUDA_R_16BF

CUDA_R_8F_E4M3

CUDA_R_16F

CUDA_R_8F_E4M3

CUSPARSE_COMPUTE_32F

CUDA_R_16BF

CUDA_R_8F_E4M3

CUDA_R_16F

CUDA_R_16BF

CUDA_R_32F

CUDA_R_8F_E5M2

CUDA_R_16F

CUDA_R_8F_E5M2

CUSPARSE_COMPUTE_32F

CUDA_R_16BF

CUDA_R_8F_E5M2

CUDA_R_16F

CUDA_R_16BF

CUDA_R_32F

For detailed list of which GPU Compute Capabilities support which datatype combinations, see Key Features

Constrains:

  • All pointers must be aligned to 16 bytes

Properties

  • The routine requires no extra storage

  • The routine supports asynchronous execution with respect to streams[0]

  • Provides deterministic (bit-wise) results for each run

cusparseLtMatmul supports the following optimizations:

  • CUDA graph capture

  • Hardware Memory Compression

See cusparseStatus_t for the description of the return status.


cusparseLtMatmulSearch#

cusparseStatus_t
cusparseLtMatmulSearch(const cusparseLtHandle_t* handle,
                       cusparseLtMatmulPlan_t*   plan,
                       const void*               alpha,
                       const void*               d_A,
                       const void*               d_B,
                       const void*               beta,
                       const void*               d_C,
                       void*                     d_D,
                       void*                     workspace,
                       cudaStream_t*             streams,
                       int32_t                   numStreams)
The function evaluates all available algorithms for the matrix multiplication described by plan and automatically updates the cusparseLtMatmulAlgSelection_t used to initialize the plan by selecting the fastest one. The functionality is intended to be used for auto-tuning purposes when the same operation is repeated multiple times over different inputs.
The function behaves similarly to cusparseLtMatmul(), with the difference that d_D values may accumulate if the operation is performed in-place (d_C=d_D).

Parameter

Memory

In/Out

Description

handle

Host

IN

cuSPARSELt library handle

plan

Host

OUT

Matrix multiplication plan

alpha

Host

IN

\alpha scalar/vector of scalars used for multiplication (float data type)

d_A

Device

IN

Pointer to the structured or dense matrix A

d_B

Device

IN

Pointer to the structured or dense matrix B

beta

Host

IN

\beta scalar/vector of scalars used for multiplication (float data type). It can have a NULL value only if CUSPARSELT_MATMUL_ALPHA_VECTOR_SCALING is set without CUSPARSELT_MATMUL_BETA_VECTOR_SCALING

d_C

Device

IN

Pointer to the dense matrix C

d_D

Device

OUT

Pointer to the dense matrix D

workspace

Device

IN

Pointer to workspace

streams

Host

IN

Pointer to CUDA stream array for the computation

numStreams

Host

IN

Number of CUDA streams in streams


Helper Functions#

cusparseLtSpMMAPrune#

cusparseStatus_t
cusparseLtSpMMAPrune(const cusparseLtHandle_t*           handle,
                     const cusparseLtMatmulDescriptor_t* matmulDescr,
                     const void*                         d_in,
                     void*                               d_out,
                     cusparseLtPruneAlg_t                pruneAlg,
                     cudaStream_t                        stream)

The function prunes a dense matrix d_in according to the specified algorithm pruneAlg.

Parameter

Memory

In/Out

Description

Possible Values

handle

Host

IN

cuSPARSELt library handle

matmulDescr

Host

IN

Matrix multiplication descriptor

d_in

Device

IN

Pointer to the dense matrix

d_out

Device

OUT

Pointer to the pruned matrix

pruneAlg

Device

IN

Pruning algorithm

CUSPARSELT_PRUNE_SPMMA_TILE, CUSPARSELT_PRUNE_SPMMA_STRIP

stream

Host

IN

CUDA stream for the computation

Properties

  • The routine requires no extra storage

  • The routine supports asynchronous execution with respect to stream

  • Provides deterministic (bit-wise) results for each run

cusparseLtSpMMAPrune() supports the following optimizations:

  • CUDA graph capture

  • Hardware Memory Compression

See cusparseStatus_t for the description of the return status.


cusparseLtSpMMAPrune2 [DEPRECATED]#

cusparseStatus_t
cusparseLtSpMMAPrune2(const cusparseLtHandle_t*        handle,
                      const cusparseLtMatDescriptor_t* sparseMatDescr,
                      int                              isSparseA,
                      cusparseOperation_t              op,
                      const void*                      d_in,
                      void*                            d_out,
                      cusparseLtPruneAlg_t             pruneAlg,
                      cudaStream_t                     stream);

The function prunes a dense matrix d_in according to the specified algorithm pruneAlg.

Parameter

Memory

In/Out

Description

Possible Values

handle

Host

IN

cuSPARSELt library handle

sparseMatDescr

Host

IN

Structured (sparse) matrix descriptor

isSparseA

Host

IN

Specify if the structured (sparse) matrix is in the first position (matA or matB)

0 false, true otherwise

op

Host

IN

Operation that will be applied to the structured (sparse) matrix in the multiplication

CUSPARSE_OPERATION_NON_TRANSPOSE, CUSPARSE_OPERATION_TRANSPOSE

d_in

Device

IN

Pointer to the dense matrix

d_out

Device

OUT

Pointer to the pruned matrix

pruneAlg

Device

IN

Pruning algorithm

CUSPARSELT_PRUNE_SPMMA_TILE, CUSPARSELT_PRUNE_SPMMA_STRIP

stream

Host

IN

CUDA stream for the computation

If CUSPARSELT_PRUNE_SPMMA_TILE is used, isSparseA and op are not relevant.

The function has the same properties of cusparseLtSpMMAPrune()


cusparseLtSpMMAPruneCheck#

cusparseStatus_t
cusparseLtSpMMAPruneCheck(const cusparseLtHandle_t*           handle,
                          const cusparseLtMatmulDescriptor_t* matmulDescr,
                          const void*                         d_in,
                          int*                                d_valid,
                          cudaStream_t                        stream)

The function checks the correctness of the pruning structure for a given matrix. Data pruned with cusparseLtSpMMAPrune() is guaranteed to be correct and this function can be skipped.

Parameter

Memory

In/Out

Description

handle

Host

IN

cuSPARSELt library handle

matmulDescr

Host

IN

Matrix multiplication descriptor

d_in

Device

IN

Pointer to the matrix to check

d_valid

Device

OUT

Validation results (0 correct, 1 wrong)

stream

Host

IN

CUDA stream for the computation

See cusparseStatus_t for the description of the return status.


cusparseLtSpMMAPruneCheck2 [DEPRECATED]#

cusparseStatus_t
cusparseLtSpMMAPruneCheck2(const cusparseLtHandle_t*        handle,
                           const cusparseLtMatDescriptor_t* sparseMatDescr,
                           int                              isSparseA,
                           cusparseOperation_t              op,
                           const void*                      d_in,
                           int*                             d_valid,
                           cudaStream_t                     stream)

The function checks the correctness of the pruning structure for a given matrix.

Parameter

Memory

In/Out

Description

Possible Values

handle

Host

IN

cuSPARSELt library handle

sparseMatDescr

Host

IN

Structured (sparse) matrix descriptor

isSparseA

Host

IN

Specify if the structured (sparse) matrix is in the first position (matA or matB)

0: false, != 0: true

op

Host

IN

Operation that will be applied to the structured (sparse) matrix in the multiplication

CUSPARSE_OPERATION_NON_TRANSPOSE, CUSPARSE_OPERATION_TRANSPOSE

d_in

Device

IN

Pointer to the matrix to check

d_valid

Device

OUT

Validation results (0 correct, 1 wrong)

stream

Host

IN

CUDA stream for the computation

The function has the same properties of cusparseLtSpMMAPruneCheck()


cusparseLtSpMMACompressedSize#

cusparseStatus_t
cusparseLtSpMMACompressedSize(const cusparseLtHandle_t*     handle,
                              const cusparseLtMatmulPlan_t* plan,
                              size_t*                       compressedSize,
                              size_t*                       compressBufferSize)

The function provides the size of the compressed matrix to be allocated before calling cusparseLtSpMMACompress().

Parameter

Memory

In/Out

Description

handle

Host

IN

cuSPARSELt library handle

plan

Host

IN

Matrix plan descriptor

compressedSize

Host

OUT

Size in bytes for the compressed matrix

compressBufferSize

Host

OUT

Size in bytes for the buffer needed for the matrix compression

See cusparseStatus_t for the description of the return status.


cusparseLtSpMMACompressedSize2 [DEPRECATED]#

cusparseStatus_t
cusparseLtSpMMACompressedSize2(const cusparseLtHandle_t*        handle,
                               const cusparseLtMatDescriptor_t* sparseMatDescr,
                               size_t*                          compressedSize,
                               size_t*                          compressBufferSize)

The function provides the size of the compressed matrix to be allocated before calling cusparseLtSpMMACompress2(). It has to be called after cusparseLtMatmulPlanInit.

Parameter

Memory

In/Out

Description

handle

Host

IN

cuSPARSELt library handle

sparseMatDescr

Host

IN

Structured (sparse) matrix descriptor

compressedSize

Host

OUT

Size in bytes of the compressed matrix

compressBufferSize

Host

OUT

Size in bytes for the buffer needed for the matrix compression

The function has the same properties of cusparseLtSpMMACompressedSize()


cusparseLtSpMMACompress#

cusparseStatus_t
cusparseLtSpMMACompress(const cusparseLtHandle_t*     handle,
                        const cusparseLtMatmulPlan_t* plan,
                        const void*                   d_dense,
                        void*                         d_compressed,
                        void*                         d_compressed_buffer,
                        cudaStream_t                  stream)

The function compresses a dense matrix d_dense. The compressed matrix is intended to be used as the first/second operand A/B in the cusparseLtMatmul() or cusparseLtMatmulSearch() function.

Input matrix d_dense to this function must be pruned either with cusparseLtSpMMAPrune() <cusparseLtSpMMAPrune-label> or with custom function. Pruned data should respect the following constrains depending on the operation applied to this matrix in the cusparseLtMatmul() which is defined by cusparseLtMatmulDescriptor_t created in the cusparseLtMatmulDescriptorInit():

  • For op = CUSPARSE_NON_TRANSPOSE

    • CUDA_R_16F, CUDA_R_16BF, CUDA_R_8I, CUDA_R_8F_E4M3, CUDA_R_8F_E5M2 each row must have at least two non-zero values every four elements

    • CUDA_R_32F each row must have at least one non-zero value every two elements

  • For op = CUSPARSE_TRANSPOSE

    • CUDA_R_16F, CUDA_R_16BF, CUDA_R_8I, CUDA_R_8F_E4M3, CUDA_R_8F_E5M2 each column must have at least two non-zero values every four elements

    • CUDA_R_32F each column must have at least one non-zero value every two elements

int8, e4m3 and e5m2 kernels should run at high SM clocks for maximizing the performance.

The correctness of the pruning result (matrix A/B) can be checked with the function cusparseLtSpMMAPruneCheck(). Note that pruning with cusparseLtSpMMAPrune() <cusparseLtSpMMAPrune-label> is guaranteed to be correct.

Parameter

Memory

In/Out

Description

handle

Host

IN

cuSPARSELt library handle

plan

Host

IN

Matrix multiplication plan

d_dense

Device

IN

Pointer to the dense matrix

d_compressed

Device

OUT

Pointer to the compressed matrix

d_compressed_buffer

Device

OUT

Pointer to temporary buffer for the compression

stream

Host

IN

CUDA stream for the computation

Properties

  • The routine supports asynchronous execution with respect to stream

  • Provides deterministic (bit-wise) results for each run

cusparseLtSpMMACompress() has to be called each time after the algorithm ID is updated with cusparseLtMatmulAlgGetAttribute().
cusparseLtSpMMACompress() supports the following optimizations:
  • CUDA graph capture

  • Hardware Memory Compression

See cusparseStatus_t for the description of the return status.


cusparseLtSpMMACompress2 [DEPRECATED]#

cusparseStatus_t
cusparseLtSpMMACompress2(const cusparseLtHandle_t*        handle,
                         const cusparseLtMatDescriptor_t* sparseMatDescr,
                         int                              isSparseA,
                         cusparseOperation_t              op,
                         const void*                      d_dense,
                         void*                            d_compressed,
                         void*                            d_compressed_buffer,
                         cudaStream_t                     stream)

Parameter

Memory

In/Out

Description

Possible Values

handle

Host

IN

cuSPARSELt library handle

sparseMatDescr

Host

IN

Structured (sparse) matrix descriptor

isSparseA

Host

IN

Specify if the structured (sparse) matrix is in the first position (matA or matB)

0 false, true otherwise

op

Host

IN

Operation that will be applied to the structured (sparse) matrix in the multiplication

CUSPARSE_OPERATION_NON_TRANSPOSE, CUSPARSE_OPERATION_TRANSPOSE

d_dense

Device

IN

Pointer to the dense matrix

d_compressed

Device

OUT

Pointer to the compressed matrix

d_compressed_buffer

Device

OUT

Pointer to temporary buffer for the compression

stream

Host

IN

CUDA stream for the computation

The function has the same properties of cusparseLtSpMMACompress()