cuSPARSELt Data Types

Opaque Data Structures

cusparseLtHandle_t

The cusparseLtHandle_t structure holds the cuSPARSELt library context (device properties, system information, etc.).
The handle must be initialized and destroyed with cusparseLtInit() and cusparseLtDestroy() functions respectively.

cusparseLtMatDescriptor_t

The cusparseLtMatDescriptor_t structure captures the shape and characteristics of a matrix.

cusparseLtMatmulDescriptor_t

The cusparseLtMatmulDescriptor_t structure holds the description of the matrix multiplication operation.
It is initialized with cusparseLtMatmulDescriptorInit() function.

cusparseLtMatmulAlgSelection_t

The cusparseLtMatmulAlgSelection_t structure holds the description of the matrix multiplication algorithm.
It is initialized with cusparseLtMatmulAlgSelectionInit() function.

cusparseLtMatmulPlan_t

The cusparseLtMatmulPlan_t structure holds the matrix multiplication execution plan, namely all the information necessary to execute the cusparseLtMatmul() operation.
It is initialized and destroyed with cusparseLtMatmulPlanInit() and cusparseLtMatmulPlanDestroy() functions respectively.

Enumerators

cusparseLtSparsity_t

The cusparseLtSparsity_t enumerator specifies the sparsity ratio of the structured matrix as

sparsity\ ratio = \frac{nnz}{num\_rows * num\_cols}

Value

Description

CUSPARSELT_SPARSITY_50_PERCENT

50% Sparsity Ratio (2:4 for half, bfloat16, int, while 1:2 for tf32 and float)

The sparsity property is used in the cusparseLtStructuredDescriptorInit() function.

cusparseComputeType

The cusparseComputeType enumerator specifies the compute precision modes of the matrix

Value

Description

CUSPARSE_COMPUTE_16F

- Default mode for 16-bit floating-point precision

- All computations and intermediate storage ensure at least 16-bit precision

- Tensor Cores will be used whenever possible

CUSPARSE_COMPUTE_32I

- Default mode for 32-bit integer precision

- All computations and intermediate storage ensure at least 32-bit integer precision

- Tensor Cores will be used whenever possible

CUSPARSE_COMPUTE_TF32_FAST

- Default mode for 32-bit floating-point precision

- The inputs are supposed to be directly represented in TensorFloat-32 precision. The 32-bit floating-point values are truncated to TensorFloat-32 before the computation

- All computations and intermediate storage ensure at least TensorFloat-32 precision

- Tensor Cores will be used whenever possible

CUSPARSE_COMPUTE_TF32

- All computations and intermediate storage ensure at least TensorFloat-32 precision

- The inputs are rounded to TensorFloat-32 precision. This mode is slower than CUSPARSE_COMPUTE_TF32_FAST, but could provide more accurate results

- Tensor Cores will be used whenever possible

The compute precision is used in the cusparseLtMatmulDescriptorInit() function.

cusparseLtMatmulAlg_t

The cusparseLtMatmulAlg_t enumerator specifies the algorithm for matrix-matrix multiplication

Value

Description

CUSPARSELT_MATMUL_ALG_DEFAULT

Default algorithm

The algorithm enumerator is used in the cusparseLtMatmulAlgSelectionInit() function.

cusparseLtMatmulAlgAttribute_t

The cusparseLtMatmulAlgAttribute_t enumerator specifies the matrix multiplication algorithm attributes

Value

Description

CUSPARSELT_MATMUL_ALG_CONFIG_ID

Algorithm ID (set and query)

CUSPARSELT_MATMUL_ALG_CONFIG_MAX_ID

Algorithm ID limit (query only)

CUSPARSELT_MATMUL_SEARCH_ITERATIONS

Number of iterations (kernel launches per algorithm) for cusparseLtMatmulSearch(), default=10

The algorithm attribute enumerator is used in the cusparseLtMatmulAlgGetAttribute() and cusparseLtMatmulAlgSetAttribute() functions.

cusparseLtPruneAlg_t

The cusparseLtPruneAlg_t enumerator specifies the pruning algorithm to apply to the structured matrix before the compression

Value

Description

CUSPARSELT_PRUNE_SPMMA_TILE

- half, bfloat16, int8: Zero-out eight values in a 4x4 tile to maximize the L1-norm of the resulting tile, under the constraint of selecting exactly two elements for each row and column

 

- float, tf32: Zero-out two values in a 2x2 tile to maximize the L1-norm of the resulting tile, under the constraint of selecting exactly one element for each row and column

CUSPARSELT_PRUNE_SPMMA_STRIP

- half, bfloat16, int8: Zero-out two values in a 1x4 strip to maximize the L1-norm of the resulting strip

 

- float, tf32: Zero-out one value in a 1x2 strip to maximize the L1-norm of the resulting strip

 

The strip direction is chosen according to the operation op and matrix layout applied to the structured (sparse) matrix

The pruning algorithm is used in the cusparseLtSpMMAPrune() function.