cuSPARSELt Data Types#

Opaque Data Structures#

`cusparseLtHandle_t`#

The structure holds the cuSPARSELt library context (device properties, system information, etc.).

The handle must be initialized and destroyed with cusparseLtInit() and cusparseLtDestroy() functions respectively.

`cusparseLtMatDescriptor_t`#

The structure captures the shape and characteristics of a matrix.

It is initialized with cusparseLtDenseDescriptorInit() or cusparseLtStructuredDescriptorInit() functions and destroyed with cusparseLtMatDescriptorDestroy().

`cusparseLtMatmulDescriptor_t`#

The structure holds the description of the matrix multiplication operation.

It is initialized with cusparseLtMatmulDescriptorInit() function.

`cusparseLtMatmulAlgSelection_t`#

The structure holds the description of the matrix multiplication algorithm.

It is initialized with cusparseLtMatmulAlgSelectionInit() function.

`cusparseLtMatmulPlan_t`#

The structure holds the matrix multiplication execution plan, namely all the information necessary to execute the cusparseLtMatmul() operation.

It is initialized and destroyed with cusparseLtMatmulPlanInit() and cusparseLtMatmulPlanDestroy() functions respectively.

Enumerators#

`cusparseLtSparsity_t`#

The enumerator specifies the sparsity ratio of the structured matrix as

$sparsity\ ratio = \frac{nnz}{num\_rows * num\_cols}$

Value	Description
`CUSPARSELT_SPARSITY_50_PERCENT`	50% Sparsity Ratio: - paired 4:8 for `e2m1` - 2:4 for `half`, `bfloat16`, `int`, `int8`, `e4m3`, `e5m2` - 1:2 for `float`

Value

Description

CUSPARSELT_SPARSITY_50_PERCENT

50% Sparsity Ratio:

- paired 4:8 for e2m1 - 2:4 for half, bfloat16, int, int8, e4m3, e5m2

- 1:2 for float

The sparsity property is used in the cusparseLtStructuredDescriptorInit() function.

`cusparseComputeType`#

The enumerator specifies the compute precision modes of the matrix

Value	Description
`CUSPARSE_COMPUTE_32I`	- Element-wise multiplication of matrix A and B, and accumulation of the intermediate values are performed with 32-bit integer precision. - Alpha and beta coefficients, and epilogue are performed with single precision floating-point. - Tensor Cores will be used whenever possible.
`CUSPARSE_COMPUTE_32F`	- Element-wise multiplication of matrix A and B, and accumulation of the intermediate values are performed with single precision floating-point. - Alpha and beta coefficients, and epilogue are performed with single precision floating-point. - Tensor Cores will be used whenever possible.
`CUSPARSE_COMPUTE_16F`	- Element-wise multiplication of matrix A and B, and accumulation of the intermediate values are performed with half precision floating-point. - Alpha and beta coefficients, and epilogue are performed with single precision floating-point. - Tensor Cores will be used whenever possible.

Value

Description

CUSPARSE_COMPUTE_32I

- Element-wise multiplication of matrix A and B, and accumulation of the intermediate values are performed with 32-bit integer precision.

- Alpha and beta coefficients, and epilogue are performed with single precision floating-point.

- Tensor Cores will be used whenever possible.

CUSPARSE_COMPUTE_32F

- Element-wise multiplication of matrix A and B, and accumulation of the intermediate values are performed with single precision floating-point.

- Alpha and beta coefficients, and epilogue are performed with single precision floating-point.

- Tensor Cores will be used whenever possible.

CUSPARSE_COMPUTE_16F

- Element-wise multiplication of matrix A and B, and accumulation of the intermediate values are performed with half precision floating-point.

- Alpha and beta coefficients, and epilogue are performed with single precision floating-point.

- Tensor Cores will be used whenever possible.

The compute precision is used in the cusparseLtMatmulDescriptorInit() function.

`cusparseLtMatDescAttribute_t`#

The enumerator specifies the additional attributes of a matrix descriptor

Value	Description
`CUSPARSELT_MAT_NUM_BATCHES`	Number of matrices in a batch
`CUSPARSELT_MAT_BATCH_STRIDE`	Stride between consecutive matrices in a batch expressed in terms of matrix elements

The algorithm enumerator is used in the cusparseLtMatDescSetAttribute() and cusparseLtMatDescGetAttribute() functions.

`cusparseLtMatmulDescAttribute_t`#

The enumerator specifies the additional attributes of a matrix multiplication descriptor

Value	Type	Default Value	Description
`CUSPARSELT_MATMUL_ACTIVATION_RELU`	`int` 0: false, true otherwise	`false`	ReLU activation function
`CUSPARSELT_MATMUL_ACTIVATION_RELU_UPPERBOUND`	`float`	`inf`	Upper bound of the ReLU activation function
`CUSPARSELT_MATMUL_ACTIVATION_RELU_THRESHOLD`	`float`	`0.0f`	Lower threshold of the ReLU activation function
`CUSPARSELT_MATMUL_ACTIVATION_GELU`	`int` 0: false, true otherwise	`false`	Enable/Disable GeLU activation function. The GeLU activation function is available only with `INT8` input, `INT8` output, `INT32` Tensor Core compute kernels `E4M3` input, `E4M3/BF16` output, `FP32` Tensor Core compute kernels `E5M2` input, `E5M2/BF16` output, `FP32` Tensor Core compute kernels `E2M1` input, `E2M1/BF16` output, `FP32` Tensor Core compute kernels
`CUSPARSELT_MATMUL_ACTIVATION_GELU_SCALING`	`float`	`1.0f`	Scaling coefficient for the GeLU activation function. It implies `CUSPARSELT_MATMUL_ACTIVATION_GELU`
`CUSPARSELT_MATMUL_ALPHA_VECTOR_SCALING`	`int` 0: false, true otherwise	`false`	Enable/Disable alpha vector (per-channel) scaling
`CUSPARSELT_MATMUL_BETA_VECTOR_SCALING`	`int` 0: false, true otherwise	`false`	Enable/Disable beta vector (per-channel) scaling. `CUSPARSELT_MATMUL_BETA_VECTOR_SCALING` implies `CUSPARSELT_MATMUL_ALPHA_VECTOR_SCALING`
`CUSPARSELT_MATMUL_BIAS_POINTER`	`void*`	`NULL` (disabled)	Bias pointer. The bias vector size must equal to the number of rows of the output matrix (D). The data type of the bias vector is the same as the matric C except the following case: `INT8` input, `INT8/INT32` output, `INT32` Tensor Core compute kernels `INT8` input, `FP16/BF16` output, `INT32` Tensor Core compute kernels on pre-`SM 9.0` in which the data type of the bias is `FP32`.
`CUSPARSELT_MATMUL_BIAS_STRIDE`	`int64_t`	`0` (disabled)	Bias stride between consecutive bias vectors. `0` means broadcast the first bias vector
`CUSPARSELT_MATMUL_SPARSE_MAT_POINTER`	`void*`	`NULL` (disabled)	Pointer to the prunned sparse matrix.
`CUSPARSELT_MATMUL_A_SCALE_MODE`	`cublasLtMatmulMatrixScale_t`	`CUSPARSELT_MATMUL_SCALE_NONE`	Scaling mode that defines how the matrix scaling factor for matrix A is interpreted.
`CUSPARSELT_MATMUL_B_SCALE_MODE`	`cublasLtMatmulMatrixScale_t`	`CUSPARSELT_MATMUL_SCALE_NONE`	Scaling mode that defines how the matrix scaling factor for matrix B is interpreted.
`CUSPARSELT_MATMUL_C_SCALE_MODE`	`cublasLtMatmulMatrixScale_t`	`CUSPARSELT_MATMUL_SCALE_NONE`	Scaling mode that defines how the matrix scaling factor for matrix C is interpreted.
`CUSPARSELT_MATMUL_D_SCALE_MODE`	`cublasLtMatmulMatrixScale_t`	`CUSPARSELT_MATMUL_SCALE_NONE`	Scaling mode that defines how the matrix scaling factor for matrix D is interpreted.
`CUSPARSELT_MATMUL_D_OUT_SCALE_MODE`	`cublasLtMatmulMatrixScale_t`	`CUSPARSELT_MATMUL_SCALE_NONE`	Scaling mode that defines how the output matrix scaling factor for matrix D is interpreted.
`CUSPARSELT_MATMUL_A_SCALE_POINTER`	`void*`	`NULL`	Pointer to the scale factor value that converts data in matrix A to the compute data type range. The scaling factor must have the same type as the compute type. If not specified, the scaling factor is assumed to be 1.
`CUSPARSELT_MATMUL_B_SCALE_POINTER`	`void*`	`NULL`	Equivalent to `CUSPARSELT_MATMUL_A_SCALE_POINTER` for matrix B.
`CUSPARSELT_MATMUL_C_SCALE_POINTER`	`void*`	`NULL`	Equivalent to `CUSPARSELT_MATMUL_A_SCALE_POINTER` for matrix C. Currently not used.
`CUSPARSELT_MATMUL_D_SCALE_POINTER`	`void*`	`NULL`	Equivalent to `CUSPARSELT_MATMUL_A_SCALE_POINTER` for matrix D.
`CUSPARSELT_MATMUL_D_OUT_SCALE_POINTER`	`void*`	`NULL`	Device pointer to the scale factors that are used to convert data in matrix D to the compute data type range. The scaling factor value type is defined by the scaling mode (see `CUSPARSELT_MATMUL_D_OUT_SCALE_MODE`).

where the ReLU activation function is defined as:

CUSPARSELT_MATMUL_SPARSE_MAT_POINTER provides more flexibility for cusparseLtMatmulSearch() to select the best algorithm. The referenced memory cannot be modified until cusparseLtMatmulSearch() is called.

The algorithm enumerator is used in the cusparseLtMatmulDescSetAttribute() and cusparseLtMatmulDescGetAttribute() functions.

`cusparseLtMatmulAlg_t`#

The enumerator specifies the algorithm for matrix-matrix multiplication

Value	Description
`CUSPARSELT_MATMUL_ALG_DEFAULT`	Default algorithm

The algorithm enumerator is used in the cusparseLtMatmulAlgSelectionInit() function.

`cusparseLtMatmulAlgAttribute_t`#

The enumerator specifies the matrix multiplication algorithm attributes

Value	Description	Possible Values
`CUSPARSELT_MATMUL_ALG_CONFIG_ID`	Algorithm ID	[0, MAX) (see `CUSPARSELT_MATMUL_ALG_CONFIG_MAX_ID`)
`CUSPARSELT_MATMUL_ALG_CONFIG_MAX_ID`	Algorithm ID limit (query only)
`CUSPARSELT_MATMUL_SEARCH_ITERATIONS`	Number of iterations (kernel launches per algorithm) for cusparseLtMatmulSearch()	> 0 (default=5)
`CUSPARSELT_MATMUL_SPLIT_K`	Split-K factor (number of slices)	On pre-`SM 9.0`, [1, K], 1: Split-K disabled (default=not set); on `SM 9.0` `SM 10.0`, -1 (segment-K enabled) or 1 (segment-K disabled)
`CUSPARSELT_MATMUL_SPLIT_K_MODE`	Number of kernels for the Split-K algorithm	See cusparseLtSplitKMode_t
`CUSPARSELT_MATMUL_SPLIT_K_BUFFERS`	Device memory buffers to store partial results for the reduction	On pre-`SM 9.0`, [0, SplitK - 1]; on `SM 9.0` `SM 10.0` `SM 12.0`, 0

The algorithm attribute enumerator is used in the cusparseLtMatmulAlgGetAttribute() and cusparseLtMatmulAlgSetAttribute() functions.

Split-K parameters allow users to split the GEMM computation along the K dimension so that more CTAs will be created with a better SM utilization when N or M dimensions are small. However, this comes with the cost of reducing the operation of K slides to the final results. The cusparseLtMatmulSearch() function can be used to find the optimal combination of Split-K parameters.

Segment-K is a split-K method on SM 9.0 that utilizes warp-specialized persistent CTAs for enhanced efficiency and replaces the tranditional split-K method.

Due to the varying validity of split-k attributes CUSPARSELT_MATMUL_SPLIT_K, CUSPARSELT_MATMUL_SPLIT_K_MODE and CUSPARSELT_MATMUL_SPLIT_K_BUFFERS across different platforms, it’s recommended to keep their default values without a priori knowledge. For optimal performance users should invoke the auto-tuning API cusparseLtMatmulSearch() to determine the best algorithm and attributes.

`cusparseLtSplitKMode_t`#

The enumerator specifies the Split-K mode values corresponding to CUSPARSELT_MATMUL_SPLIT_K_MODE attribute in cusparseLtMatmulAlgAttribute_t

Value	Description
`CUSPARSELT_SPLIT_K_MODE_ONE_KERNEL`	Use a single kernel for Split-K. It’s the default value on pre-`SM 10.0` and `SM 12.0` [1] \|
`CUSPARSELT_SPLIT_K_MODE_TWO_KERNELS`	Use two kernels for Split-K; one GPU kernel to do GEMM and another to do the final reduction. Valid on pre-`SM 10.0` and `SM12.0` [1].
`CUSPARSELT_SPLITK`	Use split-k decomposition. Valid on `SM 10.0` and `SM12.0` [2].
`CUSPARSELT_DATAPARALLEL`	No spliting along the K dimenison. Valid on `SM 10.0` and `SM12.0` [2].
`CUSPARSELT_STREAMK`	Use stream-K decomposition. Valid on `SM 10.0` and `SM12.0` [2].
`CUSPARSELT_HEURISTIC`	Use a heuristic to determine the decomposition mode. It’s the default value on `SM10.0` and `SM 12.0` [2]

`cusparseLtPruneAlg_t`#

The enumerator specifies the pruning algorithm to apply to the structured matrix before the compression

Value	Description
`CUSPARSELT_PRUNE_SPMMA_TILE`	- `e2m1`: Zero-out 16 paried values in a 8x4 (row-major) or 4x8 (colum-major) tile to maximize the L1-norm of the resulting tile, under the constraint of selecting exactly two elements or two pairs of elements for each row and column - `half`, `bfloat16`, `int8`, `e4m3`, `e5m2`: Zero-out eight values in a 4x4 tile to maximize the L1-norm of the resulting tile, under the constraint of selecting exactly two elements for each row and column - `float`: Zero-out two values in a 2x2 tile to maximize the L1-norm of the resulting tile, under the constraint of selecting exactly one element for each row and column
`CUSPARSELT_PRUNE_SPMMA_STRIP`	- `e2m1`: Zero-out four paried values in a 1x8 strip to maximize the L1-norm of the resulting strip - `half`, `bfloat16`, `int8`, `e4m3`, `e5m2`: Zero-out two values in a 1x4 strip to maximize the L1-norm of the resulting strip - `float`: Zero-out one value in a 1x2 strip to maximize the L1-norm of the resulting strip The strip direction is chosen according to the operation `op` and matrix layout applied to the structured (sparse) matrix

Value

Description

CUSPARSELT_PRUNE_SPMMA_TILE

- e2m1: Zero-out 16 paried values in a 8x4 (row-major) or 4x8 (colum-major) tile to maximize the L1-norm of the resulting tile, under the constraint of selecting exactly two elements or two pairs of elements for each row and column

- half, bfloat16, int8, e4m3, e5m2: Zero-out eight values in a 4x4 tile to maximize the L1-norm of the resulting tile, under the constraint of selecting exactly two elements for each row and column

- float: Zero-out two values in a 2x2 tile to maximize the L1-norm of the resulting tile, under the constraint of selecting exactly one element for each row and column

CUSPARSELT_PRUNE_SPMMA_STRIP

- e2m1: Zero-out four paried values in a 1x8 strip to maximize the L1-norm of the resulting strip

- half, bfloat16, int8, e4m3, e5m2: Zero-out two values in a 1x4 strip to maximize the L1-norm of the resulting strip

- float: Zero-out one value in a 1x2 strip to maximize the L1-norm of the resulting strip

The strip direction is chosen according to the operation op and matrix layout applied to the structured (sparse) matrix

The pruning algorithm is used in the cusparseLtSpMMAPrune() function.

`cusparseLtMatmulMatrixScale_t`#

The enumerator specifies scaling mode that defines how scaling factor pointers are interpreted.

Value	Description
`CUSPARSELT_MATMUL_SCALE_NONE`	Scaling is disabled. This is the default and the only valid value for matricex that are not using narrow data types.
`CUSPARSELT_MATMUL_MATRIX_SCALE_SCALAR_32F`	Scaling factors are single-precision scalars applied to the whole matrix. This is the only value valid for `CUSPARSELTLT_MATMUL_D_SCALE_MODE` when the D matrix uses a narrow precision data type.
`CUSPARSELT_MATMUL_MATRIX_SCALE_VEC32_UE4M3`	Scaling factors are tensors that contain a dedicated scaling factor stored as an 8-bit `CUDA_R_8F_UE4M3` value for each 32-element block in the innermost dimension of the corresponding data matrix.
`CUSPARSELT_MATMUL_MATRIX_SCALE_VEC64_UE8M0`	Scaling factors are tensors that contain a dedicated scaling factor stored as an 8-bit `CUDA_R_8F_UE8M0` value for each 64-element block in the innermost dimension of the corresponding data matrix.

Note: cusparrseLtMatmulMatrixScale_t is introduced for narrow precisions (E4M3 and E2M1) to be scaled or dequantized before and potentially quantized after computations. See 1D Block Scaling for FP8 and FP4 Data Types for more details. The translation from row and column indices to linear offset is the same, as well as how multiple blocks are arranged. The only difference with cuBLASLt is the block size: in cuSPARSELt a single tile of scaling factors is applied to a 128x128 block when the scaling mode is CUSPARSELT_MATMUL_MATRIX_SCALE_VEC32_UE4M3 and to a 128x256 block when it is CUSPARSELT_MATMUL_MATRIX_SCALE_VEC64_UE8M0.

cuSPARSELt Data Types#

Opaque Data Structures#

cusparseLtHandle_t#

cusparseLtMatDescriptor_t#

cusparseLtMatmulDescriptor_t#

cusparseLtMatmulAlgSelection_t#

cusparseLtMatmulPlan_t#

Enumerators#

cusparseLtSparsity_t#

cusparseComputeType#

cusparseLtMatDescAttribute_t#

cusparseLtMatmulDescAttribute_t#

cusparseLtMatmulAlg_t#

cusparseLtMatmulAlgAttribute_t#

cusparseLtSplitKMode_t#

cusparseLtPruneAlg_t#

cusparseLtMatmulMatrixScale_t#

`cusparseLtHandle_t`#

`cusparseLtMatDescriptor_t`#

`cusparseLtMatmulDescriptor_t`#

`cusparseLtMatmulAlgSelection_t`#

`cusparseLtMatmulPlan_t`#

`cusparseLtSparsity_t`#

`cusparseComputeType`#

`cusparseLtMatDescAttribute_t`#

`cusparseLtMatmulDescAttribute_t`#

`cusparseLtMatmulAlg_t`#

`cusparseLtMatmulAlgAttribute_t`#

`cusparseLtSplitKMode_t`#

`cusparseLtPruneAlg_t`#

`cusparseLtMatmulMatrixScale_t`#