cuSPARSELt Data Types¶
Opaque Data Structures¶
cusparseLtHandle_t¶
The structure holds the cuSPARSELt library context (device properties, system information, etc.).The handle must be initialized and destroyed with cusparseLtInit() and cusparseLtDestroy() functions respectively.
cusparseLtMatDescriptor_t¶
The structure captures the shape and characteristics of a matrix.It is initialized with cusparseLtDenseDescriptorInit() or cusparseLtStructuredDescriptorInit() functions and destroyed with cusparseLtMatDescriptorDestroy().
cusparseLtMatmulDescriptor_t¶
The structure holds the description of the matrix multiplication operation.It is initialized with cusparseLtMatmulDescriptorInit() function.
cusparseLtMatmulAlgSelection_t¶
The structure holds the description of the matrix multiplication algorithm.It is initialized with cusparseLtMatmulAlgSelectionInit() function.
cusparseLtMatmulPlan_t¶
The structure holds the matrix multiplication execution plan, namely all the information necessary to execute thecusparseLtMatmul()operation.It is initialized and destroyed with cusparseLtMatmulPlanInit() and cusparseLtMatmulPlanDestroy() functions respectively.
Enumerators¶
cusparseLtSparsity_t¶
The enumerator specifies the sparsity ratio of the structured matrix as
Value |
Description |
|---|---|
|
50% Sparsity Ratio: - 2:4 for - 1:2 for |
The sparsity property is used in the cusparseLtStructuredDescriptorInit() function.
cusparseComputeType¶
The enumerator specifies the compute precision modes of the matrix
Value |
Description |
|---|---|
|
- Default mode for 16-bit floating-point precision - All computations and intermediate storage ensure at least 16-bit precision - Tensor Cores will be used whenever possible |
|
- Default mode for 32-bit integer precision - All computations and intermediate storage ensure at least 32-bit integer precision - Tensor Cores will be used whenever possible |
|
- Default mode for 32-bit floating-point precision - The inputs are supposed to be directly represented in TensorFloat-32 precision. The 32-bit floating-point values are truncated to TensorFloat-32 before the computation - All computations and intermediate storage ensure at least TensorFloat-32 precision - Tensor Cores will be used whenever possible |
|
- All computations and intermediate storage ensure at least TensorFloat-32 precision - The inputs are rounded to TensorFloat-32 precision. This mode is slower than - Tensor Cores will be used whenever possible |
The compute precision is used in the cusparseLtMatmulDescriptorInit() function.
cusparseLtMatDescAttribute_t¶
The enumerator specifies the additional attributes of a matrix descriptor
Value |
Description |
|---|---|
|
Number of matrices in a batch |
|
Stride between consecutive matrices in a batch expressed in terms of matrix elements |
The algorithm enumerator is used in the cusparseLtMatDescSetAttribute() and cusparseLtMatDescGetAttribute() functions.
cusparseLtMatmulDescAttribute_t¶
The enumerator specifies the additional attributes of a matrix multiplication descriptor
Value |
Type |
Default Value |
Description |
|---|---|---|---|
|
|
|
ReLU activation function |
|
|
|
Upper bound of the ReLU activation function |
|
|
|
Lower threshold of the ReLU activation function |
|
|
|
Enable/Disable GeLU activation function |
|
|
|
Scaling coefficient for the GeLU activation function. It implies |
|
|
|
Enable/Disable alpha vector (per-channel) scaling |
|
|
|
Enable/Disable beta vector (per-channel) scaling |
|
|
|
Bias pointer. The bias vector size must equal to the number of rows of the output matrix (D) |
|
|
|
Bias stride between consecutive bias vectors. |
where the ReLU activation function is defined as:
The GeLU activation function is available only with
INT8input/output,INT32Tensor Core compute kernelsCUSPARSELT_MATMUL_BETA_VECTOR_SCALINGimpliesCUSPARSELT_MATMUL_ALPHA_VECTOR_SCALING
cusparseLtMatmulAlg_t¶
The enumerator specifies the algorithm for matrix-matrix multiplication
Value |
Description |
|---|---|
|
Default algorithm |
The algorithm enumerator is used in the cusparseLtMatmulAlgSelectionInit() function.
cusparseLtMatmulAlgAttribute_t¶
The enumerator specifies the matrix multiplication algorithm attributes
Value |
Description |
Possible Values |
|---|---|---|
|
Algorithm ID |
[0, MAX] (see |
|
Algorithm ID limit (query only) |
|
|
Number of iterations (kernel launches per algorithm) for cusparseLtMatmulSearch() |
> 0 (default=5) |
|
Split-K factor (number of slices) |
[1, K], 1: Split-K disabled (default=not set) |
|
Number of kernels for the Split-K algorithm |
|
|
Device memory buffers to store partial results for the reduction |
[1, SplitK - 1] |
The algorithm attribute enumerator is used in the cusparseLtMatmulAlgGetAttribute() and cusparseLtMatmulAlgSetAttribute() functions.Split-K parameters allow users to split the GEMM computation along the K dimension so that more CTAs will be created with a better SM utilization when N or M dimensions are small. However, this comes with the cost of reducing the operation of K slides to the final results. The cusparseLtMatmulSearch() function can be used to find the optimal combination of Split-K parameters.
cusparseLtSplitKMode_t¶
The enumerator specifies the Split-K mode values corresponding toCUSPARSELT_MATMUL_SPLIT_K_MODEattribute in cusparseLtMatmulAlgAttribute_t
Value |
Description |
Details |
|---|---|---|
|
Use a single kernel for Split-K |
Use the same GEMM kernel to do the final reduction |
|
Use two kernels for Split-K |
Launch another GPU kernel to do the final reduction |
cusparseLtPruneAlg_t¶
The enumerator specifies the pruning algorithm to apply to the structured matrix before the compression
Value |
Description |
|---|---|
|
- - |
|
- - The strip direction is chosen according to the operation |
The pruning algorithm is used in the cusparseLtSpMMAPrune() function.