cuSPARSELt Data Types¶
Opaque Data Structures¶
cusparseLtHandle_t
¶
The structure holds the cuSPARSELt library context (device properties, system information, etc.).The handle must be initialized and destroyed with cusparseLtInit() and cusparseLtDestroy() functions respectively.
cusparseLtMatDescriptor_t
¶
The structure captures the shape and characteristics of a matrix.It is initialized with cusparseLtDenseDescriptorInit() or cusparseLtStructuredDescriptorInit() functions and destroyed with cusparseLtMatDescriptorDestroy().
cusparseLtMatmulDescriptor_t
¶
The structure holds the description of the matrix multiplication operation.It is initialized with cusparseLtMatmulDescriptorInit() function.
cusparseLtMatmulAlgSelection_t
¶
The structure holds the description of the matrix multiplication algorithm.It is initialized with cusparseLtMatmulAlgSelectionInit() function.
cusparseLtMatmulPlan_t
¶
The structure holds the matrix multiplication execution plan, namely all the information necessary to execute thecusparseLtMatmul()
operation.It is initialized and destroyed with cusparseLtMatmulPlanInit() and cusparseLtMatmulPlanDestroy() functions respectively.
Enumerators¶
cusparseLtSparsity_t
¶
The enumerator specifies the sparsity ratio of the structured matrix as
Value |
Description |
---|---|
|
50% Sparsity Ratio: - 2:4 for - 1:2 for |
The sparsity property is used in the cusparseLtStructuredDescriptorInit() function.
cusparseComputeType
¶
The enumerator specifies the compute precision modes of the matrix
Value |
Description |
---|---|
|
- Default mode for 16-bit floating-point precision - All computations and intermediate storage ensure at least 16-bit precision - Tensor Cores will be used whenever possible |
|
- Default mode for 32-bit integer precision - All computations and intermediate storage ensure at least 32-bit integer precision - Tensor Cores will be used whenever possible |
|
- Default mode for 32-bit floating-point precision - The inputs are supposed to be directly represented in TensorFloat-32 precision. The 32-bit floating-point values are truncated to TensorFloat-32 before the computation - All computations and intermediate storage ensure at least TensorFloat-32 precision - Tensor Cores will be used whenever possible |
|
- All computations and intermediate storage ensure at least TensorFloat-32 precision - The inputs are rounded to TensorFloat-32 precision. This mode is slower than - Tensor Cores will be used whenever possible |
The compute precision is used in the cusparseLtMatmulDescriptorInit() function.
cusparseLtMatDescAttribute_t
¶
The enumerator specifies the additional attributes of a matrix descriptor
Value |
Description |
---|---|
|
Number of matrices in a batch |
|
Stride between consecutive matrices in a batch expressed in terms of matrix elements |
The algorithm enumerator is used in the cusparseLtMatDescSetAttribute() and cusparseLtMatDescGetAttribute() functions.
cusparseLtMatmulDescAttribute_t
¶
The enumerator specifies the additional attributes of a matrix multiplication descriptor
Value |
Type |
Default Value |
Description |
---|---|---|---|
|
|
|
ReLU activation function |
|
|
|
Upper bound of the ReLU activation function |
|
|
|
Lower threshold of the ReLU activation function |
|
|
|
Enable/Disable GeLU activation function |
|
|
|
Scaling coefficient for the GeLU activation function. It implies |
|
|
|
Enable/Disable alpha vector (per-channel) scaling |
|
|
|
Enable/Disable beta vector (per-channel) scaling |
|
|
|
Bias pointer. The bias vector size must equal to the number of rows of the output matrix (D) |
|
|
|
Bias stride between consecutive bias vectors. |
where the ReLU activation function is defined as:
The GeLU activation function is available only with
INT8
input/output,INT32
Tensor Core compute kernelsCUSPARSELT_MATMUL_BETA_VECTOR_SCALING
impliesCUSPARSELT_MATMUL_ALPHA_VECTOR_SCALING
cusparseLtMatmulAlg_t
¶
The enumerator specifies the algorithm for matrix-matrix multiplication
Value |
Description |
---|---|
|
Default algorithm |
The algorithm enumerator is used in the cusparseLtMatmulAlgSelectionInit() function.
cusparseLtMatmulAlgAttribute_t
¶
The enumerator specifies the matrix multiplication algorithm attributes
Value |
Description |
---|---|
|
Algorithm ID (set and query) |
|
Algorithm ID limit (query only) |
|
Number of iterations (kernel launches per algorithm) for cusparseLtMatmulSearch(), default=10 |
|
Split-K factor, default=not set. Valid range: [1, K]. Value 1 is equivalent to the Split-K feature is disabled |
|
Number of kernels to call for Split-K. Values are specified in cusparseLtSplitKMode_t. |
|
Device memory buffers to store partial results for the reduction. The valid range is [1, SplitK - 1] |
The algorithm attribute enumerator is used in the cusparseLtMatmulAlgGetAttribute() and cusparseLtMatmulAlgSetAttribute() functions.Split-K parameters allow users to split the GEMM computation along the K dimension so that more CTAs will be created with a better SM utilization when N or M dimensions are small. However, this comes with the cost of reducing the operation of K slides to the final results. The cusparseLtMatmulSearch() function can be used to find the optimal combination of Split-K parameters whenCUSPARSELT_MATMUL_SPLIT_K=1
.
cusparseLtSplitKMode_t
¶
The enumerator specifies the Split-K mode values corresponding toCUSPARSELT_MATMUL_SPLIT_K_MODE
attribute in cusparseLtMatmulAlgAttribute_t
Value |
Description |
Details |
---|---|---|
|
Use a single kernel for Split-K |
Use the same GEMM kernel to do the final reduction |
|
Use two kernels for Split-K |
Launch another GPU kernel to do the final reduction |
cusparseLtPruneAlg_t
¶
The enumerator specifies the pruning algorithm to apply to the structured matrix before the compression
Value |
Description |
---|---|
|
-
- |
|
-
-
The strip direction is chosen according to the operation |
The pruning algorithm is used in the cusparseLtSpMMAPrune() function.