cuSPARSELt Data Types#
Opaque Data Structures#
cusparseLtHandle_t
#
The structure holds the cuSPARSELt library context (device properties, system information, etc.).The handle must be initialized and destroyed with cusparseLtInit() and cusparseLtDestroy() functions respectively.
cusparseLtMatDescriptor_t
#
The structure captures the shape and characteristics of a matrix.It is initialized with cusparseLtDenseDescriptorInit() or cusparseLtStructuredDescriptorInit() functions and destroyed with cusparseLtMatDescriptorDestroy().
cusparseLtMatmulDescriptor_t
#
The structure holds the description of the matrix multiplication operation.It is initialized with cusparseLtMatmulDescriptorInit() function.
cusparseLtMatmulAlgSelection_t
#
The structure holds the description of the matrix multiplication algorithm.It is initialized with cusparseLtMatmulAlgSelectionInit() function.
cusparseLtMatmulPlan_t
#
The structure holds the matrix multiplication execution plan, namely all the information necessary to execute thecusparseLtMatmul()
operation.It is initialized and destroyed with cusparseLtMatmulPlanInit() and cusparseLtMatmulPlanDestroy() functions respectively.
Enumerators#
cusparseLtSparsity_t
#
The enumerator specifies the sparsity ratio of the structured matrix as
Value |
Description |
---|---|
|
50% Sparsity Ratio: - 2:4 for - 1:2 for |
The sparsity property is used in the cusparseLtStructuredDescriptorInit() function.
cusparseComputeType
#
The enumerator specifies the compute precision modes of the matrix
Value |
Description |
---|---|
|
- Element-wise multiplication of matrix A and B, and accumulation of the intermediate values are performed with 32-bit integer precision. - Alpha and beta coefficients, and epilogue are performed with single precision floating-point. - Tensor Cores will be used whenever possible. |
|
- Element-wise multiplication of matrix A and B, and accumulation of the intermediate values are performed with single precision floating-point. - Alpha and beta coefficients, and epilogue are performed with single precision floating-point. - Tensor Cores will be used whenever possible. |
|
- Element-wise multiplication of matrix A and B, and accumulation of the intermediate values are performed with half precision floating-point. - Alpha and beta coefficients, and epilogue are performed with single precision floating-point. - Tensor Cores will be used whenever possible. |
The compute precision is used in the cusparseLtMatmulDescriptorInit() function.
cusparseLtMatDescAttribute_t
#
The enumerator specifies the additional attributes of a matrix descriptor
Value |
Description |
---|---|
|
Number of matrices in a batch |
|
Stride between consecutive matrices in a batch expressed in terms of matrix elements |
The algorithm enumerator is used in the cusparseLtMatDescSetAttribute() and cusparseLtMatDescGetAttribute() functions.
cusparseLtMatmulDescAttribute_t
#
The enumerator specifies the additional attributes of a matrix multiplication descriptor
Value |
Type |
Default Value |
Description |
---|---|---|---|
|
|
|
ReLU activation function |
|
|
|
Upper bound of the ReLU activation function |
|
|
|
Lower threshold of the ReLU activation function |
|
|
|
|
|
|
|
Scaling coefficient for the GeLU activation function. It implies |
|
|
|
Enable/Disable alpha vector (per-channel) scaling |
|
|
|
Enable/Disable beta vector (per-channel) scaling. |
|
|
|
Bias pointer. The bias vector size must equal to the number of rows of the output matrix (D). The data type of the bias vector is the same as the matric C except the following case:
in which the data type of the bias is |
|
|
|
Bias stride between consecutive bias vectors. |
|
|
|
Pointer to the prunned sparse matrix. |
where the ReLU activation function is defined as:
CUSPARSELT_MATMUL_SPARSE_MAT_POINTER
provides more flexibility for cusparseLtMatmulSearch() to select the best algorithm. The referenced memory cannot be modified until cusparseLtMatmulSearch() is called.
cusparseLtMatmulAlg_t
#
The enumerator specifies the algorithm for matrix-matrix multiplication
Value |
Description |
---|---|
|
Default algorithm |
The algorithm enumerator is used in the cusparseLtMatmulAlgSelectionInit() function.
cusparseLtMatmulAlgAttribute_t
#
The enumerator specifies the matrix multiplication algorithm attributes
Value |
Description |
Possible Values |
---|---|---|
|
Algorithm ID |
[0, MAX) (see |
|
Algorithm ID limit (query only) |
|
|
Number of iterations (kernel launches per algorithm) for cusparseLtMatmulSearch() |
> 0 (default=5) |
|
Split-K factor (number of slices) |
On pre- |
|
Number of kernels for the Split-K algorithm |
|
|
Device memory buffers to store partial results for the reduction |
On pre- |
The algorithm attribute enumerator is used in the cusparseLtMatmulAlgGetAttribute() and cusparseLtMatmulAlgSetAttribute() functions.Split-K parameters allow users to split the GEMM computation along the K dimension so that more CTAs will be created with a better SM utilization when N or M dimensions are small. However, this comes with the cost of reducing the operation of K slides to the final results. The cusparseLtMatmulSearch() function can be used to find the optimal combination of Split-K parameters.Segment-K is a split-K method onSM 9.0
that utilizes warp-specialized persistent CTAs for enhanced efficiency and replaces the tranditional split-K method.
cusparseLtSplitKMode_t
#
The enumerator specifies the Split-K mode values corresponding toCUSPARSELT_MATMUL_SPLIT_K_MODE
attribute in cusparseLtMatmulAlgAttribute_t
Value |
Description |
Details |
---|---|---|
|
Use a single kernel for Split-K |
Use the same GEMM kernel to do the final reduction |
|
Use two kernels for Split-K |
Launch another GPU kernel to do the final reduction |
cusparseLtPruneAlg_t
#
The enumerator specifies the pruning algorithm to apply to the structured matrix before the compression
Value |
Description |
---|---|
|
- - |
|
- - The strip direction is chosen according to the operation |
The pruning algorithm is used in the cusparseLtSpMMAPrune() function.