cuSPARSELt Data Types¶
Opaque Data Structures¶
cusparseLtHandle_t
¶
The structure holds the cuSPARSELt library context (device properties, system information, etc.).The handle must be initialized and destroyed with cusparseLtInit() and cusparseLtDestroy() functions respectively.
cusparseLtMatDescriptor_t
¶
The structure captures the shape and characteristics of a matrix.It is initialized with cusparseLtDenseDescriptorInit() or cusparseLtStructuredDescriptorInit() functions and destroyed with cusparseLtMatDescriptorDestroy().
cusparseLtMatmulDescriptor_t
¶
The structure holds the description of the matrix multiplication operation.It is initialized with cusparseLtMatmulDescriptorInit() function.
cusparseLtMatmulAlgSelection_t
¶
The structure holds the description of the matrix multiplication algorithm.It is initialized with cusparseLtMatmulAlgSelectionInit() function.
cusparseLtMatmulPlan_t
¶
The structure holds the matrix multiplication execution plan, namely all the information necessary to execute thecusparseLtMatmul()
operation.It is initialized and destroyed with cusparseLtMatmulPlanInit() and cusparseLtMatmulPlanDestroy() functions respectively.
Enumerators¶
cusparseLtSparsity_t
¶
The enumerator specifies the sparsity ratio of the structured matrix as
Value |
Description |
---|---|
|
50% Sparsity Ratio: - 2:4 for - 1:2 for |
The sparsity property is used in the cusparseLtStructuredDescriptorInit() function.
cusparseComputeType
¶
The enumerator specifies the compute precision modes of the matrix
Value |
Description |
---|---|
|
- Element-wise multiplication of matrix A and B, and accumulation of the intermediate values are performed with 32-bit integer precision. - Alpha and beta coefficients, and epilogue are performed with single precision floating-point. - Tensor Cores will be used whenever possible. |
|
- Element-wise multiplication of matrix A and B, and accumulation of the intermediate values are performed with single precision floating-point. - Alpha and beta coefficients, and epilogue are performed with single precision floating-point. - Tensor Cores will be used whenever possible. |
|
- Element-wise multiplication of matrix A and B, and accumulation of the intermediate values are performed with half precision floating-point. - Alpha and beta coefficients, and epilogue are performed with single precision floating-point. - Tensor Cores will be used whenever possible. |
The compute precision is used in the cusparseLtMatmulDescriptorInit() function.
cusparseLtMatDescAttribute_t
¶
The enumerator specifies the additional attributes of a matrix descriptor
Value |
Description |
---|---|
|
Number of matrices in a batch |
|
Stride between consecutive matrices in a batch expressed in terms of matrix elements |
The algorithm enumerator is used in the cusparseLtMatDescSetAttribute() and cusparseLtMatDescGetAttribute() functions.
cusparseLtMatmulDescAttribute_t
¶
The enumerator specifies the additional attributes of a matrix multiplication descriptor
Value |
Type |
Default Value |
Description |
---|---|---|---|
|
|
|
ReLU activation function |
|
|
|
Upper bound of the ReLU activation function |
|
|
|
Lower threshold of the ReLU activation function |
|
|
|
Enable/Disable GeLU activation function |
|
|
|
Scaling coefficient for the GeLU activation function. It implies |
|
|
|
Enable/Disable alpha vector (per-channel) scaling |
|
|
|
Enable/Disable beta vector (per-channel) scaling |
|
|
|
Bias pointer. The bias vector size must equal to the number of rows of the output matrix (D) |
|
|
|
Bias stride between consecutive bias vectors. |
|
|
|
Pointer to the prunned sparse matrix. |
where the ReLU activation function is defined as:
The data type of the bias vector is the same as the matric C except the following case:
INT8
input/output,INT32
Tensor Core compute kernelsINT8
input,INT32
output,INT32
Tensor Core compute kernelsINT8
input,FP16
output,INT32
Tensor Core compute kernels on pre-SM 9.0
INT8
input,BF16
output,INT32
Tensor Core compute kernels on pre-SM 9.0
in which the data type of the bias is FP32
.
The GeLU activation function is available only with
INT8
input/output,INT32
Tensor Core compute kernelsE4M3
input,E4M3
output,FP32
Tensor Core compute kernelsE4M3
input,BF16
output,FP32
Tensor Core compute kernelsE5M2
input,E5M2
output,FP32
Tensor Core compute kernelsE5M2
input,BF16
output,FP32
Tensor Core compute kernels
CUSPARSELT_MATMUL_BETA_VECTOR_SCALING
impliesCUSPARSELT_MATMUL_ALPHA_VECTOR_SCALING
CUSPARSELT_MATMUL_SPARSE_MAT_POINTER
provides more flexibility for cusparseLtMatmulSearch() to select the best algorithm. The referenced memory cannot be modified until cusparseLtMatmulSearch() is called.
cusparseLtMatmulAlg_t
¶
The enumerator specifies the algorithm for matrix-matrix multiplication
Value |
Description |
---|---|
|
Default algorithm |
The algorithm enumerator is used in the cusparseLtMatmulAlgSelectionInit() function.
cusparseLtMatmulAlgAttribute_t
¶
The enumerator specifies the matrix multiplication algorithm attributes
Value |
Description |
Possible Values |
---|---|---|
|
Algorithm ID |
[0, MAX) (see |
|
Algorithm ID limit (query only) |
|
|
Number of iterations (kernel launches per algorithm) for cusparseLtMatmulSearch() |
> 0 (default=5) |
|
Split-K factor (number of slices) |
On pre- |
|
Number of kernels for the Split-K algorithm |
|
|
Device memory buffers to store partial results for the reduction |
On pre- |
The algorithm attribute enumerator is used in the cusparseLtMatmulAlgGetAttribute() and cusparseLtMatmulAlgSetAttribute() functions.Split-K parameters allow users to split the GEMM computation along the K dimension so that more CTAs will be created with a better SM utilization when N or M dimensions are small. However, this comes with the cost of reducing the operation of K slides to the final results. The cusparseLtMatmulSearch() function can be used to find the optimal combination of Split-K parameters.Segment-K is a split-K method onSM 9.0
that utilizes warp-specialized persistent CTAs for enhanced efficiency and replaces the tranditional split-K method.
cusparseLtSplitKMode_t
¶
The enumerator specifies the Split-K mode values corresponding toCUSPARSELT_MATMUL_SPLIT_K_MODE
attribute in cusparseLtMatmulAlgAttribute_t
Value |
Description |
Details |
---|---|---|
|
Use a single kernel for Split-K |
Use the same GEMM kernel to do the final reduction |
|
Use two kernels for Split-K |
Launch another GPU kernel to do the final reduction |
cusparseLtPruneAlg_t
¶
The enumerator specifies the pruning algorithm to apply to the structured matrix before the compression
Value |
Description |
---|---|
|
- - |
|
- - The strip direction is chosen according to the operation |
The pruning algorithm is used in the cusparseLtSpMMAPrune() function.