.. |nbsp| unicode:: 0xA0 :trim: ################################################################################ cuSPARSELt Data Types ################################################################################ ================================================================================ Opaque Data Structures ================================================================================ -------------------------------------------------------------------------------- :code:`cusparseLtHandle_t` -------------------------------------------------------------------------------- | The structure holds the cuSPARSELt library context (device properties, system information, etc.). | The handle must be initialized and destroyed with :ref:`cusparseLtInit() ` and :ref:`cusparseLtDestroy() ` functions respectively. -------------------------------------------------------------------------------- :code:`cusparseLtMatDescriptor_t` -------------------------------------------------------------------------------- | The structure captures the shape and characteristics of a matrix. | It is initialized with :ref:`cusparseLtDenseDescriptorInit() ` or :ref:`cusparseLtStructuredDescriptorInit() ` functions and destroyed with :ref:`cusparseLtMatDescriptorDestroy() `. -------------------------------------------------------------------------------- :code:`cusparseLtMatmulDescriptor_t` -------------------------------------------------------------------------------- | The structure holds the description of the matrix multiplication operation. | It is initialized with :ref:`cusparseLtMatmulDescriptorInit() ` function. -------------------------------------------------------------------------------- :code:`cusparseLtMatmulAlgSelection_t` -------------------------------------------------------------------------------- | The structure holds the description of the matrix multiplication algorithm. | It is initialized with :ref:`cusparseLtMatmulAlgSelectionInit() ` function. -------------------------------------------------------------------------------- :code:`cusparseLtMatmulPlan_t` -------------------------------------------------------------------------------- | The structure holds the matrix multiplication execution plan, namely all the information necessary to execute the ``cusparseLtMatmul()`` operation. | It is initialized and destroyed with :ref:`cusparseLtMatmulPlanInit() ` and :ref:`cusparseLtMatmulPlanDestroy() ` functions respectively. ---- ================================================================================ Enumerators ================================================================================ .. _cusparseLtSparsity_t-label: -------------------------------------------------------------------------------- :code:`cusparseLtSparsity_t` -------------------------------------------------------------------------------- | The enumerator specifies the sparsity ratio of the structured matrix as .. math:: sparsity\ ratio = \frac{nnz}{num\_rows * num\_cols} +----------------------------------+------------------------------------------------------------------------------------------+ | Value | Description | +==================================+==========================================================================================+ | `CUSPARSELT_SPARSITY_50_PERCENT` | 50% Sparsity Ratio: | | | | | | **-** **2:4** for `half`, `bfloat16`, `int` | | | | | | **-** **1:2** for `tf32` and `float` | +----------------------------------+------------------------------------------------------------------------------------------+ | The sparsity property is used in the :ref:`cusparseLtStructuredDescriptorInit() ` function. ---- --------------------------- :code:`cusparseComputeType` --------------------------- | The enumerator specifies the compute precision modes of the matrix +-----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Value | Description | +=============================+===============================================================================================================================================================================+ | `CUSPARSE_COMPUTE_16F` | **-** Default mode for 16-bit floating-point precision | | | | | | **-** All computations and intermediate storage ensure at least 16-bit precision | | | | | | **-** Tensor Cores will be used whenever possible | +-----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `CUSPARSE_COMPUTE_32I` | **-** Default mode for 32-bit integer precision | | | | | | **-** All computations and intermediate storage ensure at least 32-bit integer precision | | | | | | **-** Tensor Cores will be used whenever possible | +-----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `CUSPARSE_COMPUTE_TF32_FAST`| **-** Default mode for 32-bit floating-point precision | | | | | | **-** The inputs are supposed to be directly represented in TensorFloat-32 precision. The 32-bit floating-point values are truncated to TensorFloat-32 before the computation | | | | | | **-** All computations and intermediate storage ensure at least TensorFloat-32 precision | | | | | | **-** Tensor Cores will be used whenever possible | +-----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `CUSPARSE_COMPUTE_TF32` | **-** All computations and intermediate storage ensure at least TensorFloat-32 precision | | | | | | **-** The inputs are rounded to TensorFloat-32 precision. This mode is slower than `CUSPARSE_COMPUTE_TF32_FAST`, but could provide more accurate results | | | | | | **-** Tensor Cores will be used whenever possible | +-----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | The compute precision is used in the :ref:`cusparseLtMatmulDescriptorInit() ` function. ---- -------------------------------------------------------------------------------- :code:`cusparseLtMatDescAttribute_t` -------------------------------------------------------------------------------- | The enumerator specifies the additional attributes of a matrix descriptor +---------------------------------+--------------------------------------------------------------------------------------+ | Value | Description | +=================================+======================================================================================+ | `CUSPARSELT_MAT_NUM_BATCHES` | Number of matrices in a batch | +---------------------------------+--------------------------------------------------------------------------------------+ | `CUSPARSELT_MAT_BATCH_STRIDE` | Stride between consecutive matrices in a batch expressed in terms of matrix elements | +---------------------------------+--------------------------------------------------------------------------------------+ | The algorithm enumerator is used in the :ref:`cusparseLtMatDescSetAttribute() ` and :ref:`cusparseLtMatDescGetAttribute() ` functions. ---- -------------------------------------------------------------------------------- :code:`cusparseLtMatmulDescAttribute_t` -------------------------------------------------------------------------------- | The enumerator specifies the additional attributes of a matrix multiplication descriptor +------------------------------------------------+----------------------------------------+-------------------+------------------------------------------------------------------------------------------------------+ | Value | Type | Default Value | Description | +================================================+========================================+===================+======================================================================================================+ | `CUSPARSELT_MATMUL_ACTIVATION_RELU` | `int` 0: **false**, **true** otherwise | `false` | ReLU activation function | +------------------------------------------------+----------------------------------------+-------------------+------------------------------------------------------------------------------------------------------+ | `CUSPARSELT_MATMUL_ACTIVATION_RELU_UPPERBOUND` | `float` | `inf` | Upper bound of the ReLU activation function | +------------------------------------------------+----------------------------------------+-------------------+------------------------------------------------------------------------------------------------------+ | `CUSPARSELT_MATMUL_ACTIVATION_RELU_THRESHOLD` | `float` | `0.0f` | Lower threshold of the ReLU activation function | +------------------------------------------------+----------------------------------------+-------------------+------------------------------------------------------------------------------------------------------+ | `CUSPARSELT_MATMUL_ACTIVATION_GELU` | `int` 0: **false**, **true** otherwise | `false` | Enable/Disable GeLU activation function | +------------------------------------------------+----------------------------------------+-------------------+------------------------------------------------------------------------------------------------------+ | `CUSPARSELT_MATMUL_ACTIVATION_GELU_SCALING` | `float` | `1.0f` | Scaling coefficient for the GeLU activation function. It implies `CUSPARSELT_MATMUL_ACTIVATION_GELU` | +------------------------------------------------+----------------------------------------+-------------------+------------------------------------------------------------------------------------------------------+ | `CUSPARSELT_MATMUL_ALPHA_VECTOR_SCALING` | `int` 0: **false**, **true** otherwise | `false` | Enable/Disable alpha vector (per-channel) scaling | +------------------------------------------------+----------------------------------------+-------------------+------------------------------------------------------------------------------------------------------+ | `CUSPARSELT_MATMUL_BETA_VECTOR_SCALING` | `int` 0: **false**, **true** otherwise | `false` | Enable/Disable beta vector (per-channel) scaling | +------------------------------------------------+----------------------------------------+-------------------+------------------------------------------------------------------------------------------------------+ | `CUSPARSELT_MATMUL_BIAS_POINTER` | `void*` | `NULL` (disabled) | Bias pointer. The bias vector size must equal to the number of rows of the output matrix (D) | +------------------------------------------------+----------------------------------------+-------------------+------------------------------------------------------------------------------------------------------+ | `CUSPARSELT_MATMUL_BIAS_STRIDE` | `int64_t` | `0` (disabled) | Bias stride between consecutive bias vectors. `0` means broadcast the first bias vector | +------------------------------------------------+----------------------------------------+-------------------+------------------------------------------------------------------------------------------------------+ where the *ReLU* activation function is defined as: .. image:: relu.svg :width: 400px :align: center :alt: workflow * The *GeLU* activation function is available only with `INT8` input/output, `INT32` Tensor Core compute kernels * `CUSPARSELT_MATMUL_BETA_VECTOR_SCALING` implies `CUSPARSELT_MATMUL_ALPHA_VECTOR_SCALING` | The algorithm enumerator is used in the :ref:`cusparseLtMatmulDescSetAttribute() ` and :ref:`cusparseLtMatmulDescGetAttribute() ` functions. ---- -------------------------------------------------------------------------------- :code:`cusparseLtMatmulAlg_t` -------------------------------------------------------------------------------- | The enumerator specifies the algorithm for matrix-matrix multiplication +---------------------------------+----------------------------+ | Value | Description | +=================================+============================+ | `CUSPARSELT_MATMUL_ALG_DEFAULT` | Default algorithm | +---------------------------------+----------------------------+ | The algorithm enumerator is used in the :ref:`cusparseLtMatmulAlgSelectionInit() ` function. ---- .. _cusparseLtMatmulAlgAttribute_t-label: -------------------------------------------------------------------------------- :code:`cusparseLtMatmulAlgAttribute_t` -------------------------------------------------------------------------------- | The enumerator specifies the matrix multiplication algorithm attributes +---------------------------------------+-------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------+ | Value | Description | Possible Values | +=======================================+=========================================================================================================================+=============================================================================+ | `CUSPARSELT_MATMUL_ALG_CONFIG_ID` | Algorithm ID | [0, MAX] (see `CUSPARSELT_MATMUL_ALG_CONFIG_MAX_ID`) | +---------------------------------------+-------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------+ | `CUSPARSELT_MATMUL_ALG_CONFIG_MAX_ID` | Algorithm ID limit (query only) | | +---------------------------------------+-------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------+ | `CUSPARSELT_MATMUL_SEARCH_ITERATIONS` | Number of iterations (kernel launches per algorithm) for :ref:`cusparseLtMatmulSearch() ` | > 0 (default=5) | +---------------------------------------+-------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------+ | `CUSPARSELT_MATMUL_SPLIT_K` | Split-K factor (number of slices) | [1, K], **1**: Split-K disabled (default=not set) | +---------------------------------------+-------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------+ | `CUSPARSELT_MATMUL_SPLIT_K_MODE` | Number of kernels for the Split-K algorithm | `CUSPARSELT_SPLIT_K_MODE_ONE_KERNEL`, `CUSPARSELT_SPLIT_K_MODE_TWO_KERNELS` | +---------------------------------------+-------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------+ | `CUSPARSELT_MATMUL_SPLIT_K_BUFFERS` | Device memory buffers to store partial results for the reduction | [1, SplitK - 1] | +---------------------------------------+-------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------+ | The algorithm attribute enumerator is used in the :ref:`cusparseLtMatmulAlgGetAttribute() ` and :ref:`cusparseLtMatmulAlgSetAttribute() ` functions. | Split-K parameters allow users to split the GEMM computation along the K dimension so that more CTAs will be created with a better SM utilization when N or M dimensions are small. However, this comes with the cost of reducing the operation of K slides to the final results. The :ref:`cusparseLtMatmulSearch() ` function can be used to find the optimal combination of Split-K parameters. ---- .. _cusparseLtSplitKMode_t-label: -------------------------------------------------------------------------------- :code:`cusparseLtSplitKMode_t` -------------------------------------------------------------------------------- | The enumerator specifies the Split-K mode values corresponding to `CUSPARSELT_MATMUL_SPLIT_K_MODE` attribute in :ref:`cusparseLtMatmulAlgAttribute_t ` +---------------------------------------+---------------------------------+--------------------------------------------------------------------------+ | Value | Description | Details | +=======================================+=================================+==========================================================================+ | `CUSPARSELT_SPLIT_K_MODE_ONE_KERNEL` | Use a single kernel for Split-K | Use the same GEMM kernel to do the final reduction | +---------------------------------------+---------------------------------+--------------------------------------------------------------------------+ | `CUSPARSELT_SPLIT_K_MODE_TWO_KERNELS` | Use two kernels for Split-K | Launch another GPU kernel to do the final reduction | +---------------------------------------+---------------------------------+--------------------------------------------------------------------------+ ---- -------------------------------------------------------------------------------- :code:`cusparseLtPruneAlg_t` -------------------------------------------------------------------------------- | The enumerator specifies the pruning algorithm to apply to the structured matrix before the compression +--------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Value | Description | +================================+==================================================================================================================================================================================================================+ | `CUSPARSELT_PRUNE_SPMMA_TILE` | **-** `half`, `bfloat16`, `int8`: Zero-out eight values in a 4x4 tile to maximize the *L1-norm* of the resulting tile, under the constraint of selecting exactly two elements for each row and column | | | |nbsp| | | | | | | **-** `float`, `tf32`: Zero-out two values in a 2x2 tile to maximize the *L1-norm* of the resulting tile, under the constraint of selecting exactly one element for each row and column | +--------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `CUSPARSELT_PRUNE_SPMMA_STRIP` | **-** `half`, `bfloat16`, `int8`: Zero-out two values in a 1x4 strip to maximize the *L1-norm* of the resulting strip | | | |nbsp| | | | | | | **-** `float`, `tf32`: Zero-out one value in a 1x2 strip to maximize the *L1-norm* of the resulting strip | | | |nbsp| | | | | | | The strip direction is chosen according to the operation `op` and matrix layout applied to the structured (sparse) matrix | +--------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | The pruning algorithm is used in the :ref:`cusparseLtSpMMAPrune() ` function.