******************** cuSPARSELt Functions ******************** .. |handle| replace:: ``handle`` .. _handle: types.html#cusparselthandle-t .. |order| replace:: ``order`` .. _order: https://docs.nvidia.com/cuda/cusparse/index.html#cusparse-generic-enum-layout .. |valueType| replace:: ``valueType`` .. _valueType: https://docs.nvidia.com/cuda/cusparse/index.html#cusparse-generic-enum-data-types .. |sparsity| replace:: ``sparsity`` .. _sparsity: types.html#cusparseltsparsity-t .. |computeType| replace:: ``computeType`` .. _computeType: types.html#cusparsecomputetype .. |matDescr| replace:: ``matDescr`` .. _matDescr: types.html#cusparseltmatdescriptor-t .. |matMulDescr| replace:: ``matMulDescr`` .. _matMulDescr: types.html#cusparseltmatmuldescriptor-t .. |opA| replace:: ``opA`` .. _opA: https://docs.nvidia.com/cuda/cusparse/index.html#cusparseOperation_t .. |opB| replace:: ``opB`` .. _opB: https://docs.nvidia.com/cuda/cusparse/index.html#cusparseOperation_t .. |matA| replace:: ``matA`` .. _matA: types.html#cusparseltmatdescriptor-t .. |matB| replace:: ``matB`` .. _matB: types.html#cusparseltmatdescriptor-t .. |matC| replace:: ``matC`` .. _matC: types.html#cusparseltmatdescriptor-t .. |matD| replace:: ``matD`` .. _matD: types.html#cusparseltmatdescriptor-t .. |algSelection| replace:: ``algSelection`` .. _algSelection: types.html#cusparseltmatmulalgselection-t .. |alg| replace:: ``alg`` .. _alg: types.html#cusparseltmatmulalg-t .. |attribute| replace:: ``attribute`` .. _attribute: types.html#cusparseltmatmulalgattribute-t .. |plan| replace:: ``plan`` .. _plan: types.html#cusparseltmatmulplan-t .. |pruneAlg| replace:: ``pruneAlg`` .. _pruneAlg: types.html#cusparseltprunealg-t .. |CUSPARSE_COMPUTE_16F| replace:: ``CUSPARSE_COMPUTE_16F`` .. _CUSPARSE_COMPUTE_16F: types.html#cusparsecomputetype .. |CUSPARSE_COMPUTE_32I| replace:: ``CUSPARSE_COMPUTE_32I`` .. _CUSPARSE_COMPUTE_32I: types.html#cusparsecomputetype .. |CUSPARSELT_MATMUL_SEARCH_ITERATIONS| replace:: ``CUSPARSELT_MATMUL_SEARCH_ITERATIONS`` .. _CUSPARSELT_MATMUL_SEARCH_ITERATIONS: types.html#cusparseltmatmulalgattribute-t .. |CUSPARSELT_MATMUL_ALG_CONFIG_ID| replace:: ``CUSPARSELT_MATMUL_ALG_CONFIG_ID`` .. _CUSPARSELT_MATMUL_ALG_CONFIG_ID: types.html#cusparseltmatmulalgattribute-t ============================ Library Management Functions ============================ .. _cusparseLtInit-label: ---------------------- :code:`cusparseLtInit` ---------------------- .. code-block:: cpp cusparseStatus_t cusparseLtInit(cusparseLtHandle_t* handle) | The function initializes the cuSPARSELt library handle (``cusparseLtHandle_t``) which holds the cuSPARSELt library context. It allocates light hardware resources on the host, and must be called prior to making any other cuSPARSELt library calls. Calling any cusparseLt function which uses ``cusparseLtHandle_t`` without a previous call of ``cusparseLtInit()`` will return an error. | The cuSPARSELt library context is tied to the current CUDA device. To use the library on multiple devices, one cuSPARSELt handle should be created for each device. +-----------+--------+--------+---------------------------+ | Parameter | Memory | In/Out | Description | +===========+========+========+===========================+ | |handle|_ | Host | OUT | cuSPARSELt library handle | +-----------+--------+--------+---------------------------+ See `cusparseStatus_t `_ for the description of the return status. ---- .. _cusparseLtDestroy-label: ------------------------- :code:`cusparseLtDestroy` ------------------------- .. code-block:: cpp cusparseStatus_t cusparseLtDestroy(const cusparseLtHandle_t* handle) | The function releases hardware resources used by the cuSPARSELt library. This function is the last call with a particular handle to the cuSPARSELt library. | Calling any cusparseLt function which uses ``cusparseLtHandle_t`` after ``cusparseLtDestroy()`` will return an error. +-----------+--------+--------+---------------------------+ | Parameter | Memory | In/Out | Description | +===========+========+========+===========================+ | |handle|_ | Host | IN | cuSPARSELt library handle | +-----------+--------+--------+---------------------------+ See `cusparseStatus_t `_ for the description of the return status. ---- ================ Matmul Functions ================ .. _cusparseLtDenseDescriptorInit-label: ------------------------------------- :code:`cusparseLtDenseDescriptorInit` ------------------------------------- .. code-block:: cpp cusparseStatus_t cusparseLtDenseDescriptorInit(const cusparseLtHandle_t* handle, cusparseLtMatDescriptor_t* matDescr, int64_t rows, int64_t cols, int64_t ld, uint32_t alignment, cudaDataType valueType, cusparseOrder_t order) The function initializes the descriptor of a *dense* matrix. +--------------+--------+--------+---------------------------+ | Parameter | Memory | In/Out | Description | +==============+========+========+===========================+ | |handle|_ | Host | IN | cuSPARSELt library handle | +--------------+--------+--------+---------------------------+ | |matDescr|_ | Host | OUT | Dense matrix description | +--------------+--------+--------+---------------------------+ | `rows` | Host | IN | Number of rows | +--------------+--------+--------+---------------------------+ | `cols` | Host | IN | Number of columns | +--------------+--------+--------+---------------------------+ | `ld` | Host | IN | Leading dimension | +--------------+--------+--------+---------------------------+ | `alignment` | Host | IN | Memory alignment in bytes | +--------------+--------+--------+---------------------------+ | |valueType|_ | Host | IN | Data type of the matrix | +--------------+--------+--------+---------------------------+ | |order|_ | Host | IN | Memory layout | +--------------+--------+--------+---------------------------+ See `cusparseStatus_t `_ for the description of the return status. ---- .. _cusparseLtStructuredDescriptorInit-label: ------------------------------------------ :code:`cusparseLtStructuredDescriptorInit` ------------------------------------------ .. code-block:: cpp cusparseStatus_t cusparseLtStructuredDescriptorInit(const cusparseLtHandle_t* handle, cusparseLtMatDescriptor_t* matDescr, int64_t rows, int64_t cols, int64_t ld, uint32_t alignment, cudaDataType valueType, cusparseOrder_t order, cusparseLtSparsity_t sparsity) The function initializes the descriptor of a *structured* matrix. +--------------+--------+--------+---------------------------+ | Parameter | Memory | In/Out | Description | +==============+========+========+===========================+ | |handle|_ | Host | IN | cuSPARSELt library handle | +--------------+--------+--------+---------------------------+ | |matDescr|_ | Host | OUT | Dense matrix description | +--------------+--------+--------+---------------------------+ | `rows` | Host | IN | Number of rows | +--------------+--------+--------+---------------------------+ | `cols` | Host | IN | Number of columns | +--------------+--------+--------+---------------------------+ | `ld` | Host | IN | Leading dimension | +--------------+--------+--------+---------------------------+ | `alignment` | Host | IN | Memory alignment in bytes | +--------------+--------+--------+---------------------------+ | |valueType|_ | Host | IN | Data type of the matrix | +--------------+--------+--------+---------------------------+ | |order|_ | Host | IN | Memory layout | +--------------+--------+--------+---------------------------+ | |sparsity|_ | Host | IN | Matrix sparsity ratio | +--------------+--------+--------+---------------------------+ **Sparsity ratio** +----------------------------------+-------------------------------------------+ | Value | Description | +==================================+===========================================+ | `CUSPARSELT_SPARSITY_50_PERCENT` | 50% Sparsity Ratio (2:4 Sparse MMA) | +----------------------------------+-------------------------------------------+ See `cusparseStatus_t `_ for the description of the return status. ---- .. _cusparseLtMatmulDescriptorInit-label: -------------------------------------- :code:`cusparseLtMatmulDescriptorInit` -------------------------------------- .. code-block:: cpp cusparseStatus_t CUSPARSELT_API cusparseLtMatmulDescriptorInit(const cusparseLtHandle_t* handle, cusparseLtMatmulDescriptor_t* matMulDescr, cusparseOperation_t opA, cusparseOperation_t opB, const cusparseLtMatDescriptor_t* matA, const cusparseLtMatDescriptor_t* matB, const cusparseLtMatDescriptor_t* matC, const cusparseLtMatDescriptor_t* matD, cusparseComputeType computeType) The function initializes the *matrix multiplication* descriptor. +----------------+--------+--------+-----------------------------------+ | Parameter | Memory | In/Out | Description | +================+========+========+===================================+ | |handle|_ | Host | IN | cuSPARSELt library handle | +----------------+--------+--------+-----------------------------------+ | |matMulDescr|_ | Host | OUT | Matrix multiplication descriptor | +----------------+--------+--------+-----------------------------------+ | |opA|_ | Host | IN | Operation applied to the matrix A | +----------------+--------+--------+-----------------------------------+ | |opB|_ | Host | IN | Operation applied to the matrix B | +----------------+--------+--------+-----------------------------------+ | |matA|_ | Host | IN | Matrix A descriptor | +----------------+--------+--------+-----------------------------------+ | |matB|_ | Host | IN | Matrix B descriptor | +----------------+--------+--------+-----------------------------------+ | |matC|_ | Host | IN | Matrix C descriptor | +----------------+--------+--------+-----------------------------------+ | |matD|_ | Host | IN | Matrix D descriptor | +----------------+--------+--------+-----------------------------------+ | |computeType|_ | Host | IN | Compute precision | +----------------+--------+--------+-----------------------------------+ See `cusparseStatus_t `_ for the description of the return status. ---- .. _cusparseLtMatmulAlgSelectionInit-label: ---------------------------------------- :code:`cusparseLtMatmulAlgSelectionInit` ---------------------------------------- .. code-block:: cpp cusparseStatus_t cusparseLtMatmulAlgSelectionInit(const cusparseLtHandle_t* handle, cusparseLtMatmulAlgSelection_t* algSelection, const cusparseLtMatmulDescriptor_t* matmulDescr, cusparseLtMatmulAlg_t alg) The function initializes the *algorithm selection* descriptor. +-----------------+--------+--------+-----------------------------------+ | Parameter | Memory | In/Out | Description | +=================+========+========+===================================+ | |handle|_ | Host | IN | cuSPARSELt library handle | +-----------------+--------+--------+-----------------------------------+ | |algSelection|_ | Host | OUT | Algorithm selection descriptor | +-----------------+--------+--------+-----------------------------------+ | |matMulDescr|_ | Host | IN | Matrix multiplication descriptor | +-----------------+--------+--------+-----------------------------------+ | |alg|_ | Host | IN | Algorithm mode | +-----------------+--------+--------+-----------------------------------+ See `cusparseStatus_t `_ for the description of the return status. ---- .. _cusparseLtMatmulAlgSetAttribute-label: ---------------------------------------- :code:`cusparseLtMatmulAlgSetAttribute` ---------------------------------------- .. code-block:: cpp cusparseStatus_t cusparseLtMatmulAlgSetAttribute(const cusparseLtHandle_t* handle, cusparseLtMatmulAlgSelection_t* algSelection, cusparseLtMatmulAlgAttribute_t attribute, const void* data, size_t dataSize) The function sets the value of the specified attribute belonging to algorithm selection descriptor. +-----------------+--------+--------+---------------------------------------------------------------------+ | Parameter | Memory | In/Out | Description | +=================+========+========+=====================================================================+ | |handle|_ | Host | IN | cuSPARSELt library handle | +-----------------+--------+--------+---------------------------------------------------------------------+ | |algSelection|_ | Host | OUT | Algorithm selection descriptor | +-----------------+--------+--------+---------------------------------------------------------------------+ | |attribute|_ | Host | IN | The attribute that will be set by this function | +-----------------+--------+--------+---------------------------------------------------------------------+ | `data` | Host | IN | Pointer to the value to which the specified attribute will be set | +-----------------+--------+--------+---------------------------------------------------------------------+ | `dataSize` | Host | IN | Size in bytes of the attribute value used for verification | +-----------------+--------+--------+---------------------------------------------------------------------+ See `cusparseStatus_t `_ for the description of the return status. ---- .. _cusparseLtMatmulAlgGetAttribute-label: ---------------------------------------- :code:`cusparseLtMatmulAlgGetAttribute` ---------------------------------------- .. code-block:: cpp cusparseStatus_t cusparseLtMatmulAlgGetAttribute(const cusparseLtHandle_t* handle, const cusparseLtMatmulAlgSelection_t* algSelection, cusparseLtMatmulAlgAttribute_t attribute, void* data, size_t dataSize) The function returns the value of the queried attribute belonging to algorithm selection descriptor. +-----------------+--------+--------+--------------------------------------------------------------------------+ | Parameter | Memory | In/Out | Description | +=================+========+========+==========================================================================+ | |handle|_ | Host | IN | cuSPARSELt library handle | +-----------------+--------+--------+--------------------------------------------------------------------------+ | |algSelection|_ | Host | IN | Algorithm selection descriptor | +-----------------+--------+--------+--------------------------------------------------------------------------+ | |attribute|_ | Host | IN | The attribute that will be retrieved by this function | +-----------------+--------+--------+--------------------------------------------------------------------------+ | `data` | Host | OUT | Memory address containing the attribute value retrieved by this function | +-----------------+--------+--------+--------------------------------------------------------------------------+ | `dataSize` | Host | IN | Size in bytes of the attribute value used for verification | +-----------------+--------+--------+--------------------------------------------------------------------------+ See `cusparseStatus_t `_ for the description of the return status. ---- .. _cusparseLtMatmulGetWorkspace-label: ------------------------------------ :code:`cusparseLtMatmulGetWorkspace` ------------------------------------ .. code-block:: cpp cusparseStatus_t cusparseLtMatmulGetWorkspace(const cusparseLtHandle_t* handle, const cusparseLtMatmulAlgSelection_t* algSelection, size_t* workspaceSize) The function determines the required workspace size associated to the selected algorithm. +-----------------+--------+--------+-----------------------------------+ | Parameter | Memory | In/Out | Description | +=================+========+========+===================================+ | |handle|_ | Host | IN | cuSPARSELt library handle | +-----------------+--------+--------+-----------------------------------+ | |algSelection|_ | Host | IN | Algorithm selection descriptor | +-----------------+--------+--------+-----------------------------------+ | `workspaceSize` | Host | OUT | Workspace size in bytes | +-----------------+--------+--------+-----------------------------------+ See `cusparseStatus_t `_ for the description of the return status. ---- .. _cusparseLtMatmulPlanInit-label: -------------------------------- :code:`cusparseLtMatmulPlanInit` -------------------------------- .. code-block:: cpp cusparseStatus_t cusparseLtMatmulPlanInit(const cusparseLtHandle_t* handle, cusparseLtMatmulPlan_t* plan, const cusparseLtMatmulDescriptor_t* matmulDescr, const cusparseLtMatmulAlgSelection_t* algSelection, size_t workspaceSize) +-----------------+--------+--------+-----------------------------------+ | Parameter | Memory | In/Out | Description | +=================+========+========+===================================+ | |handle|_ | Host | IN | cuSPARSELt library handle | +-----------------+--------+--------+-----------------------------------+ | |plan|_ | Host | OUT | Matrix multiplication plan | +-----------------+--------+--------+-----------------------------------+ | |matmulDescr|_ | Host | IN | Matrix multiplication descriptor | +-----------------+--------+--------+-----------------------------------+ | |algSelection|_ | Host | IN | Algorithm selection descriptor | +-----------------+--------+--------+-----------------------------------+ | `workspaceSize` | Host | IN | Workspace size in bytes | +-----------------+--------+--------+-----------------------------------+ See `cusparseStatus_t `_ for the description of the return status. ---- .. _cusparseLtMatmulPlanDestroy-label: ----------------------------------- :code:`cusparseLtMatmulPlanDestroy` ----------------------------------- .. code-block:: cpp cusparseStatus_t cusparseLtMatmulPlanDestroy(const cusparseLtMatmulPlan_t* plan) | The function releases the resources used by an instance of the matrix multiplication plan. This function is the last call with a specific plan instance. | Calling any cusparseLt function which uses ``cusparseLtMatmulPlan_t`` after ``cusparseLtMatmulPlanDestroy()`` will return an error. +-----------------+--------+--------+-----------------------------------+ | Parameter | Memory | In/Out | Description | +=================+========+========+===================================+ | |plan|_ | Host | IN | Matrix multiplication plan | +-----------------+--------+--------+-----------------------------------+ See `cusparseStatus_t `_ for the description of the return status. ---- .. _cusparseLtMatmul-label: ------------------------ :code:`cusparseLtMatmul` ------------------------ .. code-block:: cpp cusparseStatus_t cusparseLtMatmul(const cusparseLtHandle_t* handle, const cusparseLtMatmulPlan_t* plan, const void* alpha, const void* d_A, const void* d_B, const void* beta, const void* d_C, void* d_D, void* workspace, cudaStream_t* streams, int32_t numStreams) The function computes the matrix multiplication of matrices `A` and `B` to produce the the output matrix `D`, according to the following operation: .. math:: D = \alpha op(A) * op(B) + \beta op(C) | where `A`, `B`, and `C` are input matrices, and :math:`\alpha` and :math:`\beta` are input scalars. | **Note**: The function currently only supports the case where ``C == D`` +-----------------+--------+--------+-------------------------------------------------------------------+ | Parameter | Memory | In/Out | Description | +=================+========+========+===================================================================+ | |handle|_ | Host | IN | cuSPARSELt library handle | +-----------------+--------+--------+-------------------------------------------------------------------+ | |plan|_ | Host | IN | Matrix multiplication plan | +-----------------+--------+--------+-------------------------------------------------------------------+ | `alpha` | Host | IN | :math:`\alpha` scalar used for multiplication (`float` data type) | +-----------------+--------+--------+-------------------------------------------------------------------+ | `d_A` | Device | IN | Pointer to the structured matrix `A` | +-----------------+--------+--------+-------------------------------------------------------------------+ | `d_B` | Device | IN | Pointer to the dense matrix `B` | +-----------------+--------+--------+-------------------------------------------------------------------+ | `beta` | Host | IN | :math:`\beta` scalar used for multiplication (`float` data type) | +-----------------+--------+--------+-------------------------------------------------------------------+ | `d_C` | Device | OUT | Pointer to the dense matrix `C` | +-----------------+--------+--------+-------------------------------------------------------------------+ | `d_D` | Device | OUT | Pointer to the dense matrix `D` | +-----------------+--------+--------+-------------------------------------------------------------------+ | `workspace` | Device | IN | Pointer to workspace | +-----------------+--------+--------+-------------------------------------------------------------------+ | `streams` | Host | IN | Pointer to CUDA stream array for the computation | +-----------------+--------+--------+-------------------------------------------------------------------+ | `numStreams` | Host | IN | Number of CUDA streams in `streams` | +-----------------+--------+--------+-------------------------------------------------------------------+ **Datatypes Supported:** +---------------+---------------+-------------------------+ | Input | Ouput | Compute | +===============+===============+=========================+ | `CUDA_R_16F` | `CUDA_R_16F` | |CUSPARSE_COMPUTE_16F|_ | +---------------+---------------+-------------------------+ | `CUDA_R_16BF` | `CUDA_R_16BF` | |CUSPARSE_COMPUTE_16F|_ | +---------------+---------------+-------------------------+ | `CUDA_R_8I` | `CUDA_R_8I` | |CUSPARSE_COMPUTE_32I|_ | +---------------+---------------+-------------------------+ The *structured matrix* `A` (compressed) must respect the following constrains: * For ``opA = CUSPARSE_NON_TRANSPOSE``, each row must have at least two non-zero values every four elements * For ``opA = CUSPARSE_TRANSPOSE``, each column must have at least two non-zero values every four elements The correctness of the pruning result (matrix `A`) can be check with the function :ref:`cusparseLtSpMMAPruneCheck() `. **Limitations:** * All pointers must be aligned to 16 bytes * For `CUDA_R_16F` and `CUDA_R_16BF` data types, the total size of the matrix cannot exceed :math:`2^{31}-1` elements * For `CUDA_R_8I` data type, the total size of the matrix cannot exceed :math:`2^{32}-1` elements * `CUDA_R_8I` data type only supports: * ``opA/opB = TN`` if the matrix orders are ``orderA/orderB = Col/Col`` * ``opA/opB = NT`` if the matrix orders are ``orderA/orderB = Row/Row`` * ``opA/opB = NN`` if the matrix orders are ``orderA/orderB = Row/Col`` * ``opA/opB = TT`` if the matrix orders are ``orderA/orderB = Col/Row`` * Given `A` of size :math:`m \times k`, `B` of size :math:`k \times n`, and `C` of size :math:`m \times n` (regardless `opA`, `opB`) * :math:`k` must be a multiple of 32 * For `CUDA_R_16F` and `CUDA_R_16BF` data types, :math:`m, n` must be a multiple of 8 * For `CUDA_R_8I`, :math:`m, n` must be a multiple of 16 **Properties** * The routine requires no extra storage * The routine supports asynchronous execution with respect to `streams[0]` * Provides deterministic (bit-wise) results for each run `cusparseLtMatmul` supports the following `optimizations `_: * CUDA graph capture * Hardware Memory Compression See `cusparseStatus_t `_ for the description of the return status. ---- .. _cusparseLtMatmulSearch-label: ------------------------------ :code:`cusparseLtMatmulSearch` ------------------------------ .. code-block:: cpp cusparseStatus_t cusparseLtMatmulSearch(const cusparseLtHandle_t* handle, cusparseLtMatmulPlan_t* plan, const void* alpha, const void* d_A, const void* d_B, const void* beta, const void* d_C, void* d_D, void* workspace, cudaStream_t* streams, int32_t numStreams) | The function evaluates all available algorithms for the matrix multiplication and automatically updates the `plan` by selecting the fastest one. The functionality is intended to be used for auto-tuning purposes when the same operation is repeated multiple times over different inputs. | The function behavior is the same of :ref:`cusparseLtMatmul() `. * The function is *NOT* asynchronous with respect to `streams[0]` (*blocking call*) * The number of iterations for the evaluation can be set by using :ref:`cusparseLtMatmulAlgSetAttribute() ` with |CUSPARSELT_MATMUL_SEARCH_ITERATIONS|_. * The selected algorithm id can be retrived by using :ref:`cusparseLtMatmulAlgGetAttribute() ` with |CUSPARSELT_MATMUL_ALG_CONFIG_ID|_. ---- ================ Helper Functions ================ .. _cusparseLtSpMMAPrune-label: ---------------------------- :code:`cusparseLtSpMMAPrune` ---------------------------- .. code-block:: cpp cusparseStatus_t cusparseLtSpMMAPrune(const cusparseLtHandle_t* handle, const cusparseLtMatmulDescriptor_t* matmulDescr, const void* d_in, void* d_out, cusparseLtPruneAlg_t pruneAlg, cudaStream_t stream) The function prunes a dense matrix `d_in` according to the specified algorithm `pruneAlg`. +-----------------+--------+--------+-----------------------------------+ | Parameter | Memory | In/Out | Description | +=================+========+========+===================================+ | |handle|_ | Host | IN | cuSPARSELt library handle | +-----------------+--------+--------+-----------------------------------+ | |matmulDescr|_ | Host | IN | Matrix multiplication descriptor | +-----------------+--------+--------+-----------------------------------+ | `d_in` | Device | IN | Pointer to the dense matrix | +-----------------+--------+--------+-----------------------------------+ | `d_out` | Device | OUT | Pointer to the pruned matrix | +-----------------+--------+--------+-----------------------------------+ | |pruneAlg|_ | Device | IN | Pruning algorithm | +-----------------+--------+--------+-----------------------------------+ | `stream` | Host | IN | CUDA stream for the computation | +-----------------+--------+--------+-----------------------------------+ **Pruning Algorithms** +--------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Value | Description | +================================+================================================================================================================================================================================+ | `CUSPARSELT_PRUNE_SPMMA_TILE` | Zero-out eight values in a 4x4 tile to maximize the *L1-norm* of the resulting tile, under the constraint of selecting exactly two elements for each row and column | +--------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `CUSPARSELT_PRUNE_SPMMA_STRIP` | Zero-out two values in a 1x4 strip to maximize the *L1-norm* of the resulting strip. The strip direction is chosen according to the operation `opA` specified in `matmulDescr` | +--------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ **Properties** * The routine requires no extra storage * The routine supports asynchronous execution with respect to `stream` * Provides deterministic (bit-wise) results for each run `cusparseLtSpMMAPrune` supports the following `optimizations `_: * CUDA graph capture * Hardware Memory Compression See `cusparseStatus_t `_ for the description of the return status. ---- .. _cusparseLtSpMMAPruneCheck-label: --------------------------------- :code:`cusparseLtSpMMAPruneCheck` --------------------------------- .. code-block:: cpp cusparseStatus_t cusparseLtSpMMAPruneCheck(const cusparseLtHandle_t* handle, const cusparseLtMatmulDescriptor_t* matmulDescr, const void* d_in, int* valid, cudaStream_t stream) The function checks the correctness of the pruning structure for a given matrix. +-----------------+--------+--------+----------------------------------------------+ | Parameter | Memory | In/Out | Description | +=================+========+========+==============================================+ | |handle|_ | Host | IN | cuSPARSELt library handle | +-----------------+--------+--------+----------------------------------------------+ | |matmulDescr|_ | Host | IN | Matrix multiplication descriptor | +-----------------+--------+--------+----------------------------------------------+ | `d_in` | Device | IN | Pointer to the matrix to check | +-----------------+--------+--------+----------------------------------------------+ | `valid` | Host | OUT | Validation results (`0` correct, `1` wrong) | +-----------------+--------+--------+----------------------------------------------+ | `stream` | Host | IN | CUDA stream for the computation | +-----------------+--------+--------+----------------------------------------------+ See `cusparseStatus_t `_ for the description of the return status. ---- .. _cusparseLtSpMMACompressedSize-label: ------------------------------------- :code:`cusparseLtSpMMACompressedSize` ------------------------------------- .. code-block:: cpp cusparseStatus_t cusparseLtSpMMACompressedSize(const cusparseLtHandle_t* handle, const cusparseLtMatmulPlan_t* plan, size_t* compressedSize) The function provides the size of the *compressed* matrix to be allocated before calling :ref:`cusparseLtSpMMACompress() `. +------------------+--------+--------+----------------------------------------+ | Parameter | Memory | In/Out | Description | +==================+========+========+========================================+ | |handle|_ | Host | IN | cuSPARSELt library handle | +------------------+--------+--------+----------------------------------------+ | |plan|_ | Host | IN | Matrix plan descriptor | +------------------+--------+--------+----------------------------------------+ | `compressedSize` | Host | OUT | Size in bytes of the compressed matrix | +------------------+--------+--------+----------------------------------------+ See `cusparseStatus_t `_ for the description of the return status. ---- .. _cusparseLtSpMMACompress-label: ------------------------------- :code:`cusparseLtSpMMACompress` ------------------------------- .. code-block:: cpp cusparseStatus_t cusparseLtSpMMACompress(const cusparseLtHandle_t* handle, const cusparseLtMatmulPlan_t* plan, const void* d_dense, void* d_compressed, cudaStream_t stream) The function compresses a dense matrix `d_dense`. The *compressed* matrix is intended to be used as the first operand `A` in the :ref:`cusparseLtMatmul() ` function. +----------------+--------+--------+---------------------------------------------------+ | Parameter | Memory | In/Out | Description | +================+========+========+===================================================+ | |handle|_ | Host | IN | cuSPARSELt library handle | +----------------+--------+--------+---------------------------------------------------+ | |plan|_ | Host | IN | Matrix multiplication plan | +----------------+--------+--------+---------------------------------------------------+ | `d_dense` | Device | IN | Pointer to the dense matrix | +----------------+--------+--------+---------------------------------------------------+ | `d_compressed` | Device | OUT | Pointer to the *compressed* matrix | +----------------+--------+--------+---------------------------------------------------+ | `stream` | Host | IN | CUDA stream for the computation | +----------------+--------+--------+---------------------------------------------------+ **Properties** * The routine requires no extra storage * The routine supports asynchronous execution with respect to `stream` * Provides deterministic (bit-wise) results for each run `cusparseLtSpMMACompress` supports the following `optimizations `_: * CUDA graph capture * Hardware Memory Compression See `cusparseStatus_t `_ for the description of the return status.