Getting Started

In this section, we show how to implement a sparse matrix-matrix multiplication using cuSPARSELt. We first introduce an overview of the workflow by showing the main steps to set up the computation. Then, we describe how to install the library and how to compile it. Lastly, we present a step by step code example with additional comments.

cuSPARSELt Workflow

cuSPARSELt follows an equivalent approach and adopts similar concepts to cuBLASLt and cuTENSOR. The library programming model requires organizing the computation in such a way the same setup can be repeatedly used for different inputs.
In particular, the model relies on in following high-level stages:
A. Problem definition: Specify matrices shapes, data types, operations, etc.
B. User preferences/constraints: User algorithm selection or limit search space of viable implementations (candidates)
C. Plan: Gather descriptors for the execution and “find” the best implementation if needed
D. Execution: Perform the actual computation

More in detail, the common workflow consists in the following steps:

1. Initialize the library handle: cusparseLtInit().
2. Specify the input/output matrices characteristics: cusparseLtDenseDescriptorInit(), cusparseLtStructuredDescriptorInit().
3. Initialize the matrix multiplication descriptor and its properties (e.g. operations, compute type, etc.): cusparseLtMatmulDescriptorInit().
4. Initialize the algorithm selection descriptor: cusparseLtMatmulAlgSelectionInit().
5. Initialize the matrix multiplication plan: cusparseLtMatmulPlanInit().
6. Prune the A matrix: cusparseLtSpMMAPrune(). This step is not needed if the user provides a customized matrix pruning.
7. Compress the pruned matrix: cusparseLtSpMMACompress().
8. Execute the matrix multiplication: cusparseLtMatmul(). This step can be repeated multiple times with different inputs.
9. Destroy the matrix multiplication plan and the library handle: cusparseLtMatmulPlanDestroy() cusparseLtDestroy().
workflow

Installation and Compilation

Assuming cuSPARSELt package has been downloaded and extracted in CUSPARSELT_DIR, we update the library path accordingly:

export LD_LIBRARY_PATH=${CUSPARSELT_DIR}/lib64:${LD_LIBRARY_PATH}

To compile the sample code we will discuss below (spmma_example.cu),

nvcc spmma_example.cu -I${CUSPARSELT_DIR}/include -lcusparseLt -o spmma_example

Note that previous command links cusparseLt as a shared library. Linking the code with the static version of the library requires additional flags:

nvcc spmma_example_static.cu -I${CUSPARSELT_DIR}/include                        \
                             -Xlinker=--whole-archive                           \
                             -Xlinker=${CUSPARSELT_DIR}/libcusparseLt_static.a  \
                             -Xlinker=--no-whole-archive -o spmma_example_static

Code Example

The following code example shows the common steps to use cuSPARSELt and performs the matrix multiplication.
The full code can be found in the CUDALibrarySamples repository.
#include <cusparseLt.h> // cusparseLt header

// Device pointers and coefficient definitions
float alpha = 1.0f;
float beta  = 0.0f;
__half* dA = ...
__half* dB = ...
__half* dC = ...

//--------------------------------------------------------------------------
// cusparseLt data structures and handle initialization
cusparseLtHandle_t             handle;
cusparseLtMatDescriptor_t      matA, matB, matC;
cusparseLtMatmulDescriptor_t   matmul;
cusparseLtMatmulAlgSelection_t alg_sel;
cusparseLtMatmulPlan_t         plan;
cudaStream_t                   stream = nullptr;
cusparseLtInit(&handle);

//--------------------------------------------------------------------------
// matrix descriptor initilization
cusparseLtStructuredDescriptorInit(&handle, &matA, num_A_rows, num_A_cols,
                                   lda, alignment, type, order,
                                   CUSPARSELT_SPARSITY_50_PERCENT);
cusparseLtDenseDescriptorInit(&handle, &matB, num_B_rows, num_B_cols, ldb,
                              alignment, type, order);
cusparseLtDenseDescriptorInit(&handle, &matC, num_C_rows, num_C_cols, ldc,
                              alignment, type, order);

//--------------------------------------------------------------------------
// matmul, algorithm selection, and plan initilization
cusparseLtMatmulDescriptorInit(&handle, &matmul, opA, opB, &matA, &matB,
                               &matC, &matC, compute_type);
cusparseLtMatmulAlgSelectionInit(&handle, &alg_sel, &matmul,
                                 CUSPARSELT_MATMUL_ALG_DEFAULT);
int alg = 0; // set algorithm ID
cusparseLtMatmulAlgSetAttribute(&handle, &alg_sel,
                                CUSPARSELT_MATMUL_ALG_CONFIG_ID,
                                &alg, sizeof(alg));
size_t workspace_size, compressed_size;
cusparseLtMatmulGetWorkspace(&handle, &alg_sel, &workspace_size);

cusparseLtMatmulPlanInit(&handle, &plan, &matmul, &alg_sel, workspace_size);

//--------------------------------------------------------------------------
// Prune the A matrix (in-place) and check the correcteness
cusparseLtSpMMAPrune(&handle, &matmul, dA, dA, CUSPARSELT_PRUNE_SPMMA_TILE,
                     stream);
int is_valid;
cusparseLtSpMMAPruneCheck(&handle, &matmul, dA, &is_valid, stream);
if (is_valid != 0) {
    std::printf("!!!! The matrix has been pruned in a wrong way. "
                "cusparseLtMatmul will not provided correct results\n");
    return EXIT_FAILURE;
}

//--------------------------------------------------------------------------
// Matrix A compression
cusparseLtSpMMACompressedSize(&handle, &plan, &compressed_size);
cudaMalloc((void**) &dA_compressed, compressed_size);

cusparseLtSpMMACompress(&handle, &plan, dA, dA_compressed, stream);

//--------------------------------------------------------------------------
// Perform the matrix multiplication
void*         d_workspace = nullptr;
int           num_streams = 0;
cudaStream_t* streams     = nullptr;

cusparseLtMatmul(&handle, &plan, &alpha, dA_compressed, dB, &beta, dC, dD,
                 d_workspace, streams, num_streams) )

//--------------------------------------------------------------------------
// Destroy plan and handle
cusparseLtMatmulPlanDestroy(&plan);
cusparseLtDestroy(&handle);