Getting Started¶

In this section, we show how to solve a linear system with a sparse system matrix using cuDSS. We first introduce an overview of the workflow by showing the main steps to set up the computation. Then, we describe how to install the library and how to compile an example which uses cuDSS. Lastly, we present a step by step code example (without the MGMN mode with additional comments.

cuDSS Workflow
Installation and Compilation
- Linux
- Windows
Code Example

cuDSS Workflow¶

cuDSS is following a commonly used multi-stage approach in sparse direct solvers. The process of solving a linear system is split into several phases: reordering and symbolic factorization, numerical factorization (and, optionally, refactorization) and solving (with optional iterative refinement).

However, before calling the main cuDSS API cudssExecute(), a setup is necessary.

One of the simplest yet complete workflow consists of the following steps:

0. Prerequisites: User-allocated device memory buffers for the sparse matrix of the linear system, values of dense right-hand side matrix and solution.

1. Initialize the library handle: cudssCreate().

2. Create the matrix wrappers for the sparse matrix of the system, solution and right-hand side matrices: cudssCreateMatrixDn(), cudssCreateMatrixCsr().

3. Create the opaque cuDSS objects for the solver settings (cudssConfig_t) and data (cudssData_t): cudssConfigCreate(), cudssDataCreate().

4. Apply any extra settings (like setting CUDA stream to execute with, solver settings and data parameters): cudssSetStream(), cudssConfigSet(), cudssDataSet.

5. Execute the reordering/symbolic factorization step: cudssExecute() with CUDSS_PHASE_ANALYSIS.

6. Execute the factorization step: cudssExecute() with CUDSS_PHASE_FACTORIZATION.

7. Execute the solving step: cudssExecute() with CUDSS_PHASE_SOLVE.

8. Destroy the opaque objects (cuDSS matrices, configs and data): cudssMatrixDestroy(), cudssConfigDestroy(), cudssDataDestroy().

9. Destroy the library handle: cudssDestroy().

For further details, see cuDSS Types and cuDSS Functions.

Installation and Compilation¶

Download the cuDSS package from developer.nvidia.com/cudss-downloads

Prerequisites¶

CUDA 12.x toolkit (or above) and compatible driver (see CUDA Driver Release Notes).
Dependencies: cudart

Linux¶

Assuming cuDSS has been extracted in CUDSS_DIR, we update the library path accordingly:

export LD_LIBRARY_PATH=${CUDSS_DIR}/lib:${LD_LIBRARY_PATH}

To compile the sample code we will discuss below (cudss_simple.cpp),

nvcc cudss_simple.cpp -I${CUDSS_DIR}/include -L${CUDSS_DIR}/lib -lcudss -o cudss_simple

Note that the previous command links cuDSS as a shared library. Linking the code with the static version of the library requires additional flags:

nvcc cudss_simple.cpp -I${CUDSS_DIR}/include                               \
                      -Xlinker=${CUDSS_DIR}/lib/libcudss_static.a   \
                      -o cudss_simple_static

Windows¶

Assuming cuDSS has been extracted in CUDSS_DIR, we update the library path accordingly:

setx PATH "%CUDSS_DIR%\lib:%PATH%"

To compile the sample code we will discuss below (cudss_simple.cpp),

nvcc.exe cudss_simple.cpp -I "%CUDSS_DIR%\include" -lcudss -o cudss_simple.exe

Note that the previous command links cuDSS as a shared library. Linking the code with the static version of the library requires additional flags:

nvcc.exe cudss_simple.cpp -I %CUDSS_DIR%\include                                         \
                          -Xlinker=/WHOLEARCHIVE:"%CUDSS_DIR%\lib\cudss.lib" \
                          -Xlinker=/FORCE -o cudss_simple_static.exe

Code Example¶The following code example shows the common steps to use cuDSS and solve a sparse linear system.
The full code can be found in cuDSS Example 1.
#include <cudss.h> // cuDSS header

// Device pointers and scalar shape parameters, matrix properties
int*    rowOffsets = ...
int*    colIndices = ...
double* values     = ...
double* bvalues    = ...
double* xvalues    = ...

//---------------------------------------------------------------------------------
// cuDSS data structures and handle initialization
cudssHandle_t             handle;
cudssConfig_t             config;
cudssData_t               data;
cudssMatrix_t             A;
cudssMatrix_t             b;
cudssMatrix_t             x;

cudssCreate(&handle);

cudssConfigCreate(&config);
cudssDataCreate(handle, &data);
cudssMatrixCreateCsr(&A, ... rowOffsets, colIndices, values, ...);
cudssMatrixCreateDn(&b, ... bvalues, ...);
cudssMatrixCreateDn(&x, ... xvalues, ...);

//---------------------------------------------------------------------------------
// (optional) Modifying solver settings, e.g., reordering algorithm
cudssAlgType_t reorder_alg = CUDSS_ALG_DEFAULT;
cudssConfigSet(config, CUDSS_REORDERING_ALG, &reorder_alg, sizeof(cudssAlgType_t));

//---------------------------------------------------------------------------------
// Reordering & symbolic factorization
cudssExecute(handle, CUDSS_PHASE_ANALYSIS, config, data, A, x, b);

//---------------------------------------------------------------------------------
// Numerical factorization
cudssExecute(handle, CUDSS_PHASE_FACTORIZATION, config, data, A, x, b);

//---------------------------------------------------------------------------------
// Solving the system
cudssExecute(handle, CUDSS_PHASE_SOLVE, config, data, A, x, b);

//---------------------------------------------------------------------------------
// (optional) Extra data can be retrieved from the cudssData_t object
// For example, diagonal of the factorized matrix or the reordering permutation

//---------------------------------------------------------------------------------
// Destroy the opaque objects
cudssConfigDestroy(config);
cudssDataDestroy(handle, data);
cudssMatrixDestroy(A);
cudssMatrixDestroy(x);
cudssMatrixDestroy(b);
cudssDestroy(handle);

//---------------------------------------------------------------------------------
// The solution of the system can now be accessed via the user-allocated device pointer xvalues

// ...