Getting Started#
In this section, we show how to solve a linear system with a sparse system matrix using cuDSS. We first introduce an overview of the workflow by showing the main steps to set up the computation. Then, we describe how to install the library and how to compile an example which uses cuDSS. Lastly, we present a step by step code example (without the MGMN mode with additional comments.
cuDSS Workflow#
cudssExecute()
, a setup is necessary.One of the simplest yet complete workflow consists of the following steps:
0. Prerequisites: User-allocated device memory buffers for the sparse matrix of the linear system, values of dense right-hand side matrix and solution.1. Initialize the library handle:cudssCreate()
.2. Create the matrix wrappers for the sparse matrix of the system, solution and right-hand side matrices:cudssCreateMatrixDn()
,cudssCreateMatrixCsr()
.3. Create the opaque cuDSS objects for the solver settings (cudssConfig_t
) and data (cudssData_t
):cudssConfigCreate()
,cudssDataCreate()
.4. Apply any extra settings (like setting CUDA stream to execute with, solver settings and data parameters):cudssSetStream()
,cudssConfigSet()
,cudssDataSet
.5. Execute the reordering/symbolic factorization step:cudssExecute()
withCUDSS_PHASE_ANALYSIS
.6. Execute the factorization step:cudssExecute()
withCUDSS_PHASE_FACTORIZATION
.7. Execute the solving step:cudssExecute()
withCUDSS_PHASE_SOLVE
.8. Destroy the opaque objects (cuDSS matrices, configs and data):cudssMatrixDestroy()
,cudssConfigDestroy()
,cudssDataDestroy()
.9. Destroy the library handle:cudssDestroy()
.
For further details, see cuDSS Types and cuDSS Functions.
Installation and Compilation#
Download the cuDSS package from developer.nvidia.com/cudss-downloads
Prerequisites#
CUDA 12.x toolkit (or above) and compatible driver (see Table 2 and 3).
Dependencies:
cudart
,cublas
Linux#
Assuming cuDSS has been extracted in CUDSS_DIR
and CUDA Toolkit (CTK) is in CTK_DIR
,
we update the library path accordingly:
export LD_LIBRARY_PATH=${CUDSS_DIR}/lib:${CTK_DIR}/lib64:${LD_LIBRARY_PATH}
To compile the sample code we will discuss below (cudss_simple.cpp
),
nvcc cudss_simple.cpp -I${CUDSS_DIR}/include -L${CUDSS_DIR}/lib -lcudss -o cudss_simple
Note: if pip wheels are used to install cudss, since symlinks are not supported by the wheels, one should replace -lcudss
above with -l:libcudss.so.0
.
The previous command links cuDSS
as a shared library. Linking the code with the static version of the library requires additional flags:
nvcc cudss_simple.cpp -I${CUDSS_DIR}/include \
-Xlinker=${CUDSS_DIR}/lib/libcudss_static.a \
-o cudss_simple_static
Windows#
Assuming cuDSS has been extracted in CUDSS_DIR
and CUDA Toolkit (CTK) is in CTK_DIR
, we update the library path accordingly:
setx PATH "%CUDSS_DIR%\lib:%CTK_DIR%\lib64:%PATH%"
To compile the sample code we will discuss below (cudss_simple.cpp
),
nvcc.exe cudss_simple.cpp -I "%CUDSS_DIR%\include" -lcudss -o cudss_simple.exe
Note that the previous command links cuDSS
as a shared library. Linking the code with the static version of the library requires additional flags:
nvcc.exe cudss_simple.cpp -I %CUDSS_DIR%\include \
-Xlinker=/WHOLEARCHIVE:"%CUDSS_DIR%\lib\cudss.lib" \
-Xlinker=/FORCE -o cudss_simple_static.exe
Code Example#
#include <cudss.h> // cuDSS header
// Device pointers and scalar shape parameters, matrix properties
int* rowOffsets = ...
int* colIndices = ...
double* values = ...
double* bvalues = ...
double* xvalues = ...
//---------------------------------------------------------------------------------
// cuDSS data structures and handle initialization
cudssHandle_t handle;
cudssConfig_t config;
cudssData_t data;
cudssMatrix_t A;
cudssMatrix_t b;
cudssMatrix_t x;
cudssCreate(&handle);
cudssConfigCreate(&config);
cudssDataCreate(handle, &data);
cudssMatrixCreateCsr(&A, ... rowOffsets, colIndices, values, ...);
cudssMatrixCreateDn(&b, ... bvalues, ...);
cudssMatrixCreateDn(&x, ... xvalues, ...);
//---------------------------------------------------------------------------------
// (optional) Modifying solver settings, e.g., reordering algorithm
cudssAlgType_t reorder_alg = CUDSS_ALG_DEFAULT;
cudssConfigSet(config, CUDSS_REORDERING_ALG, &reorder_alg, sizeof(cudssAlgType_t));
//---------------------------------------------------------------------------------
// Reordering & symbolic factorization
cudssExecute(handle, CUDSS_PHASE_ANALYSIS, config, data, A, x, b);
//---------------------------------------------------------------------------------
// Numerical factorization
cudssExecute(handle, CUDSS_PHASE_FACTORIZATION, config, data, A, x, b);
//---------------------------------------------------------------------------------
// Solving the system
cudssExecute(handle, CUDSS_PHASE_SOLVE, config, data, A, x, b);
//---------------------------------------------------------------------------------
// (optional) Extra data can be retrieved from the cudssData_t object
// For example, diagonal of the factorized matrix or the reordering permutation
//---------------------------------------------------------------------------------
// Destroy the opaque objects
cudssConfigDestroy(config);
cudssDataDestroy(handle, data);
cudssMatrixDestroy(A);
cudssMatrixDestroy(x);
cudssMatrixDestroy(b);
cudssDestroy(handle);
//---------------------------------------------------------------------------------
// The solution of the system can now be accessed via the user-allocated device pointer xvalues
// ...