cuDSS 0.8.0 Migration Guide#
This guide describes the breaking API changes introduced in cuDSS 0.8.0 and explains how to update code written against previous versions of cuDSS to work with the new release.
For a summary of all changes in 0.8.0, including new features and bug fixes, see the release notes.
Replacing cudaDataType_t with cudssDataType_t#
All matrix creation and query functions now take cudssDataType_t
in place of cudaDataType_t. The enum values map one-to-one:
Old (cudaDataType) |
New (cudssDataType) |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
Before:
#include <library_types.h> /* or cuda_runtime_api.h */
#include <cudss.h>
cudssMatrix_t A, x, b;
/* Sparse CSR system matrix with double-precision values, int32 indices */
cudssMatrixCreateCsr(&A, nrows, ncols, nnz,
rowOffsets, /*rowEnd=*/NULL, colIndices, values,
CUDA_R_32I, CUDA_R_64F,
CUDSS_MTYPE_GENERAL, CUDSS_MVIEW_FULL,
CUDSS_BASE_ZERO);
/* Dense right-hand side and solution */
cudssMatrixCreateDn(&b, nrows, 1, nrows, bvalues, CUDA_R_64F, CUDSS_LAYOUT_COL_MAJOR);
cudssMatrixCreateDn(&x, nrows, 1, nrows, xvalues, CUDA_R_64F, CUDSS_LAYOUT_COL_MAJOR);
After (0.8.0):
#include <cudss.h>
cudssMatrix_t A, x, b;
/* Replace CUDA_R_* with CUDSS_R_* equivalents; also add the new offsetType
argument (here the same as indexType for uniform 32-bit indexing) */
cudssMatrixCreateCsr(&A, nrows, ncols, nnz,
rowOffsets, /*rowEnd=*/NULL, colIndices, values,
CUDSS_R_32I, CUDSS_R_32I, CUDSS_R_64F,
CUDSS_MTYPE_GENERAL, CUDSS_MVIEW_FULL,
CUDSS_BASE_ZERO);
cudssMatrixCreateDn(&b, nrows, 1, nrows, bvalues, CUDSS_R_64F, CUDSS_LAYOUT_COL_MAJOR);
cudssMatrixCreateDn(&x, nrows, 1, nrows, xvalues, CUDSS_R_64F, CUDSS_LAYOUT_COL_MAJOR);
Adding offsetType parameter in CSR matrix functions#
All CSR matrix creation and query functions (cudssMatrixCreateCsr,
cudssMatrixGetCsr, cudssMatrixCreateBatchCsr, cudssMatrixGetBatchCsr)
take a new offsetType parameter immediately before indexType.
offsetType is the type of the row-offset arrays; indexType is the type
of the column-index array. Splitting them allows mixed-width indexing — for
example, 64-bit row offsets with 32-bit column indices — for matrices whose
total nonzero count exceeds the int32 range.
When both arrays use the same type, pass the same value for both parameters:
Before:
/* Single indexType covers both row offsets and column indices */
cudssMatrixCreateCsr(&A, nrows, ncols, nnz,
rowOffsets, /*rowEnd=*/NULL, colIndices, values,
CUDA_R_32I, /* indexType */
CUDA_R_64F, /* valueType */
CUDSS_MTYPE_GENERAL, CUDSS_MVIEW_FULL,
CUDSS_BASE_ZERO);
After (0.8.0):
/* offsetType and indexType are separate; for uniform 32-bit indexing
both are set to CUDSS_R_32I */
cudssMatrixCreateCsr(&A, nrows, ncols, nnz,
rowOffsets, /*rowEnd=*/NULL, colIndices, values,
CUDSS_R_32I, /* offsetType: type of rowOffsets / rowEnd */
CUDSS_R_32I, /* indexType: type of colIndices */
CUDSS_R_64F, /* valueType */
CUDSS_MTYPE_GENERAL, CUDSS_MVIEW_FULL,
CUDSS_BASE_ZERO);
For mixed-width indexing (e.g., 64-bit row offsets with 32-bit column indices) pass different types:
cudssMatrixCreateCsr(&A, nrows, ncols, nnz,
rowOffsets, /*rowEnd=*/NULL, colIndices, values,
CUDSS_R_64I, /* offsetType: 64-bit row offsets */
CUDSS_R_32I, /* indexType: 32-bit column indices */
CUDSS_R_64F, /* valueType */
CUDSS_MTYPE_GENERAL, CUDSS_MVIEW_FULL,
CUDSS_BASE_ZERO);
Restructuring algorithm enums#
The generic cudssAlgType_t (with values CUDSS_ALG_DEFAULT,
CUDSS_ALG_1, …, CUDSS_ALG_5) has been removed. Each configuration parameter
that previously accepted it now uses a dedicated enum with semantically named
values:
Config parameter |
Old type |
New type (0.8.0) |
|---|---|---|
|
||
|
||
|
||
|
||
|
Mappings from the old cudssAlgType_t values to each new enum follow. See
cuDSS Data Types for full descriptions of the new values.
Note
In most cases, previous use of CUDSS_ALG_DEFAULT should be replaced with
the new appropriate DEFAULT enum value. New semantic names for the
default values have been introduced and can be chosen to ensure those
algorithms will be used. Future versions of cuDSS may change the meaning of
DEFAULT depending on the values of other config parameters.
Reordering#
cudssReorderingAlg_t (set with CUDSS_CONFIG_REORDERING_ALG):
Old |
New (0.8.0) |
|---|---|
|
|
|
|
|
|
|
|
(not available) |
Before:
cudssAlgType_t reorder_alg = CUDSS_ALG_DEFAULT;
cudssConfigSet(config, CUDSS_CONFIG_REORDERING_ALG, &reorder_alg,
sizeof(cudssAlgType_t));
After (0.8.0):
cudssReorderingAlg_t reorder_alg = CUDSS_REORDERING_ALG_DEFAULT;
cudssConfigSet(config, CUDSS_CONFIG_REORDERING_ALG, &reorder_alg,
sizeof(cudssReorderingAlg_t));
Factorization#
cudssFactorizationAlg_t (set with CUDSS_CONFIG_FACTORIZATION_ALG):
Old |
New (0.8.0) |
|---|---|
|
|
|
Before:
cudssAlgType_t factor_alg = CUDSS_ALG_DEFAULT;
cudssConfigSet(config, CUDSS_CONFIG_FACTORIZATION_ALG, &factor_alg,
sizeof(cudssAlgType_t));
After (0.8.0):
cudssFactorizationAlg_t factor_alg = CUDSS_FACTORIZATION_ALG_DEFAULT;
cudssConfigSet(config, CUDSS_CONFIG_FACTORIZATION_ALG, &factor_alg,
sizeof(cudssFactorizationAlg_t));
Solve#
cudssSolveAlg_t (set with CUDSS_CONFIG_SOLVE_ALG):
Old |
New (0.8.0) |
|---|---|
|
Before:
cudssAlgType_t solve_alg = CUDSS_ALG_DEFAULT;
cudssConfigSet(config, CUDSS_CONFIG_SOLVE_ALG, &solve_alg,
sizeof(cudssAlgType_t));
After (0.8.0):
cudssSolveAlg_t solve_alg = CUDSS_SOLVE_ALG_DEFAULT;
cudssConfigSet(config, CUDSS_CONFIG_SOLVE_ALG, &solve_alg,
sizeof(cudssSolveAlg_t));
Pivot epsilon#
cudssPivotEpsilonAlg_t (set with CUDSS_CONFIG_PIVOT_EPSILON_ALG):
Old |
New (0.8.0) |
|---|---|
|
|
|
Before:
cudssAlgType_t pivot_eps_alg = CUDSS_ALG_DEFAULT;
cudssConfigSet(config, CUDSS_CONFIG_PIVOT_EPSILON_ALG, &pivot_eps_alg,
sizeof(cudssAlgType_t));
After (0.8.0):
cudssPivotEpsilonAlg_t pivot_eps_alg = CUDSS_PIVOT_EPSILON_ALG_DEFAULT;
cudssConfigSet(config, CUDSS_CONFIG_PIVOT_EPSILON_ALG, &pivot_eps_alg,
sizeof(cudssPivotEpsilonAlg_t));
Matching#
cudssMatchingAlg_t (set with CUDSS_CONFIG_MATCHING_ALG)
CUDSS_CONFIG_USE_MATCHING has been removed. Matching is now controlled solely
through CUDSS_CONFIG_MATCHING_ALG, which takes the new
cudssMatchingAlg_t enum. Using CUDSS_MATCHING_ALG_NONE
disables matching (the default); any other value enables matching.
Old |
New (0.8.0) |
|---|---|
(not available) |
|
|
|
|
|
|
|
|
|
|
|
|
Before:
/* Enabling matching required setting two parameters */
int use_matching = 1;
cudssAlgType_t matching_alg = CUDSS_ALG_DEFAULT;
cudssConfigSet(config, CUDSS_CONFIG_USE_MATCHING, &use_matching,
sizeof(int));
cudssConfigSet(config, CUDSS_CONFIG_MATCHING_ALG, &matching_alg,
sizeof(cudssAlgType_t));
/* Disabling matching */
int use_matching = 0;
cudssConfigSet(config, CUDSS_CONFIG_USE_MATCHING, &use_matching,
sizeof(int));
After (0.8.0):
/* Enabling matching: choose an algorithm (any value except NONE) */
cudssMatchingAlg_t matching_alg = CUDSS_MATCHING_ALG_AUTO;
cudssConfigSet(config, CUDSS_CONFIG_MATCHING_ALG, &matching_alg,
sizeof(cudssMatchingAlg_t));
/* Disabling matching (this is the default, but can be set explicitly) */
cudssMatchingAlg_t matching_alg = CUDSS_MATCHING_ALG_NONE;
cudssConfigSet(config, CUDSS_CONFIG_MATCHING_ALG, &matching_alg,
sizeof(cudssMatchingAlg_t));
Replacing CUDSS_DATA_COMM#
CUDSS_DATA_COMM has been replaced by c:enumerator:CUDSS_DATA_COMM_HOST (CPU buffers)
and c:enumerator:CUDSS_DATA_COMM_DEVICE (CPU buffers). MGMN users must set
c:enumerator:CUDSS_DATA_COMM_HOST and c:enumerator:CUDSS_DATA_COMM_DEVICE
whenever GPU-side communication takes place.
Before:
/* Single communicator used for both device and host buffers */
cudssDataSet(handle, data, CUDSS_DATA_COMM, &comm, sizeof(comm));
After (0.8.0):
/* Set device communicator (required for GPU-side communication) */
cudssDataSet(handle, data, CUDSS_DATA_COMM_DEVICE,
&device_comm, sizeof(device_comm));
/* Set host communicator */
cudssDataSet(handle, data, CUDSS_DATA_COMM_HOST,
&host_comm, sizeof(host_comm));
These communicators can be the same or they can be independent. See this sample for more detail.
Updating custom communication layers#
Note
Applies only to users who implement a custom communication layer. The prebuilt communication layer libraries provided with cuDSS are unaffected.
Every function pointer in cudssDistributedInterface_t has been renamed and
split into Device (GPU buffer) and Host (CPU buffer) variants, matching the
new independent device/host communicators. cudaDataType_t is replaced
throughout the callback signatures by cudssDataType_t, and a new
cudssDistributedGetProperty function pointer is added.
The renames follow a consistent pattern: each original entry
(cudssCommRank, cudssCommSize, cudssSend, cudssRecv,
cudssBcast, cudssReduce, cudssAllreduce, cudssScatterv,
cudssCommSplit, cudssCommFree) becomes two — the same name with a
Device or Host suffix.
Custom communication layers are built as shared libraries that export a
cudssDistributedInterface symbol; applications load them via
cudssSetCommLayer(). Changes must be made in the communication layer source code,
while calls to cudssSetCommLayer() from the application remain unchanged.
Before:
/* Flat naming; cudaDataType_t in callback signatures */
extern "C" {
cudssDistributedInterface_t cudssDistributedInterface = {
.cudssCommRank = my_comm_rank,
.cudssCommSize = my_comm_size,
.cudssSend = my_send,
.cudssRecv = my_recv,
.cudssBcast = my_bcast,
.cudssReduce = my_reduce,
.cudssAllreduce = my_allreduce,
.cudssScatterv = my_scatterv,
.cudssCommSplit = my_comm_split,
.cudssCommFree = my_comm_free
};
}
After (0.8.0):
/* Device/Host variants; cudssDataType_t in callback signatures */
extern "C" {
cudssDistributedInterface_t cudssDistributedInterface = {
.cudssCommRankDevice = my_comm_rank_device,
.cudssCommSizeDevice = my_comm_size_device,
.cudssSendDevice = my_send_device,
.cudssRecvDevice = my_recv_device,
.cudssBcastDevice = my_bcast_device,
.cudssReduceDevice = my_reduce_device,
.cudssAllreduceDevice = my_allreduce_device,
.cudssScattervDevice = my_scatterv_device,
.cudssCommSplitDevice = my_comm_split_device,
.cudssCommFreeDevice = my_comm_free_device,
.cudssCommRankHost = my_comm_rank_host,
.cudssCommSizeHost = my_comm_size_host,
.cudssSendHost = my_send_host,
.cudssRecvHost = my_recv_host,
.cudssBcastHost = my_bcast_host,
.cudssReduceHost = my_reduce_host,
.cudssAllreduceHost = my_allreduce_host,
.cudssScattervHost = my_scatterv_host,
.cudssCommSplitHost = my_comm_split_host,
.cudssCommFreeHost = my_comm_free_host,
/* New property query callback */
.cudssDistributedGetProperty = my_get_property
};
}
For a full code showing how to validate the communication layer, see this sample.
Adding GetProperty functions to custom communication/threading layers#
Note
Applies only to users who implement a custom communication layer and/or a custom threading layer. The prebuilt communication and threading libraries provided with cuDSS are unaffected.
Both cudssDistributedInterface_t and cudssThreadingInterface_t
have gained a new function pointer:
/* For cudssDistributedInterface_t: */
int (*cudssDistributedGetProperty)(libraryPropertyType propertyType, int *value);
/* For cudssThreadingInterface_t: */
int (*cudssThreadingGetProperty)(libraryPropertyType propertyType, int *value);
cuDSS calls these functions at runtime to query version information for the
relevant layer to ensure that the layers are compatible with the current version
of cuDSS. The get property function must accept a requested property type
(MAJOR_VERSION, MINOR_VERSION, PATCH_LEVEL) and populate the value with the
value of the requested property.
See cudssDistributedGetProperty and
cudssThreadingGetProperty for more details.
The versions of the communication layer and the threading layer are required to
be greater than or equal to the current version of cuDSS.
As with the communication layer (described above), threading layers are loaded from a
shared library via cudssSetThreadingLayer() and calls to that function by the
application remain unchanged; migration work is in the library source.
Before:
cudssThreadingInterface_t cudssThreadingInterface = {
.cudssGetMaxThreads = my_get_max_threads,
.cudssParallelFor = my_parallel_for,
/* no cudssThreadingGetProperty field */
};
After (0.8.0):
cudssThreadingInterface_t cudssThreadingInterface = {
.cudssGetMaxThreads = my_get_max_threads,
.cudssParallelFor = my_parallel_for,
/* New: provide a property query callback */
.cudssThreadingGetProperty = my_threading_get_property,
};
For a full code showing how to validate the threading layer, see this sample.
Reacting to const additions in APIs#
Input parameters across the public API have been marked const to reflect
ownership semantics-—-caller-provided handles and buffers are not modified by
the library. The changes fall into three patterns:
Handles passed to query/read functions (
cudssConfigGet,cudssDataGet,cudssDataCreate,cudssGetDeviceMemHandler,cudssMatrixGet*,cudssMatrixGetFormat,cudssMatrixGetDistributionRow1d) — handle/config/data/matrix arguments are nowconst-qualified.Value and data pointers passed in to setters and creators (
cudssConfigSet,cudssDataSet, allcudssMatrixCreate*andcudssMatrixSet*variants,cudssCreateMg) — input pointers are nowconst void *(orconst void *const *for batched arrays).cudssExecute —
solverConfig,inputMatrix, andrhsareconst.
These changes are source-compatible in C. C++ call sites that pass
void ** to batched matrix create/set functions may need an explicit cast
or a const-qualified local, since void ** does not implicitly convert
to const void *const *.
Responding to renamed parameters#
Some parameters in cuDSS have been renamed to help disambiguate their purpose. For these renamed parameters, a simple find and replace is all that is required.
Old |
New (0.8.0) |
|---|---|
|
|
|
|
|
|
|
|
|
Before:
int hybrid_mode = 1;
cudssConfigSet(config, CUDSS_CONFIG_HYBRID_MODE, &hybrid_mode, sizeof(int));
/* ... */
/* Retrieving the partition tree after analysis */
void *tree_data = NULL;
size_t tree_size = 0;
cudssDataGet(handle, data, CUDSS_DATA_ELIMINATION_TREE,
&tree_data, sizeof(void*), &tree_size);
/* Supplying the partition tree to a subsequent run */
cudssDataSet(handle, data, CUDSS_DATA_USER_ELIMINATION_TREE,
tree_data, tree_size);
After (0.8.0):
int hybrid_mode = 1;
cudssConfigSet(config, CUDSS_CONFIG_HYBRID_MEMORY_MODE, &hybrid_mode, sizeof(int));
/* ... */
/* Retrieving the partition tree after analysis */
void *tree_data = NULL;
size_t tree_size = 0;
cudssDataGet(handle, data, CUDSS_DATA_ND_PARTITION_TREE,
&tree_data, sizeof(void*), &tree_size);
/* Supplying the partition tree to a subsequent run */
cudssDataSet(handle, data, CUDSS_DATA_USER_ND_PARTITION_TREE,
tree_data, tree_size);