cuDSS Data Types#

Opaque Data Structures#

`cudssHandle_t`#

The structure holds the cuDSS library context (device properties, system information, execution controls like cudaStream_t, etc.).

The handle must be initialized prior to calling any other cuDSS API with cudssCreate(). The handle must be destroyed to free up resources after using cuDSS with cudssDestroy().

`cudssMatrix_t`#

The structure is a lightweight wrapper around standard dense/sparse matrix parameters and does not own any data arrays. Matrix objects are used to pass matrix of the linear system, as well as solution and right-hand side (even if these are in fact vectors). Currently, cuDSS matrix objects can have either one of the two underlying matrix formats: dense and 3-array CSR (sparse). Additionally, they can represent non-uniform batches of matrices or distributed (in the MGMN mode).

Matrix objects should be created via cudssMatrixCreateDn() (for dense matrices) or cudssMatrixCreateBatchDn() (for a batch of dense matrices) or cudssMatrixCreateCsr() (for sparse matrices in CSR format) or cudssMatrixCreateBatchCsr() (for a batch of sparse matrices in CSR format). After use, matrix objects should be destroyed via cudssMatrixDestroy(). For distributed matrices, one can additionally call cudssMatrixSetDistributionRow1d() to define how the matrix is distributed.

Matrix objects can be modified after creation via cudssMatrixSetValues() and cudssMatrixSetCsrPointers() (and similar APIs for batches).

Information can be retrieved from a matrix object by calling cudssMatrixGetFormat() followed by either cudssMatrixGetDn() or cudssMatrixGetCsr() depending on the format returned.

`cudssData_t`#

The structure holds internal data (e.g., factors related data structures) as well as pointers to user-provided data. A single object of this type should be associated with solving a specific linear system. If multiple systems with the same datatype(!) are solved consecutively the object can be re-used (all necessary internal buffers will be re-created per necessity).

Note: by default, the library allocates device memory required for performing LU factorization and storing the LU factors internally. All data buffers are of this kind are kept inside the data object. To change this default behavior, one can set a cudssDeviceMemHandler_t which will then be used for allocating device memory inside the solver.

The object should be created via cudssDataCreate() and destroyed via cudssDataDestroy().

During execution of any of the stages cudssExecute(), configuration settings of the solver are read from cudssConfig_t and thus affect the execution and internal data stored in the cudssData_t object.

Data parameters can be updated or retrieved by calling cudssDataSet() or cudssDataGet() respectively.

`cudssConfig_t`#

The structure stores configuration settings for the solver. This object is a lightweight (host-side) wrapper around common solver settings. While it can be re-used for solving different linear systems, it is recommended to have one per linear system.

The object should be created via cudssConfigCreate() and destroyed via cudssConfigDestroy().

During execution of any of the stages cudssExecute(), configuration settings of the solver are read from cudssConfig_t and thus affect the execution.

Configuration settings can be updated or retrieved by calling cudssConfigSet() or cudssConfigGet() respectively. Note: certain settings need to be set before a corresponding solver stage is executed (e.g., reordering algorithm must be set prior to the phase CUDSS_PHASE_ANALYSIS).

Non-opaque Data Structures#

`cudssDeviceMemHandler_t`#

This structure holds information about the user-provided, stream-ordered device memory pool (mempool).

The object can be created by setting the struct members described below.

Once created, a device memory handler can be set for the cuDSS library handle via cudssSetDeviceMemhandler().

Once set for the cuDSS library handle, information about the set device memory handler can be retrieved via cudssGetDeviceMemhandler().

Members:

void *ctx

A pointer to the user-owned mempool/context object.

int (*device_alloc)(void *ctx, void **ptr, size_t size, cudaStream_t stream)

A function pointer to the user-provided routine for allocating device memory of size on stream.

The allocated memory should be made accessible to the current device (or more precisely, to the current CUDA context bound to the library handle).

This interface supports any stream-ordered memory allocator ctx. Upon success, the allocated memory can be immediately used on the given stream by any operations enqueued/ordered on the same stream after this call.

It is the caller’s responsibility to ensure a proper stream order is established.

The allocated memory should be at least 256-byte aligned.

Parameters

In/Out

Description

ctx

In

A pointer to the user-owned mempool object.

ptr

Out

On success, a pointer to the allocated buffer.

size

In

The amount of memory in bytes to be allocated.

stream

In

The CUDA stream on which the memory is allocated (and the stream order is established)

Returns error status (as int) of the invocation. Must return 0 on success and any nonzero integer otherwise.

int (*device_free)(void *ctx, void *ptr, size_t size, cudaStream_t stream)

A function pointer to the user-provided routine for deallocating device memory of size on stream.

This interface supports any stream-ordered memory allocator. Upon success, any subsequent accesses (of the memory pointed to by the pointer ptr) ordered after this call are undefined behaviors.

It is the caller’s responsibility to ensure a proper stream order is established.

If the arguments ctx and size are not the same as those passed to device_alloc to allocate the memory pointed to by ptr, the behavior is undefined.

The argument stream need not be identical to the one used for allocating ptr, as long as the stream order is correctly established. The behavior is undefined if this assumption is not held.

Parameters

In/Out

Description

ctx

IN

A pointer to the user-owned mempool object.

ptr

IN

The pointer to the allocated buffer.

size

IN

The size of the allocated memory.

stream

IN

The CUDA stream on which the memory is deallocated (and the stream order is established).

Returns error status (as int) of the invocation. Must return 0 on success and any nonzero integer otherwise.

char name[CUDSS_ALLOCATOR_NAME_LEN]

The name of the provided mempool (must not exceed 64 characters).

Parameters	In/Out	Description
`ctx`	In	A pointer to the user-owned mempool object.
`ptr`	Out	On success, a pointer to the allocated buffer.
`size`	In	The amount of memory in bytes to be allocated.
`stream`	In	The CUDA stream on which the memory is allocated (and the stream order is established)

Parameters	In/Out	Description
`ctx`	IN	A pointer to the user-owned mempool object.
`ptr`	IN	The pointer to the allocated buffer.
`size`	IN	The size of the allocated memory.
`stream`	IN	The CUDA stream on which the memory is deallocated (and the stream order is established).

Enumerators#

`cudssStatus_t`#

The enumerator specifies possible status values (on the host) which can be returned from calls to cuDSS routines.

Note: device side failures are returned via CUDSS_DATA_INFO from cudssDataParam_t.

Value	Description
`CUDSS_STATUS_SUCCESS`	The operation completed successfully.
`CUDSS_STATUS_NOT_INITIALIZED`	One of the input operands was not properly initialized prior to the call to a cuDSS routine. This can usually be one of the opaque objects like cudssHandle_t, cudssData_t or others.
`CUDSS_STATUS_ALLOC_FAILED`	Resource allocation failed inside the cuDSS library. This is usually caused by a device memory allocation (cudaMalloc()) or by a host memory allocation failure
`CUDSS_STATUS_INVALID_VALUE`	An incorrect value or parameter was passed to the function (a negative vector size, or a a NULL pointer for a must-have buffer,for example)
`CUDSS_STATUS_NOT_SUPPORTED`	An unsupported (but otherwise reasonable parameter was passed to the function
`CUDSS_STATUS_EXECUTION_FAILED`	The GPU program failed to execute. This is often caused by a launch failure of the kernel on the GPU, which can be caused by multiple reasons
`CUDSS_STATUS_INTERNAL_ERROR`	An internal cuDSS operation failed

`cudssConfigParam_t`#

The enumerator specifies possible names of solver configuration settings. For each setting there is a matching type to be used in cudssConfigSet() or cudssConfigGet().

Value	Description
`CUDSS_CONFIG_REORDERING_ALG`	Algorithm for the reordering phase Supported options are: `CUDSS_ALG_DEFAULT` - a customized nested dissection algorithm based on METIS. `CUDSS_ALG_1` - a custom combination of block triangular reordering and COLAMD algorithms which can be used together with global pivoting to increase solution accuracy for non-symmetric matrices where `CUDSS_ALG_DEFAULT` produces too many perturbed pivots. When this option is used for reordering, cuDSS uses an appropriate custom factorization algorithm (without the need to change the factorization setting `CUDSS_CONFIG_FACTORIZATION_ALG`). `CUDSS_ALG_2` - similar to `CUDSS_ALG_1` that it implies using a special factorization algorithm which is tailored for a block-triangular representation but, unlike `CUDSS_ALG_1`, this option uses a trivial block structure. `CUDSS_ALG_3` - an approximate minimum degree (AMD) reordering. Note: `CUDSS_ALG_1` and `CUDSS_ALG_2` are only supported for general (non-symmetric or non-hermitian) matrices. Note: `CUDSS_ALG_1` uses an upper bound on the number of non-zero entries in the factors factors. If this bound appears to be not sufficient during the factorization phase, a runtime device error is returned (which can be checked by synchronizing the stream and calling `cudssDataGet()` with `CUDSS_DATA_INFO` which will output the device error). In order to set a non-default upper bound, one should call `cudssConfigSet()` with `CUDSS_CONFIG_MAX_LU_NNZ` setting. Associated parameter type: cudssAlgType_t Default value: `CUDSS_ALG_DEFAULT`
`CUDSS_CONFIG_FACTORIZATION_ALG`	Algorithm for the factorization phase Supported options are: `CUDSS_ALG_DEFAULT` - the default factorization algorithm. this option chooses the best fitting factorization algorithm based on the choice of the reordering algorithm and sparsity structure produced by it. It is not recommended to use other options. `CUDSS_ALG_1` - a modification of the default algorithm. Associated parameter type: cudssAlgType_t Default value: `CUDSS_ALG_DEFAULT`
`CUDSS_CONFIG_SOLVE_ALG`	Algorithm for the solving phase Supported options are: `CUDSS_ALG_DEFAULT` - the default solve algorithm. Other options are not supported Associated parameter type: cudssAlgType_t Default value: `CUDSS_ALG_DEFAULT`
`CUDSS_CONFIG_PIVOT_EPSILON_ALG`	Algorithm for the pivot epsilon calculation Supported options are: `CUDSS_ALG_DEFAULT` - pivots with magnitude smaller than epsilon will be replaced by the appropriately signed epsilon. `CUDSS_ALG_1` - pivots with magnitude smaller than epsilon will be replaced by the appropriately signed and scaled epsilon. The scale will be computed as the maximum element in the corresponding row and column of the preprocessed matrix. Note: `CUDSS_ALG_1` for `CUDSS_CONFIG_PIVOT_EPSILON_ALG` is not supported when `CUDSS_CONFIG_REORDERING_ALG` is set to `CUDSS_ALG_1` or `CUDSS_ALG_2` Associated parameter type: cudssAlgType_t Default value: `CUDSS_ALG_DEFAULT`
`CUDSS_CONFIG_USE_MATCHING`	Flag for enabling/disabling matching. Matching is as an optional preprocessing step which computes a (non-symmetric column) permutation to put larger values on the diagonal which often improves the accuracy of the solution. This permutation is then combined with the permutation from the reordering. While matching often reduces the number of perturbed pivots and improves accuracy of the solution (especially for ill-conditioned and badly scaled matrices) there are no guarantees that the accuracy will improve. Enabling matching brings an overhead during the analysis step and changes the non-zero pattern of the factors and therefore, can slow down factorization and solve. Note: matching routines require their internal workspace size to fit into the limit of 32-bit integers. The workspace limit depend on the choice of the matching algorithm. The biggest requirement among them is to have `10 * nrows + nnz` < `INT_MAX` (`nrows` and `nnz` for the input matrix without accounting for symmetry). Associated parameter type: `int` Default value: `0` (matching disabled)
`CUDSS_CONFIG_MATCHING_ALG`	Algorithm for matching Matching algorithm setting is only used if `CUDSS_CONFIG_USE_MATCHING` is not equal to `0`. Note: one of the options, `CUDSS_ALG_5`, computes scaling vectors in addition to the matching permutation. This option is considered to be the most impactful and it is recommended to use this option if accuracy of the solution needs to be improved. However, this option requires matrix values to be present during the analysis step. See more details about other choices in the description of `cudssAlgType_t`. Note: matching is not supported for `CUDSS_ALG_1` and `CUDSS_ALG_2` reordering algorithms (which use global pivoting to make the solution more accurate) or distributed matrices. Supported options are: `CUDSS_ALG_DEFAULT` - same as `CUDSS_ALG_5` (the most robust option). `CUDSS_ALG_1` - this option is based on `job = 1` from MC64 algorithm (HSL). Computes a column permutation of the matrix so that the permuted matrix has as many entries on its diagonal as possible. The values on the diagonal are of arbitrary size. Note: this option does not use matrix values and thus can actually make the accuracy worse. It is not recommended to use this algorithm for matching unless there is a justified need. `CUDSS_ALG_2` - this option is based on `job = 2` from MC64 algorithm (HSL). Computes a column permutation of the matrix so that the smallest value on the diagonal of the permuted matrix is maximized. `CUDSS_ALG_3` - this option is based on `job = 3` from MC64 algorithm (HSL). Computes a column permutation of the matrix so that the smallest value on the diagonal of the permuted matrix is maximized. This algorithm is different from the one used for `CUDSS_ALG_2`. `CUDSS_ALG_4` - this option is based on `job = 4` from MC64 algorithm (HSL). Computes a column permutation of the matrix so that the sum of the diagonal entries of the permuted matrix is maximized. `CUDSS_ALG_5` - this option is based on `job = 5` from MC64 algorithm (HSL). Associated parameter type: cudssAlgType_t Computes a column permutation of the matrix so that the product of the diagonal entries of the permuted matrix is maximized. In addition, this algorithm computes the row/col scaling vectors which can further improve accuracy of the solution. Default value: `CUDSS_ALG_DEFAULT` (matching with scaling, requires matrix values)
`CUDSS_CONFIG_SOLVE_MODE`	Potential modificator on the system matrix (e.g. transpose or conjugate transpose) Associated parameter type: `int` Default value: `0` (no modificator). Other values are not supported.
`CUDSS_CONFIG_IR_N_STEPS`	Number of steps during the iterative refinement Associated parameter type: `int` Default value: `0`
`CUDSS_CONFIG_IR_TOL`	Iterative refinement tolerance Associated parameter type: `double` Currently it is ignored (exactly `CUDSS_CONFIG_IR_N_STEPS` steps are made)
`CUDSS_CONFIG_PIVOT_TYPE`	Type of pivoting The exact meaning of this parameter depends on the choice of the reordering algorithm which can be changed through `CUDSS_CONFIG_REORDERING_ALG`. For `CUDSS_ALG_1` and `CUDSS_ALG_2` the parameter refers to the global pivoting, while for `CUDSS_ALG_DEFAULT` and others it refers to the partial 1x1 pivoting procedure. Note that in the latter case, if the matrix type is symmetric but not positive-definite, then the pivoting is also symmetric and the pivot is searched for on the diagonal of the block, if only the pivot type is not equal to `CUDSS_PIVOT_NONE`. Associated parameter type: cudssPivotType_t Default value: `CUDSS_PIVOT_COL`.
`CUDSS_CONFIG_PIVOT_THRESHOLD`	Pivoting threshold \(p_{threshold}\) which is used to determine if diagonal element is subject to pivoting and will be swapped with the maximum element in the row (or column) depending on the type of pivoting. The diagonal element will be swapped if: \(p_{threshold} \cdot max_{(sub)row \, or \, col} \|a_{ij}\| \geq \|a_{ii}\|\) Associated parameter type: `double` Default value: `1.0f`. Currently it is only supported for `CUDSS_ALG_1` and `CUDSS_ALG_2` reordering algorithms.
`CUDSS_CONFIG_PIVOT_EPSILON`	Pivoting epsilon. By default, this is the absolute value to test and replace small diagonal elements encountered during numerical factorization. In case `CUDSS_CONFIG_EPSILON_ALG` is set to `CUDSS_ALG_1`, this value will be additionally scaled (see more details in the description of `CUDSS_CONFIG_EPSILON_ALG`). Associated parameter type: `double` Default value: `1e-5` for single precision and `1e-13` for double precision.
`CUDSS_CONFIG_MAX_LU_NNZ`	Upper limit on the number of nonzero entries in LU factors. This is only relevant for non-symmetric matrices and reordering algorithm set to `CUDSS_ALG_1` or `CUDSS_ALG_2`. If the number of non-zero entries in L and U exceeds the set limit, a runtime error happen. See also the note for `CUDSS_ALG_1` in the table entry for `CUDSS_CONFIG_REORDERING_ALG`. Associated parameter type: `int64_t` Default value: `-1` (then the value is ignored).
`CUDSS_CONFIG_HYBRID_MODE`	Memory mode: `0` (default = device-only) or `1` (hybrid = host/device). Note: Hybrid memory mode should be enabled before the analysis phase (cudssExecute() with `CUDSS_PHASE_ANALYSIS`). If the decision to use the hybrid mode is done after the analysis phase, the hybrid memory mode should be enabled and analysis phase must be re-done (which is sub-optimal). Note: Unlike the hybrid execution mode (see CUDSS_CONFIG_HYBRID_EXECUTE_MODE) which controls where compute kernels are executed, the hybrid memory mode (‘CUDSS_CONFIG_HYBRID_MODE’) only allows cuDSS to keep part of the factor values (internal data) on the host and always uses GPU for factorization and solve. For more details regarding the hybrid memory mode, see Hybrid mode feature. Associated parameter type: `int` Default value: `0` (disabled). Currently not supported when `CUDSS_ALG_1` or `CUDSS_ALG_2` is used for reordering, or, when `CUDSS_ALG_1` is used for the factorization.
`CUDSS_CONFIG_HYBRID_DEVICE_MEMORY_LIMIT`	User-defined device memory limit (number of bytes) for the hybrid memory mode. This setting only affects execution when the hybrid memory mode is enabled. For more details regarding the hybrid memory mode, see Hybrid mode feature. Associated parameter type: `int64_t` Default value: `-1` (use the internal default heuristic).
`CUDSS_CONFIG_USE_CUDA_REGISTER_MEMORY`	A flag to enable or disable usage of cudaHostRegister() by cuDSS hybrid memory mode. Since the hybrid memory mode of cuDSS uses host memory to store the factors, it can use cudaHostRegister() (if the HW supports it) to speedup associated host-to-device and device-to-host memory transfers. However, registering host memory has limitations and in some cases might lead to slowdowns. If the flag is not set to `0`, cuDSS will use cudaHostRegister() whenever the HW supports it. If the flag is set to `0`, cuDSS will not attempt to use cudaHostRegister() even if the HW supports it. This setting only affects execution when the hybrid memory mode is enabled. For more details regarding the hybrid memory mode, see Hybrid mode feature. Associated parameter type: `int` Default value: `1` (use cudaHostRegister() if the device supports it)
`CUDSS_CONFIG_HOST_NTHREADS`	Number of threads to be used by cuDSS in MT mode. This setting only affects execution when the multi-threaded mode is enabled. Associated parameter type: `int` Default value: `-1` (use number of threads returned by `cudssGetMaxThreads()`)
`CUDSS_CONFIG_HYBRID_EXECUTE_MODE`	Execute mode: `0` (default = device-only) or `1` (hybrid = host/device). Hybrid execute mode allows cuDSS to perform calculations on both GPU and CPU. Currently it is used to speed up execution parts with low parallelization capacity. Note: Reordering part of the analysis step is performed on CPU regardless of execute mode value. Note: Hybrid execute mode should be enabled before the analysis phase (cudssExecute() with `CUDSS_PHASE_ANALYSIS`). Note: Unlike the hybrid execution mode (`CUDSS_CONFIG_HYBRID_EXECUTE_MODE`) which controls where compute kernels are executed (and allows greater flexibility in placement of the input data), the hybrid memory mode (`CUDSS_CONFIG_HYBRID_MODE`) only allows cuDSS to keep part of the factor values (internal data) on the host while only GPU is used for factorization and solve. Note: If hybrid execute mode is enabled input matrix, right hand side and solution can be host memory pointers. For more details regarding the hybrid execute mode, see Hybrid execute mode feature. Note: Host memory data is supported for `nrhs` = 1 only. Associated parameter type: `int` Default value: `0` (disabled). Currently not supported when `CUDSS_CONFIG_HYBRID_MODE` or `MGMN` mode is used, or, when `batchCount` is greater than 1.
`CUDSS_CONFIG_ND_NLEVELS`	Minimum number of levels for the nested dissection reordering. The value for this parameter should be a non-negative power of two (upsampled if just a a positive integer is provided). Additionally, for the MGMN mode the number of levels is increased to satisfy the requirement: \(2^{n_{levels}-1} \geq n_{proc}\) where \(n_{proc}\) is the number of processes in the communicator. This setting only works when reordering algorithm is `CUDSS_ALG_DEFAULT`. Note: This is considered an advanced performance knob and it is recommended to use a non-default value only when optimizing for performance. Typical values to try should not be too far (as powers of two) from the default value, e.g. from the range 128 - 1024. Associated parameter type: `int` Default value: `512`.
`CUDSS_CONFIG_UBATCH_SIZE`	The number of matrices in a uniform batch of systems to be processed by cuDSS. Uniform batch of matrices is defined as a batch of matrices with the same non-zero pattern but potentially different values. To create a uniform batch of matrices (unlike a non-uniform batch) one can simply use the usual cudssMatrixCreateCsr() or cudssMatrixCreateDn() with the only change: as a pointer to the matrix values one should provide a buffer which holds values for all matrices in the batch. Thus, it should have `nnz` * `CUDSS_CONFIG_UBATCH_SIZE` elements for sparse matrices (`csr_values` only) and `nrows` * `ncols` * `CUDSS_CONFIG_UBATCH_SIZE` elements for dense matrices. Note: a single system can be viewed as a uniform batch with `CUDSS_CONFIG_UBATCH_SIZE` set to 1. There are two ways how cuDSS can process a uniform batch based on the value of the `CUDSS_CONFIG_UBATCH_INDEX`: either factorizing (or solving) all matrices at once or just one at a time. For details, see the description of `CUDSS_CONFIG_UBATCH_INDEX`. Note: `CUDSS_CONFIG_UBATCH_SIZE` must be set before calling `CUDSS_PHASE_ANALYSIS`. Default value: `1`. Currently not supported when either `CUDSS_CONFIG_HYBRID_MODE` (see here) or `CUDSS_CONFIG_HYBRID_EXECUTE_MODE` (see here) are enabled, or MGMN mode is used, or, with `CUDSS_ALG_1` and `CUDSS_ALG_2` reordering algorithms.
`CUDSS_CONFIG_UBATCH_INDEX`	`-1` or a 0-based index of matrix in a uniform batch which will be processed during factorization or solve phase. Special value of `-1` can be used to tell cuDSS to process all matrices in the uniform batch at once. Note: `CUDSS_CONFIG_UBATCH_INDEX` must be less than `CUDSS_CONFIG_UBATCH_SIZE`. Note: if `CUDSS_CONFIG_UBATCH_INDEX` is set then cuDSS will treat the input sparse matrix, right hand side and solution as regular matrices (with just one set of values). Thus, it allows to process a uniform batch of matrices one by one while re-using the result of the analysis and keeping corresponding factors for all matrices in a single `cudssData_t` object. Note: `CUDSS_CONFIG_UBATCH_INDEX` can be set after `CUDSS_PHASE_FACTORIZATION`, in that case one could solve only one specific system Default value: `-1` Currently not supported when either `CUDSS_CONFIG_HYBRID_MODE` (see here) or `CUDSS_CONFIG_HYBRID_EXECUTE_MODE` (see here) are enabled, or MGMN mode is used, or, with `CUDSS_ALG_1` and `CUDSS_ALG_2` reordering algorithms.

`cudssDataParam_t`#

The enumerator specifies possible parameter names which can set or get in the cudssData_t object. For each parameter name there is an associated type to be used in cudssDataSet() or cudssDataGet(). Each parameter name is marked with “in”, “out” or “inout” depending on whether a parameter can be only set, get or be involved in both.

Value	Description
`CUDSS_DATA_INFO`	Device-side error code. One of the noticeable use cases is when a matrix of the system is passed with an `mtype` for positive-definite matrices but it appears to have non-positive minors (at least numerically). In this case, calling cudssDataGet() with `CUDSS_DATA_INFO` after the factorization phase will return the 1-based index of the first encountered non-positive minor. Then one can set the device error back to zero via cudssDataSet(), and either change the matrix type or adjust the matrix values (to make the matrix positive-definite) and call the factorization phase again. Note that the returned index is 1-based and is for the reordered matrix. To get the corresponding original index, it should be combined with (inverse) permutation which can be extracted via `cudssDataGet()` for `CUDSS_DATA_PERM_REORDER_ROW`. Direction: out Memory: host Associated parameter type: int
`CUDSS_DATA_LU_NNZ`	Number of non-zero entries in LU factors. Direction: out Memory: host Associated parameter type: `int64_t`
`CUDSS_DATA_NPIVOTS`	Number of pivots encountered during factorization. Direction: out Memory: host Associated parameter type: same as for the indices of the sparse matrix of the system
`CUDSS_DATA_INERTIA`	Positive and negative indices of inertia for the system matrix `A` (two integer values). Valid only for symmetric/Hermitian non positive-definite matrix types. Direction: out Memory: host Associated parameter type: same as for the indices of the sparse matrix of the system
`CUDSS_DATA_PERM_REORDER_ROW`	Row permutation `P` after reordering such that `A[P,Q]` is factorized. Direction: out Memory: host or device Associated parameter type: same as for the indices of the sparse matrix of the system
`CUDSS_DATA_PERM_REORDER_COL`	Column permutation `Q` after reordering such that `A[P,Q]` is factorized. Direction: out Memory: host or device Associated parameter type: same as for the indices of the sparse matrix of the system
`CUDSS_DATA_PERM_ROW`	Final row permutation P (includes effects of both reordering and pivoting) which is applied to the original right-hand side of the system in the form \(b_{new} = b_{old} \circ P\) Direction: out Memory: host or device Associated parameter type: same as for the indices of the sparse matrix of the system Currently supported only when `CUDSS_ALG_1` or `CUDSS_ALG_2` is used for reordering.
`CUDSS_DATA_PERM_COL`	Final column permutation Q (includes effects of both reordering and pivoting) which is applied to transform the solution of the permuted system into the original solution \(x_{old} = x_{new} \circ Q^{-1}\) Direction: out Memory: host or device Associated parameter type: same as for the indices of the sparse matrix of the system Currently supported only when `CUDSS_ALG_1` or `CUDSS_ALG_2` is used for reordering.
`CUDSS_DATA_PERM_MATCHING`	Matching (column) permutation `Q` such that `A[:,Q]` is reordered and then factorized. Direction: out Memory: host or device Associated parameter type: same as for the indices of the sparse matrix of the system
`CUDSS_DATA_DIAG`	Diagonal of the factorized matrix Direction: out Memory: host or device Associated parameter type: same as for the values of the sparse matrix of the system Currently supported only when `CUDSS_ALG_1` or `CUDSS_ALG_2` is used for reordering.
`CUDSS_DATA_SCALE_ROW`	Row scaling the factorized matrix (corresponding to the rows of the original matrix) Direction: out Memory: host or device Associated parameter type: floating point for the absolute values of the matrix of of the system (i.e., always real, either float or double) Only supported when matching is enabled and matching algorithm computes the scaling.
`CUDSS_DATA_SCALE_COL`	Column scaling the factorized matrix (corresponding to the columns of the original matrix). Direction: out Memory: host or device Associated parameter type: floating point for the absolute values of the matrix of of the system (i.e., always real, either float or double) Only supported when matching is enabled and matching algorithm computes the scaling.
`CUDSS_DATA_USER_PERM`	User permutation to be used instead of running the reordering algorithms. Direction: in Memory: host or device Associated parameter type: same as for the indices of the sparse matrix of the system Currently not supported when `CUDSS_ALG_1` or `CUDSS_ALG_2` is used for reordering.
`CUDSS_DATA_MEMORY_ESTIMATES`	Memory estimates (in bytes) for host and device memory required for the chosen memory mode. The chosen memory mode is defined as the memory mode detected during the last executed analysis phase for the current `cudssData_t` object and the state of the corresponding `cudssConfigParam_t` object during the call. See cudssConfigParam_t for more details. Note: the returned memory estimate depends not just on the settings from `cudssConfig_t` object but also on the input matrix and rhs (specifically, nrhs). Information about the input and right hand side matrices is retrieved during the analysis phase so if the objects (or the config) change after the analysis phase, memory estimates might be not accurate anymore. Values returned in the output array at position: 0 - permanent device memory 1 - peak device memory 2 - permanent host memory 3 - peak host memory 4 - (if in hybrid memory mode) minimum device memory for the hybrid memory mode 5 - (if in hybrid memory mode) maximum host memory for the hybrid memory mode 6, … 15 - reserved for future use This query must be done after the analysis phase and will return status `CUDSS_STATUS_NOT_SUPPORTED` if it cannot be processed. Direction: out Memory: host Associated parameter type: `int64_t[16]`
`CUDSS_DATA_HYBRID_DEVICE_MEMORY_MIN`	Minimal amount of device memory (number of bytes) required in the hybrid memory mode. This query must be done after the analysis phase and will return status `CUDSS_STATUS_NOT_SUPPORTED` if it cannot be processed. Direction: out Memory: host Associated parameter type: `int64_t` For more details regarding the hybrid memory mode, see Hybrid mode feature.
`CUDSS_DATA_COMM`	Communicator for MGMN mode. The actual type of the communicator must match the communication layer which must be set via calling cudssSetCommLayer() for the cuDSS library handle via cudssSetCommLayer(). Direction: in Memory: host Associated parameter type: `void*` For more details regarding the MGMN mode, see MGMN mode.

Note: In case of batchCount > 1 (see cudssMatrixCreateBatchDn() and cudssMatrixCreateBatchCsr()) for CUDSS_DATA_LU_NNZ, CUDSS_DATA_LU_NNZ, CUDSS_DATA_INERTIA, CUDSS_DATA_DIAG cudssDataGet() returns accumulated number over all matrices in the batch.

Note: In case of batchCount > 1 (see cudssMatrixCreateBatchDn() and cudssMatrixCreateBatchCsr()) CUDSS_DATA_USER_PERM, CUDSS_DATA_PERM_COL, CUDSS_DATA_PERM_ROW, CUDSS_DATA_PERM_REORDER_COL, CUDSS_DATA_PERM_REORDER_ROW are not supported

`cudssPhase_t`#

The enumerator specifies the solver phases to be performed in the main cuDSS routine cudssExecute(). Phases can be combined with the binary OR operator (|) to specify multiple at once: CUDSS_PHASE_FACTORIZATION | CUDSS_PHASE_SOLVE.

Value	Description
`CUDSS_PHASE_REORDERING`	Reordering
`CUDSS_PHASE_SYMBOLIC_FACTORIZATION`	Symbolic factorization
`CUDSS_PHASE_ANALYSIS`	Reordering and symbolic factorization combined (`CUDSS_PHASE_REORDERING \| CUDSS_PHASE_SYMBOLIC_FACTORIZATION`)
`CUDSS_PHASE_FACTORIZATION`	Numerical factorization
`CUDSS_PHASE_REFACTORIZATION`	Numerical re-factorization. Note: For now it is only used if the reordering algorithm is set to `CUDSS_ALG_1` or `CUDSS_ALG_2`. Otherwise, it is the same as FACTORIZATION phase.
`CUDSS_PHASE_SOLVE_FWD`	Forward substitution sub-step of the solving phase Currently not supported.
`CUDSS_PHASE_SOLVE_DIAG`	Diagonal solve sub-step of the solving phase Currently not supported.
`CUDSS_PHASE_SOLVE_BWD`	Backward substitution sub-step of the solving phase Currently not supported.
`CUDSS_PHASE_SOLVE`	Full solving phase, combining all of the above (forward substitution + diagonal solve + backward substitution) and (optional) iterative refinement (`CUDSS_PHASE_SOLVE_FWD \| CUDSS_PHASE_SOLVE_DIAG \| CUDSS_PHASE_SOLVE_BWD`) Note: Changing the sparse matrix for this phase is allowed but with restrictions, see limitations for cudssExecute().

`cudssMatrixFormat_t`#

The enumerator specifies the underlying matrix formats inside a cuDSS matrix object.

Value	Description
`CUDSS_MFORMAT_DENSE`	Matrix is dense (applied to a single matrix and a batch equally)
`CUDSS_MFORMAT_CSR`	Matrix is in CSR format (applies to a single matrix and a batch equally) Note: Only 3-array CSR is supported.
`CUDSS_MFORMAT_BATCH`	Matrix object represents a batch of matrices

Value

Description

CUDSS_MFORMAT_DENSE

Matrix is dense (applied to a single matrix and a batch equally)

CUDSS_MFORMAT_CSR

Matrix is in CSR format (applies to a single matrix and a batch equally)

Note: Only 3-array CSR is supported.

CUDSS_MFORMAT_BATCH

Matrix object represents a batch of matrices

Note: The format flags can be combined. E.g., creating a cudssMatrix_t via cudssMatrixCreateBatchCsr() would set both CUDSS_MFORMAT_CSR and CUDSS_MFORMAT_BATCH flags. One can check for a mixture of flags via bit-wise operations, e.g. CUDSS_MFORMAT_CSR | CUDSS_MFORMAT_BATCH for a batch of CSR matrices.

`cudssMatrixType_t`#

The enumerator specifies available matrix types for sparse matrices. Matrix type should be used to describe the properties of the underlying matrix storage. Matrix type affects the decision about what type of factorization is computed by the solver. E.g, when matrix type is one of the positive-definite types, checks for singular values on the diagonal is not done.

Value	Description
`CUDSS_MTYPE_GENERAL`	General matrix [default] `LDU` factorization will be computed with optional local or global pivoting
`CUDSS_MTYPE_SYMMETRIC`	Real symmetric matrix. `LDL^T` factorization will be computed with optional local pivoting
`CUDSS_MTYPE_HERMITIAN`	Complex Hermitian matrix. `LDL^H` factorization will be computed with optional local pivoting
`CUDSS_MTYPE_SPD`	Symmetric positive-definite matrix Cholesky factorization will be computed with optional local pivoting Note: if the matrix passed with this matrix type appears to have zero minors (at least numerically), one can get the 1-based index of the first encountered non-positive minor by calling cudssDataGet() with `CUDSS_DATA_INFO`. As this would be a device-side error, the call to `cudssExecute()` may still return `CUDSS_STATUS_SUCCESS`.
`CUDSS_MTYPE_HPD`	Hermitian positive-definite matrix Complex Cholesky factorization will be computed with optional local pivoting Note: if the matrix passed with this matrix type appears to have zero minors (at least numerically), one can get the 1-based index of the first non-positive minor by calling cudssDataGet() with `CUDSS_DATA_INFO`. As this would be a device-side error, the call to `cudssExecute()` may still return `CUDSS_STATUS_SUCCESS`.

`cudssMatrixViewType_t`#

The enumerator specifies available matrix view types for sparse matrices. Matrix view defines how the matrix is treated by the main cuDSS routine cudssExecute(). E.g., to provide only upper-triangle data for a symmetric matrix one can use as CUDSS_MTYPE_SYMMETRIC as matrix type combined with CUDSS_MVIEW_UPPER as the matrix view. If the accompanying matrix type is CUDSS_MTYPE_GENERAL, the matrix view is ignored.

Value	Description
`CUDSS_MVIEW_FULL`	Full matrix [default]
`CUDSS_MVIEW_LOWER`	Lower-triangular matrix (including the diagonal) All values above the main diagonal will be ignored.
`CUDSS_MVIEW_UPPER`	Upper-triangular matrix (including the diagonal) All values below the main diagonal will be ignored.

`cudssIndexBase_t`#

The enumerator specifies indexing base (0 or 1) for sparse matrix indices (row start/end offsets and column indices). Once set for a sparse matrix, cudssExecute() will use the indexing base from the input sparse matrix for all index-related data (e.g. output from cudssDataGet() called with CUDSS_DATA_PERM_REORDER).

Value	Description
`CUDSS_BASE_ZERO`	Zero-based indexing [default]
`CUDSS_BASE_ONE`	One-based indexing

`cudssLayout_t`#

The enumerator specifies dense matrix layout.

Value	Description
`CUDSS_LAYOUT_COL_MAJOR`	Column-major layout [default]
`CUDSS_LAYOUT_ROW_MAJOR`	Row-major layout. Currently not supported.

Value

Description

CUDSS_LAYOUT_COL_MAJOR

Column-major layout [default]

CUDSS_LAYOUT_ROW_MAJOR

Row-major layout.

Currently not supported.

`cudssAlgType_t`#

The enumerator specifies algorithm choices to be made for different solver settings like reordering,

factorization and other algorithms.

Value	Description
`CUDSS_ALG_DEFAULT`	Default value [default] Uses the default algorithm (decided by cuDSS).
`CUDSS_ALG_1`	First algorithm See the description of the specific configuration setting (algorithm for reordering, factorization, etc.)
`CUDSS_ALG_2`	Second algorithm See the description of the specific configuration setting (algorithm for reordering, factorization, etc.)
`CUDSS_ALG_3`	Third algorithm See the description of the specific configuration setting (algorithm for reordering, factorization, etc.)
`CUDSS_ALG_4`	Fourth algorithm See the description of the specific configuration setting (algorithm for reordering, factorization, etc.)
`CUDSS_ALG_5`	Fifth algorithm See the description of the specific configuration setting (algorithm for reordering, factorization, etc.)

Different values represent different algorithms (for reordering, factorization, etc.) and can lead to significant differences in accuracy and performance. It is currently recommended to use CUDSS_ALG_DEFAULT and only in case accuracy or performance are not sufficient, one can experiment with other values.

`cudssPivotType_t`#

The enumerator specifies type of pivoting to be performed.

Value	Description
`CUDSS_PIVOT_COL`	Column-based pivoting [default]
`CUDSS_PIVOT_ROW`	Row-based pivoting
`CUDSS_PIVOT_NONE`	No pivoting

Communication Layer (Distributed Interface) Types#

`cudssDistributedInterface_t`#

This struct defines all communication primitives which need to be implemented

in (any) implementation of cuDSS communication layer, see for more details

communication layer and MGMN mode.

Note: all communication layer API functions below take an argument of type void * for comm.

This parameter should be interpreted in the implementation based on the underlying communication

backend to be used with the particular communication layer. E.g., for OpenMPI, comm should be

treated as the OpenMPI communicator.

Note: most of the APIs below take an argument called stream of type

cudaStream_t and must be stream-ordered. For GPU-aware communication backends like OpenMPI,

this implies the need to do explicit cudaStreamSynchronize() in the communication layer implementation.

Members:

int (*cudssCommRank)(void *comm, int *rank)

A function pointer to a routine which returns the rank of the process in a communicator.

Parameters

In/Out

Description

comm

In

A pointer to the communicator.

rank

Out

Rank of the calling process in the communicator.

Returns error status (as int) of the invocation. Must return 0 on success and any nonzero integer otherwise.

int (*cudssCommSize)(void *comm, int *size)

A function pointer to a routine which returns number of processes in a communicator.

Parameters

In/Out

Description

comm

In

A pointer to the communicator.

size

Out

Number of processes in the communicator.

Returns error status (as int) of the invocation. Must return 0 on success and any nonzero integer otherwise.

int (*cudssSend)(const void *buffer, int count, cudaDataType_t datatype, int dest, int tag, void *comm, cudaStream_t stream)

A function pointer to a routine which performs a blocking send.

Parameters

In/Out

Description

buffer

In

Initial address of the send device buffer.

count

In

Number of elements (of type datatype) to be sent.

datatype

In

CUDA datatype of elements to be sent.

dest

In

Rank of the receiving process (destination).

tag

In

Message tag.

comm

In

A pointer to the communicator.

stream

In

CUDA stream.

Returns error status (as int) of the invocation. Must return 0 on success and any nonzero integer otherwise.

int (*cudssRecv)(void *buffer, int count, cudaDataType_t datatype, int source, int tag, void *comm, cudaStream_t stream)

A function pointer to a routine which performs a blocking receive for a message.

Parameters

In/Out

Description

buffer

Out

Initial address of the receive device buffer.

count

In

Number of elements (of type datatype) to be received.

datatype

In

CUDA datatype of elements to be received.

source

In

Rank of the sending process (source).

tag

In

Message tag.

comm

In

A pointer to the communicator.

stream

In

CUDA stream.

Returns error status (as int) of the invocation. Must return 0 on success and any nonzero integer otherwise.

int (*cudssBcast)(void *buffer, int count, cudaDataType_t datatype, int root, void *comm, cudaStream_t stream)

A function pointer to a routine which performs a broadcast for a message from the root process

to all other processes of the communicator.

Parameters

In/Out

Description

buffer

In/Out

Address of the device buffer to be broadcast.

count

In

Number of elements (of type datatype) to be received.

datatype

In

CUDA datatype of elements to be received.

root

In

Rank of the sending process (source).

comm

In

A pointer to the communicator.

stream

In

CUDA stream.

Returns error status (as int) of the invocation. Must return 0 on success and any nonzero integer otherwise.

int (*cudssReduce)(const void *sendbuf, void *recvbuf, int count, cudaDataType_t datatype, cudssOpType_t op, int root, void *comm, cudaStream_t stream)

A function pointer to a routine which performs a reduction of values on all processes to a single value on

the root process.

Parameters

In/Out

Description

sendbuf

In

Address of the send buffer

recvbuf

Out

Address of the receive buffer

count

In

Number of elements (of type datatype) to be received.

datatype

In

CUDA datatype of elements to be received.

op

In

Type of the reduction operation to be performed, see cudssOpType_t for supported values.

root

In

Rank of the root process (source).

comm

In

A pointer to the communicator.

stream

In

CUDA stream.

Returns error status (as int) of the invocation. Must return 0 on success and any nonzero integer otherwise.

int (*cudssAllreduce)(const void *sendbuf, void *recvbuf, int count, cudaDataType_t datatype, cudssOpType_t op, void *comm, cudaStream_t stream)

A function pointer to a routine which performs a reduction of values on all processes to a single value and

distributes the result back to all processes.

Parameters

In/Out

Description

sendbuf

In

Address of the send buffer.

recvbuf

Out

Address of the receive buffer.

count

In

Number of elements (of type datatype) to be received.

datatype

In

CUDA datatype of elements to be received.

op

In

Type of the reduction operation to be performed, see cudssOpType_t for supported values.

comm

In

A pointer to the communicator.

stream

In

CUDA stream.

Returns error status (as int) of the invocation. Must return 0 on success and any nonzero integer otherwise.

int (*cudssScatterv)(const void *sendbuf, const int *sendcounts, const int *displs, cudaDataType_t sendtype, void *recvbuf, int recvcount, cudaDataType_t recvtype, int root, void *comm, cudaStream_t stream)

A function pointer to a routine which performs a scatter operation on a buffer in parts to all processes in a communicator.

Parameters

In/Out

Description

sendbuf

In

Address of the send buffer.

sendcounts

In

Non-negative integer array (of length communicator size) specifying the number of elements to send to each rank.

displs

In

An array of integers of length communicator size. Entry i specifies the displacement (relative to sendbuf) from which to take the outgoing data to process i.

sendtype

In

CUDA datatype of elements to be sent.

recvbuf

Out

Address of the receive buffer.

recvcount

In

Number of elements in receive buffer (non-negative integer).

recvtype

In

CUDA datatype of elements to be received.

root

In

Rank of the sending process (source).

comm

In

A pointer to the communicator.

stream

In

CUDA stream.

Returns error status (as int) of the invocation. Must return 0 on success and any nonzero integer otherwise.

int (*cudssCommSplit)(const void *comm, int color, int key, void *new_comm)

A function pointer to a routine which creates new communicators based on colors and keys.

Parameters

In/Out

Description

comm

In

A pointer to the communicator to be split.

color

In

Control of the subset assignment. Processes with the same color are grouped together.

key

In

Control of the rank assignment. Processes in the new communicator are ordered based on the keys.

new_comm

Out

A pointer to the new communicator defined w.r.t to colors.

Returns error status (as int) of the invocation. Must return 0 on success and any nonzero integer otherwise.

int (*cudssCommFree)(void *comm)

A function pointer to a routine which deallocates resources of a communicator.

Parameters

In/Out

Description

comm

In/Out

A pointer to the communicator to be freed.

Returns error status (as int) of the invocation. Must return 0 on success and any nonzero integer otherwise.

Parameters	In/Out	Description
`comm`	In	A pointer to the communicator.
`rank`	Out	Rank of the calling process in the communicator.

Parameters	In/Out	Description
`comm`	In	A pointer to the communicator.
`size`	Out	Number of processes in the communicator.

Parameters	In/Out	Description
`buffer`	In	Initial address of the send device buffer.
`count`	In	Number of elements (of type `datatype`) to be sent.
`datatype`	In	CUDA datatype of elements to be sent.
`dest`	In	Rank of the receiving process (destination).
`tag`	In	Message tag.
`comm`	In	A pointer to the communicator.
`stream`	In	CUDA stream.

Parameters	In/Out	Description
`buffer`	Out	Initial address of the receive device buffer.
`count`	In	Number of elements (of type `datatype`) to be received.
`datatype`	In	CUDA datatype of elements to be received.
`source`	In	Rank of the sending process (source).
`tag`	In	Message tag.
`comm`	In	A pointer to the communicator.
`stream`	In	CUDA stream.

Parameters	In/Out	Description
`buffer`	In/Out	Address of the device buffer to be broadcast.
`count`	In	Number of elements (of type `datatype`) to be received.
`datatype`	In	CUDA datatype of elements to be received.
`root`	In	Rank of the sending process (source).
`comm`	In	A pointer to the communicator.
`stream`	In	CUDA stream.

Parameters	In/Out	Description
`sendbuf`	In	Address of the send buffer
`recvbuf`	Out	Address of the receive buffer
`count`	In	Number of elements (of type `datatype`) to be received.
`datatype`	In	CUDA datatype of elements to be received.
`op`	In	Type of the reduction operation to be performed, see cudssOpType_t for supported values.
`root`	In	Rank of the root process (source).
`comm`	In	A pointer to the communicator.
`stream`	In	CUDA stream.

Parameters	In/Out	Description
`sendbuf`	In	Address of the send buffer.
`recvbuf`	Out	Address of the receive buffer.
`count`	In	Number of elements (of type `datatype`) to be received.
`datatype`	In	CUDA datatype of elements to be received.
`op`	In	Type of the reduction operation to be performed, see cudssOpType_t for supported values.
`comm`	In	A pointer to the communicator.
`stream`	In	CUDA stream.

Parameters	In/Out	Description
`sendbuf`	In	Address of the send buffer.
`sendcounts`	In	Non-negative integer array (of length communicator size) specifying the number of elements to send to each rank.
`displs`	In	An array of integers of length communicator size. Entry i specifies the displacement (relative to `sendbuf`) from which to take the outgoing data to process i.
`sendtype`	In	CUDA datatype of elements to be sent.
`recvbuf`	Out	Address of the receive buffer.
`recvcount`	In	Number of elements in receive buffer (non-negative integer).
`recvtype`	In	CUDA datatype of elements to be received.
`root`	In	Rank of the sending process (source).
`comm`	In	A pointer to the communicator.
`stream`	In	CUDA stream.

Parameters	In/Out	Description
`comm`	In	A pointer to the communicator to be split.
`color`	In	Control of the subset assignment. Processes with the same color are grouped together.
`key`	In	Control of the rank assignment. Processes in the new communicator are ordered based on the keys.
`new_comm`	Out	A pointer to the new communicator defined w.r.t to colors.

Parameters	In/Out	Description
`comm`	In/Out	A pointer to the communicator to be freed.

`cudssOpType_t`#

The enumerator specifies reduction operation to be used when calling

communication layer APIs cudssReduce() or cudssAllreduce().

Value	Description
`CUDSS_SUM`	Reduced elements are added together.
`CUDSS_MAX`	Maximum element is found among the reduced elements.
`CUDSS_MIN`	Minimum element is found among the reduced elements.

Threading Layer Types#

`cudssThreadingInterface_t`#

This struct defines all threading primitives which need to be implemented

in (any) implementation of cuDSS threading layer, see for more details

threading layer and MT mode.

Members:

int (*cudssGetMaxThreads)()

A function pointer to a routine which returns (as int) the maximum number of threads on the CPU that can be used by cuDSS for parallel execution.

void (*cudssParallelFor)(int nthr_requested, int ntasks, void *ctx, cudss_thr_func_t f)

A function pointer to a routine which opens a parallel for section with the requested number of threads and call cudss_thr_func_t (see cudss_threading_interface.h for details) f in the for loop with ntasks number of iterations.

Parameters

In/Out

Description

nthr_requested

In

Requested number of threads for the parallel section.

ntasks

In

Number of tasks in the parallel for loop.

ctx

In

A pointer to the input data for cudss_thr_func_t f

f

In

A pointer to the cudss_thr_func_t function to be called in the parallel for loop

The threading interface struct uses the following typedef declaration of a task function which defines parallel units of work, also called tasks: typedef void (*cudss_thr_func_t)(int task, void *ctx)

Parameters

In/Out

Description

task

In

The task number (or iteration count of the parallel loop).

ctx

In

A pointer to the input data.

Parameters	In/Out	Description
`nthr_requested`	In	Requested number of threads for the parallel section.
`ntasks`	In	Number of tasks in the parallel for loop.
`ctx`	In	A pointer to the input data for cudss_thr_func_t f
`f`	In	A pointer to the cudss_thr_func_t function to be called in the parallel for loop

Parameters	In/Out	Description
`task`	In	The task number (or iteration count of the parallel loop).
`ctx`	In	A pointer to the input data.

cuDSS Data Types#

Opaque Data Structures#

cudssHandle_t#

cudssMatrix_t#

cudssData_t#

cudssConfig_t#

Non-opaque Data Structures#

cudssDeviceMemHandler_t#

Enumerators#

cudssStatus_t#

cudssConfigParam_t#

cudssDataParam_t#

cudssPhase_t#

cudssMatrixFormat_t#

cudssMatrixType_t#

cudssMatrixViewType_t#

cudssIndexBase_t#

cudssLayout_t#

cudssAlgType_t#

cudssPivotType_t#

Communication Layer (Distributed Interface) Types#

cudssDistributedInterface_t#

cudssOpType_t#

Threading Layer Types#

cudssThreadingInterface_t#

`cudssHandle_t`#

`cudssMatrix_t`#

`cudssData_t`#

`cudssConfig_t`#

`cudssDeviceMemHandler_t`#

`cudssStatus_t`#

`cudssConfigParam_t`#

`cudssDataParam_t`#

`cudssPhase_t`#

`cudssMatrixFormat_t`#

`cudssMatrixType_t`#

`cudssMatrixViewType_t`#

`cudssIndexBase_t`#

`cudssLayout_t`#

`cudssAlgType_t`#

`cudssPivotType_t`#

`cudssDistributedInterface_t`#

`cudssOpType_t`#

`cudssThreadingInterface_t`#