cuDSS Functions#

Library Management Functions#

`cudssCreate`#

cudssStatus_t
cudssCreate(cudssHandle_t* handle)

The function initializes the cuDSS library handle (cudssHandle_t) which holds the cuDSS library context. It allocates light hardware resources on the host, and must be called prior to making any other cuDSS library calls. Calling any cuDSS function which uses cudssHandle_t without a previous call of cudssCreate() will return an error. The cuDSS library context is tied to the current CUDA device. To use the library on multiple devices, one cuDSS handle should be created for each device.

Parameter	Memory	In/Out	Description
`handle`	Host	OUT	cuDSS library handle

See cudssStatus_t for the description of the return status.

`cudssDestroy`#

cudssStatus_t
cudssDestroy(cudssHandle_t handle)

The function releases hardware resources used by the cuDSS library. This function is the last call with a particular handle to the cuDSS library. Calling any cuDSS function which uses cudssHandle_t after cudssDestroy() will return an error.

Parameter	Memory	In/Out	Description
`handle`	Host	IN	cuDSS library handle

See cudssStatus_t for the description of the return status.

`cudssGetProperty`#

cudssStatus_t
cudssGetProperty(libraryPropertyType propertyType,
                 int*                value)

The function returns the value of the requested property. Refer to libraryPropertyType for supported types.

Parameter	Memory	In/Out	Description
`propertyType`	Host	IN	Requested property
`value`	Host	OUT	Value of the requested property

libraryPropertyType (defined in library_types.h):

Value	Meaning
`MAJOR_VERSION`	Enumerator to query the major version
`MINOR_VERSION`	Enumerator to query the minor version
`PATCH_LEVEL`	Number to identify the patch level

See cudssStatus_t for the description of the return status.

`cudssSetStream`#

cudssStatus_t
cudssSetStream(cudssHandle_t handle,
               cudaStream_t  stream)

The function sets the stream to be used by the cuDSS library to execute its routines.

Parameter	Memory	In/Out	Description
`handle`	Host	IN	cuDSS library handle
`stream`	Host	IN	The stream used by the library

See cudssStatus_t for the description of the return status.

`cudssSetDeviceMemHandler`#

cudssStatus_t
cudssSetDeviceMemHandler(cudssHandle_t handle,
                         const cudssDeviceMemHandler_t *handler)

Set the current device memory handler inside the library handle.

If handler argument is set to NULL, the library handle will detach its existing memory handler from the library handle. In case device memory handler needs to be changed after it is set for the first time, previously set device memory handler needs to be detached.

If a cuDSS API which needs to allocate device memory (cudssExecute()) is called and there is no device memory handler attached to the library handle at the moment of the call, cuDSS will allocate device memory internally using a default memory handler.

When the device memory handler is set, during calls to the main cuDSS routine cudssExecute() the library will allocate the necessary device memory using the device_alloc() member of the device memory handler struct. The allocated memory will remain a part of the cudssData_t object used in the call until the allocated memory will be deallocated using the device_free() member of the struct when the corresponding cudssDataDestroy() is called. As it follows, erroneous behavior is likely to occur if the device memory handler is changed during the lifespan of the cudssData_t objects which have used the library handle. See cudssDeviceMemHandler_t for further details about device_alloc() and device_free().

The internal stream order is established using the user-provided stream set via cudssSetStream() .

Note: It is undefined behavior if the library handle is bound to a memory handler and subsequently to another handler (without detaching), or the library handle outlives the attached memory pool, or the memory pool is not stream-ordered.

Parameter	Memory	In/Out	Description
`handle`	Host	IN	cuDSS library handle
`handler`	Host	IN	The device memory handler that encapsulates the user’s mempool. The struct content is copied internally

See cudssStatus_t for the description of the return status.

`cudssGetDeviceMemHandler`#

cudssStatus_t
cudssGetDeviceMemHandler(cudssHandle_t handle,
                         cudssDeviceMemHandler_t *handler)

Get the current device memory handler.

Parameter	Memory	In/Out	Description
`handle`	Host	IN	cuDSS library handle
`handler`	Host	OUT	A (deep) copy of the device memory handler that encapsulates the user’s mempool (if it was set) previously via calling cudssSetDeviceMemHandler()

See cudssStatus_t for the description of the return status.

`cudssSetCommLayer`#

cudssStatus_t
cudssSetCommLayer(cudssHandle_t handle, const char* commLibFileName)

The function sets the communication layer to be used in MGMN mode
of cuDSS. The set communication layer will be used for all MGMN mode operations where the
modified library handle is involved.
For more details of when and how this function should be used, see
MGMN mode.

Parameter

Memory

In/Out

Description

handle

Host

cuDSS library handle

commLibFileName

Host

Full filename (including path) to the cuDSS communication layer library

If NULL, the communication layer library name is read from the environment variable CUDSS_COMM_LIB

See cudssStatus_t for the description of the return status.

`cudssSetThreadingLayer`#

cudssStatus_t
cudssSetThreadingLayer(cudssHandle_t handle, const char* thrLibFileName)

The function sets the threading layer to be used in MT mode of cuDSS. The set threading layer will be used for all MT mode operations where the modified library handle is involved. For more details of when and how this function should be used, see MT mode.

Parameter

Memory

In/Out

Description

handle

Host

cuDSS library handle

thrLibFileName

Host

Full filename (including path) to the cuDSS threading layer library

If NULL, the threading layer library name is read from the environment variable CUDSS_THREADING_LIB

See cudssStatus_t for the description of the return status.

Config and Data Object Functions#

`cudssConfigCreate`#

cudssStatus_t
cudssConfigCreate(cudssConfig_t* config)

The function initializes the cuDSS config object (cudssConfig_t) which holds the settings of the solver related
to solving a specific linear system. It allocates light resources on the host.
To release the allocated memory, cudssConfigDestroy() must be called.

Parameter	Memory	In/Out	Description
`config`	Host	OUT	cuDSS config object

See cudssStatus_t for the description of the return status.

`cudssConfigDestroy`#

cudssStatus_t
cudssConfigDestroy(cudssConfig_t config)

The function releases the host resources used by the cuDSS config object. Using the config object after this function call can lead to undefined behavior.

Parameter	Memory	In/Out	Description
`config`	Host	IN	cuDSS config object to be destroyed

See cudssStatus_t for the description of the return status.

`cudssConfigSet`#

cudssStatus_t
cudssConfigSet(cudssConfig_t      config,
               cudssConfigParam_t param,
               void*              value,
               size_t             sizeInBytes)

The function sets a parameter (cudssConfigParam_t) to the specified value passed by the pointer.

Parameter	Memory	In/Out	Description
`config`	Host	INOUT	cuDSS config object
`param`	Host	IN	Parameter to be set
`value`	Host	IN	A pointer to the value to be set
`sizeInBytes`	Host	IN	Number of bytes to be read from the pointer

See cudssStatus_t for the description of the return status.

`cudssConfigGet`#

cudssStatus_t
cudssConfigGet(cudssConfig_t      config,
               cudssConfigParam_t param,
               void*              value,
               size_t             sizeInBytes,
               size_t*            sizeWritten)

The function retrieves value of a parameter (cudssConfigParam_t) and saves it to the specified memory location.

Parameter	Memory	In/Out	Description
`config`	Host	IN	cuDSS config object
`param`	Host	IN	Parameter to be retrieved from the config
`value`	Host	OUT	A pointer to the output memory
`sizeInBytes`	Host	IN	Number of bytes to be written (for verification)
`sizeWritten`	Host	OUT	Valid only when the return value is CUDSS_STATUS_SUCCESS. If `sizeInBytes` is non-zero, then `sizeWritten` is the number of bytes actually written; if `sizeInBytes` is zero: `sizeWritten` is the number of bytes needed to write full contents

See cudssStatus_t for the description of the return status.

`cudssDataCreate`#

cudssStatus_t
cudssDataCreate(cudssHandle_t handle, cudssData_t* data)

The function initializes the cuDSS data object (cudssData_t) which holds the internal data (e.g., LU factors arrays) as well as pointers to user-provided data related to solving a specific linear system. To release the allocated memory, cudssDataDestroy() must be called.

Parameter	Memory	In/Out	Description
`handle`	Host	IN	cuDSS library handle
`data`	Host	OUT	cuDSS data object

See cudssStatus_t for the description of the return status.

`cudssDataDestroy`#

cudssStatus_t
cudssDataDestroy(cudssHandle_t handle, cudssData_t data)

The function releases the hardware resources used by the cuDSS data object. Using the data object after this function call can lead to undefined behavior.

Parameter	Memory	In/Out	Description
`handle`	Host	IN	cuDSS library handle
`data`	Host	IN	cuDSS data object to be destroyed

See cudssStatus_t for the description of the return status.

`cudssDataSet`#

cudssStatus_t
cudssDataSet(cudssHandle_t    handle
             cudssData_t      data,
             cudssDataParam_t param,
             void*            value,
             size_t           sizeInBytes)

The function sets a parameter (cudssDataParam_t) to the specified value passed by the pointer.

Parameter	Memory	In/Out	Description
`handle`	Host	IN	cuDSS library handle
`data`	Host	INOUT	cuDSS data object
`param`	Host	IN	Parameter to be set
`value`	Host	IN	A pointer to the value to be set
`sizeInBytes`	Host	IN	Number of bytes to be read from the pointer

`cudssDataGet`#

cudssStatus_t
cudssDataGet(cudssHandle_t    handle,
             cudssData_t      data,
             cudssDataParam_t param,
             void*            value,
             size_t           sizeInBytes,
             size_t*          sizeWritten)

The function retrieves value of a parameter (cudssDataParam_t) and saves it to the specified memory location.

The output memory buffer can be either on device or on host, a memory copy will be done if necessary.

Parameter	Memory	In/Out	Description
`handle`	Host	IN	cuDSS library handle
`data`	Host	IN	cuDSS data object
`param`	Host	IN	Parameter to be retrieved from the config
`value`	Host Device	OUT	A pointer to the output memory
`sizeInBytes`	Host	IN	Number of bytes to be written (for verification)
`sizeWritten`	Host	OUT	Valid only when the return value is CUDSS_STATUS_SUCCESS. If `sizeInBytes` is non-zero, then `sizeWritten` is the number of bytes actually written; if `sizeInBytes` is zero: `sizeWritten` is the number of bytes needed to write full contents

See cudssStatus_t for the description of the return status.

Main cuDSS Function#

`cudssExecute`#

cudssStatus_t
cudssExecute(cudssHandle_t  handle,
             int            phase,
             cudssConfig_t  config,
             cudssConfig_t  data,
             cudssMatrix_t  matrix,
             cudssMatrix_t  solution,
             cudssMatrix_t  rhs)

The function executes a phase of the solution process. Prior to calling cudssExecute(), all objects passed as parameters must already be created and properly initialized.

The simplest possible solution process consists of three main phases, analysis, factorization, and solve, following one another. During the analysis phase, reordering and symbolic factorization (preparing the internal data structures) are done. During the factorization phase, numerical factorization is performed and during the solve phase, the factorization is used to find the solution to the linear system.

The phases must always happen in the following order: CUDSS_PHASE_REORDERING -> CUDSS_PHASE_SYMBOLIC_FACTORIZATION -> CUDSS_PHASE_FACTORIZATION -> (optional) CUDSS_PHASE_REFACTORIZATION -> CUDSS_PHASE_SOLVE. The optional refactorization is usually skipped before the first solve. Re-using the analysis results is supported. Users can change matrix values and only need to run the (re-)factorization and solve phases.

Note: Combining phases is supported as long as the order is followed. As an example, combining CUDSS_PHASE_REORDERING | CUDSS_PHASE_SOLVE will result in an error, while CUDSS_PHASE_ANALYSIS | CUDSS_PHASE_FACTORIZATION is allowed.

During the execution, the solver configuration properties are read from the config of type cudssConfig_t. The internal data structures necessary to keep all data required for solving the system (incl. the factors) are kept as a part of data object of type cudssData_t. Users can change the configuration settings or provide additional data parameters (e.g. a user permutation) or query extra information (like memory estimates or number of pivots) before/after the phases of the solution process via cudssConfigSet(), or cudssDataGet(), respectively.

The data buffers in the matrix objects for the input matrix, solution and right-hand side matrices must hold device-visible data, unless the hybrid host/device execution mode is enabled.

Note: The function has the following limitations on the cudssMatrix_t objects which can be used as call arguments, in addition to the limitations of the corresponding matrix creation routines (e.g., cudssMatrixCreateCsr()):

The input sparse matrix (in the batch case, each matrix in the batch) must have data consistent with its description (incl. offsets and indices, indexing base, matrix type)
The input sparse matrix (in the batch case, each matrix in the batch) may have unsorted column indices but must not have repeating entries.
The input sparse matrix in CSR format must have its row offsets array start with the indexing base (0 or 1).
The input sparse matrix may change for the SOLVE phase (except when hybrid execution mode is used) but for distributed matrices, the range of rows (row distribution) must remain the same.
The input sparse matrix, right hand side and solution must have the same datatypes for values and indices (if applicable).
For the batched inputs, non-uniform batches with varying shapes are supported. E.g., nrows, ncols, nnz can be different for each batch instance.
In MGMN mode all processes must have valid global nrows, ncols, nnz data for sparse matrices and valid global nrows, ncols and local ld for the dense matrices. In the batch case, all processes must additionally have a valid batchCount.
If cudssMatrixSetDistributionRow1d() is not used then in MGMN mode full matrix data (for the system’s matrix, solution and right-hand side) must be present on the root process, i.e. process with rank = 0 in the provided communicator. The other processes may have the data pointers set to NULL.

Otherwise the system’s matrix, solution and right-hand side must be distributed according to the corresponding call to cudssMatrixSetDistributionRow1d().

Note: using the cudssMatrixViewType_t parameter when creating the sparse input matrix (or a batch of those), one can pass only a triangular portion of the matrix without the need to explicitly change the underlying matrix data. E.g., if the mview is set to CUDSS_MVIEW_UPPER, then during the analysis phase, cudssExecute() will ignore all the indices in the lower part of the matrix, even if the underlying matrix storage represents a full matrix.

Parameter	Memory	In/Out	Description	Possible Values
`handle`	Host	IN	cuDSS library handle
`phase`	Host	IN	Execution phase(s)	Currently supported are `CUDSS_PHASE_REORDERING`, `CUDSS_PHASE_SYMBOLIC_FACTORIZATION`, `CUDSS_PHASE_ANALYSIS`, `CUDSS_PHASE_FACTORIZATION`, `CUDSS_PHASE_REFACTORIZATION`, and `CUDSS_PHASE_SOLVE`. Phase combinations in a single call are supported as long as no mandatory phase is skipped.
`config`	Host	IN	Solver config object
`data`	Host	INOUT	Solver data object
`matrix`	Host	IN	Input sparse matrix	Must be sparse
`solution`	Host	INOUT	Solution matrix	Must be dense
`rhs`	Host	IN	Right-hand side matrix	Must be dense

See cudssStatus_t for the description of the return status.

Matrix Object Functions#

`cudssMatrixCreateDn`#

cudssStatus_t
cudssMatrixCreateDn(cudssMatrix_t* matrix,
                    int64_t        nrows,
                    int64_t        ncols,
                    int64_t        ld,
                    void*          values,
                    cudaDataType_t valueType,
                    cudssLayout_t  layout)

The function creates a matrix object wrapped around dense matrix data. The provided data buffer for the matrix values must hold device-visible data.

Note: In MGMN mode all processes must have valid nrows, ncols and ld.

See more limitations for using cudssMatrix_t objects in the documentation for the main routine, cudssExecute().

Parameter	Memory	In/Out	Description	Possible Values
`matrix`	Host	OUT	Created matrix object
`nrows`	Host	IN	Number of rows	Must be non-negative
`ncols`	Host	IN	Number of columns	Must be non-negative
`ld`	Host	IN	Leading dimension	≥ nrows if column-major, ≥ ncols if row-major.
`values`	Device or `Host`	IN	Values of the dense matrix
`valueType`	Host	IN	Data type of the matrix	`CUDA_R_32F`, `CUDA_R_64F`, `CUDA_C_32F`, `CUDA_C_64F`
`layout`	Host	IN	Memory layout	`CUDSS_LAYOUT_COL_MAJOR`, `CUDSS_LAYOUT_ROW_MAJOR` The only supported value right now is `CUDSS_LAYOUT_COL_MAJOR`

See cudssStatus_t for the description of the return status.

`cudssMatrixCreateBatchDn`#

cudssStatus_t
cudssMatrixCreateBatchDn(cudssMatrix_t* matrix,
                         int64_t        batchCount,
                         void*          nrows,
                         void*          ncols,
                         void*          ld,
                         void**         values,
                         cudaDataType_t indexType,
                         cudaDataType_t valueType,
                         cudssLayout_t  layout)

The function creates a matrix object wrapped around a batch of dense matrices. The provided data buffer for the matrix values must contain device-visible pointers to device-visible data.

Note: cuDSS supports non-uniform batches with varying shapes, e.g nrows, ncols, ld can be different for each batch instance.

Note: MGMN mode does not support matrix batches.

See more limitations for using cudssMatrix_t objects in the documentation for the main routine, cudssExecute().

Parameter	Memory	In/Out	Description	Possible Values
`matrix`	Host	OUT	Created matrix object
`batchCount`	Host	IN	Size of the batch	Must be non-negative
`nrows`	Host	IN	Number of rows for each matrix in the batch	Must be non-negative
`ncols`	Host	IN	Number of columns for each matrix in the batch	Must be non-negative
`ld`	Host	IN	Leading dimension for each matrix in the batch	≥ nrows if column-major, ≥ ncols if row-major. The only supported values right now is exactly nrows (ncols)
`values`	Device	IN	Pointer to values of each dense matrix in the batch
`indexType`	Host	IN	Index type for scalar arrays (nrows, ncols, ld)	`CUDA_R_32I`
`valueType`	Host	IN	Data type of the matrix	`CUDA_R_32F`, `CUDA_R_64F`, `CUDA_C_32F`, `CUDA_C_64F`
`layout`	Host	IN	Memory layout	`CUDSS_LAYOUT_COL_MAJOR`, `CUDSS_LAYOUT_ROW_MAJOR` The only supported value right now is `CUDSS_LAYOUT_COL_MAJOR`

See cudssStatus_t for the description of the return status.

`cudssMatrixCreateCsr`#

cudssStatus_t
cudssMatrixCreateCsr(cudssMatrix_t*        matrix,
                     int64_t               nrows,
                     int64_t               ncols,
                     int64_t               nnz,
                     void*                 rowStart,
                     void*                 rowEnd,
                     void*                 colIndices,
                     void*                 values,
                     cudaDataType_t        indexType,
                     cudaDataType_t        valueType,
                     cudssMatrixType_t     mtype,
                     cudssMatrixViewType_t mview,
                     cudssIndexBase_t      indexBase)

The function creates a matrix object wrapped around sparse matrix data. The provided data buffers for rowStart, rowEnd, colIndices and values must hold device-visible data.

Note: creating a cudssMatrix_t with CSR format does not perform any data consistency checks and thus currently it is the caller’s responsibility to have data description parameters matching the data.

Note: In MGMN mode all processes must have valid nrows, ncols, nnz data (in case of distributed matrices, these should correspond to the global matrix)

See more limitations for using cudssMatrix_t objects in the documentation for the main routine, cudssExecute().

Parameter	Memory	In/Out	Description	Possible Values
`matrix`	Host	OUT	Created matrix object
`nrows`	Host	IN	Number of rows	Must be non-negative
`ncols`	Host	IN	Number of columns	Must be non-negative
`nnz`	Host	IN	Number of non-zeroes	Must be non-negative
`rowStart`	Device or `Host`	IN	Row start offsets
`rowEnd`	Device or `Host`	IN	Values of the dense matrix	NULL is the only supported value as 4-array CSR is not supported currently
`colIndices`	Device or `Host`	IN	Column indices of the matrix
`values`	Device or `Host`	IN	Values of the dense matrix
`indexType`	Host	IN	Index type of the matrix	`CUDA_R_32I`
`valueType`	Host	IN	Data type of the matrix	`CUDA_R_32F`, `CUDA_R_64F`, `CUDA_C_32F`, `CUDA_C_64F`
`mtype`	Host	IN	Matrix type of the matrix	See cudssMatrixType_t
`mview`	Host	IN	Matrix view of the matrix	See cudssMatrixViewType_t
`indexBase`	Host	IN	Indexing base	See cudssIndexBase_t

See cudssStatus_t for the description of the return status.

`cudssMatrixCreateBatchCsr`#

cudssStatus_t
cudssMatrixCreateBatchCsr(cudssMatrix_t*        matrix,
                          int64_t               batchCount,
                          void*                 nrows,
                          void*                 ncols,
                          void*                 nnz,
                          void**                rowStart,
                          void**                rowEnd,
                          void**                colIndices,
                          void**                values,
                          cudaDataType_t        indexType,
                          cudaDataType_t        valueType,
                          cudssMatrixType_t     mtype,
                          cudssMatrixViewType_t mview,
                          cudssIndexBase_t      indexBase)

The function creates a matrix object wrapped around a batch of sparse matrices (CSR format). The provided data buffer for rowStart, rowEnd, colIndices and values must contain device-visible pointers to device-visible data.

Note: cuDSS supports non-uniform batches with varying shapes, e.g nrows, ncols, nnz can be different for each batch instance.

Note: creating a cudssMatrix_t with a batch of matrices in CSR format does not perform any data consistency checks and thus currently it is the caller’s responsibility to have data description parameters matching the data.

See more limitations for using cudssMatrix_t objects in the documentation for the main routine, cudssExecute().

Parameter	Memory	In/Out	Description	Possible Values
`matrix`	Host	OUT	Created matrix object
`batchCount`	Host	IN	Size of the batch	Must be non-negative
`nrows`	Host	IN	Number of rows for each matrix in the batch	Must be non-negative
`ncols`	Host	IN	Number of columns for each matrix in the batch	Must be non-negative
`nnz`	Host	IN	Numbers of non-zeroes for each matrix in the batch	Must be non-negative
`rowStart`	Device	IN	Pointer to row start offsets for each matrix in the batch
`rowEnd`	Device	IN	Pointer to row end offsets for each matrix in the batch	NULL is the only supported value as 4-array CSR is not supported currently
`colIndices`	Device	IN	Pointer to column indices for each matrix in the batch
`values`	Device	IN	Pointer to values of each CSR matrix in the batch
`indexType`	Host	IN	Index type of the matrix	`CUDA_R_32I`
`valueType`	Host	IN	Data type of the matrix	`CUDA_R_32F`, `CUDA_R_64F`, `CUDA_C_32F`, `CUDA_C_64F`
`mtype`	Host	IN	Matrix type of the matrix	See cudssMatrixType_t
`mview`	Host	IN	Matrix view of the matrix	See cudssMatrixViewType_t
`indexBase`	Host	IN	Indexing base	See cudssIndexBase_t

See cudssStatus_t for the description of the return status.

`cudssMatrixDestroy`#

cudssStatus_t
cudssMatrixDestroy(cudssMatrix_t matrix)

The function releases memory associated with the matrix wrapper. As cuDSS matrix objects are only lightweight wrappers around the user data, the user data remains untouched.

Parameter	Memory	In/Out	Description
`matrix`	Host	IN	cuDSS matrix object

See cudssStatus_t for the description of the return status.

`cudssMatrixSetValues`#

cudssStatus_t
cudssMatrixSetValues(cudssMatrix_t matrix, void *values)

The function resets the pointer to values inside the cuDSS matrix object to the provided buffer. The provided data buffer must hold device-visible data.

Parameter	Memory	In/Out	Description
`matrix`	Host	IN	cuDSS matrix object
`values`	Device or Host	IN	Buffer with the new matrix values

See cudssStatus_t for the description of the return status.

`cudssMatrixSetBatchValues`#

cudssStatus_t
cudssMatrixSetBatchValues(cudssMatrix_t matrix, void **values)

The function resets the pointer to values inside the cuDSS matrix object to the provided buffer. The provided data buffer for the matrix values must contain device-visible pointers to device-visible data.

Parameter

Memory

In/Out

Description

matrix

Host

cuDSS matrix object

values

Device

or Host

Pointer to the new values for each dense matrix in the batch

See cudssStatus_t for the description of the return status.

`cudssMatrixSetCsrPointers`#

cudssStatus_t
cudssMatrixSetCsrPointers(cudssMatrix_t matrix,
                          void*         rowStart,
                          void*         rowEnd,
                          void*         colIndices,
                          void*         values)

The function resets the CSR pointers inside the cuDSS matrix object to the provided buffers. The provided data buffers must hold device-visible data.

Parameter	Memory	In/Out	Description
`matrix`	Host	IN	cuDSS matrix object
`rowStart`	Device or Host	IN	Buffer with the new row start offsets
`rowEnd`	Device or Host	IN	Buffer with the new row end offsets
`colIndices`	Device or Host	IN	Buffer with the new column indices
`values`	Device or Host	IN	Buffer with the new matrix values

See cudssStatus_t for the description of the return status.

`cudssMatrixSetBatchCsrPointers`#

cudssStatus_t
cudssMatrixSetBatchCsrPointers(cudssMatrix_t matrix,
                               void**        rowStart,
                               void**        rowEnd,
                               void**        colIndices,
                               void**        values)

The function resets the CSR pointers inside the cuDSS matrix object to the provided buffers. The provided data buffer for rowStart, rowEnd, colIndices and values must contain device-visible pointers to device-visible data.

Parameter	Memory	In/Out	Description
`matrix`	Host	IN	cuDSS matrix object
`rowStart`	Device or Host	IN	Pointer to the new row start offsets for each CSR matrix in the batch
`rowEnd`	Device or Host	IN	Pointer to the new row end offsets for each CSR matrix in the batch
`colIndices`	Device or Host	IN	Pointer to the new column indices for each CSR matrix in the batch
`values`	Device or Host	IN	Pointer to the new values for each CSR matrix in the batch

See cudssStatus_t for the description of the return status.

`cudssMatrixSetDistributionRow1d`#

cudssStatus_t
cudssMatrixSetDistributionRow1d(cudssMatrix_t matrix,
                                int64_t*     first_row,
                                int64_t*     last_row)

The function sets the 1D distribution for the matrix (CSR or Dense) for the MGMN mode. The provided first_row and last_row must be always 0-based and specify the first and the last (included) row indices of the local matrix on the calling process. Setting first_row > last_row means that the local matrix is empty for the calling process.

Note: input sparse matrix, right-hand-size and solution can have a different (from each other) distribution. For example, only the sparse matrix can be distributed, while right-hand-size or solution are not.

Note: if the sparse matrix or right-hand side are distributed with an overlap (between processes) then the overlapped part will be summed up, that is the overlapped part will have a contribution from all related processes. if the solution is distributed with overlapping (between processes) then the overlapped part will have the same values on corresponding processes.

Parameter	Memory	In/Out	Description
`matrix`	Host	IN	cuDSS matrix object
`first_row`	Host	IN	first row index of the local matrix on the calling process
`last_row`	Host	IN	last row index of the local matrix on the calling process

See cudssStatus_t for the description of the return status.

`cudssMatrixGetDn`#

cudssStatus_t
cudssMatrixGetDn(cudssMatrix_t   matrix,
                 int64_t*        nrows,
                 int64_t*        ncols,
                 int64_t*        ld,
                 void**          values,
                 cudaDataType_t* valueType,
                 cudssLayout_t*  layout)

The function retrieves dense matrix properties and data from a cuDSS matrix object which holds a dense matrix. If on input any of the pointers is NULL, it is ignored and the corresponding value is not returned.

Parameter	Memory	In/Out	Description	Possible Values
`matrix`	Host	IN	cuDSS matrix object
`nrows`	Host	OUT	Buffer for the number of rows	Ignored if NULL
`ncols`	Host	OUT	Buffer for the number of columns	Ignored if NULL
`ld`	Host	OUT	Buffer for the leading dimension	Ignored if NULL
`values`	Device or Host	OUT	Buffer for the values of the matrix	Ignored if NULL
`valueType`	Host	OUT	Buffer for the data type of the matrix	`CUDA_R_32F`, `CUDA_R_64F`, `CUDA_C_32F`, `CUDA_C_64F` Ignored if NULL
`layout`	Host	OUT	Buffer for the memory layout	Ignored if NULL

See cudssStatus_t for the description of the return status.

`cudssMatrixGetBatchDn`#

cudssStatus_t
cudssMatrixGetBatchDn(cudssMatrix_t   matrix,
                      int64_t*        batchCount,
                      void**          nrows,
                      void**          ncols,
                      void**          ld,
                      void***         values,
                      cudaDataType_t* indexType,
                      cudaDataType_t* valueType,
                      cudssLayout_t*  layout)

The function retrieves dense matrix properties and data from a cuDSS matrix object which holds a batch of dense matrices. If on input any of the pointers is NULL, it is ignored and the corresponding value is not returned.

Parameter	Memory	In/Out	Description	Possible Values
`matrix`	Host	IN	cuDSS matrix object
`batchCount`	Host	OUT	Buffer for the size of the batch
`nrows`	Host	OUT	Pointer to the number of rows for each dense matrix in the batch	Ignored if NULL
`ncols`	Host	OUT	Pointer to the number of columns for dense matrix in the batch	Ignored if NULL
`ld`	Host	OUT	Pointer to the leading dimension for each dense matrix in the batch	Ignored if NULL
`values`	Device	OUT	Pointer to the values of each dense matrix in the batch	Ignored if NULL
`indexType`	Host	OUT	Buffer for the index type of the matrix	`CUDA_R_32I` Ignored if NULL
`valueType`	Host	OUT	Buffer for the data type of the matrix	`CUDA_R_32F`, `CUDA_R_64F`, `CUDA_C_32F`, `CUDA_C_64F` Ignored if NULL
`layout`	Host	OUT	Buffer for the memory layout	Ignored if NULL

See cudssStatus_t for the description of the return status.

`cudssMatrixGetCsr`#

cudssStatus_t
cudssMatrixGetCsr(cudssMatrix_t         matrix,
                  int64_t*              nrows,
                  int64_t*              ncols,
                  int64_t*              nnz,
                  void**                rowStart,
                  void**                rowEnd,
                  void**                colIndices,
                  void**                values,
                  cudaDataType_t*       indexType,
                  cudaDataType_t*       valueType,
                  cudssMatrixType_t*     mtype,
                  cudssMatrixViewType_t* mview,
                  cudssIndexBase_t*     indexBase)

The function retrieves sparse matrix properties and data from a cuDSS matrix object which holds a CSR sparse matrix. If on input any of the pointers is NULL, it is ignored and the corresponding value is not returned.

Parameter	Memory	In/Out	Description	Possible Values
`matrix`	Host	IN	Matrix object
`nrows`	Host	OUT	Buffer for the number of rows	Ignored if NULL
`ncols`	Host	OUT	Buffer for the number of columns	Ignored if NULL
`nnz`	Host	OUT	Buffer for the number of non-zeroes	Ignored if NULL
`rowStart`	Device or Host	OUT	Buffer for the row start offsets	Ignored if NULL
`rowEnd`	Device or Host	OUT	Buffer for the row end offsets	Must be NULL as 4-array CSR is not supported
`colIndices`	Device or Host	OUT	Buffer for the column indices of the matrix	Ignored if NULL
`values`	Device or Host	OUT	Buffer for the values of the CSR matrix	Ignored if NULL
`indexType`	Host	OUT	Buffer for the index type of the matrix	`CUDA_R_32I` Ignored if NULL
`valueType`	Host	OUT	Buffer for the data type of the matrix	`CUDA_R_32F`, `CUDA_R_64F`, `CUDA_C_32F`, `CUDA_C_64F` Ignored if NULL
`mtype`	Host	OUT	Matrix type of the matrix	See cudssMatrixType_t
`mview`	Host	OUT	Matrix view of the matrix	See cudssMatrixViewType_t
`indexBase`	Host	OUT	Buffer for the indexing base	See cudssIndexBase_t Ignored if NULL

See cudssStatus_t for the description of the return status.

`cudssMatrixGetBatchCsr`#

cudssStatus_t
cudssMatrixGetBatchCsr(cudssMatrix_t*        matrix,
                       int64_t*              batchCount,
                       void**                nrows,
                       void**                ncols,
                       void**                nnz,
                       void***               rowStart,
                       void***               rowEnd,
                       void***               colIndices,
                       void***               values,
                       cudaDataType_t*       indexType,
                       cudaDataType_t*       valueType,
                       cudssMatrixType_t*     mtype,
                       cudssMatrixViewType_t* mview,
                       cudssIndexBase_t*     indexBase)

The function retrieves sparse matrix properties and data from a cuDSS matrix object which holds a batch of CSR sparse matrices. If on input any of the pointers is NULL, it is ignored and the corresponding value is not returned.

Parameter	Memory	In/Out	Description	Possible Values
`matrix`	Host	IN	Matrix object
`batchCount`	Host	OUT	Buffer for the size of the batch	Ignored if NULL
`nrows`	Host	OUT	Pointer to the number of rows for each CSR matrix in the batch	Ignored if NULL
`ncols`	Host	OUT	Pointer to the number of columns for each CSR matrix in the batch	Ignored if NULL
`nnz`	Host	OUT	Pointer to the number of non-zeroes for each CSR matrix in the batch	Ignored if NULL
`rowStart`	Device	OUT	Pointer to the row start offsets for each CSR matrix in the batch	Ignored if NULL
`rowEnd`	Device	OUT	Pointer to the row end offsets for each CSR matrix in the batch	Must be NULL as 4-array CSR is not supported
`colIndices`	Device	OUT	Pointer to the column indices for each CSR matrix in the batch	Ignored if NULL
`values`	Device	OUT	Pointer to the values for each CSR matrix in the batch	Ignored if NULL
`indexType`	Host	OUT	Buffer for the index type of the matrix	`CUDA_R_32I` Ignored if NULL
`valueType`	Host	OUT	Buffer for the data type of the matrix	`CUDA_R_32F`, `CUDA_R_64F`, `CUDA_C_32F`, `CUDA_C_64F` Ignored if NULL
`mtype`	Host	OUT	Matrix type of the matrix	See cudssMatrixType_t
`mview`	Host	OUT	Matrix view of the matrix	See cudssMatrixViewType_t
`indexBase`	Host	OUT	Buffer for the indexing base	See cudssIndexBase_t Ignored if NULL

See cudssStatus_t for the description of the return status.

`cudssMatrixGetDistributionRow1d`#

cudssStatus_t
cudssMatrixGetDistributionRow1d(cudssMatrix_t matrix,
                                int64_t*      first_row,
                                int64_t*      last_row)

The function retrieves 1D distribution boundaries (first and last row indices) from a cuDSS matrix object. For the non-MGMN mode the function returns first_row equals to 0 and last_row equal to nrows - 1.

Parameter	Memory	In/Out	Description
`matrix`	Host	IN	cuDSS matrix object
`first_row`	Host	IN	first row index of the local matrix on the calling process
`last_row`	Host	IN	last row index of the local matrix on the calling process

See cudssStatus_t for the description of the return status.

`cudssMatrixGetFormat`#

cudssStatus_t
cudssMatrixGetFormat(cudssMatrix_t matrix, int *format)

The function returns into the provided buffer the matrix format (as an int which can be a combination of bit flags defined in cudssMatrixFormat_t) of the cuDSS matrix object.

Parameter	Memory	In/Out	Description
`matrix`	Host	IN	cuDSS matrix object
`format`	Host	OUT	Buffer for the returned matrix format

See cudssStatus_t for the description of the return status.

cuDSS Functions#

Library Management Functions#

cudssCreate#

cudssDestroy#

cudssGetProperty#

cudssSetStream#

cudssSetDeviceMemHandler#

cudssGetDeviceMemHandler#

cudssSetCommLayer#

cudssSetThreadingLayer#

Config and Data Object Functions#

cudssConfigCreate#

cudssConfigDestroy#

cudssConfigSet#

cudssConfigGet#

cudssDataCreate#

cudssDataDestroy#

cudssDataSet#

cudssDataGet#

Main cuDSS Function#

cudssExecute#

Matrix Object Functions#

cudssMatrixCreateDn#

cudssMatrixCreateBatchDn#

cudssMatrixCreateCsr#

cudssMatrixCreateBatchCsr#

cudssMatrixDestroy#

cudssMatrixSetValues#

cudssMatrixSetBatchValues#

cudssMatrixSetCsrPointers#

cudssMatrixSetBatchCsrPointers#

cudssMatrixSetDistributionRow1d#

cudssMatrixGetDn#

cudssMatrixGetBatchDn#

cudssMatrixGetCsr#

cudssMatrixGetBatchCsr#

cudssMatrixGetDistributionRow1d#

cudssMatrixGetFormat#

`cudssCreate`#

`cudssDestroy`#

`cudssGetProperty`#

`cudssSetStream`#

`cudssSetDeviceMemHandler`#

`cudssGetDeviceMemHandler`#

`cudssSetCommLayer`#

`cudssSetThreadingLayer`#

`cudssConfigCreate`#

`cudssConfigDestroy`#

`cudssConfigSet`#

`cudssConfigGet`#

`cudssDataCreate`#

`cudssDataDestroy`#

`cudssDataSet`#

`cudssDataGet`#

`cudssExecute`#

`cudssMatrixCreateDn`#

`cudssMatrixCreateBatchDn`#

`cudssMatrixCreateCsr`#

`cudssMatrixCreateBatchCsr`#

`cudssMatrixDestroy`#

`cudssMatrixSetValues`#

`cudssMatrixSetBatchValues`#

`cudssMatrixSetCsrPointers`#

`cudssMatrixSetBatchCsrPointers`#

`cudssMatrixSetDistributionRow1d`#

`cudssMatrixGetDn`#

`cudssMatrixGetBatchDn`#

`cudssMatrixGetCsr`#

`cudssMatrixGetBatchCsr`#

`cudssMatrixGetDistributionRow1d`#

`cudssMatrixGetFormat`#