cusolverMp C API¶
Library Management¶
cusolverMpCreate
¶
cusolverStatus_t cusolverMpCreate(
cusolverMpHandle_t *handle,
int deviceId,
cudaStream_t stream)
The function initializes the cusolverMp library handle (cusolverMpHandle_t) which holds the cusolverMp library context. It allocates light hardware resources on the host, and must be called prior to making any other cusolverMp library calls.
Calling any cusolverMp function which uses cusolverMpHandle_t without a previous call of cusolverMpCreate() will return an error.
The cusolverMp library context is tied to the CUDA device provided by
deviceId
and the CUDA stream streamId
.Only one handle per process and per GPU supported. Sharing a device with multiple processes will result in undefined behavior.
Parameter | Memory | In/Out | Description |
---|---|---|---|
handle | Host | In | cusolverMp library handle |
deviceId | Host | In | Device that will be assigned to the handle. |
streamId | Host | In | Stream that will be assigned to the handle. |
See cusolverStatus_t for the description of the return status.
cusolverMpDestroy
¶
cusolverStatus_t cusolverMpDestroy(
cusolverMpHandle_t handle)
The function destroy the cusolverMp library handle (cusolverMpHandle_t) which holds the cusolverMp library context.
The cusolverMp library context is tied to the CUDA device provided by
deviceId
. Only one handle per process and per GPU supported.Parameter | Memory | In/Out | Description |
---|---|---|---|
handle | Host | In | cusolverMp library handle |
See cusolverStatus_t for the description of the return status.
cusolverMpGetStream
¶
cusolverStatus_t cusolverMpGetStream(
cusolverMpHandle_t handle,
cudaStream_t stream)
The function returns the
stream
associated to the handle.Parameter | Memory | In/Out | Description |
---|---|---|---|
handle | Host | In | cusolverMp library handle |
stream | Host | In | Stream associated with the handle. |
See cusolverStatus_t for the description of the return status.
cusolverMpGetVersion
¶
cusolverStatus_t cusolverMpGetVersion(
cusolverMpHandle_t handle,
int *version)
This function returns the version number of the cusolverMp library.
Parameter | Memory | In/Out | Description |
---|---|---|---|
handle | Host | In | cusolverMp library handle |
version | Host | In | cusolverMp library version. |
See cusolverStatus_t for the description of the return status.
Grid Management¶
cusolverMpCreateDeviceGrid
¶
cusolverStatus_t cusolverMpCreateDeviceGrid(
cusolverMpHandle_t handle,
cudaLibMpGrid_t *grid,
cal_comm_t comm,
int32_t numRowDevices,
int32_t numColDevices,
cudaLibMpGridMapping_t mapping)
The function initializes the grid opaque data structure. It maps the given resources (communicator, grid dimensions and grid layout) to a grid object.
All the processes defined to be in this grid must enter this function.
Parameter | Memory | In/Out | Description |
---|---|---|---|
handle | Host | In | cusolverMp library handle |
grid | Host | In/Out | Grid object to be initialized. |
comm | Host | In | Communicator that will be associated with the grid. |
numRowDevices | Host | In | How many of process rows the grid will contain. |
numColDevices | Host | In | How many of process columns the grid will contain. |
mapping | Host | In | How to map processes to the grid. See description of cudaLibMpGrid_t for further details. Currently, only CUDALIBMP_GRID_MAPPING_COL_MAJOR is supported |
See cusolverStatus_t for the description of the return status.
cusolverMpDestroyGrid
¶
cusolverStatus_t cusolverMpDestroyGrid(
cudaLibMpGrid_t grid)
The function destroys the given
grid
object.All the processes defined to be in this grid must enter this function.
Parameter | Memory | In/Out | Description |
---|---|---|---|
grid | Host | In/Out | Grid object to be destroyed. |
See cusolverStatus_t for the description of the return status.
Matrix Management¶
cusolverMpCreateMatrixDesc
¶
cusolverStatus_t cusolverMpCreateMatrixDesc(
cudaLibMpMatrixDesc_t *descr,
cudaLibMpGrid_t grid,
cudaDataType dataType,
int64_t M_A,
int64_t N_A,
int64_t MB_A,
int64_t NB_A,
uint32_t RSRC_A,
uint32_t CSRC_A,
int64_t LLD_A)
The function initializes cudaLibMpMatrixDesc_t object.
Parameter | Memory | In/Out | Description |
---|---|---|---|
descr | Host | In/Out | Matrix descriptor object initialized by this function. |
dataType | Host | In | Data type of the matrix A. |
M_A | Host | In | Number of rows in the global array A. |
N_A | Host | In | Number of columns in the global matrix A. |
MB_A | Host | In | Blocking factor used to distribute the rows of the global matrix A. |
NB_A | Host | In | Blocking factor used to distribute the columns of the global matrix A. |
RSRC_A | Host | In | Process row over which the first row of the matrix A is distributed. Only the value of 0 is currently supported. |
CSRC_A | Host | In | Process column over which the first row of the matrix A is distributed. Only the value of 0 is currently supported. |
LLD_A | Host | In | Leading dimension of the local matrix. |
Supported values for
dataType
argument:Data Type of A | computeType | Output Data Type |
---|---|---|
CUDA_R_64F | 64-bit integer values. | |
CUDA_R_32F | Single precision real values. | |
CUDA_R_64F | Double precision real values. | |
CUDA_R_32C | Single precision complex values. | |
CUDA_R_64C | Double precision complex values |
See cusolverStatus_t for the description of the return status.
cusolverMpDestroyMatrixDesc
¶
cusolverStatus_t cusolverMpDestroyMatrixDesc(
cudaLibMpMatrixDesc_t descr )
The function destroys cudaLibMpMatrixDesc_t object.
Parameter | Memory | In/Out | Description |
---|---|---|---|
descr | Host | In/Out | Matrix descriptor object destroyed by this function. |
See cusolverStatus_t for the description of the return status.
Utility¶
cusolverMpNUMROC
¶
int64_t cusolverMpNUMROC(
int64_t n,
int64_t nb,
uint32_t iproc,
uint32_t isrcproc,
uint32_t nprocs)
Computes the number of rows or columns of a distributed matrix owned by the process indicated by
iproc
argument.Parameter | Memory | In/Out | Description |
---|---|---|---|
n | Host | In | Number of rows or columns in the global distributed matrix. |
nb | Host | In | Row or column blocking size of the global matrix. |
iproc | Host | In | The coordinate of the process whole local array row or column is to be determined. |
isrcproc | Host | In | The coordinate of the process that owns the first row or column of the distributed matrix. |
nprocs | Host | In | The total number of row or column processes over which the matrix is distributed. |
See cusolverStatus_t for the description of the return status.
cusolverMpMatrixGatherD2H
¶
cusolverStatus_t cusolverMpMatrixGatherD2H(
cusolverMpHandle_t handle,
int64_t M,
int64_t N,
void *d_A,
int64_t IA,
int64_t JA,
cudaLibMpMatrixDesc_t descrA,
int root,
void *h_dst,
int64_t h_lddst)
Gathers the global distributed matrix
A
on a buffer provided on process root
. The input matrix A
is originally distributed using 2D block cyclic format, on output h_dst
contains the matrix in column-major format.Notice that, for this function, the input data is on the device and the output is stored on host memory.
Parameter | Memory | In/Out | Description |
---|---|---|---|
handle | Host | In | cusolverMp library handle. |
M | Host | In | Number of rows of the global distributed matrix A. |
M | Host | In | Number of columns of the global distributed matrix A. |
d_A | Device | In | Number of columns of the global distributed matrix A. |
d_A | Device | In | Pointer into the local memory to an array of dimension (LLD_A, LOCc(JA+N-1)) . On entry, this array contains the local pieces of the M-by-N distributed matrix sub(A). |
IA | Host | In | Row index in the global matrix A indicating the first row of sub(A). This function does not make any assuptions on the alignment of IA . |
JA | Host | In | Column index in the global matrix A indicating the first column of sub(A). This function does not make any assumptions on the alignment of JA . |
descrA | Host | In | Matrix descriptor of the global matrix A. |
root | Host | In | Process ID on which the matrix A will be gathered. |
h_dst | Host | In/Out | Destination host buffer on root process. On output it contains the global matrix A stored in column-major format. Total size must be at least M*N words. |
h_lddst | Host | In | Leading dimension of the h_dst on root process. Must be larger than M . |
See cusolverStatus_t for the description of the return status.
Warning
This is function is meant as an utility function to verify correctness of the data layouts and it is not intended to achieve high performance on large inputs.
cusolverMpMatrixScatterH2D
¶
cusolverStatus_t cusolverMpMatrixScatterH2D(
cusolverMpHandle_t handle,
int64_t M,
int64_t N,
void *d_A,
int64_t IA,
int64_t JA,
cudaLibMpMatrixDesc_t descrA,
int root,
void *h_src,
int64_t h_ldsrc)
Scatters the matrix stored in the local buffer
h_src
from root
process to a distributed global matrix A
.The input matrix
h_src
is stored in column-major format. On ouput, d_A
contains the local portions of the global matrix A
distributed in 2D block cyclic format.Notice that, for this function, the input data is on the host and the output is stored on device memory.
Parameter | Memory | In/Out | Description |
---|---|---|---|
handle | Host | In | cusolverMp library handle. |
M | Host | In | Number of rows of the global distributed matrix A. |
M | Host | In | Number of columns of the global distributed matrix A. |
d_A | Device | Out | Pointer into the local memory to an array of dimension (LLD_A, LOCc(JA+N-1)) . On output, this array contains the local pieces of the M-by-N distributed matrix sub(A). |
IA | Host | In | Row index in the global matrix A indicating the first row of sub(A). This function does not make any assuptions on the alignment of IA . |
JA | Host | In | Column index in the global matrix A indicating the first column of sub(A). This function does not make any assumptions on the alignment of JA . |
descrA | Host | In | Matrix descriptor of the global matrix A. |
root | Host | In | Blocking factor used to distribute the columns of the global matrix A. |
h_src | Host | In | Source buffer on root process. On input it contains the global M by N matrix A stored in column-major format. |
h_ldsrc | Host | In | Leading dimension of the h_dst on root process. Must be larger than M . |
See cusolverStatus_t for the description of the return status.
Warning
This is function is meant as an utility function to verify correctness of the data layouts and it is not intended to achieve high performance on large inputs.
Logging¶
cusolverMpLoggerSetCallback
¶
cusolverStatus_t cusolverMpLoggerSetCallback(
cusolverMpLoggerCallback_t callback)
This function sets the logging callback function.
Parameter | Memory | In/Out | Description |
---|---|---|---|
callback | Host | In | Pointer to a callback function. See cusolverMpLoggerCallback_t . |
See cusolverStatus_t for the description of the return status.
Warning
This is an experimental feature.
cusolverMpLoggerSetFile
¶
cusolverStatus_t cusolverMpLoggerSetFile(
FILE *file)
This function sets the logging output file. Note: once registered using this function call, the provided file handle must not be closed unless the function is called again to switch to a different file handle.
Parameter | Memory | In/Out | Description |
---|---|---|---|
file | Host | In | Pointer to an open file. File should have write permission |
See cusolverStatus_t for the description of the return status.
Warning
This is an experimental feature.
cusolverMpLoggerOpenFile
¶
cusolverStatus_t cusolverMpLoggerOpenFile(
const char* logFile)
This function opens a logging output file in the given path.
Parameter | Memory | In/Out | Description |
---|---|---|---|
logFile | Host | In | Path of the logging output file. |
See cusolverStatus_t for the description of the return status.
Warning
This is an experimental feature.
cusolverMpLoggerSetLevel
¶
cusolverStatus_t cusolverMpLoggerSetLevel(
int level)
Complete
Parameter | Memory | In/Out | Description |
---|---|---|---|
level | Host | In | Value of the logging level. See cusolverMp Logging . |
See cusolverStatus_t for the description of the return status.
Warning
This is an experimental feature.
cusolverMpLoggerSetMask
¶
cusolverStatus_t cusolverMpLoggerSetMask(
int mask)
This function sets the value of the logging mask.
Parameter | Memory | In/Out | Description |
---|---|---|---|
mask | Host | In | Value of the logging mask. See cusolverMp Logging . |
See cusolverStatus_t for the description of the return status.
Warning
This is an experimental feature.
cusolverMpLoggerForceDisable
¶
cusolverStatus_t cusolverMpLoggerForceDisable(
int level)
This function disables logging for the entier run.
See cusolverStatus_t for the description of the return status.
Warning
This is an experimental feature.
Factorization¶
cusolverMpGetrf
¶
cusolverStatus_t cusolverMpGetrf(
cusolverMpHandle_t handle,
int64_t M,
int64_t N,
void *d_A,
int64_t IA,
int64_t JA,
cudaLibMpMatrixDesc_t descrA,
int64_t *d_ipiv,
cudaDataType_t computeType,
void *d_work,
size_t workspaceInBytesOnDevice,
void *h_work,
size_t workspaceInBytesOnHost,
int *info)
This routine computes an LU factorization of a general M-by-N distributed matrix sub(A) using partial pivoting.
The user can also disable pivoting by setting
d_ipiv=NULL
.The factorization has the form:
where is a permutation matrix, is lower triangular with unit diagonal elements (lower trapezoidal if ), and
is upper triangular (upper trapezoidal if ). and are stored in sub(A).
The user can combine cusolverMpGetrf() and cusolverMpGetrs() to solve a system of linear equations.
Parameter | Memory | In/Out | Description |
---|---|---|---|
handle | Host | In | cusolverMp library handle. |
M | Host | In | Number of rows of sub(A). |
N | Host | In | Number of columns of sub(A). |
d_A | Device | In/Out | Pointer to the first entry of the local portion of the global matrix A. On output, the sub(A) is overwritten with the L and U factors. |
IA | Host | In | Row index of the first row of the sub(A). This function does not make any assumptions on the alignment of IA . |
JA | Host | In | Column index of the first column of the sub(A). This function does not make any assumptions on the alignment of JA . |
descrA | Host | In | Matrix descriptor associated to the global matrix A |
d_ipiv | Device | In/Out | Local array of dimension (LOCr(M_A)+MB_A) . If the user set d_ipiv != NULL , on output, this array contains the pivoting information. d_ipiv[i] indicates the global row local row i was swapped with. This array is tied to the distributed matrix A. |
computeType | Host | In | Data type used for computations. See table below for supported combinations. |
d_work | Host | In/Out | Device workspace of size workspaceInBytesOnDevice . |
workspaceInBytesOnDevice | Host | In | The size in bytes of the local device workspace needed by the routine as provided by cusolverMpGetrf_bufferSize(). |
h_work | Host | In/Out | Host workspace of size workspaceInBytesOnHost . |
workspaceInBytesOnHost | Host | In | The size in bytes of the local host workspace needed by the routine as provided by cusolverMpGetrf_bufferSize() |
info | Device | Out | info < 0 indicates an incorrect value of the i-th argument of the function. info > 0 indicates the index of the leading minor in the case of a singular matrix. |
This routine supports the following combinations of data types:
Data Type of A | computeType | Output Data Type |
---|---|---|
CUDA_R_32F | CUDA_R_32F | CUDA_R_32F |
CUDA_R_64F | CUDA_R_64F | CUDA_R_64F |
CUDA_R_32C | CUDA_R_32C | CUDA_R_32C |
CUDA_R_64C | CUDA_R_64C | CUDA_R_64C |
See cusolverStatus_t for the description of the return status.
cusolverMpGetrf_bufferSize
¶
cusolverStatus_t cusolverMpGetrf_bufferSize(
cusolverMpHandle_t handle,
int64_t M,
int64_t N,
void *d_A,
int64_t IA,
int64_t JA,
cudaLibMpMatrixDesc_t descrA,
int64_t *d_ipiv,
cudaDataType_t computeType,
size_t *workspaceInBytesOnDevice,
size_t *workspaceInBytesOnHost)
Computes the size in bytes of the host and device working buffers required by cusolverMpGetrf().
The user can set
d_ipiv=NULL
so cusolverMpGetrf() will compute the LU factorization of the input matrix A without pivoting.Parameter | Memory | In/Out | Description |
---|---|---|---|
handle | Host | In | cusolverMp library handle. |
M | Host | In | Number of rows of sub(A). |
N | Host | In | Number of columns of sub(A). |
d_A | Device | In/Out | Pointer to the first entry of the local portion of the global matrix A. On output, the sub(A) is overwritten with the L and U factors. |
IA | Host | In | Row index of the first row of the sub(A). This function does not make any assumptions on the alignment of IA . |
JA | Host | In | Column index of the first column of the sub(A). This function does not make any assumptions on the alignment of JA . |
descrA | Host | In | Matrix descriptor associated to the global matrix A |
d_ipiv | Device | In/Out | On output, it contains the distributed integer array containing the pivot indices if d_ipiv != NULL on input. |
computeType | Host | In | Data type used for computations. See table below for supported combinations. |
workspaceInBytesOnDevice | Host | In/Out | On output, contains the size in bytes of the local device workspace needed by cusolverMpGetrf(). |
workspaceInBytesOnHost | Host | In/Out | On output, contains the size in bytes of the local host workspace needed by cusolverMpGetrf(). |
This routine supports the following combinations of data types:
Data Type of A | computeType | Output Data Type |
---|---|---|
CUDA_R_32F | CUDA_R_32F | CUDA_R_32F |
CUDA_R_64F | CUDA_R_64F | CUDA_R_64F |
CUDA_R_32C | CUDA_R_32C | CUDA_R_32C |
CUDA_R_64C | CUDA_R_64C | CUDA_R_64C |
See cusolverStatus_t for the description of the return status.
cusolverMpGetrs
¶
cusolverStatus_t cusolverMpGetrs(
cusolverMpHandle_t handle,
cublasOperation_t trans,
int64_t N,
int64_t NRHS,
const void *d_A,
int64_t IA,
int64_t JA,
cudaLibMpMatrixDesc_t descrA,
const int64_t *d_ipiv,
void *d_B,
int64_t IB,
int64_t JB,
cudaLibMpMatrixDesc_t descrB,
cudaDataType_t computeType,
void *d_work,
size_t workspaceInBytesOnDevice,
void *h_work,
size_t workspaceInBytesOnHost,
int *d_info)
This routine solves a system of distributed linear equations
with a general N-by-N distributed matrix sub(A) using the LU factorization computed by cusolverMpGetrf().
Where is defined by the argument
trans
, which allows to solve linear systems of the form:trans | Form of the linear system |
---|---|
CUBLAS_OP_N | |
CUBLAS_OP_T | |
CUBLAS_OP_C |
Parameter | Memory | In/Out | Description |
---|---|---|---|
handle | Host | In | cusolverMp library handle. |
trans | Host | In | Specifies the form of the linear system. Only CUBLAS_OP_N is currently supported. |
N | Host | In | Number of rows of sub(A). |
NRHS | Host | In | Number of colums of sub(B). Currently, this routine only supports NRHS=1 . |
d_A | Device | In | Pointer into the local memory to an array of dimension (LLD_A, LOCc(JA+N-1)) . On entry, this array contains the local pieces of the M-by-N distributed L and U factors of sub(A) as computed by cusolverMpGetrf(). |
IA | Host | In | Row index of the first row of the sub(A). This function does not make any assumptions on the alignment of IA . |
JA | Host | In | Column index of the first column of the sub(A). This function does not make any assumptions on the alignment of JA . |
descrA | Host | In | Matrix descriptor associated to the global matrix A |
d_ipiv | Device | In | Local array of dimension (LOCr(M_A)+MB_A) containing the pivoting information as computed by cusolverMpGetrf(). |
d_B | Device | In/Out | Pointer into the local memory to an array of dimension (LLD_B,LOCc(JB+NRHS-1)) . On entry, the right hand sides sub(B). On exit, sub(B) is overwritten by the solution distributed matrix X. |
IB | Host | In | Row index of the first row of the sub(B). This function does not make any assumptions on the alignment of IB . |
JB | Host | In | Column index of the first column of the sub(B). This function does not make any assumptions on the alignment of JB . |
descrB | Host | In | Matrix descriptor associated to the global matrix B. |
computeType | Host | In | Data type used for computations. See table below for supported combinations. |
d_work | Device | In/Out | Host workspace of size workspaceInBytesOnHost . |
workspaceInBytesOnDevice | Host | In | The size in bytes of the local device workspace needed by the routine as provided by cusolverMpGetrs_bufferSize(). |
h_work | Host | In/Out | Host workspace of size workspaceInBytesOnHost . |
workspaceInBytesOnHost | Host | In | The size in bytes of the local host workspace needed by the routine as provided by cusolverMpGetrs_bufferSize() |
info | Device | Out | info < 0 indicates an incorrect value of the i-th argument of the function. info > 0 indicates the index of the leading minor in the case of a singular matrix. |
This routine supports the following combinations of data types:
Data Type of A | computeType | Output Data Type |
---|---|---|
CUDA_R_32F | CUDA_R_32F | CUDA_R_32F |
CUDA_R_64F | CUDA_R_64F | CUDA_R_64F |
CUDA_R_32C | CUDA_R_32C | CUDA_R_32C |
CUDA_R_64C | CUDA_R_64C | CUDA_R_64C |
See cusolverStatus_t for the description of the return status.
cusolverMpGetrs_bufferSize
¶
cusolverStatus_t cusolverMpGetrs_bufferSize(
cusolverMpHandle_t handle,
cublasOperation_t trans,
int64_t N,
int64_t NRHS,
const void *d_A,
int64_t IA,
int64_t JA,
cudaLibMpMatrixDesc_t descrA,
const int64_t *d_ipiv,
void *d_B,
int64_t IB,
int64_t JB,
cudaLibMpMatrixDesc_t descrB,
cudaDataType_t computeType,
size_t *workspaceInBytesOnDevice,
size_t *workspaceInBytesOnHost)
Computes the size in bytes of the host and device working buffers required by cusolverMpGetrs().
If pivoting was disabled during cusolverMpGetrf(), the user must set
d_ipiv=NULL
.Parameter | Memory | In/Out | Description |
---|---|---|---|
handle | Host | In | cusolverMp library handle. |
trans | Host | In | Specifies the form of the linear system. Only CUBLAS_OP_N is currently supported. |
N | Host | In | Number of rows of sub(A). |
NRHS | Host | In | Number of colums of sub(B). Currently, this routine only supports NRHS=1 . |
d_A | Device | In | Pointer into the local memory to an array of dimension (LLD_A, LOCc(JA+N-1)) . On entry, this array contains the local pieces of the M-by-N distributed L and U factors of sub(A) as computed by cusolverMpGetrf(). |
IA | Host | In | Row index of the first row of the sub(A). This function does not make any assumptions on the alignment of IA . |
JA | Host | In | Column index of the first column of the sub(A). This function does not make any assumptions on the alignment of JA . |
descrA | Host | In | Matrix descriptor associated to the global matrix A |
d_ipiv | Device | In | Local array of dimension (LOCr(M_A)+MB_A) containing the pivoting information as computed by cusolverMpGetrf(). |
d_B | Device | In/Out | Pointer to the first entry of the local portion of the global matrix B. On output, B is overwritten the solution of the linear system. |
IB | Host | In | Row index of the first row of the sub(B). This function does not make any assumptions on the alignment of IB . |
JB | Host | In | Column index of the first column of the sub(B). This function does not make any assumptions on the alignment of JB . |
descrB | Host | In | Matrix descriptor associated to the global matrix B. |
computeType | Host | In | Data type used for computations. See table below for supported combinations. |
workspaceInBytesOnDevice | Host | In/Out | On output, contains the size in bytes of the local device workspace needed by cusolverMpGetrs(). |
workspaceInBytesOnHost | Host | In/Out | On output, contains the size in bytes of the local host workspace needed by cusolverMpGetrs(). |
This routine supports the following combinations of data types:
Data Type of A | computeType | Output Data Type |
---|---|---|
CUDA_R_32F | CUDA_R_32F | CUDA_R_32F |
CUDA_R_64F | CUDA_R_64F | CUDA_R_64F |
CUDA_R_32C | CUDA_R_32C | CUDA_R_32C |
CUDA_R_64C | CUDA_R_64C | CUDA_R_64C |
See cusolverStatus_t for the description of the return status.
cusolverMpPotrf
¶
cusolverStatus_t cusolverMpPotrf(
cusolverMpHandle_t handle,
cublasFillMode_t uplo,
int64_t N,
void *d_A,
int64_t IA,
int64_t JA,
cudaLibMpMatrixDesc_t descA,
cudaDataType_t computeType,
void *d_work,
size_t workspaceInBytesOnDevice,
void *h_work,
size_t workspaceInBytesOnHost,
int *info)
Computes the Cholesky factorization of an N-by-N real symmetric or a complex hermitian positive definite distributed matrix sub(A) denoting
A(IA:IA+N-1, JA:JA+N-1)
.If A is upper triangular and
uplo=CUBLAS_FILL_MODE_UPPER
, the factorization has the formwhere U is upper triangular.
If the matrix is lower triangular and
uplo
is set to CUBLAS_FILL_MODE_LOWER
, the factorization has the formwhere L is lower triangular.
The user can combine cusolverMpPotrf() and cusolverMpPotrs() to solve a system of linear equations.
Parameter | Memory | In/Out | Description |
---|---|---|---|
handle | Host | In | cusolverMp library handle. |
uplo | Host | In | Specifies if A is upper (CUBLAS_FILL_MODE_UPPER ) or lower triangular matrix (CUBLAS_FILL_MODE_LOWER ). |
N | Host | In | Number of rows and columns of sub(A). |
d_A | Device | In | Pointer into the local memory to an array of dimension (LLD_A, LOCc(JA+N-1)) . On entry, this array contains the local pieces of the N-by-N distributed matrix sub(A). On output, this array contains the L or U factors of A, depending on the value of uplo . |
IA | Host | In | Row index of the first row of the sub(A). IA must be a multiple of the row blocking dimension MB_A . |
JA | Host | In | Column index of the first column of the sub(A).`JA` must be a multiple of the column blocking dimension NB_A . |
descrA | Host | In | Matrix descriptor associated to the global matrix A. |
computeType | Host | In | Data type used for computations. See table below for supported combinations. |
d_work | Device | In/Out | Device workspace of size workspaceInBytesOnDevice . |
workspaceInBytesOnDevice | Host | In | The size in bytes of the local device workspace needed by the routine as provided by cusolverMpPotrf_bufferSize(). |
h_work | Host | In/Out | Host workspace of size workspaceInBytesOnHost . |
workspaceInBytesOnHost | Host | In | The size in bytes of the local host workspace needed by the routine as provided by cusolverMpPotrf_bufferSize() |
info | Device | Out | info < 0 indicates an incorrect value of the i-th argument of the function. info > 0 indicates the index of the leading minor in the case of a singular matrix. |
This function requires square block size
(MB_A == NB_A)
.This routine supports the following combinations of data types:
Data Type of A | computeType | Output Data Type |
---|---|---|
CUDA_R_32F | CUDA_R_32F | CUDA_R_32F |
CUDA_R_64F | CUDA_R_64F | CUDA_R_64F |
CUDA_R_32C | CUDA_R_32C | CUDA_R_32C |
CUDA_R_64C | CUDA_R_64C | CUDA_R_64C |
See cusolverStatus_t for the description of the return status.
cusolverMpPotrf_bufferSize
¶
cusolverStatus_t cusolverMpPotrf_bufferSize(
cusolverMpHandle_t handle,
cublasFillMode_t uplo,
int64_t N,
const void *d_A,
int64_t IA,
int64_t JA,
cudaLibMpMatrixDesc_t descA,
cudaDataType_t computeType,
size_t* workspaceInBytesOnDevice,
size_t* workspaceInBytesOnHost)
Computes the size in bytes of the host and device working buffers required by cusolverMpPotrf().
Parameter | Memory | In/Out | Description |
---|---|---|---|
handle | Host | In | cusolverMp library handle. |
uplo | Host | In | Specifies if A is upper (CUBLAS_FILL_MODE_UPPER ) or lower triangular matrix (CUBLAS_FILL_MODE_LOWER ). |
N | Host | In | Number of rows and columns of sub(A). |
d_A | Device | In | Pointer into the local memory to an array of dimension (LLD_A, LOCc(JA+N-1)) . On entry, this array contains the local pieces of the N-by-N distributed matrix sub(A). On output, this array contains the L or U factors of A, depending on the value of uplo . |
IA | Host | In | Row index of the first row of the sub(A). This function does not make any assumptions on the alignment of IA . |
JA | Host | In | Column index of the first column of the sub(A). This function does not make any assumptions on the alignment of JA . |
descrA | Host | In | Matrix descriptor associated to the global matrix A |
computeType | Host | In | Data type used for computations. See table below for supported combinations. |
workspaceInBytesOnDevice | Host | In/Out | On output, contains the size in bytes of the local device workspace needed by cusolverMpPotrf(). |
workspaceInBytesOnHost | Host | In/Out | On output, contains the size in bytes of the local host workspace needed by cusolverMpPotrf(). |
This routine supports the following combinations of data types:
Data Type of A | computeType | Output Data Type |
---|---|---|
CUDA_R_32F | CUDA_R_32F | CUDA_R_32F |
CUDA_R_64F | CUDA_R_64F | CUDA_R_64F |
CUDA_R_32C | CUDA_R_32C | CUDA_R_32C |
CUDA_R_64C | CUDA_R_64C | CUDA_R_64C |
See cusolverStatus_t for the description of the return status.
cusolverMpPotrs
¶
cusolverStatus_t cusolverMpPotrs(
cusolverMpHandle_t handle,
cublasFillMode_t uplo,
int64_t N,
int64_t NRHS,
const void *d_A,
int64_t IA,
int64_t JA,
cudaLibMpMatrixDesc_t descA,
void *d_B,
int64_t IB,
int64_t JB,
cudaLibMpMatrixDesc_t descB,
cudaDataType_t computeType,
void *d_work,
size_t workspaceInBytesOnDevice,
void *h_work,
size_t workspaceInBytesOnHost,
int *info)
Solves a system of linear equations
where sub(A) denotes
A(IA:IA+N-1,JA:JA+N-1)
and is a N-by-N symmetric or hermitian positive definite distributed matrix using the Cholesky factorization:or
computed by cusolverMpPotrf() and sub(B) denotes the distributed matrix
B(IB:IB+N-1,JB:JB+NRHS-1)
.Parameter | Memory | In/Out | Description |
---|---|---|---|
handle | Host | In | cusolverMp library handle. |
uplo | Host | In | Specifies if A is upper (CUBLAS_FILL_MODE_UPPER ) or lower triangular matrix (CUBLAS_FILL_MODE_LOWER ). |
N | Host | In | Number of rows and columns of sub(A). |
NRHS | Host | In | Number of colums of sub(B). Currently, this routine only supports NRHS=1 . |
d_A | Device | In | Pointer into the local memory to an array of dimension (LLD_A, LOCc(JA+N-1)) . Contains the local pieces of the N-by-N distributed L or U factors of sub(A) as computed by cusolverMpPotrf(). |
IA | Host | In | Row index of the first row of the sub(A). IA must be a multiple of the row blocking dimension NB_A . |
JA | Host | In | Column index of the first column of the sub(A). JA must be a multiple of the column blocking dimension MB_A . |
descrA | Host | In | Matrix descriptor associated to the global matrix A |
d_B | Device | In/Out | Pointer into the local memory to an array of dimension (LLD_B,LOCc(JB+NRHS-1)) . On entry, the right hand sides sub(B). On exit, sub(B) is overwritten by the solution distributed matrix X. |
IB | Host | In | Row index of the first row of the sub(B). This function does not make any assumptions on the alignment of IB . |
JB | Host | In | Column index of the first column of the sub(B). This function does not make any assumptions on the alignment of JB . |
descrB | Host | In | Matrix descriptor associated to the global matrix B. |
computeType | Host | In | Data type used for computations. See table below for supported combinations. |
d_work | Device | In/Out | Device workspace of size workspaceInBytesOnDevice . |
workspaceInBytesOnDevice | Host | In | The size in bytes of the local device workspace needed by the routine as provided by cusolverMpPotrs_bufferSize(). |
h_work | Host | In/Out | Host workspace of size workspaceInBytesOnHost . |
workspaceInBytesOnHost | Host | In | The size in bytes of the local host workspace needed by the routine as provided by cusolverMpPotrs_bufferSize() |
info | Device | Out | info < 0 indicates an incorrect value of the i-th argument of the function. info > 0 indicates the index of the leading minor in the case of a singular matrix. |
This function requires square block size
(MB_A == NB_A)
and alignment of sub(A) and sub(B) matrices, meaning (MB_A == MB_B)
and (IA == IB)
.This routine supports the following combinations of data types:
Data Type of A | computeType | Output Data Type |
---|---|---|
CUDA_R_32F | CUDA_R_32F | CUDA_R_32F |
CUDA_R_64F | CUDA_R_64F | CUDA_R_64F |
CUDA_R_32C | CUDA_R_32C | CUDA_R_32C |
CUDA_R_64C | CUDA_R_64C | CUDA_R_64C |
See cusolverStatus_t for the description of the return status.
—
cusolverMpPotrs_bufferSize
¶
cusolverStatus_t cusolverMpPotrs_bufferSize(
cusolverMpHandle_t handle,
cublasFillMode_t uplo,
int64_t n,
int64_t nrhs,
const void *a,
int64_t ia,
int64_t ja,
cudaLibMpMatrixDesc_t descA,
const void *b,
int64_t ib,
int64_t jb,
cudaLibMpMatrixDesc_t descB,
cudaDataType_t computeType,
size_t* workspaceInBytesOnDevice,
size_t* workspaceInBytesOnHost)
Computes the size in bytes of the host and device working buffers required by cusolverMpPotrs().
Parameter | Memory | In/Out | Description |
---|---|---|---|
handle | Host | In | cusolverMp library handle. |
uplo | Host | In | Specifies if A is upper (CUBLAS_FILL_MODE_UPPER ) or lower triangular matrix (CUBLAS_FILL_MODE_LOWER ). |
N | Host | In | Number of rows and columns of sub(A). |
NRHS | Host | In | Number of colums of sub(B). Currently, this routine only supports NRHS=1 . |
d_A | Device | In | Pointer into the local memory to an array of dimension (LLD_A, LOCc(JA+N-1)) . Contains the local pieces of the N-by-N distributed L or U factors of sub(A) as computed by cusolverMpPotrf(). |
IA | Host | In | Row index of the first row of the sub(A). IA must be a multiple of the row blocking dimension NB_A . |
JA | Host | In | Column index of the first column of the sub(A). JA must be a multiple of the column blocking dimension MB_A . |
descrA | Host | In | Matrix descriptor associated to the global matrix A |
d_B | Device | In/Out | Pointer into the local memory to an array of dimension (LLD_B,LOCc(JB+NRHS-1)) . On entry, the right hand sides sub(B). On exit, sub(B) is overwritten by the solution distributed matrix X. |
IB | Host | In | Row index of the first row of the sub(B). This function does not make any assumptions on the alignment of IB . |
JB | Host | In | Column index of the first column of the sub(B). This function does not make any assumptions on the alignment of JB . |
descrB | Host | In | Matrix descriptor associated to the global matrix B. |
computeType | Host | In | Data type used for computations. See table below for supported combinations. |
workspaceInBytesOnDevice | Host | In/Out | On output, contains the size in bytes of the local device workspace needed by cusolverMpPotrs(). |
workspaceInBytesOnHost | Host | In/Out | On output, contains the size in bytes of the local host workspace needed by cusolverMpPotrs(). |
This function requires square block size
(MB_A == NB_A)
and alignment of sub(A) and sub(B) matrices, meaning (MB_A == MB_B)
and (IA == IB)
.This routine supports the following combinations of data types:
Data Type of A | computeType | Output Data Type |
---|---|---|
CUDA_R_32F | CUDA_R_32F | CUDA_R_32F |
CUDA_R_64F | CUDA_R_64F | CUDA_R_64F |
CUDA_R_32C | CUDA_R_32C | CUDA_R_32C |
CUDA_R_64C | CUDA_R_64C | CUDA_R_64C |
See cusolverStatus_t for the description of the return status.