Solve Linear Systems After LU Factorization#

GETRS (GEneral TRiangular Solve) function solves a system of linear equations

\[op(A) X = B\]

where

\(A\) is the input batched LU-factorized \(N \times N\) matrix. The lower triangular part of A is L, and upper triangular part (including diagonal elements) of A is U.
\(B\) is the input batched \(N \times K\) right-hand side matrix.
\(X\) is the output batched \(N \times K\) solution matrix.
Operation \(op(A)\) indicates if matrix A is non_transposed, transposed for real data type, or conj_transposed for complex data type.

cuSolverDx provides separate function operators depending on whether pivoting is to be performed:

cusolverdx::function::getrs_no_pivot: Linear system solve using LU factors with no pivoting.
cusolverdx::function::getrs_partial_pivot: Linear system solve using LU factors with partial pivoting.

cuSolverDx getrs_no_pivot device functions are (see Execution Methods):

__device__ void execute(const data_type* A, data_type* B);
// with runtime leading dimensions
__device__ void execute(const data_type* A, const unsigned int lda,
                        data_type* B);
__device__ void execute(const data_type* A,
                        data_type* B, const unsigned int ldb);
__device__ void execute(const data_type* A, const unsigned int lda,
                        data_type* B, const unsigned int ldb);

cuSolverDx getrs_partial_pivot device functions are:

__device__ void execute(const data_type* A, const int* ipiv, data_type* B);
// with runtime leading dimensions
__device__ void execute(const data_type* A, const unsigned int lda,
                        const int* ipiv,
                        data_type* B);
__device__ void execute(const data_type* A,
                        const int* ipiv,
                        data_type* B, const unsigned int ldb);
__device__ void execute(const data_type* A, const unsigned int lda,
                        const int* ipiv,
                        data_type* B, const unsigned int ldb);

A is a batched \(N \times N\) LU-factorized general matrix. The lower triangular part of A is L, and upper triangular part (including diagonal elements) of A is U. The leading dimension of A is \(\mathrm{lda} \geq N\) regardless matrix A is in column- or row-major layout.

For getrs_partial_pivot, the input ipiv is an array of size N for each batch, and ipiv[batch_id, i] indicates the row i interchanges with row ipiv[batch_id, i] on the batch_id-th batch of A.

B is a batched \(N \times K\) right-hand side matrix. The operation is in-place, i.e. matrix X overwrites matrix B with the same leading dimension ldb. The leading dimension of B is \(\mathrm{ldb} \geq N\) if B is column-major, or \(\mathrm{ldb} \geq K\) if B is row-major.

The functions support:

A and B either being the same or different column- or row-major layouts, see Arrangement operator, and
\(op(A)\) either being non_transposed, transposed for real data type, or conj_transposed for complex data type, see TransposeMode operator.