Generalized Linear System Solver#

GELS (GEneral Lease Square) function solves overdetermined or underdetermined least square problems:

\[\min \| op(A) * X - B \|_2\]

using the QR or LQ factorization of A, and overwriting B with the solution X.

The configurations supported by GELS are:

  1. If op(A) is non_transposed and M >= N, find the least squares solution of an overdetermined system using the QR factorization of A;

  2. If op(A) is non_transposed and M < N, find the minimum norm solution of an underdetermined system using the LQ factorization of A;

  3. If op(A) is transposed or conj_transposed and M >= N, find the minimum norm solution of an underdetermined system using the LQ factorization of A;

  4. If A is either transposed or conj_transposed and M < N: find the least squares solution of an overdetermined system using the QR factorization of A.

cuSolverDx gels device functions are (see Execution Methods):

__device__ void execute(data_type* A, data_type* tau, data_type* B);
// with runtime leading dimensions
__device__ void execute(data_type* A, data_type* tau,
                        data_type* B, const unsigned int ldb);
__device__ void execute(data_type* A, const unsigned int lda, data_type* tau,
                        data_type* B);
__device__ void execute(data_type* A, const unsigned int lda, data_type* tau,
                        data_type* B, const unsigned int ldb);

A is a batched M x N matrix, with leading dimension lda >= M if A is in column-major layout, or lda >= N if matrix A is row-major. After the function returns, A is overwritten by the QR or LQ factorization of the input matrix.

The input B is a batched N x K right-hand side matrix, and the result X is a batched M x K matrix.

Note

GELS is a in-place function, i.e., B is overwritten by the solution X after the function returns. While mathmatically the dimensions of X and B are different, in practice the storage object of B/X is max(M, N) x K per batch.

tau is an array of size min(M, N) for each batch, and represents the Householder vectors of the QR or LQ factorization of A.

The functions support:

  1. A and B being either column- or row-major memory layout, see Arrangement Operator,

  2. \(op(A)\) either being non_transposed, transposed for real data type, or conj_transposed for complex data type, see TransposeMode Operator, and

  3. M >= N for overdetermined system, or M < N for underdetermined system.