Generalized Linear System Solver#

GELS (GEneral Least Square) function solves overdetermined or underdetermined least square problems:

\[\min \| op(A) X - B \|_2\]

using the QR or LQ factorization of \(A\), and overwriting \(B\) with the solution \(X\).

The configurations supported by GELS are:

  1. If \(op(A)\) is non_transposed and \(M \geq N\), find the least squares solution of an overdetermined system using the QR factorization of \(A\);

  2. If \(op(A)\) is non_transposed and \(M < N\), find the minimum norm solution of an underdetermined system using the LQ factorization of \(A\);

  3. If \(op(A)\) is transposed or conj_transposed and \(M \geq N\), find the minimum norm solution of an underdetermined system using the LQ factorization of \(A\);

  4. If \(op(A)\) is transposed or conj_transposed and \(M < N\): find the least squares solution of an overdetermined system using the QR factorization of \(A\).

cuSolverDx gels device functions are (see Execution Methods):

__device__ void execute(data_type* A, data_type* tau, data_type* B);
// with runtime leading dimensions
__device__ void execute(data_type* A, data_type* tau,
                        data_type* B, const unsigned int ldb);
__device__ void execute(data_type* A, const unsigned int lda, data_type* tau,
                        data_type* B);
__device__ void execute(data_type* A, const unsigned int lda, data_type* tau,
                        data_type* B, const unsigned int ldb);

A is a batched \(M \times N\) matrix, with leading dimension \(\mathrm{lda} \geq M\) if A is in column-major layout, or \(\mathrm{lda} \geq N\) if matrix A is row-major. After the function returns, A is overwritten by the QR or LQ factorization of the input matrix.

  • If \(op()\) is non_transposed, then the input B is a batched \(M \times K\) right-hand side matrix, and the result X is a batched \(N \times K\) solution matrix.

  • If \(op()\) is transposed or conj_transposed, then the input B is a batched \(N \times K\) right-hand side matrix, and the result X is a batched \(M \times K\) solution matrix.

Note

GELS is an in-place function, i.e., B is overwritten by the solution X after the function returns. While mathematically the dimensions of X and B are different, in practice the storage object of B/X is \(\max(M, N) \times K\) per batch.

tau is an array of size \(\min(M, N)\) for each batch, and represents the Householder vectors of the QR or LQ factorization of A.

The functions support:

  1. A and B being either column- or row-major memory layout, see Arrangement Operator,

  2. \(op(A)\) either being non_transposed, transposed for real data type, or conj_transposed for complex data type, see TransposeMode Operator, and

  3. \(M \geq N\) for overdetermined system, or \(M < N\) for underdetermined system.