Generalized Linear System Solver#
GELS (GEneral Lease Square) function solves overdetermined or underdetermined least square problems:
using the QR or LQ factorization of A, and overwriting B with the solution X.
The configurations supported by GELS are:
If
op(A)isnon_transposedandM >= N, find the least squares solution of an overdetermined system using the QR factorization ofA;If
op(A)isnon_transposedandM < N, find the minimum norm solution of an underdetermined system using the LQ factorization ofA;If
op(A)istransposedorconj_transposedandM >= N, find the minimum norm solution of an underdetermined system using the LQ factorization ofA;If
Ais eithertransposedorconj_transposedandM < N: find the least squares solution of an overdetermined system using the QR factorization ofA.
cuSolverDx gels device functions are (see Execution Methods):
__device__ void execute(data_type* A, data_type* tau, data_type* B);
// with runtime leading dimensions
__device__ void execute(data_type* A, data_type* tau,
data_type* B, const unsigned int ldb);
__device__ void execute(data_type* A, const unsigned int lda, data_type* tau,
data_type* B);
__device__ void execute(data_type* A, const unsigned int lda, data_type* tau,
data_type* B, const unsigned int ldb);
A is a batched M x N matrix, with leading dimension lda >= M if A is in column-major layout, or lda >= N if matrix A is row-major. After the function returns, A is overwritten by the QR or LQ factorization of the input matrix.
The input B is a batched N x K right-hand side matrix, and the result X is a batched M x K matrix.
Note
GELS is a in-place function, i.e., B is overwritten by the solution X after the function returns. While mathmatically the dimensions of X and B are different, in practice the storage object of B/X is max(M, N) x K per batch.
tau is an array of size min(M, N) for each batch, and represents the Householder vectors of the QR or LQ factorization of A.
The functions support:
AandBbeing either column- or row-major memory layout, see Arrangement Operator,\(op(A)\) either being
non_transposed,transposedfor real data type, orconj_transposedfor complex data type, see TransposeMode Operator, andM >= Nfor overdetermined system, orM < Nfor underdetermined system.