Generalized Linear System Solver#
GELS (GEneral Least Square) function solves overdetermined or underdetermined least square problems:
using the QR or LQ factorization of \(A\), and overwriting \(B\) with the solution \(X\).
The configurations supported by GELS are:
If \(op(A)\) is
non_transposedand \(M \geq N\), find the least squares solution of an overdetermined system using the QR factorization of \(A\);If \(op(A)\) is
non_transposedand \(M < N\), find the minimum norm solution of an underdetermined system using the LQ factorization of \(A\);If \(op(A)\) is
transposedorconj_transposedand \(M \geq N\), find the minimum norm solution of an underdetermined system using the LQ factorization of \(A\);If \(op(A)\) is
transposedorconj_transposedand \(M < N\): find the least squares solution of an overdetermined system using the QR factorization of \(A\).
cuSolverDx gels device functions are (see Execution Methods):
__device__ void execute(data_type* A, data_type* tau, data_type* B);
// with runtime leading dimensions
__device__ void execute(data_type* A, data_type* tau,
data_type* B, const unsigned int ldb);
__device__ void execute(data_type* A, const unsigned int lda, data_type* tau,
data_type* B);
__device__ void execute(data_type* A, const unsigned int lda, data_type* tau,
data_type* B, const unsigned int ldb);
A is a batched \(M \times N\) matrix, with leading dimension \(\mathrm{lda} \geq M\) if A is in column-major layout, or \(\mathrm{lda} \geq N\) if matrix A is row-major. After the function returns, A is overwritten by the QR or LQ factorization of the input matrix.
If \(op()\) is
non_transposed, then the inputBis a batched \(M \times K\) right-hand side matrix, and the resultXis a batched \(N \times K\) solution matrix.If \(op()\) is
transposedorconj_transposed, then the inputBis a batched \(N \times K\) right-hand side matrix, and the resultXis a batched \(M \times K\) solution matrix.
Note
GELS is an in-place function, i.e., B is overwritten by the solution X after the function returns. While mathematically the dimensions of X and B are different, in practice the storage object of B/X is \(\max(M, N) \times K\) per batch.
tau is an array of size \(\min(M, N)\) for each batch, and represents the Householder vectors of the QR or LQ factorization of A.
The functions support:
AandBbeing either column- or row-major memory layout, see Arrangement Operator,\(op(A)\) either being
non_transposed,transposedfor real data type, orconj_transposedfor complex data type, see TransposeMode Operator, and\(M \geq N\) for overdetermined system, or \(M < N\) for underdetermined system.