Modified LU Factorization for Unitary Matrices#
MODIFIED_LU function computes a batched modified LU factorization based on an algorithm of reconstructing Householder reflectors after performing Tall-Skinny QR (TSQR) decomposition and constructing the explicit tall-skinny Q factor [1].
Given an \(M \times N\) orthonormal matrix \(Q\), this function computes:
where:
\(Q\) is an \(M \times N\) orthonormal matrix.
\(S\) is an \(N \times N\) diagonal sign matrix corresponding to the sign choices made by Householder HR.
\(L\) is a lower trapezoidal matrix with implicit unit diagonal if \(M \geq N\), or lower triangular matrix with implicit unit diagonal if \(M < N\).
\(U\) is an upper triangular matrix, if \(M \geq N\), or upper trapezoidal matrix if \(M < N\).
Modified_LU can be applied to any orthonormal matrix, not necessarily the ones obtained from TSQR factorization. Note that no pivoting is needed because the LU decomposition is applied to \(Q - S\) where \(S\) is a diagonal sign matrix chosen to ensure the diagonal entry is at least 1 in absolute value.
cuSolverDx modified_lu device functions are (see Execution Methods):
__device__ void execute(data_type* Q, data_type* S);
// with runtime ldq
__device__ void execute(data_type* Q, const unsigned int ldq, data_type* S);
Q- orthonormal matrix of size \(M \times N\) per batch. On output, it is overwritten with \(L\) and \(U\).S- 1d array of size \(\min(M, N)\) per batch,. On output, it contains the diagonal sign matrix of \(Q\) so that \(Q - S = L U\).
Note
Unlike standard LU factorization, modified LU omits the status argument. The factorization is applied to \(Q - S\), where \(S\) is a diagonal sign matrix chosen so that no diagonal entry of \(Q - S\) is zero; the factorization therefore always succeeds.
The function supports \(Q\) being either column- or row-major memory layout, see Arrangement operator.