LQFactorize#

class nvmath.device.LQFactorize(
size: Sequence[int],
precision: type[floating],
execution: str,
*,
sm=None,
arrangement: str | None = None,
batches_per_block: int | Literal['suggested'] | None = None,
data_type: str | None = None,
leading_dimension: int | None = None,
block_dim: Sequence[int] | Literal['suggested'] | None = None,
)[source]#

A class that encapsulates LQ orthogonal factorization device function for general matrices using Householder reflections.

Available operation:

  • factorize: Computes the LQ factorization A = L @ Q, where L is a lower triangular matrix (if M <= N) or lower trapezoidal matrix (if M > N) and Q is a unitary N x N matrix.

The factorization uses Householder reflection transformations and does not explicitly form the unitary matrix Q. Instead, Q is represented as a product of Householder vectors stored in the input matrix A along with the tau array.

Memory Layout Requirements:

Matrices must be stored in shared memory according to their arrangement and leading dimension (ld):

For matrix A (M x N):

  • Column-major arrangement: Matrix shape (batches_per_block, M, N) with strides (lda * N, 1, lda)

  • Row-major arrangement: Matrix shape (batches_per_block, M, N) with strides (lda * M, lda, 1)

Note

The LQ factorization is essentially equivalent to the QR factorization of A^T (or A^H for complex types).

Parameters:
  • size (Sequence[int]) – Problem size specified as a sequence of 1 to 3 elements: (M,) (treated as (M, M, 1)), (M, N) (treated as (M, N, 1)), or (M, N, K). M and N represent the dimensions of the matrix A used in factorization. K is ignored if specified.

  • precision (type[np.floating]) – The computation precision specified as a numpy float dtype. Currently supports: numpy.float32, numpy.float64.

  • execution (str) – A string specifying the execution method. Supported values: 'Block'.

  • sm (ComputeCapability) – Target mathdx compute-capability.

  • arrangement (str, optional) – Storage layout for matrix A. Can be one of: 'col_major', 'row_major'. Defaults to 'col_major'. Note: When provided in the constructor, leading dimensions are set at compile-time. To use runtime leading dimensions (avoiding recompilation for different leading dimensions), provide the leading dimension parameters directly to the device methods instead.

  • batches_per_block (int | Literal["suggested"], optional) – Number of batches to compute in parallel in a single CUDA block. Can be a non-zero integer or the string 'suggested' for automatic selection of an optimal value. We recommend using 1 for matrix A size larger than or equal to 16 x 16, and using 'suggested' for smaller sizes to achieve optimal performance. Defaults to 1.

  • data_type (str, optional) – The data type of the input matrices, can be one of: 'real', 'complex'. Defaults to 'real'.

  • leading_dimension (int, optional) – The leading dimension for input matrix A, or None. If not provided, it will be automatically deduced from size and arrangement. Note: When provided in the constructor, leading dimensions are set at compile-time. To use runtime leading dimensions (avoiding recompilation for different leading dimensions), provide the leading dimension parameters directly to the device methods instead.

  • block_dim (Sequence[int] | Literal["suggested"], optional) – The block dimension for launching the CUDA kernel, specified as a 1 to 3 integer sequence (x, y, z) where missing dimensions are assumed to be 1. Can be a sequence of 1 to 3 positive integers, the string 'suggested' for optimal value selection, or None for the default value.

See also

For further details, please refer to the cuSOLVERDx documentation:

Attributes

a_arrangement#
a_shape#
batches_per_block#
block_dim#
block_size#
data_type#
execution#
lda#
m#
n#
precision#
size#
sm#
tau_shape#
tau_size#
tau_strides#
tau_type#
value_type#

Methods

a_size(*, lda: int | None = None) int[source]#
a_strides(
*,
lda: int | None = None,
) tuple[int, int, int][source]#
factorize(a, tau, lda=None) None[source]#

Computes the LQ factorization of a general matrix A using Householder reflections.

This device function computes A = L @ Q, where L is a lower triangular matrix (if M <= N) or lower trapezoidal matrix (if M > N), and Q is a unitary N x N matrix. Uses cuSOLVERDx 'gelqf'.

If lda is provided, uses runtime version with the specified leading dimension. If lda is not provided (None), uses compile-time version with default or constructor-provided leading dimensions.

The LQ factorization is essentially the same as the QR factorization of A^T (or A^H for complex data types).

Matrix Q is not explicitly formed. Instead, Q is represented as a product of min(M, N) Householder vectors: Q = H(min(M, N) - 1)^H * … * H(1)^H * H(0)^H.

Each Householder vector has the form H(i) = I - tau[i] * v * v^H, where:

  • v is a vector of size N for each batch

  • v[0:i-1] = 0, v[i] = 1

  • conjugate(v[i+1:N]) is stored on exit in A[i, i+1:N]

For more details, see: get_started/functions/gelqf.html

Parameters:
  • a – Pointer to an array in shared memory, storing the batched M x N matrix according to the specified arrangement and leading dimension (see __init__()). The matrix is overwritten in place. On exit, the lower triangular or lower trapezoidal part (including diagonal) contains the matrix L. The elements above the diagonal, with the array tau, represent the unitary matrix Q as a product of Householder vectors.

  • tau – Pointer to a 1D array of size min(M, N) for each batch. Contains the scalar factors of the Householder reflections. The tau array, together with the Householder vectors stored in A, defines the unitary matrix Q.

  • lda – Optional runtime leading dimension of matrix A. If not specified, the compile-time lda is used.