LQMultiply#

class nvmath.device.LQMultiply(
size: Sequence[int],
precision: type[floating],
execution: str,
side: str,
*,
sm=None,
transpose_mode: str = 'non_transposed',
arrangement: Sequence[str] | None = None,
batches_per_block: int | Literal['suggested'] | None = None,
data_type: str | None = None,
leading_dimensions: Sequence[int] | None = None,
block_dim: Sequence[int] | Literal['suggested'] | None = None,
)[source]#

A class that encapsulates the multiplication of a matrix C by the unitary matrix Q from an LQ factorization (UNMLQ operation).

Memory Layout Requirements:

Matrices must be stored in shared memory according to their arrangement and leading dimension (ld):

For matrix A (containing Householder vectors):

  • If side='left': A is K x M

  • If side='right': A is K x N

  • Column-major arrangement: Matrix shape (batches_per_block, K, cols) with strides (lda * cols, 1, lda)

  • Row-major arrangement: Matrix shape (batches_per_block, K, cols) with strides (lda * K, lda, 1)

For matrix C (M x N):

  • Column-major arrangement: Matrix shape (batches_per_block, M, N) with strides (ldc * N, 1, ldc)

  • Row-major arrangement: Matrix shape (batches_per_block, M, N) with strides (ldc * M, ldc, 1)

Parameters:
  • size (Sequence[int]) – Problem size specified as a sequence of 1 to 3 elements: (M,) (treated as (M, M, 1)), (M, N) (treated as (M, N, 1)), or (M, N, K). M and N represent the dimensions of matrix C. K represents the number of Householder reflections from the LQ factorization. If side='left', then K <= M and A is K x M. If side='right', then K <= N and A is K x N.

  • precision (type[np.floating]) – The computation precision specified as a numpy float dtype. Currently supports: numpy.float32, numpy.float64.

  • execution (str) – A string specifying the execution method. Supported values: 'Block'.

  • sm (ComputeCapability) – Target mathdx compute-capability.

  • side (str) – Side of matrix Q in the multiplication operation. Can be one of: 'left', 'right'. If side='left', computes op(Q) * C where Q is M x M. If side='right', computes C * op(Q) where Q is N x N.

  • transpose_mode (str, optional) – Transpose mode for operation op(Q) applied to matrix Q. Can be one of: 'non_transposed', 'transposed', 'conj_transposed'. Defaults to 'non_transposed'.

  • arrangement (Sequence[str], optional) – Storage layout for matrices A and B, specified as a sequence of 2 elements (arr_A, arr_B). Each element can be one of: 'col_major', 'row_major'. Defaults to ("col_major", "col_major").

  • batches_per_block (int | Literal["suggested"], optional) – Number of batches to compute in parallel in a single CUDA block. Can be a non-zero integer or the string 'suggested' for automatic selection of an optimal value. We recommend using 1 for matrix A size larger than or equal to 16 x 16, and using 'suggested' for smaller sizes to achieve optimal performance. Defaults to 1.

  • data_type (str, optional) – The data type of the input matrices, can be one of: 'real', 'complex'. Defaults to 'real'.

  • leading_dimensions (Sequence[int], optional) – The leading dimensions for input matrices A and B, specified as a sequence of 2 elements (lda, ldb) or None. If not provided, it will be automatically deduced from size and arrangement. Note: When provided in the constructor, leading dimensions are set at compile-time. To use runtime leading dimensions (avoiding recompilation for different leading dimensions), provide the leading dimension parameters directly to the device methods instead.

  • block_dim (Sequence[int] | Literal["suggested"], optional) – The block dimension for launching the CUDA kernel, specified as a 1 to 3 integer sequence (x, y, z) where missing dimensions are assumed to be 1. Can be a sequence of 1 to 3 positive integers, the string 'suggested' for optimal value selection, or None for the default value.

See also

For further details, please refer to the cuSOLVERDx documentation:

Attributes

a_arrangement#
a_shape#
arrangement#
b_arrangement#
batches_per_block#
block_dim#
block_size#
c_shape#
data_type#
execution#
k#
lda#
ldb#
leading_dimensions#
m#
n#
precision#
side#
size#
sm#
tau_shape#
tau_size#
tau_strides#
tau_type#
transpose_mode#
value_type#

Methods

a_size(*, lda: int | None = None) int[source]#
a_strides(
*,
lda: int | None = None,
) tuple[int, int, int][source]#
c_size(*, ldc: int | None = None) int[source]#
c_strides(
*,
ldc: int | None = None,
) tuple[int, int, int][source]#
multiply(a, tau, c, lda=None, ldc=None) None[source]#

Multiplies matrix C by the unitary matrix Q from an LQ factorization.

This device function computes:

op(Q) * C (if side='left') C * op(Q) (if side='right')

where Q is the unitary matrix from the LQ factorization, represented by Householder vectors stored in A and the tau array. Uses cuSOLVERDx 'unmlq'. The result overwrites matrix C.

If lda and ldc are provided, uses runtime version with the specified leading dimensions. If not provided (None), uses compile-time version with default or constructor-provided leading dimensions.

For more details, see: get_started/functions/unmlq.html

Parameters:
  • a – Pointer to an array in shared memory, storing the batched matrix containing Householder vectors from the LQ factorization, according to the specified arrangement and leading dimension (see __init__()). The elements above the diagonal of A, with the array tau, represent the unitary matrix Q as a product of Householder reflections. If side='left', A is K x M. If side='right', A is K x N.

  • tau – Pointer to a 1D array of size K for each batch, containing the scalar factors of the Householder reflections from the LQ factorization. The tau array, together with the Householder vectors in A, defines Q.

  • c – Pointer to an array in shared memory, storing the batched M x N matrix according to the specified arrangement and leading dimension (see __init__()). The operation is in-place: result overwrites C.

  • lda – Optional runtime leading dimension for matrix A. The lda and ldc must be specified together. If not specified, the compile-time lda is used.

  • ldc – Optional runtime leading dimension for matrix C. The lda and ldc must be specified together. If not specified, the compile-time ldc is used.