QRMultiply#
-
class nvmath.
device. QRMultiply( - size: Sequence[int],
- precision: type[floating],
- execution: str,
- side: str,
- *,
- sm=None,
- transpose_mode: str = 'non_transposed',
- arrangement: Sequence[str] | None = None,
- batches_per_block: int | Literal['suggested'] | None = None,
- data_type: str | None = None,
- leading_dimensions: Sequence[int] | None = None,
- block_dim: Sequence[int] | Literal['suggested'] | None = None,
A class that encapsulates the multiplication of a matrix C by the unitary matrix Q from a QR factorization (UNMQR operation).
Memory Layout Requirements:
Matrices must be stored in shared memory according to their arrangement and leading dimension (ld):
For matrix A (containing Householder vectors):
If
side='left': A isMxKIf
side='right': A isNxKColumn-major arrangement: Matrix shape
(batches_per_block, rows, K)with strides(lda * K, 1, lda)Row-major arrangement: Matrix shape
(batches_per_block, rows, K)with strides(lda * rows, lda, 1)
For matrix C (M x N):
Column-major arrangement: Matrix shape
(batches_per_block, M, N)with strides(ldb * N, 1, ldb)Row-major arrangement: Matrix shape
(batches_per_block, M, N)with strides(ldb * M, ldb, 1)
- Parameters:
size (Sequence[int]) – Problem size specified as a sequence of 1 to 3 elements:
(M,)(treated as(M, M, 1)),(M, N)(treated as(M, N, 1)), or(M, N, K).MandNrepresent the dimensions of matrix C.Krepresents the number of Householder reflections from the QR factorization. Ifside='left', thenK <= Mand A isMxK. Ifside='right', thenK <= Nand A isNxK.precision (type[np.floating]) – The computation precision specified as a numpy float dtype. Currently supports:
numpy.float32,numpy.float64.execution (str) – A string specifying the execution method. Supported values:
'Block'.sm (ComputeCapability) – Target mathdx compute-capability.
side (str) – Side of matrix Q in the multiplication operation. Can be one of:
'left','right'. Ifside='left', computes op(Q) * C where Q isMxM. Ifside='right', computes C * op(Q) where Q isNxN.transpose_mode (str, optional) – Transpose mode for operation op(Q) applied to matrix Q. Can be one of:
'non_transposed','transposed','conj_transposed'. Defaults to'non_transposed'.arrangement (Sequence[str], optional) – Storage layout for matrices A and B, specified as a sequence of 2 elements
(arr_A, arr_B). Each element can be one of:'col_major','row_major'. Defaults to("col_major", "col_major").batches_per_block (int | Literal["suggested"], optional) – Number of batches to compute in parallel in a single CUDA block. Can be a non-zero integer or the string
'suggested'for automatic selection of an optimal value. We recommend using 1 for matrix A size larger than or equal to 16 x 16, and using'suggested'for smaller sizes to achieve optimal performance. Defaults to 1.data_type (str, optional) – The data type of the input matrices, can be one of:
'real','complex'. Defaults to'real'.leading_dimensions (Sequence[int], optional) – The leading dimensions for input matrices A and B, specified as a sequence of 2 elements (
lda,ldb) orNone. If not provided, it will be automatically deduced fromsizeandarrangement. Note: When provided in the constructor, leading dimensions are set at compile-time. To use runtime leading dimensions (avoiding recompilation for different leading dimensions), provide the leading dimension parameters directly to the device methods instead.block_dim (Sequence[int] | Literal["suggested"], optional) – The block dimension for launching the CUDA kernel, specified as a 1 to 3 integer sequence (x, y, z) where missing dimensions are assumed to be 1. Can be a sequence of 1 to 3 positive integers, the string
'suggested'for optimal value selection, orNonefor the default value.
Attributes
- a_arrangement#
- a_shape#
- arrangement#
- b_arrangement#
- batches_per_block#
- block_dim#
- block_size#
- c_shape#
- data_type#
- execution#
- k#
- lda#
- ldb#
- leading_dimensions#
- m#
- n#
- precision#
- side#
- size#
- sm#
- tau_shape#
- tau_size#
- tau_strides#
- tau_type#
- transpose_mode#
- value_type#
Methods
- multiply(a, tau, c, lda=None, ldc=None) None[source]#
Multiplies matrix C by the unitary matrix Q from a QR factorization.
- This device function computes:
op(Q) * C(ifside='left')C * op(Q)(ifside='right')
where Q is the unitary matrix from the QR factorization, represented by Householder vectors stored in A and the tau array. Uses cuSOLVERDx
'unmqr'. The result overwrites matrix C.If
ldaandldcare provided, uses runtime version with the specified leading dimensions. If not provided (None), uses compile-time version with default or constructor-provided leading dimensions.For more details, see: get_started/functions/unmqr.html
- Parameters:
a – Pointer to an array in shared memory, storing the batched matrix containing Householder vectors from the QR factorization, according to the specified arrangement and leading dimension (see
__init__()). The elements below the diagonal of A, with the array tau, represent the unitary matrix Q as a product of Householder reflections. Ifside='left', A isMxK. Ifside='right', A isNxK.tau – Pointer to a 1D array of size K for each batch, containing the scalar factors of the Householder reflections from the QR factorization. The tau array, together with the Householder vectors in A, defines Q.
c – Pointer to an array in shared memory, storing the batched
MxNmatrix according to the specified arrangement and leading dimension (see__init__()). The operation is in-place: result overwrites C.lda – Optional runtime leading dimension for matrix A. The
ldaandldcmust be specified together. If not specified, the compile-timeldais used.ldc – Optional runtime leading dimension for matrix C. The
ldaandldcmust be specified together. If not specified, the compile-timeldcis used.