LeastSquaresSolver#
-
class nvmath.
device. LeastSquaresSolver( - size: Sequence[int],
- precision: type[floating],
- execution: str,
- *,
- sm=None,
- transpose_mode: str = 'non_transposed',
- arrangement: Sequence[str] | None = None,
- batches_per_block: int | Literal['suggested'] | None = None,
- data_type: str | None = None,
- leading_dimensions: Sequence[int] | None = None,
- block_dim: Sequence[int] | Literal['suggested'] | None = None,
A class that encapsulates least squares solver device function (
'gels'). GELS (GEneral Least Square) solves overdetermined or underdetermined least squares problems:\[\min \| op(A) * X - B \|_2\]using the QR or LQ factorization of A, and overwriting B with the solution X.
The configurations supported by GELS are:
Memory layout requirements:
Matrices must be stored in shared memory according to their arrangement and leading dimension (ld):
For matrix A (M x N):
Column-major arrangement: Matrix shape
(batches_per_block, M, N)with strides(lda * N, 1, lda)Row-major arrangement: Matrix shape
(batches_per_block, M, N)with strides(lda * M, lda, 1)
Matrix B and X are stored in the same buffer (B is overwritten by X in place). The buffer has shape
(batches_per_block, max(M, N), K). The second leading dimensionldbrefers to this shared B/X buffer and must satisfyldb >= max(M, N). Logical shapes differ bytranspose_mode. Useb_shapeandx_shapefor B and X.Matrix B (right-hand side):
Logical shape:
b_shape-(batches_per_block, M, K)if non-transposed,(batches_per_block, N, K)if transposed.Column-major arrangement: strides
(ldb * K, 1, ldb).Row-major arrangement: strides
(ldb * max(M, N), ldb, 1).
Matrix X (solution):
Logical shape:
x_shape-(batches_per_block, N, K)if non-transposed,(batches_per_block, M, K)if transposed.Column-major arrangement: strides
(ldb * K, 1, ldb).Row-major arrangement: strides
(ldb * max(M, N), ldb, 1).
Note
GELS is an in-place function. Matrix A is overwritten by the QR or LQ factorization, and matrix B is overwritten by the solution X. Both B and X use the single buffer of shape
(max(M, N), K)per batch.- Parameters:
size (Sequence[int]) – Problem size specified as a sequence of 1 to 3 elements:
(M,)(treated as(M, M, 1)),(M, N)(treated as(M, N, 1)), or(M, N, K).MandNrepresent the dimensions of matrix A (MxN).Krepresents the number of columns in the right-hand side matrix B.precision (type[np.floating]) – The computation precision specified as a numpy float dtype. Currently supports:
numpy.float32,numpy.float64.execution (str) – A string specifying the execution method. Supported values:
'Block'.sm (ComputeCapability) – Target mathdx compute-capability.
transpose_mode (str, optional) – Transpose mode for operation op(A) applied to matrix A. Can be one of:
'non_transposed','transposed','conj_transposed'. Defaults to'non_transposed'.arrangement (Sequence[str], optional) – Storage layout for matrices A and B, specified as a sequence of 2 elements
(arr_A, arr_B). Each element can be one of:'col_major','row_major'. Defaults to("col_major", "col_major").batches_per_block (int | Literal["suggested"], optional) – Number of batches to compute in parallel in a single CUDA block. Can be a non-zero integer or the string
'suggested'for automatic selection of an optimal value. We recommend using 1 for matrix A size larger than or equal to 16 x 16, and using'suggested'for smaller sizes to achieve optimal performance. Defaults to 1.data_type (str, optional) – The data type of the input matrices, can be one of:
'real','complex'. Defaults to'real'.leading_dimensions (Sequence[int], optional) – The leading dimensions for input matrices A and B, specified as a sequence of 2 elements (
lda,ldb) orNone. If not provided, it will be automatically deduced fromsizeandarrangement. Note: When provided in the constructor, leading dimensions are set at compile-time. To use runtime leading dimensions (avoiding recompilation for different leading dimensions), provide the leading dimension parameters directly to the device methods instead.block_dim (Sequence[int] | Literal["suggested"], optional) – The block dimension for launching the CUDA kernel, specified as a 1 to 3 integer sequence (x, y, z) where missing dimensions are assumed to be 1. Can be a sequence of 1 to 3 positive integers, the string
'suggested'for optimal value selection, orNonefor the default value.
Attributes
- a_arrangement#
- a_shape#
- arrangement#
- b_arrangement#
- b_shape#
- batches_per_block#
- block_dim#
- block_size#
- data_type#
- execution#
- k#
- lda#
- ldb#
- leading_dimensions#
- m#
- n#
- precision#
- size#
- sm#
- tau_shape#
- tau_size#
- tau_strides#
- tau_type#
- transpose_mode#
- value_type#
- x_shape#
Methods
- solve(a, tau, b, lda=None, ldb=None) None[source]#
Solves a least squares problem using QR or LQ factorization.
This device function solves:
\[\min \| op(A) * X - B \|_2\]using the QR or LQ factorization of A, and overwrites B with the solution X. Uses cuSOLVERDx
'gels'. The operation is in-place: matrix A is overwritten by the factorization, and matrix B is overwritten by the solution X.If
ldaandldbare provided, uses runtime version with the specified leading dimensions. If not provided (None), uses compile-time version with default or constructor-provided leading dimensions.Note
The choice between QR and LQ factorization depends on the problem dimensions and transpose mode:
If
op(A)is'non_transposed'andM >= N: uses QR factorizationIf
op(A)is'non_transposed'andM < N: uses LQ factorizationIf
op(A)is'transposed'or'conj_transposed'andM >= N: uses LQ factorizationIf
op(A)is'transposed'or'conj_transposed'andM < N: uses QR factorization
For more details, see: get_started/functions/gels.html
- Parameters:
a – Pointer to an array in shared memory, storing the batched matrix according to the specified arrangement and leading dimension (see
__init__()). The matrix is overwritten in place by the QR or LQ factorization.tau – Pointer to a 1D array of size min(M, N) for each batch. Contains the scalar factors of the Householder reflections. The tau array, together with the Householder vectors stored in A, defines the unitary matrix Q.
b – Pointer to an array in shared memory, storing the batched right-hand side matrix according to the specified arrangement and leading dimension (see
__init__()). The storage size ismax(M, N) x Kper batch. The operation is in-place: result X overwrites B.lda – Optional runtime leading dimension for matrix A. The
ldaandldbmust be specified together. If not specified, the compile-timeldais used.ldb – Optional runtime leading dimension for matrix B. The
ldaandldbmust be specified together. If not specified, the compile-timeldbis used.