LeastSquaresSolver#

class nvmath.device.LeastSquaresSolver(
size: Sequence[int],
precision: type[floating],
execution: str,
*,
sm=None,
transpose_mode: str = 'non_transposed',
arrangement: Sequence[str] | None = None,
batches_per_block: int | Literal['suggested'] | None = None,
data_type: str | None = None,
leading_dimensions: Sequence[int] | None = None,
block_dim: Sequence[int] | Literal['suggested'] | None = None,
)[source]#

A class that encapsulates least squares solver device function ('gels'). GELS (GEneral Least Square) solves overdetermined or underdetermined least squares problems:

\[\min \| op(A) * X - B \|_2\]

using the QR or LQ factorization of A, and overwriting B with the solution X.

The configurations supported by GELS are:

Memory layout requirements:

Matrices must be stored in shared memory according to their arrangement and leading dimension (ld):

For matrix A (M x N):

  • Column-major arrangement: Matrix shape (batches_per_block, M, N) with strides (lda * N, 1, lda)

  • Row-major arrangement: Matrix shape (batches_per_block, M, N) with strides (lda * M, lda, 1)

Matrix B and X are stored in the same buffer (B is overwritten by X in place). The buffer has shape (batches_per_block, max(M, N), K). The second leading dimension ldb refers to this shared B/X buffer and must satisfy ldb >= max(M, N). Logical shapes differ by transpose_mode. Use b_shape and x_shape for B and X.

Matrix B (right-hand side):

  • Logical shape: b_shape - (batches_per_block, M, K) if non-transposed, (batches_per_block, N, K) if transposed.

  • Column-major arrangement: strides (ldb * K, 1, ldb).

  • Row-major arrangement: strides (ldb * max(M, N), ldb, 1).

Matrix X (solution):

  • Logical shape: x_shape - (batches_per_block, N, K) if non-transposed, (batches_per_block, M, K) if transposed.

  • Column-major arrangement: strides (ldb * K, 1, ldb).

  • Row-major arrangement: strides (ldb * max(M, N), ldb, 1).

Note

GELS is an in-place function. Matrix A is overwritten by the QR or LQ factorization, and matrix B is overwritten by the solution X. Both B and X use the single buffer of shape (max(M, N), K) per batch.

Parameters:
  • size (Sequence[int]) – Problem size specified as a sequence of 1 to 3 elements: (M,) (treated as (M, M, 1)), (M, N) (treated as (M, N, 1)), or (M, N, K). M and N represent the dimensions of matrix A (M x N). K represents the number of columns in the right-hand side matrix B.

  • precision (type[np.floating]) – The computation precision specified as a numpy float dtype. Currently supports: numpy.float32, numpy.float64.

  • execution (str) – A string specifying the execution method. Supported values: 'Block'.

  • sm (ComputeCapability) – Target mathdx compute-capability.

  • transpose_mode (str, optional) – Transpose mode for operation op(A) applied to matrix A. Can be one of: 'non_transposed', 'transposed', 'conj_transposed'. Defaults to 'non_transposed'.

  • arrangement (Sequence[str], optional) – Storage layout for matrices A and B, specified as a sequence of 2 elements (arr_A, arr_B). Each element can be one of: 'col_major', 'row_major'. Defaults to ("col_major", "col_major").

  • batches_per_block (int | Literal["suggested"], optional) – Number of batches to compute in parallel in a single CUDA block. Can be a non-zero integer or the string 'suggested' for automatic selection of an optimal value. We recommend using 1 for matrix A size larger than or equal to 16 x 16, and using 'suggested' for smaller sizes to achieve optimal performance. Defaults to 1.

  • data_type (str, optional) – The data type of the input matrices, can be one of: 'real', 'complex'. Defaults to 'real'.

  • leading_dimensions (Sequence[int], optional) – The leading dimensions for input matrices A and B, specified as a sequence of 2 elements (lda, ldb) or None. If not provided, it will be automatically deduced from size and arrangement. Note: When provided in the constructor, leading dimensions are set at compile-time. To use runtime leading dimensions (avoiding recompilation for different leading dimensions), provide the leading dimension parameters directly to the device methods instead.

  • block_dim (Sequence[int] | Literal["suggested"], optional) – The block dimension for launching the CUDA kernel, specified as a 1 to 3 integer sequence (x, y, z) where missing dimensions are assumed to be 1. Can be a sequence of 1 to 3 positive integers, the string 'suggested' for optimal value selection, or None for the default value.

See also

For further details, please refer to the cuSOLVERDx documentation:

Attributes

a_arrangement#
a_shape#
arrangement#
b_arrangement#
b_shape#
batches_per_block#
block_dim#
block_size#
data_type#
execution#
k#
lda#
ldb#
leading_dimensions#
m#
n#
precision#
size#
sm#
tau_shape#
tau_size#
tau_strides#
tau_type#
transpose_mode#
value_type#
x_shape#

Methods

a_size(*, lda: int | None = None) int[source]#
a_strides(
*,
lda: int | None = None,
) tuple[int, int, int][source]#
bx_size(*, ldb: int | None = None) int[source]#
bx_strides(
*,
ldb: int | None = None,
) tuple[int, int, int][source]#
solve(a, tau, b, lda=None, ldb=None) None[source]#

Solves a least squares problem using QR or LQ factorization.

This device function solves:

\[\min \| op(A) * X - B \|_2\]

using the QR or LQ factorization of A, and overwrites B with the solution X. Uses cuSOLVERDx 'gels'. The operation is in-place: matrix A is overwritten by the factorization, and matrix B is overwritten by the solution X.

If lda and ldb are provided, uses runtime version with the specified leading dimensions. If not provided (None), uses compile-time version with default or constructor-provided leading dimensions.

Note

The choice between QR and LQ factorization depends on the problem dimensions and transpose mode:

  • If op(A) is 'non_transposed' and M >= N: uses QR factorization

  • If op(A) is 'non_transposed' and M < N: uses LQ factorization

  • If op(A) is 'transposed' or 'conj_transposed' and M >= N: uses LQ factorization

  • If op(A) is 'transposed' or 'conj_transposed' and M < N: uses QR factorization

For more details, see: get_started/functions/gels.html

Parameters:
  • a – Pointer to an array in shared memory, storing the batched matrix according to the specified arrangement and leading dimension (see __init__()). The matrix is overwritten in place by the QR or LQ factorization.

  • tau – Pointer to a 1D array of size min(M, N) for each batch. Contains the scalar factors of the Householder reflections. The tau array, together with the Householder vectors stored in A, defines the unitary matrix Q.

  • b – Pointer to an array in shared memory, storing the batched right-hand side matrix according to the specified arrangement and leading dimension (see __init__()). The storage size is max(M, N) x K per batch. The operation is in-place: result X overwrites B.

  • lda – Optional runtime leading dimension for matrix A. The lda and ldb must be specified together. If not specified, the compile-time lda is used.

  • ldb – Optional runtime leading dimension for matrix B. The lda and ldb must be specified together. If not specified, the compile-time ldb is used.