LUSolver#

class nvmath.device.LUSolver(
size: Sequence[int],
precision: type[floating],
execution: str,
*,
sm=None,
transpose_mode: str = 'non_transposed',
arrangement: Sequence[str] | None = None,
batches_per_block: int | Literal['suggested'] | None = None,
data_type: str | None = None,
leading_dimensions: Sequence[int] | None = None,
block_dim: Sequence[int] | Literal['suggested'] | None = None,
)[source]#

A class that encapsulates cuSOLVERDx LU factorization without pivoting and linear solver for general matrices.

Available operations:

  • factorize: Computes the LU factorization A = L @ U, where L is a unit lower triangular matrix and U is an upper triangular matrix.

  • solve: Solves the system Ax = B using a previously computed LU factorization.

Memory Layout Requirements:

Matrices must be stored in shared memory according to their arrangement and leading dimension (ld):

For matrix A (M x N):

  • Column-major arrangement: Matrix shape (batches_per_block, M, N) with strides (lda * N, 1, lda)

  • Row-major arrangement: Matrix shape (batches_per_block, M, N) with strides (lda * M, lda, 1)

For matrix B (N x K):

  • Column-major arrangement: Matrix shape (batches_per_block, N, K) with strides (ldb * K, 1, ldb)

  • Row-major arrangement: Matrix shape (batches_per_block, N, K) with strides (ldb * N, ldb, 1)

Note

If a nonsingular matrix A is diagonal dominant, then it is safe to factorize without pivoting. If a matrix is not diagonal dominant, then pivoting is usually required to ensure numerical stability (see LUPivotSolver).

Parameters:
  • size (Sequence[int]) – Problem size specified as a sequence of 1 to 3 elements: (M,) (treated as (M, M, 1)), (M, N) (treated as (M, N, 1)), or (M, N, K). M and N represent the dimensions of the matrix A used in factorization. K represents the number of columns in the right-hand side matrix B (dimensions N x K) for the solve operation. To use solve(), N must be equal to M, otherwise an exception will be thrown when solver.solve() is used.

  • precision (type[np.floating]) – The computation precision specified as a numpy float dtype. Currently supports: numpy.float32, numpy.float64.

  • execution (str) – A string specifying the execution method. Supported values: 'Block'.

  • sm (ComputeCapability) – Target mathdx compute-capability.

  • transpose_mode (str, optional) – Transpose mode of matrix A for the solve operation. Can be one of: 'non_transposed', 'transposed', 'conj_transposed'. Defaults to 'non_transposed'.

  • arrangement (Sequence[str], optional) – Storage layout for matrices A and B, specified as a sequence of 2 elements (arr_A, arr_B). Each element can be one of: 'col_major', 'row_major'. Defaults to ("col_major", "col_major").

  • batches_per_block (int | Literal["suggested"], optional) – Number of batches to compute in parallel in a single CUDA block. Can be a non-zero integer or the string 'suggested' for automatic selection of an optimal value. We recommend using 1 for matrix A size larger than or equal to 16 x 16, and using 'suggested' for smaller sizes to achieve optimal performance. Defaults to 1.

  • data_type (str, optional) – The data type of the input matrices, can be one of: 'real', 'complex'. Defaults to 'real'.

  • leading_dimensions (Sequence[int], optional) – The leading dimensions for input matrices A and B, specified as a sequence of 2 elements (lda, ldb) or None. If not provided, it will be automatically deduced from size and arrangement. Note: When provided in the constructor, leading dimensions are set at compile-time. To use runtime leading dimensions (avoiding recompilation for different leading dimensions), provide the leading dimension parameters directly to the device methods instead.

  • block_dim (Sequence[int] | Literal["suggested"], optional) – The block dimension for launching the CUDA kernel, specified as a 1 to 3 integer sequence (x, y, z) where missing dimensions are assumed to be 1. Can be a sequence of 1 to 3 positive integers, the string 'suggested' for optimal value selection, or None for the default value.

See also

For further details, please refer to the cuSOLVERDx documentation:

Attributes

a_arrangement#
a_shape#
arrangement#
b_arrangement#
b_shape#
batches_per_block#
block_dim#
block_size#
data_type#
execution#
info_shape#
info_strides#
info_type#
k#
lda#
ldb#
leading_dimensions#
m#
n#
precision#
size#
sm#
transpose_mode#
value_type#

Methods

a_size(*, lda: int | None = None) int[source]#
a_strides(*, lda: int | None = None) tuple[int, int, int][source]#
b_size(*, ldb: int | None = None) int[source]#
b_strides(*, ldb: int | None = None) tuple[int, int, int][source]#
factorize(a, info, lda=None) None[source]#

Computes the LU factorization of a general matrix A without pivoting.

This device function computes A = L @ U, where L is a unit lower triangular matrix and U is an upper triangular matrix. This variant is suitable for diagonally dominant matrices or when pivoting is not required. Uses cuSOLVERDx 'getrf_no_pivot'.

If lda is provided, uses runtime version with the specified leading dimension. If lda is not provided (None), uses compile-time version with default or constructor-provided leading dimensions.

Note

The transpose_mode parameter does not affect factorization. This operation always treats the input matrix as-is (non-transposed).

For more details, see: get_started/functions/getrf.html

Parameters:
  • a – Pointer to an array in shared memory, storing the batched matrix according to the specified arrangement and leading dimension (see __init__()). The matrix is overwritten in place. On exit, contains the factors L and U from the factorization A = L @ U. The unit diagonal elements of L are not stored.

  • info – Pointer to a 1D array of int32. On exit, info[batch_id] = 0 indicates success for that batch, info[batch_id] = i > 0 indicates U(i,i) is exactly zero, meaning the factorization has been completed but the factor U is singular and division by zero will occur if it is used to solve a system of equations.

  • lda – Optional runtime leading dimension of matrix A. If not specified, the compile-time lda is used.

solve(a, b, lda=None, ldb=None) None[source]#

Solves a system of linear equations Ax = B using the LU factorization without pivoting. The a operand must be a square matrix (M == N), otherwise this function will throw an exception.

This device function uses the previously computed factorization A = L @ U to solve the system. Uses cuSOLVERDx 'getrs_no_pivot'.

If lda and ldb are provided, uses runtime version with the specified leading dimensions. If not provided (None), uses compile-time version with default or constructor-provided leading dimensions.

Note

The transpose_mode parameter (set in constructor) determines which system is solved: A*x=B ('non_transposed'), A^T*x=B ('transposed'), or A^H*x=B ('conj_transposed' for complex matrices).

For more details, see: get_started/functions/getrs.html

Parameters:
  • a – Pointer to an array in shared memory, storing the batched factors L and U from the LU factorization, according to the specified arrangement and leading dimension (see __init__()). The unit diagonal elements of L are not stored. See the factorize() documentation for details.

  • b – Pointer to an array in shared memory, storing the batched matrix according to the specified arrangement and leading dimension (see __init__()). The matrix is overwritten in place with the solution matrix x.

  • lda – Optional runtime leading dimension of matrix A. The lda and ldb must be specified together. If not specified, the compile-time lda is used.

  • ldb – Optional runtime leading dimension of matrix B. The lda and ldb must be specified together. If not specified, the compile-time ldb is used.