LUPivotSolver#

class nvmath.device.LUPivotSolver(
size: Sequence[int],
precision: type[floating],
execution: str,
*,
sm=None,
transpose_mode: str = 'non_transposed',
arrangement: Sequence[str] | None = None,
batches_per_block: int | Literal['suggested'] | None = None,
data_type: str | None = None,
leading_dimensions: Sequence[int] | None = None,
block_dim: Sequence[int] | Literal['suggested'] | None = None,
)[source]#

A class that encapsulates cuSOLVERDx LU factorization with partial pivoting and linear solver for general matrices.

Available operations:

  • factorize: Computes the LU factorization P @ A = L @ U with partial pivoting, where P is a permutation matrix, L is a lower triangular matrix and U is an upper triangular matrix.

  • solve: Solves the system Ax = B using a previously computed LU factorization with partial pivoting

Memory Layout Requirements:

Matrices must be stored in shared memory according to their arrangement and leading dimension (ld):

For matrix A (M x N):

  • Column-major arrangement: Matrix shape (batches_per_block, M, N) with strides (lda * N, 1, lda)

  • Row-major arrangement: Matrix shape (batches_per_block, M, N) with strides (lda * M, lda, 1)

For matrix B (N x K):

  • Column-major arrangement: Matrix shape (batches_per_block, N, K) with strides (ldb * K, 1, ldb)

  • Row-major arrangement: Matrix shape (batches_per_block, N, K) with strides (ldb * N, ldb, 1)

Note

This solver uses partial pivoting for improved numerical stability and is suitable for general matrices. If your matrix is diagonally dominant, you may consider using LUSolver which does not use pivoting and may be faster.

Parameters:
  • size (Sequence[int]) – Problem size specified as a sequence of 1 to 3 elements: (M,) (treated as (M, M, 1)), (M, N) (treated as (M, N, 1)), or (M, N, K). M and N represent the dimensions of the matrix A used in factorization. K represents the number of columns in the right-hand side matrix B (dimensions N x K) for the solve operation. To use solve(), N must be equal to M, otherwise an exception will be thrown when solver.solve() is used.

  • precision (type[np.floating]) – The computation precision specified as a numpy float dtype. Currently supports: numpy.float32, numpy.float64.

  • execution (str) – A string specifying the execution method. Supported values: 'Block'.

  • sm (ComputeCapability) – Target mathdx compute-capability.

  • transpose_mode (str, optional) – Transpose mode of matrix A for the solve operation. Can be one of: 'non_transposed', 'transposed', 'conj_transposed'. Defaults to 'non_transposed'.

  • arrangement (Sequence[str], optional) – Storage layout for matrices A and B, specified as a sequence of 2 elements (arr_A, arr_B). Each element can be one of: 'col_major', 'row_major'. Defaults to ("col_major", "col_major").

  • batches_per_block (int | Literal["suggested"], optional) – Number of batches to compute in parallel in a single CUDA block. Can be a non-zero integer or the string 'suggested' for automatic selection of an optimal value. We recommend using 1 for matrix A size larger than or equal to 16 x 16, and using 'suggested' for smaller sizes to achieve optimal performance. Defaults to 1.

  • data_type (str, optional) – The data type of the input matrices, can be one of: 'real', 'complex'. Defaults to 'real'.

  • leading_dimensions (Sequence[int], optional) – The leading dimensions for input matrices A and B, specified as a sequence of 2 elements (lda, ldb) or None. If not provided, it will be automatically deduced from size and arrangement. Note: When provided in the constructor, leading dimensions are set at compile-time. To use runtime leading dimensions (avoiding recompilation for different leading dimensions), provide the leading dimension parameters directly to the device methods instead.

  • block_dim (Sequence[int] | Literal["suggested"], optional) – The block dimension for launching the CUDA kernel, specified as a 1 to 3 integer sequence (x, y, z) where missing dimensions are assumed to be 1. Can be a sequence of 1 to 3 positive integers, the string 'suggested' for optimal value selection, or None for the default value.

See also

For further details, please refer to the cuSOLVERDx documentation:

Attributes

a_arrangement#
a_shape#
arrangement#
b_arrangement#
b_shape#
batches_per_block#
block_dim#
block_size#
data_type#
execution#
info_shape#
info_strides#
info_type#
ipiv_shape#
ipiv_size#
ipiv_strides#
ipiv_type#
k#
lda#
ldb#
leading_dimensions#
m#
n#
precision#
size#
sm#
transpose_mode#
value_type#

Methods

a_size(*, lda: int | None = None) int[source]#
a_strides(
*,
lda: int | None = None,
) tuple[int, int, int][source]#
b_size(*, ldb: int | None = None) int[source]#
b_strides(
*,
ldb: int | None = None,
) tuple[int, int, int][source]#
factorize(a, ipiv, info, lda=None) None[source]#

Computes the LU factorization of a general matrix A with partial pivoting.

This device function computes P @ A = L @ U, where P is a permutation matrix, L is a unit lower triangular matrix and U is an upper triangular matrix. This variant uses partial pivoting for improved numerical stability and is suitable for general matrices. Uses cuSOLVERDx 'getrf_partial_pivot'.

If lda is provided, uses runtime version with the specified leading dimension. If lda is not provided (None), uses compile-time version with default or constructor-provided leading dimensions.

Note

The transpose_mode parameter does not affect factorization. This operation always treats the input matrix as-is (non-transposed).

For more details, see: get_started/functions/getrf.html

Parameters:
  • a – Pointer to an array in shared memory, storing the batched matrix according to the specified arrangement and leading dimension (see __init__()). The matrix is overwritten in place. On exit, contains the factors L and U from the factorization P @ A = L @ U. The unit diagonal elements of L are not stored.

  • ipiv – Pointer to a 1D array of int32, storing pivot indices. The array has size min(M, N) for each batch. On exit, ipiv[batch_id * min(M, N) + i] indicates that row i was interchanged with row ipiv[batch_id * min(M, N) + i] - 1 in the batch_id-th batch of A.

  • info – Pointer to a 1D array of int32. On exit, info[batch_id] = 0 indicates success for that batch, info[batch_id] = i > 0 indicates U(i,i) is exactly zero, meaning the factorization has been completed but the factor U is singular and division by zero will occur if it is used to solve a system of equations.

  • lda – Optional runtime leading dimension of matrix A. If not specified, the compile-time lda is used.

solve(a, ipiv, b, lda=None, ldb=None) None[source]#

Solves a system of linear equations Ax = B using the LU factorization with partial pivoting. The a operand must be a square matrix (M == N), otherwise this function will throw an exception.

This device function uses the previously computed factorization P @ A = L @ U to solve the system. Uses cuSOLVERDx 'getrs_partial_pivot'.

If lda and ldb are provided, uses runtime version with the specified leading dimensions. If not provided (None), uses compile-time version with default or constructor-provided leading dimensions.

Note

The transpose_mode parameter (set in constructor) determines which system is solved: A*x=B ('non_transposed'), A^T*x=B ('transposed'), or A^H*x=B ('conj_transposed' for complex matrices).

For more details, see: get_started/functions/getrs.html

Parameters:
  • a – Pointer to an array in shared memory, storing the batched factors L and U from the LU factorization with partial pivoting, according to the specified arrangement and leading dimension (see __init__()). The unit diagonal elements of L are not stored. See the factorize() documentation for details.

  • ipiv – Pointer to a 1D array of int32 in shared or global memory storing pivot indices. The array has size min(M, N) for each batch. The ipiv array should contain the pivot information from the factorize() call. ipiv[batch_id * min(M, N) + i] indicates that row i was interchanged with row ipiv[batch_id * min(M, N) + i] in the batch_id-th batch of A.

  • b – Pointer to an array in shared memory, storing the batched matrix according to the specified arrangement and leading dimension (see __init__()). The matrix is overwritten in place with the solution matrix x.

  • lda – Optional runtime leading dimension of matrix A. The lda and ldb must be specified together. If not specified, the compile-time lda is used.

  • ldb – Optional runtime leading dimension of matrix B. The lda and ldb must be specified together. If not specified, the compile-time ldb is used.