The nvmath.device module is experimental and potentially subject to future changes.

LUPivotSolver#

class nvmath.device.LUPivotSolver( size: Sequence[int], precision: type[floating], execution: str, *, sm=None, transpose_mode: str = 'non_transposed', arrangement: Sequence[str] | None = None, batches_per_block: int | Literal['suggested'] | None = None, data_type: str | None = None, leading_dimensions: Sequence[int] | None = None, block_dim: Sequence[int] | Literal['suggested'] | None = None, )[source]#

A class that encapsulates cuSOLVERDx LU factorization with partial pivoting and linear solver for general matrices.

Available operations:

factorize: Computes the LU factorization P @ A = L @ U with partial pivoting, where P is a permutation matrix, L is a unit lower triangular matrix and U is an upper triangular matrix.
solve: Solves the system Ax = B using a previously computed LU factorization with partial pivoting

Memory Layout Requirements:

Matrices must be stored in shared memory according to their arrangement and leading dimension (ld):

For matrix A (M x N):

Column-major arrangement: Matrix shape (batches_per_block, M, N) with strides (lda * N, 1, lda)
Row-major arrangement: Matrix shape (batches_per_block, M, N) with strides (lda * M, lda, 1)

For matrix B (N x K):

Column-major arrangement: Matrix shape (batches_per_block, N, K) with strides (ldb * K, 1, ldb)
Row-major arrangement: Matrix shape (batches_per_block, N, K) with strides (ldb * N, ldb, 1)

Note

This solver uses partial pivoting for improved numerical stability and is suitable for general matrices. If your matrix is diagonally dominant, you may consider using LUSolver which does not use pivoting and may be faster.

Parameters:

size (Sequence[int]) – Problem size specified as a sequence of 1 to 3 elements: (M,) (treated as (M, M, 1)), (M, N) (treated as (M, N, 1)), or (M, N, K). M and N represent the dimensions of the matrix A used in factorization. K represents the number of columns in the right-hand side matrix B (dimensions N x K) for the solve operation. To use solve(), N must be equal to M, otherwise an exception will be thrown when solver.solve() is used.
precision (type[np.floating]) – The computation precision specified as a numpy float dtype. Currently supports: numpy.float32, numpy.float64.
execution (str) – A string specifying the execution method. Supported values: 'Block'.
sm (ComputeCapability) – Target mathdx compute-capability.
transpose_mode (str, optional) – Transpose mode of matrix A for the solve operation. Can be one of: 'non_transposed', 'transposed', 'conj_transposed'. Defaults to 'non_transposed'.
arrangement (Sequence[str], optional) – Storage layout for matrices A and B, specified as a sequence of 1 or 2 elements (arr_A, arr_B). Each element can be one of: 'col_major', 'row_major'. Defaults to ("col_major", "col_major").
batches_per_block (int | Literal["suggested"], optional) – Number of batches to compute in parallel in a single CUDA block. Can be a positive integer or the string 'suggested' for automatic selection of an optimal value. We recommend using 1 for matrix A size larger than or equal to 16 x 16, and using 'suggested' for smaller sizes to achieve optimal performance. Defaults to 1.
data_type (str, optional) – The data type of the input matrices, can be one of: 'real', 'complex'. Defaults to 'real'.
leading_dimensions (Sequence[int], optional) – The leading dimensions for input matrices A and B, specified as a sequence of 1 or 2 elements (lda, ldb) or None. If not provided, it will be automatically deduced from size and arrangement. Note: When provided in the constructor, leading dimensions are set at compile-time. To use runtime leading dimensions (avoiding recompilation for different leading dimensions), provide the leading dimension parameters directly to the device methods instead.
block_dim (Sequence[int] | Literal["suggested"], optional) – The block dimension for launching the CUDA kernel, specified as a 1 to 3 integer sequence (x, y, z) where missing dimensions are assumed to be 1. Can be a sequence of 1 to 3 positive integers, the string 'suggested' for optimal value selection, or None for the default value.