LUSolver#
-
class nvmath.
device. LUSolver( - size: Sequence[int],
- precision: type[floating],
- execution: str,
- *,
- sm=None,
- transpose_mode: str = 'non_transposed',
- arrangement: Sequence[str] | None = None,
- batches_per_block: int | Literal['suggested'] | None = None,
- data_type: str | None = None,
- leading_dimensions: Sequence[int] | None = None,
- block_dim: Sequence[int] | Literal['suggested'] | None = None,
A class that encapsulates cuSOLVERDx LU factorization without pivoting and linear solver for general matrices.
Available operations:
factorize: Computes the LU factorization A = L @ U, where L is a unit lower triangular matrix and U is an upper triangular matrix.
solve: Solves the system Ax = B using a previously computed LU factorization.
Memory Layout Requirements:
Matrices must be stored in shared memory according to their arrangement and leading dimension (ld):
For matrix A (M x N):
Column-major arrangement: Matrix shape
(batches_per_block, M, N)with strides(lda * N, 1, lda)Row-major arrangement: Matrix shape
(batches_per_block, M, N)with strides(lda * M, lda, 1)
For matrix B (N x K):
Column-major arrangement: Matrix shape
(batches_per_block, N, K)with strides(ldb * K, 1, ldb)Row-major arrangement: Matrix shape
(batches_per_block, N, K)with strides(ldb * N, ldb, 1)
Note
If a nonsingular matrix A is diagonal dominant, then it is safe to factorize without pivoting. If a matrix is not diagonal dominant, then pivoting is usually required to ensure numerical stability (see
LUPivotSolver).- Parameters:
size (Sequence[int]) – Problem size specified as a sequence of 1 to 3 elements:
(M,)(treated as(M, M, 1)),(M, N)(treated as(M, N, 1)), or(M, N, K).MandNrepresent the dimensions of the matrix A used in factorization.Krepresents the number of columns in the right-hand side matrix B (dimensionsNxK) for thesolveoperation. To usesolve(),Nmust be equal toM, otherwise an exception will be thrown whensolver.solve()is used.precision (type[np.floating]) – The computation precision specified as a numpy float dtype. Currently supports:
numpy.float32,numpy.float64.execution (str) – A string specifying the execution method. Supported values:
'Block'.sm (ComputeCapability) – Target mathdx compute-capability.
transpose_mode (str, optional) – Transpose mode of matrix A for the solve operation. Can be one of:
'non_transposed','transposed','conj_transposed'. Defaults to'non_transposed'.arrangement (Sequence[str], optional) – Storage layout for matrices A and B, specified as a sequence of 2 elements
(arr_A, arr_B). Each element can be one of:'col_major','row_major'. Defaults to("col_major", "col_major").batches_per_block (int | Literal["suggested"], optional) – Number of batches to compute in parallel in a single CUDA block. Can be a non-zero integer or the string
'suggested'for automatic selection of an optimal value. We recommend using 1 for matrix A size larger than or equal to 16 x 16, and using'suggested'for smaller sizes to achieve optimal performance. Defaults to 1.data_type (str, optional) – The data type of the input matrices, can be one of:
'real','complex'. Defaults to'real'.leading_dimensions (Sequence[int], optional) – The leading dimensions for input matrices A and B, specified as a sequence of 2 elements (
lda,ldb) orNone. If not provided, it will be automatically deduced fromsizeandarrangement. Note: When provided in the constructor, leading dimensions are set at compile-time. To use runtime leading dimensions (avoiding recompilation for different leading dimensions), provide the leading dimension parameters directly to the device methods instead.block_dim (Sequence[int] | Literal["suggested"], optional) – The block dimension for launching the CUDA kernel, specified as a 1 to 3 integer sequence (x, y, z) where missing dimensions are assumed to be 1. Can be a sequence of 1 to 3 positive integers, the string
'suggested'for optimal value selection, orNonefor the default value.
Attributes
- a_arrangement#
- a_shape#
- arrangement#
- b_arrangement#
- b_shape#
- batches_per_block#
- block_dim#
- block_size#
- data_type#
- execution#
- info_shape#
- info_strides#
- info_type#
- k#
- lda#
- ldb#
- leading_dimensions#
- m#
- n#
- precision#
- size#
- sm#
- transpose_mode#
- value_type#
Methods
- factorize(a, info, lda=None) None[source]#
Computes the LU factorization of a general matrix A without pivoting.
This device function computes A = L @ U, where L is a unit lower triangular matrix and U is an upper triangular matrix. This variant is suitable for diagonally dominant matrices or when pivoting is not required. Uses cuSOLVERDx
'getrf_no_pivot'.If
ldais provided, uses runtime version with the specified leading dimension. Ifldais not provided (None), uses compile-time version with default or constructor-provided leading dimensions.Note
The
transpose_modeparameter does not affect factorization. This operation always treats the input matrix as-is (non-transposed).For more details, see: get_started/functions/getrf.html
- Parameters:
a – Pointer to an array in shared memory, storing the batched matrix according to the specified arrangement and leading dimension (see
__init__()). The matrix is overwritten in place. On exit, contains the factors L and U from the factorization A = L @ U. The unit diagonal elements of L are not stored.info – Pointer to a 1D array of
int32. On exit,info[batch_id] = 0indicates success for that batch,info[batch_id] = i > 0indicatesU(i,i)is exactly zero, meaning the factorization has been completed but the factor U is singular and division by zero will occur if it is used to solve a system of equations.lda – Optional runtime leading dimension of matrix A. If not specified, the compile-time
ldais used.
- solve(a, b, lda=None, ldb=None) None[source]#
Solves a system of linear equations Ax = B using the LU factorization without pivoting. The
aoperand must be a square matrix (M == N), otherwise this function will throw an exception.This device function uses the previously computed factorization A = L @ U to solve the system. Uses cuSOLVERDx
'getrs_no_pivot'.If
ldaandldbare provided, uses runtime version with the specified leading dimensions. If not provided (None), uses compile-time version with default or constructor-provided leading dimensions.Note
The
transpose_modeparameter (set in constructor) determines which system is solved: A*x=B ('non_transposed'), A^T*x=B ('transposed'), or A^H*x=B ('conj_transposed'for complex matrices).For more details, see: get_started/functions/getrs.html
- Parameters:
a – Pointer to an array in shared memory, storing the batched factors L and U from the LU factorization, according to the specified arrangement and leading dimension (see
__init__()). The unit diagonal elements of L are not stored. See thefactorize()documentation for details.b – Pointer to an array in shared memory, storing the batched matrix according to the specified arrangement and leading dimension (see
__init__()). The matrix is overwritten in place with the solution matrix x.lda – Optional runtime leading dimension of matrix A. The
ldaandldbmust be specified together. If not specified, the compile-timeldais used.ldb – Optional runtime leading dimension of matrix B. The
ldaandldbmust be specified together. If not specified, the compile-timeldbis used.