The nvmath.device module is experimental and potentially subject to future changes.

Solver#

A class that encapsulates a partial dense matrix factorization and solve device function.

Memory Layout Requirements:

Matrices must be stored in shared memory according to their arrangement and leading dimension (ld):

Column-major arrangement: Matrix shape (batches_per_block, rows, cols)
with strides (ld * cols, 1, ld)
Row-major arrangement: Matrix shape (batches_per_block, rows, cols)
with strides (ld * rows, ld, 1)

Parameters:

function (str) – Solver function to be executed by the execute() method. List of available options: 'potrf' (Cholesky factorization), 'potrs' (Linear system solve after Cholesky factorization), 'posv' (Fused Cholesky factorization with solve), 'getrf_no_pivot' (LU factorization without pivoting), 'getrs_no_pivot' (LU solve without pivoting), 'gesv_no_pivot' (Fused LU without pivoting factorization with solve), 'getrf_partial_pivot' (LU factorization with partial pivoting), 'getrs_partial_pivot' (LU solve with partial pivoting), 'gesv_partial_pivot' (Fused LU with partial pivoting factorization with solve), 'geqrf' (QR factorization), 'unmqr' (Multiplication of Q from QR factorization), 'gelqf' (LQ factorization), 'unmlq' (Multiplication of Q from LQ factorization), 'trsm' (Triangular matrix-matrix solve), 'gels' (Overdetermined or underdetermined least squares problems), 'ungqr' (Q matrix generation from geqrf factorization), 'unglq' (Q matrix generation from gelqf factorization), 'htev' (Eigenvalue solver for Hermitian tridiagonal matrices), 'heev' (Eigenvalue solver for Hermitian dense matrices), 'gtsv_no_pivot' (General tridiagonal matrix solve without pivoting). Functions 'ungqr', 'unglq', 'htev', 'heev', 'gtsv_no_pivot' require libmathdx 0.3.2 or later.
size (Sequence[int]) – Problem size specified as a sequence of 1 to 3 elements: (M,) (treated as (M, M, 1)), (M, N) (treated as (M, N, 1)), or (M, N, K). Please refer to cuSOLVERDx functionalities for detailed meaning.
precision (type[np.floating]) – The computation precision specified as a numpy float dtype. Currently supports: numpy.float32, numpy.float64.
execution (str) – A string specifying the execution method. Supported values: 'Block'.
sm (ComputeCapability) – Target mathdx compute-capability.
arrangement (Sequence[str], optional) – Storage layout for matrices A and B, specified as a sequence of 1 or 2 elements (arr_A, arr_B). Each element can be one of: 'col_major', 'row_major'. Defaults to ("col_major", "col_major").
transpose_mode (str, optional) – Transpose mode of matrix A. Refer to cuSOLVERDx documentation, as some functions do not support this option. Can be one of: 'non_transposed', 'transposed', 'conj_transposed'. Defaults to 'non_transposed'.
side (str, optional) – Side of matrix A in a multiplication operation. Required and supported only by functions: 'unmqr', 'unmlq', 'trsm'. Can be one of: 'left', 'right'.
diag (str, optional) – Indicates whether the diagonal elements of matrix A are ones or not. Required and supported only by 'trsm' functions. Can be one of: 'unit', 'non_unit'.
fill_mode (str, optional) – Indicates which part of matrix A is filled and should be used by function. Required and supported only by functions: 'potrf', 'potrs', 'posv', 'trsm', 'heev'. Can be one of: 'upper', 'lower'.
batches_per_block (int | Literal["suggested"], optional) – Number of batches to compute in parallel in a single CUDA block. Can be a positive integer or the string 'suggested' for automatic selection of an optimal value. We recommend using 1 for matrix A size larger than or equal to 16 x 16, and using 'suggested' for smaller sizes to achieve optimal performance. Defaults to 1.
data_type (str, optional) – The data type of the input matrices, can be one of: 'real', 'complex'. Defaults to 'real'.
leading_dimensions (Sequence[int], optional) – The leading dimensions for input matrices A and B, specified as a sequence of 1 or 2 elements (lda, ldb) or None. If not provided, it will be automatically deduced from size and arrangement.
block_dim (Sequence[int] | Literal["suggested"], optional) – The block dimension for launching the CUDA kernel, specified as a 1 to 3 integer sequence (x, y, z) where missing dimensions are assumed to be 1. Can be a sequence of 1 to 3 positive integers, the string 'suggested' for optimal value selection, or None for the default value.
job (str, optional) – Job type for eigenvalue computation. Required and supported only by functions: 'htev', 'heev'. For 'htev': 'no_vectors', 'all_vectors', 'multiply_vectors'. For 'heev': 'no_vectors', 'overwrite_vectors'. Requires libmathdx 0.3.2 or later.