TriangularSolver#

class nvmath.device.TriangularSolver( size: Sequence[int], precision: type[floating], execution: str, side: str, fill_mode: str, diag: str, transpose_mode: str = 'non_transposed', *, sm=None, arrangement: Sequence[str] | None = None, batches_per_block: int | Literal['suggested'] | None = None, data_type: str | None = None, leading_dimensions: Sequence[int] | None = None, block_dim: Sequence[int] | Literal['suggested'] | None = None, )[source]#

A class that encapsulates triangular matrix-matrix solve device function ('trsm').

TRSM (TRiangular Solve for Matrix) solves a triangular linear system with multiple right-hand sides:

op(A) * X = B (if side='left')
X * op(A) = B (if side='right')

where:

A is the input batched triangular matrix stored in lower or upper mode
B is the batched right-hand side matrix, overwritten by the result X on exit
Operation op(A) indicates if matrix A is 'non_transposed', 'transposed' (for real data type), or 'conj_transposed' (for complex data type)

Memory Layout Requirements:

Matrices must be stored in shared memory according to their arrangement and leading dimension (ld):

For matrix A (M x M) with ``side=’left’``:

Column-major arrangement: Matrix shape (batches_per_block, M, M) with strides (lda * M, 1, lda)
Row-major arrangement: Matrix shape (batches_per_block, M, M) with strides (lda * M, lda, 1)

For matrix A (N x N) with ``side=’right’``:

Column-major arrangement: Matrix shape (batches_per_block, N, N) with strides (lda * N, 1, lda)
Row-major arrangement: Matrix shape (batches_per_block, N, N) with strides (lda * N, lda, 1)

For matrix B (M x N):

Column-major arrangement: Matrix shape (batches_per_block, M, N) with strides (ldb * N, 1, ldb)
Row-major arrangement: Matrix shape (batches_per_block, M, N) with strides (ldb * M, ldb, 1)

Note

The TRSM function is temporarily exposed in cuSolverDx library and will be moved to cuBLASDx library in a future release.

Parameters:

size (Sequence[int]) – Problem size specified as a sequence of 1 to 3 elements: (M,) (treated as (M, M, 1)), (M, N) (treated as (M, N, 1)), or (M, N, K). M and N represent the dimensions of matrices A and B. When side='left', A is M x M, otherwise when side='right', A is N x N. B is always M x N.
precision (type[np.floating]) – The computation precision specified as a numpy float dtype. Currently supports: numpy.float32, numpy.float64.
execution (str) – A string specifying the execution method. Supported values: 'Block'.
sm (ComputeCapability) – Target mathdx compute-capability.
side (str) – Side of matrix A in the triangular solve operation (required for TRSM). Can be one of: 'left', 'right'. If side='left', solves op(A) * X = B where A is M x M. If side='right', solves X * op(A) = B where A is N x N.
fill_mode (str) – Indicates which part of triangular matrix A is filled and should be used. Can be one of: 'upper', 'lower'. For lower fill mode, only the diagonal and lower triangular part of A is processed, the upper part is untouched. For upper fill mode, only the diagonal and upper triangular part of A is processed, the lower part is untouched.
diag (str) – Indicates whether the diagonal elements of matrix A are unity or not. Can be one of: 'unit', 'non_unit'. For unit diagonal mode, the diagonal elements of A are unity and are not accessed. For non-unit diagonal mode, the diagonal elements of A are used in the computation.
transpose_mode (str, optional) – Transpose mode for operation op(A) applied to matrix A. Can be one of: 'non_transposed', 'transposed', 'conj_transposed'. Defaults to 'non_transposed'.
arrangement (Sequence[str], optional) – Storage layout for matrices A and B, specified as a sequence of 2 elements (arr_A, arr_B). Each element can be one of: 'col_major', 'row_major'. Defaults to ("col_major", "col_major").
batches_per_block (int | Literal["suggested"], optional) – Number of batches to compute in parallel in a single CUDA block. Can be a non-zero integer or the string 'suggested' for automatic selection of an optimal value. We recommend using 1 for matrix A size larger than or equal to 16 x 16, and using 'suggested' for smaller sizes to achieve optimal performance. Defaults to 1.
data_type (str, optional) – The data type of the input matrices, can be one of: 'real', 'complex'. Defaults to 'real'.
leading_dimensions (Sequence[int], optional) – The leading dimensions for input matrices A and B, specified as a sequence of 2 elements (lda, ldb) or None. If not provided, it will be automatically deduced from size and arrangement. Note: When provided in the constructor, leading dimensions are set at compile-time. To use runtime leading dimensions (avoiding recompilation for different leading dimensions), provide the leading dimension parameters directly to the device methods instead.
block_dim (Sequence[int] | Literal["suggested"], optional) – The block dimension for launching the CUDA kernel, specified as a 1 to 3 integer sequence (x, y, z) where missing dimensions are assumed to be 1. Can be a sequence of 1 to 3 positive integers, the string 'suggested' for optimal value selection, or None for the default value.