Description Operators#
Description operators define the problem we want to solve. Combined with execution operators, they form a complete function descriptor that can be executed on a GPU.
Operator |
Default Value |
Description |
---|---|---|
Not set |
Problem size for matrices |
|
Not set |
Choice of the function, e.g., |
|
|
Arrangement of Matrices |
|
|
Transpose mode of matrix |
|
Not set |
Side of matrix |
|
Not set |
Indications of whether the diagonal elements of matrix |
|
|
Fill mode of the symmetric (Hermitian) matrix |
|
|
Precision of matrices ( |
|
1 |
Number of batches to execute in parallel in a single CUDA thread block. |
|
|
Type of input and output data ( |
|
As defined by |
Leading dimensions for matrices |
|
Not set |
Target CUDA architecture for which the cuSolverDx function should be generated. |
Operators are added (in arbitrary order) to construct the operation descriptor type. For example, to describe a linear system solver after Cholesky decomposition for matrices A
(m x n
), B
(n x k
) with double-complex values, m = 16
, n = 16
, k = 4
, row major layout for both A
and B
, lower fill for A
, and execution on Ampere architecture, one would write:
#include <cusolverdx.hpp>
using Solver = decltype(cusolverdx::Size<16, 16, 4>()
+ cusolverdx::Precision<double>()
+ cusolverdx::Type<cusolverdx::type::complex>()
+ cusolverdx::Arrangement<cusolverdx::row_major>()
+ cusolverdx::FillMode<cusolverdx::lower>()
+ cusolverdx::Function<cusolverdx::function::posv>()
+ cusolverdx::SM<800>());
For a function descriptor to be complete, the following is required:
One, and only one, Size Operator.
One, and only one, Function Operator.
One, and only one, SM Operator.
Size Operator#
cusolverdx::Size<unsigned int M, unsigned int N = M, unsigned int K = 1>()
Sets the problem size of the function to be executed.
For POSV:
M
- number of rows in matrixA
.N
- number of columns in matricesA
.K
- number of right-hand sides in matrixB
.
For example, for POSV M
, N
, and K
specify that the inverse of the square A
(M x N
) matrix is multiplied by B
(M x K
) which results in X
(N x K
) matrix. If M
and N
are not equal, compile-time evaluation will fail leading to a compilation error with a message detailing the reason.
Some problems don’t require all three sizes to be defined. For example, for POTRF we only need M
because A
is square matrix and matrix B
is not involved, in which case we can simply use Size<M>()
.
Type Operator#
cusolverdx::Type<cusolverdx::type T>;
namespace cusolverdx {
enum class type
{
real,
complex
};
}
Sets the type of input and output data used in computation. Use type::real
for real data type, and type::complex
for complex data type.
Precision Operator#
cusolverdx::Precision<Prec>;
Sets the floating-point precision of A
, B
, and X
. cuSolverDx does not currently support mixed precision for A
, B
, and X
.
The supported precision type is float
or double
. The set precision is used for both input and output.
Arrangement Operator#
cusolverdx::Arrangement<cusolverdx::arrangement A, cusolverdx::arrangement B = A>;
namespace cusolverdx {
enum class arrangement
{
col_major,
row_major
};
inline constexpr auto col_major = arrangement::col_major;
inline constexpr auto row_major = arrangement::row_major;
}
Sets the storage layout for A
and B
matrices used in the function. The order can be either column-major or row-major.
TransposeMode Operator#
cusolverdx::TransposeMode<cusolverdx::transpose A>;
namespace cusolverdx {
enum class transpose
{
non_transposed,
transposed, // supported only for real data type
conj_transposed, // supported only for complex data type
};
inline constexpr auto non_trans = transpose::non_transposed;
inline constexpr auto trans = transpose::transposed;
inline constexpr auto conj_trans = transpose::conj_transposed;
}
Sets the transpose mode of matrix A
.
Note
TransposeMode operator is not supported with Cholesky functions, i.e., potrf
, potrs
, posv
. Specifying the operator with any of these functions will result in a compile-time error.
Only the non_transposed
mode is supported for the factorization-only functions, including geqrf
, gelqf
, getrf_no_pivot
, getrf_partial_pivot
. Specifying the operator to be either transposed
or conj_transposed
with any of these functions will result in a compile-time error.
For other functions, the default value non_transposed
is used if the operator is not specified.
Side Operator#
cusolverdx::Side<cusolverdx::side S>;
namespace cusolverdx {
enum class side
{
left,
right,
};
inline constexpr auto left = side::left;
inline constexpr auto right = side::right;
}
Sets the side of matrix A
in a multiplication operation.
Note
The Side operator is required to be specified when using the function unmqr
, unmlq
, or trsm
, and not supported for other functions.
Not defining the operator when using the function unmqr
, unmlq
, or trsm
, or defining the operator when using any other function will result in a compile-time error.
Diag Operator#
cusolverdx::Side<cusolverdx::side S>;
namespace cusolverdx {
enum class diag
{
non_unit,
unit,
};
}
Indicates whether the diagonal elements of matrix A
are ones or not.
Note
The Diag operator is required to be specified when using the function trsm
, and not supported for other functions.
Not defining the operator when using the function trsm
, or defining the operator with any other function will result in a compile-time error.
FillMode Operator#
cusolverdx::FillMode<cusolverdx::fill_mode A>;
namespace cusolverdx {
enum class fill_mode
{
upper,
lower,
};
inline constexpr auto upper = fill_mode::upper;
inline constexpr auto lower = fill_mode::lower;
}
Indicates which part (lower or upper) of the dense matrix A
is filled and consequently should be used by the function.
Note
The FillMode operator is required to be specified when using the function potrf
, potrs
, posv
, trsm
, and not supported for other functions.
Not defining the operator when using the function potrf
, potrs
, posv
, trsm
, or defining the operator when using any other function will result in a compile-time error.
BatchesPerBlock Operator#
cusolverdx::BatchesPerBlock<unsigned int BPB>()
Number of batches to compute in parallel within a single CUDA block. The default is 1
batch per block.
Note
Multiple BPB impacts directly on performance and the shared memory usage. We recommend using the default 1 for matrix A size larger or equal to 16 x 16, and for smaller size of A always use Suggested batches per block trait to get the optimal performance.
LeadingDimension Operator#
cusolverdx::LeadingDimension<unsigned int LDA, unsigned int LDB>()
Defines leading dimensions for matrices A
and B
. Note that the leading dimensions correspond to the shared memory storage of the matrices that are passed to cuSolverDx.
The leading dimension of a matrix is a stride (in elements) to the beginning of the next column for a column-major matrix or the next row for a row-major matrix.
Based on the Arrangement operator in the description of a cuSolverDx operation, the dimensions of the A
and B
matrices can be described in a following way:
Real dimensions of matrix
A
are \(LDA\times N\) with \(LDA >= M\) ifA
is column-major, or \(LDA\times M\) with \(LDA >= N\) ifA
is row-major.Real dimensions of matrix
B
are \(LDB\times K\) with \(LDA >= N\) ifB
is column-major, or \(LDB\times N\) with \(LDB >= K\) ifB
is row-major.
Note
In addition to the compile-time LeadingDimension Operator, cuSolverDx also supports runtime leading dimensions (see Execution Method). If runtime leading dimensions are passed to the execution methods, they override the compile-time values, and users need to call the get_shared_memory_size(lda, ldb) function to calculate the matrices’ shared memory size.
Function Operator#
cusolverdx::Function<cusolverdx::function F>()
namespace cusolverdx {
enum class function
{
potrf, // Cholesky Factorization
potrs, // Linear system solve after Cholesky factorization
posv, // Cholesky Factorization and Solve
getrf_no_pivot, // LU Factorization without pivoting
getrs_no_pivot, // LU Solve without pivoting
gesv_no_pivot, // LU Factor and Solve without pivoting
getrf_partial_pivot, // LU Factorization with partial pivoting
getrs_partial_pivot, // LU Solve with partial pivoting
gesv_partial_pivot, // LU Factor and Solve with partial pivoting
geqrf, // QR Factorization
gelqf, // LQ Factorization
unmqr, // Multiplication of Q from QR factorization
unmlq, // Multiplication of Q from LQ factorization
gels, // Least squares system solve after QR and LQ factorizations
trsm, // Triangular matrix-matrix solve
};
inline constexpr auto potrf = function::potrf;
inline constexpr auto potrs = function::potrs;
inline constexpr auto posv = function::posv;
inline constexpr auto getrf_no_pivot = function::getrf_no_pivot;
inline constexpr auto getrs_no_pivot = function::getrs_no_pivot;
inline constexpr auto gesv_no_pivot = function::gesv_no_pivot;
inline constexpr auto getrf_partial_pivot = function::getrf_partial_pivot;
inline constexpr auto getrs_partial_pivot = function::getrs_partial_pivot;
inline constexpr auto gesv_partial_pivot = function::gesv_partial_pivot;
inline constexpr auto geqrf = function::geqrf;
inline constexpr auto gelqf = function::gelqf;
inline constexpr auto unmqr = function::unmqr;
inline constexpr auto unmlq = function::unmlq;
inline constexpr auto gels = function::gels;
inline constexpr auto trsm = function::trsm;
}
Sets the Solver function to be executed. See cuSovlerDX functionality for details of the supported functions.
SM Operator#
cusolverdx::SM<unsigned int CC>()
Sets the target architecture CC
for the underlying Solver function to use. Supported architectures are:
Volta:
700
and720
(sm_70, sm_72).Turing:
750
(sm_75).Ampere:
800
,860
and870
(sm_80, sm_86, sm_87).Ada:
890
(sm_89).Hopper:
900
(sm_90).Blackwell:
1000
,1010
,1030
,1200
,1210
(sm_100, sm_101, sm_103, sm_120, sm_121).
Note
When compiling cuSolverDx for XYa
or XYf
compute capability use XY0
in the SM
operator (see also CUDA C++ Programming Guide: Feature Availability).
Warning
Starting with cuSolverDx 0.2.0, support for NVIDIA Xavier Tegra SoC (SM<720>
or sm_72
) is deprecated.
Warning
Support for architectures sm_103
and sm_121
is experimental in this release.
Warning
It is not guaranteed that executions of exactly the same complete function description on GPUs of different CUDA architectures will produce bit-identical results.