Description Operators#
Description operators define the problem we want to solve. Combined with execution operators, they form a complete function descriptor that can be executed on a GPU.
Operator |
Default Value |
Description |
|---|---|---|
Not set |
Choice of the function, e.g., |
|
Not set |
Problem size for matrices used in the function. |
|
|
Type of input and output data ( |
|
|
Precision of matrices ( |
|
|
Arrangement of Matrices |
|
As defined by |
Leading dimensions for matrices |
|
|
Transpose mode of matrix |
|
|
Side of matrix |
|
Not set |
Indications of whether the diagonal elements of matrix |
|
|
Fill mode of the symmetric (Hermitian) matrix |
|
|
Option for computing eigenvectors in the function ( |
|
1 |
Number of batches to execute in parallel in a single CUDA thread block. |
|
Not set |
Target CUDA architecture for which the cuSolverDx function should be generated. |
Operators are added (in arbitrary order) to construct the operation descriptor type. For example, to describe a linear system solver after Cholesky decomposition for matrices A (m x n), B (n x k) with double-complex values, m = 16, n = 16, k = 4, row major layout for both A and B, lower fill for A, and execution on Ampere architecture, one would write:
#include <cusolverdx.hpp>
using Solver = decltype(cusolverdx::Size<16, 16, 4>()
+ cusolverdx::Precision<double>()
+ cusolverdx::Type<cusolverdx::type::complex>()
+ cusolverdx::Arrangement<cusolverdx::row_major>()
+ cusolverdx::FillMode<cusolverdx::lower>()
+ cusolverdx::Function<cusolverdx::function::posv>()
+ cusolverdx::SM<800>());
For a function descriptor to be complete, the following is required:
One, and only one, Size Operator.
One, and only one, Function Operator.
One, and only one, SM Operator.
Function Operator#
cusolverdx::Function<cusolverdx::function F>()
namespace cusolverdx {
enum class function
{
potrf, // Cholesky Factorization
potrs, // Linear system solve after Cholesky factorization
posv, // Cholesky Factorization and Solve
getrf_no_pivot, // LU Factorization without pivoting
getrs_no_pivot, // LU Solve without pivoting
gesv_no_pivot, // LU Factor and Solve without pivoting
getrf_partial_pivot, // LU Factorization with partial pivoting
getrs_partial_pivot, // LU Solve with partial pivoting
gesv_partial_pivot, // LU Factor and Solve with partial pivoting
geqrf, // QR Factorization
gelqf, // LQ Factorization
unmqr, // Multiplication of Q from QR factorization
unmlq, // Multiplication of Q from LQ factorization
ungqr, // Multiplication of Q from QR factorization
unglq, // Multiplication of Q from LQ factorization
gels, // Least squares system solve after QR and LQ factorizations
trsm, // Triangular matrix-matrix solve
htev, // Eigensolver for symmetric tridiagonal matrix
heev, // Eigensolver for symmetric matrix
bdsvd, // Singular value decomposition for bidiagonal matrix
gesvd, // Singular value decomposition for general matrix
};
inline constexpr auto potrf = function::potrf;
inline constexpr auto potrs = function::potrs;
inline constexpr auto posv = function::posv;
inline constexpr auto getrf_no_pivot = function::getrf_no_pivot;
inline constexpr auto getrs_no_pivot = function::getrs_no_pivot;
inline constexpr auto gesv_no_pivot = function::gesv_no_pivot;
inline constexpr auto getrf_partial_pivot = function::getrf_partial_pivot;
inline constexpr auto getrs_partial_pivot = function::getrs_partial_pivot;
inline constexpr auto gesv_partial_pivot = function::gesv_partial_pivot;
inline constexpr auto geqrf = function::geqrf;
inline constexpr auto gelqf = function::gelqf;
inline constexpr auto unmqr = function::unmqr;
inline constexpr auto unmlq = function::unmlq;
inline constexpr auto ungqr = function::ungqr;
inline constexpr auto unglq = function::unglq;
inline constexpr auto gels = function::gels;
inline constexpr auto trsm = function::trsm;
inline constexpr auto htev = function::htev;
inline constexpr auto heev = function::heev;
inline constexpr auto bdsvd = function::bdsvd;
inline constexpr auto gesvd = function::gesvd;
}
Sets the Solver function to be executed. See cuSovlerDX functionality for details of the supported functions.
Size Operator#
cusolverdx::Size<unsigned int M, unsigned int N = M, unsigned int K = 1>()
Sets the problem size of the function to be executed.
Note that for different functions, the meaning of M, N, and K is different. For example, for POSV:
M- number of rows in matrixA.N- number of columns in matrixA.K- number of right-hand sides in matrixB.
Here M, N, and K specify that the inverse of the square A (M x N) matrix is multiplied by B (M x K) which results in X (N x K) matrix. If M and N are not equal, compile-time evaluation will fail leading to a compilation error with a message detailing the reason.
Some problems don’t require all three sizes to be defined. For example, for POTRF we only need M because A is square matrix and matrix B is not involved, in which case we can simply use Size<M>(). The table below describes the required sizes for each function.
Function |
Size Operator |
Detailed Description |
|---|---|---|
|
|
M is the size of square matrix A (M x M).N, if specified, must be equal to M.K, if specified, is ignored. |
|
|
M is the size of square matrix A (M x N).N must be equal to M.K is the number of right-hand sides in matrix B (M x K). |
|
|
M is the size of square matrix A (M x N).N must be equal to M.K is the number of right-hand sides in matrix B (M x K). |
|
|
M is the number of rows in matrix A (M x N).N is the number of columns in matrix A (M x N).K, if specified, is ignored. |
|
|
M is the size of square matrix A (M x N).N must be equal to M.K is the number of right-hand sides in matrix B (M x K). |
|
|
M is the size of square matrix A (M x N).N must be equal to M.K is the number of right-hand sides in matrix B (M x K). |
|
|
M is the number of rows in matrix A (M x N).N is the number of columns in matrix A (M x N).K, if specified, is ignored. |
|
|
M is the size of square matrix A (M x N).N must be equal to M.K is the number of right-hand sides in matrix B (M x K). |
|
|
M is the size of square matrix A (M x N).N must be equal to M.K is the number of right-hand sides in matrix B (M x K). |
|
|
M is the size of square matrix A (M x N).N must be equal to M.K is the number of right-hand sides in matrix B (M x K). |
|
|
M is the number of rows in matrix A (M x N).N is the number of columns in matrix A (M x N).K, if specified, is ignored. |
|
|
M is the number of rows in matrix A (M x N).N is the number of columns in matrix A (M x N).K, if specified, is ignored. |
|
|
M is the number of rows in matrix C (M x N).N is the number of columns in matrix C (M x N).K is the number of Householder reflectors to apply.The size of matrix
A is (M x K) if left side multiplication, and (N x K) if right side multiplication. |
|
|
M is the number of rows in matrix C (M x N).N is the number of columns in matrix C (M x N).K is the number of rows in matrix Q used in multiplication.The size of matrix
A is (K x M) if left side multiplication, and (K x N) if right side multiplication. |
|
|
M is the number of rows in matrix A (M x N).N is the number of columns in matrix A (M x N).K is the number of Householder reflectors to apply.M >= N >= K is required. |
|
|
M is the number of rows in matrix A (M x N).N is the number of columns in matrix A (M x N).K is the number of Householder reflectors to apply.N >= M >= K is required. |
|
|
M is the number of rows in matrix A (M x N).N is the number of columns in matrix A (M x N).K is the number of right-hand sides in matrix B. |
|
|
M is the number of rows in matrix B (M x N).N is the number of columns in matrix B (M x N).K, if specified, is ignored.The size of the triangular matrix
A is (M x M) if left side multiplication, and (N x N) if right side multiplication. |
|
|
M is the size of symmetric or Hermitian tridiagonal matrix A (M x M).N, if specified, is ignored.K, if specified, is ignored. |
|
|
M is the size of symmetric or Hermitian matrix A (M x M).N, if specified, has to be equal to M.K, if specified, is ignored. |
|
|
M is the size of bidiagonal matrix A (M x M).N, if specified, is ignored.K, if specified, is ignored. |
|
|
M is the number of rows in matrix A (M x N).N is the number of columns in matrix A (M x N).K, if specified, is ignored. |
Type Operator#
cusolverdx::Type<cusolverdx::type T>;
namespace cusolverdx {
enum class type
{
real,
complex
};
}
Sets the type of input and output data used in computation. Use type::real for real data type, and type::complex for complex data type.
Precision Operator#
cusolverdx::Precision<Prec>;
Sets the floating-point precision of A, B, and X. cuSolverDx does not currently support mixed precision for A, B, and X.
The supported precision type is float or double. The set precision is used for both input and output.
Arrangement Operator#
cusolverdx::Arrangement<cusolverdx::arrangement A, cusolverdx::arrangement B = A>;
namespace cusolverdx {
enum class arrangement
{
col_major,
row_major
};
inline constexpr auto col_major = arrangement::col_major;
inline constexpr auto row_major = arrangement::row_major;
}
Sets the storage layout for A and B matrices used in the function. The order can be either column-major or row-major.
LeadingDimension Operator#
cusolverdx::LeadingDimension<unsigned int LDA, unsigned int LDB>()
Defines leading dimensions for matrices A and B. Note that the leading dimensions correspond to the shared memory storage of the matrices that are passed to cuSolverDx.
The leading dimension of a matrix is a stride (in elements) to the beginning of the next column for a column-major matrix or the next row for a row-major matrix.
Based on the Arrangement operator in the description of a cuSolverDx operation, the dimensions of the A and B matrices can be described in a following way:
Real dimensions of matrix
Aare \(LDA\times N\) with \(LDA >= M\) ifAis column-major, or \(LDA\times M\) with \(LDA >= N\) ifAis row-major.Real dimensions of matrix
Bare \(LDB\times K\) with \(LDA >= N\) ifBis column-major, or \(LDB\times N\) with \(LDB >= K\) ifBis row-major.
Note
In addition to the compile-time LeadingDimension Operator, cuSolverDx also supports runtime leading dimensions (see Execution Method). If runtime leading dimensions are passed to the execution methods, they override the compile-time values, and users need to call the get_shared_memory_size(lda, ldb) function to calculate the problem’s required shared memory size.
TransposeMode Operator#
cusolverdx::TransposeMode<cusolverdx::transpose A>;
namespace cusolverdx {
enum class transpose
{
non_transposed,
transposed, // supported only for real data type
conj_transposed, // supported only for complex data type
};
inline constexpr auto non_trans = transpose::non_transposed;
inline constexpr auto trans = transpose::transposed;
inline constexpr auto conj_trans = transpose::conj_transposed;
}
Sets the transpose mode of matrix A.
Note
TransposeMode operator is not supported with Cholesky functions, i.e., potrf, potrs, posv. Specifying the operator with any of these functions will result in a compile-time error.
Only the non_transposed mode is supported for some functions, including geqrf, gelqf, getrf_no_pivot, getrf_partial_pivot. Specifying the operator to be either transposed or conj_transposed with any of these functions will either result in a compile-time error or be ignored.
For other functions, the default value non_transposed is used if the operator is not specified.
Side Operator#
cusolverdx::Side<cusolverdx::side S>;
namespace cusolverdx {
enum class side
{
left,
right,
};
inline constexpr auto left = side::left;
inline constexpr auto right = side::right;
}
Sets the side of matrix A in a multiplication operation.
Note
The Side operator is required to be specified when using the function unmqr, unmlq, or trsm, and not supported for other functions.
Not defining the operator when using the function unmqr, unmlq, or trsm, or defining the operator when using any other function will either result in a compile-time error or be ignored.
Diag Operator#
cusolverdx::Side<cusolverdx::side S>;
namespace cusolverdx {
enum class diag
{
non_unit,
unit,
};
}
Indicates whether the diagonal elements of matrix A are ones or not.
Note
The Diag operator is required to be specified when using the function trsm, and not supported for other functions.
Not defining the operator when using the function trsm, or defining the operator with any other function will either result in a compile-time error or be ignored.
FillMode Operator#
cusolverdx::FillMode<cusolverdx::fill_mode A>;
namespace cusolverdx {
enum class fill_mode
{
upper,
lower,
};
inline constexpr auto upper = fill_mode::upper;
inline constexpr auto lower = fill_mode::lower;
}
Indicates which part (lower or upper) of the dense matrix A is filled and consequently should be used by the function.
Note
The FillMode operator is required to be specified when using the function potrf, potrs, posv, trsm, and not supported for other functions.
Not defining the operator when using the function potrf, potrs, posv, trsm, or defining the operator when using any other function will either result in a compile-time error or be ignored.
Job Operator#
cusolverdx::Job<cusolverdx::job J>;
namespace cusolverdx {
enum class job
{
no_vectors, // Don't compute any eigenvectors
all_vectors, // Compute all eigenvectors in a seperate storage array
multiply_vectors, // Compute all eigenvectors an multiply them to the existing array content
overwrite_vectors, // Compute eigenvectors and overwrite the input A matrix
};
}
The operator sets the option for computing eigenvectors in the function, htev or heev.
Note
The Job operator is only relevant when using the function htev, heev. If not specified, the default value no_vectors is used.
Defining the operator with any other value than no_vectors when using any of the functions except htev and heev will either result in a compile-time error or be ignored.
BatchesPerBlock Operator#
cusolverdx::BatchesPerBlock<unsigned int BPB>()
Number of batches to compute in parallel within a single CUDA block. The default is 1 batch per block.
Note
Multiple BPB impacts directly on performance and the shared memory usage. We recommend using the default 1 for matrix A size larger or equal to 16 x 16, and for smaller size of A always use Suggested batches per block trait to get the optimal performance.
SM Operator#
cusolverdx::SM<unsigned int CC>()
Sets the target architecture CC for the underlying Solver function to use. Supported architectures are:
Volta:
700and720(sm_70, sm_72).Turing:
750(sm_75).Ampere:
800,860and870(sm_80, sm_86, sm_87).Ada:
890(sm_89).Hopper:
900(sm_90).Blackwell:
1000,1010,1030,1200,1210(sm_100, sm_101, sm_103, sm_120, sm_121).
Note
When compiling cuSolverDx for XYa or XYf compute capability use XY0 in the SM operator (see also CUDA C++ Programming Guide: Feature Availability).
Warning
Starting with cuSolverDx 0.2.0, support for NVIDIA Xavier Tegra SoC (SM<720> or sm_72) is deprecated.
Warning
Support for architectures sm_103 and sm_121 is experimental in cuSolverDx 0.3.0.
Warning
It is not guaranteed that executions of exactly the same complete function description on GPUs of different CUDA architectures will produce bit-identical results.