Description Operators#

Description operators define the problem we want to solve. Combined with execution operators, they form a complete function descriptor that can be executed on a GPU.

Operator	Default Value	Description
`Size<M, N, NRHS>`	Not set	Problem size for matrices `A`, `B`.
`Function<func>`	Not set	Choice of function (`potrf`, `potrs`, `posv`, `getrf_no_pivot`, `getrs_no_pivot`, `gesv_no_pivot`, see Functionality. Function names follow LAPACK’s convention)
`Arrangement<arrangement A, arrangment B = A>`	`col_major`, `col_major`	Arrangement of Matrices `A`, `B`.
`FillMode<fill_mode>`	`lower`	Fill mode of the symmetric (Hermitian) matrix `A`. Ignored if the function does not expect a symmetric nor Hermitian matrix.
`Precision<Prec>`	`float`	Precision of matrices (`float` or `double`).
`BatchesPerBlock<BPB>`	1	Number of batches to execute in parallel in a single CUDA thread block.
`Type<type>`	`type::real`	Type of input and output data (`type::real` or `type::complex`).
`LeadingDimension<LDA, LDB=LDA>`	As defined by `Size` and `Arrangement` operators	Leading dimensions for matrices `A`, `B`.
`SM<CC>`	Not set	Target CUDA architecture for which the cuSolverDx function should be generated.

Operators are added (in arbitrary order) to construct the operation descriptor type. For example, to describe a linear system solver after Cholesky decomposition for matrices A (m x n), B (n x nrhs) with double-complex values, m = 16, n = 16, nrhs = 4, row major layout for both A and B, lower fill for A, and execution on Ampere architecture, one would write:

#include <cusolverdx.hpp>

using Solver = decltype(cusolverdx::Size<16, 16, 4>()
              + cusolverdx::Precision<double>()
              + cusolverdx::Type<cusolverdx::type::complex>()
              + cusolverdx::Arrangement<cusolverdx::row_major>()
              + cusolverdx::FillMode<cusolverdx::lower>()
              + cusolverdx::Function<cusolverdx::function::posv>()
              + cusolverdx::SM<800>());

For a function descriptor to be complete, the following is required:

One, and only one, Size Operator.

One, and only one, Function Operator.

One, and only one, SM Operator.

Size Operator#

cusolverdx::Size<unsigned int M, unsigned int N = M, unsigned int NRHS = 1>()

Sets the problem size of the function to be executed.

For POSV:

M - number of rows in matrix A.
N - number of columns in matrices A.
NRHS - number of right-hand sides in matrix B.

For example, for POSV M, N, and RHS specify that the inverse of the square A (M x N) matrix is multiplied by B (M x NRHS) which results in X (N x NRHS) matrix. If M and N are not equal, compile-time evaluation will fail leading to a compilation error with a message detailing the reason.

Some problems don’t require all three sizes to be defined. For example, for POTRF we only need M because A is square matrix and matrix B is not involved, in which case we can simply use Size<M>().

Type Operator#

cubladx::Type<cusolverdx::type T>;

namespace cusolverdx {
  enum class type
  {
    real,
    complex
  };
}

Sets the type of input and output data used in computation. Use type::real for real data type, and type::complex for complex data type.

Precision Operator#

cusolverdx::Precision<Prec>;

Sets the floating-point precision of A, B, and X. cuSolverDx does not currently support mixed precision for A, B, and X.

The supported precision type is float or double. The set precision is used for both input and output.

Arrangement Operator#

cusolverdx::Arrangement<cusolverdx::arrangement A, cusolverdx::arrangement B = A>;

namespace cusolverdx {
  enum class arrangement
  {
    col_major,
    row_major
  };

  inline constexpr auto col_major   = arrangement::col_major;
  inline constexpr auto row_major   = arrangement::row_major;
}

Sets the storage layout for A and B matrices used in the function. The order can be either column-major or row-major.

FillMode Operator#

cusolverdx::FillMode<cusolverdx::fill_mode A>;

namespace cusolverdx {
  enum class fill_mode
  {
      upper,
      lower,
  };

  inline constexpr auto upper = fill_mode::upper;
  inline constexpr auto lower = fill_mode::lower;
  }

Indicates which part (lower or upper) of the dense matrix A is filled and consequently should be used by the function.

BatchesPerBlock Operator#

cusolverdx::BatchesPerBlock<unsigned int BPB>()

Number of batches to compute in parallel within a single CUDA block. The default is 1 batch per block.

Note

Multiple BPB impacts directly on performance and the shared memory usage. We recommend using the default 1 for matrix A size larger or equal to 16 x 16, and for smaller size of A always use Suggested batches per block trait to get the optimal performance.

LeadingDimension Operator#

cusolverdx::LeadingDimension<unsigned int LDA, unsigned int LDB>()

Defines leading dimensions for matrices A and B. Note that the leading dimensions correspond to the shared memory storage of the matrices that are passed to cuSolverDx.

The leading dimension of a matrix is a stride (in elements) to the beginning of the next column for a column-major matrix or the next row for a row-major matrix.

Based on the Arrangement operator in the description of a cuSolverDx operation, the dimensions of the A and B matrices can be described in a following way:

Real dimensions of matrix A are \(LDA\times N\) with \(LDA >= M\) if A is column-major, or \(LDA\times M\) with \(LDA >= N\) if A is row-major.
Real dimensions of matrix B are \(LDB\times NRHS\) with \(LDA >= N\) if B is column-major, or \(LDB\times N\) with \(LDB >= NRHS\) if B is row-major.

Note

In addition to the compile-time LeadingDimension Operator, cuSolverDx also supports runtime leading dimensions (see Execution Method). If runtime leading dimensions are passed to the execution methods, they override the compile-time values, and users need to call the get_shared_memory_size(lda, ldb) function to calculate the matrices’ shared memory size.

Function Operator#

cusolverdx::Function<cusolverdx::function F>()

namespace cusolverdx {
  enum class function
  {
    potrf,               // Cholesky Factorization
    potrs,               // Linear system solve after Cholesky factorization
    posv,                // Cholesky Factorization and Solve
    getrf_no_pivot,      // LU Factorization without pivoting
    getrs_no_pivot,      // LU Solve without pivoting
    gesv_no_pivot,       // LU Factor and Solve without pivoting
  };

  inline constexpr auto potrf = function::potrf;
  inline constexpr auto potrs = function::potrs;
  inline constexpr auto posv  = function::posv;
  inline constexpr auto getrf_no_pivot      = function::getrf_no_pivot;
  inline constexpr auto getrs_no_pivot      = function::getrs_no_pivot;
  inline constexpr auto gesv_no_pivot       = function::gesv_no_pivot;
}

Sets the Solver function to be executed. See cuSovlerDX functionality for details of the supported functions.

SM Operator#

cusolverdx::SM<unsigned int CC>()

Sets the target architecture CC for the underlying Solver function to use. Supported architectures are:

Volta: 700 and 720 (sm_70, sm_72).
Turing: 750 (sm_75).
Ampere: 800, 860 and 870 (sm_80, sm_86, sm_87).
Ada: 890 (sm_89).
Hopper: 900 (sm_90, sm_90a).

Note

When compiling cuSolverDx for 9.0a compute capability use 900 in the SM operator (see also CUDA C++ Programming Guide: Feature Availability).

Warning

It is not guaranteed that executions of exactly the same complete function description on GPUs of different CUDA architectures will produce bit-identical results.