Description Operators#

Description operators define the problem we want to solve. Combined with execution operators, they form a complete function descriptor that can be executed on a GPU.

Table 2 List of Description Operators#

Operator

Default Value

Description

Function<func>

Not set

Choice of the function, e.g., potrf, getrf_no_pivot, geqrf etc., see the full list in Function Operator. Function names follow LAPACK’s convention.

Size<M, N=M, K=1>

Not set

Problem size for matrices used in the function.

Type<type>

type::real

Type of input and output data (type::real or type::complex).

Precision<Prec>

float

Precision of matrices (float or double).

Arrangement<Arr, Brr=Arr, Crr=Arr>

col_major

Arrangement of Matrices in the function.

LeadingDimension<LDA, LDB=LDA, LDC=LDA>

As defined by Size, Arrangement and Function operators

Leading dimensions for matrices in the function.

TransposeMode<transpose_mode>

non_trans

Transpose mode of matrix A (non_trans, trans for real data type only, conj_trans for complex data type only).

Side<side>

side::left

Side of matrix A in a multiplication operation (left or right).

Diag<diag>

Not set

Indications of whether the diagonal elements of matrix A are ones (unit) or not (non_unit).

FillMode<fill_mode>

lower

Fill mode of the symmetric (Hermitian) matrix A (lower or upper).

Job<job>

job::no_vectors

Option for computing eigenvectors in the function (no_vectors, all_vectors, multiply_vectors, overwrite_vectors).

SM<CC>

Not set

Target CUDA architecture for which the cuSolverDx function should be generated.

Operators are added (in arbitrary order) to construct the operation descriptor type. For example, to describe a linear system solver after Cholesky decomposition for matrices \(A\) (\(M \times N\)), \(B\) (\(N \times K\)) with double-complex values, \(M = 16\), \(N = 16\), \(K = 4\), row major layout for both \(A\) and \(B\), lower fill for \(A\), and execution on Hopper architecture, one would write:

#include <cusolverdx.hpp>

using Solver = decltype(cusolverdx::Size<16, 16, 4>()
              + cusolverdx::Precision<double>()
              + cusolverdx::Type<cusolverdx::type::complex>()
              + cusolverdx::Arrangement<cusolverdx::row_major>()
              + cusolverdx::FillMode<cusolverdx::lower>()
              + cusolverdx::Function<cusolverdx::function::posv>()
              + cusolverdx::SM<900>());

For a function descriptor to be complete, the following is required:

Function Operator#

cusolverdx::Function<cusolverdx::function F>()

namespace cusolverdx {
  enum class function
  {
    potrf,                // Cholesky Factorization
    potrs,                // Linear system solve after Cholesky factorization
    posv,                 // Cholesky Factorization and Solve
    getrf_no_pivot,       // LU Factorization without pivoting
    getrs_no_pivot,       // LU Solve without pivoting
    gesv_no_pivot,        // LU Factor and Solve without pivoting
    getrf_partial_pivot,  // LU Factorization with partial pivoting
    getrs_partial_pivot,  // LU Solve with partial pivoting
    gesv_partial_pivot,   // LU Factor and Solve with partial pivoting
    gtsv_no_pivot,        // Tridiagonal Solve without pivoting
    modified_lu,          // Modified LU Factorization for unitary matrix
    geqrf,                // QR Factorization
    gelqf,                // LQ Factorization
    unmqr,                // Multiplication of Q from QR factorization
    unmlq,                // Multiplication of Q from LQ factorization
    ungqr,                // Generation of Q from QR factorization
    unglq,                // Generation of Q from LQ factorization
    gels,                 // Least squares system solve after QR and LQ factorizations
    trsm,                 // Triangular matrix-matrix solve
    htev,                 // Eigensolver for symmetric tridiagonal matrix
    heev,                 // Eigensolver for symmetric matrix
    bdsvd,                // Singular value decomposition for bidiagonal matrix
    gesvd,                // Singular value decomposition for general matrix
  };

  inline constexpr auto potrf               = function::potrf;
  inline constexpr auto potrs               = function::potrs;
  inline constexpr auto posv                = function::posv;
  inline constexpr auto getrf_no_pivot      = function::getrf_no_pivot;
  inline constexpr auto getrs_no_pivot      = function::getrs_no_pivot;
  inline constexpr auto gesv_no_pivot       = function::gesv_no_pivot;
  inline constexpr auto getrf_partial_pivot = function::getrf_partial_pivot;
  inline constexpr auto getrs_partial_pivot = function::getrs_partial_pivot;
  inline constexpr auto gesv_partial_pivot  = function::gesv_partial_pivot;
  inline constexpr auto gtsv_no_pivot       = function::gtsv_no_pivot;
  inline constexpr auto modified_lu         = function::modified_lu;
  inline constexpr auto geqrf               = function::geqrf;
  inline constexpr auto gelqf               = function::gelqf;
  inline constexpr auto unmqr               = function::unmqr;
  inline constexpr auto unmlq               = function::unmlq;
  inline constexpr auto ungqr               = function::ungqr;
  inline constexpr auto unglq               = function::unglq;
  inline constexpr auto gels                = function::gels;
  inline constexpr auto trsm                = function::trsm;
  inline constexpr auto htev                = function::htev;
  inline constexpr auto heev                = function::heev;
  inline constexpr auto bdsvd               = function::bdsvd;
  inline constexpr auto gesvd               = function::gesvd;
}

Sets the Solver function to be executed. See cuSolverDx functionality for details of the supported functions.

Size Operator#

cusolverdx::Size<unsigned int M, unsigned int N = M, unsigned int K = 1>()

Sets the problem size of the function to be executed.

Note that for different functions, the meaning of M, N, and K is different. For example, for POSV:

  • M - number of rows in matrix A.

  • N - number of columns in matrix A.

  • K - number of right-hand sides in matrix B.

Here M, N, and K specify that the inverse of the square A (M x N) matrix is multiplied by B (M x K) which results in X (N x K) matrix. If M and N are not equal, compile-time evaluation will fail leading to a compilation error with a message detailing the reason.

Some problems don’t require all three sizes to be defined. For example, for POTRF we only need M because A is square matrix and matrix B is not involved, in which case we can simply use Size<M>(). The table below describes the required sizes for each function.

Table 3 Size Operator Usage by Function#

Function

Size Operator

Detailed Description

potrf

Size<M>

M is the size of square matrix A (M x M).
N, if specified, must be equal to M.
K, if specified, is ignored.

potrs

Size<M, N, K>

M is the size of square matrix A (M x N).
N must be equal to M.
K is the number of right-hand sides in matrix B (M x K).

posv

Size<M, N, K>

M is the size of square matrix A (M x N).
N must be equal to M.
K is the number of right-hand sides in matrix B (M x K).

getrf_no_pivot

Size<M, N>

M is the number of rows in matrix A (M x N).
N is the number of columns in matrix A (M x N).
K, if specified, is ignored.

getrs_no_pivot

Size<M, N, K>

M is the size of square matrix A (M x N).
N must be equal to M.
K is the number of right-hand sides in matrix B (M x K).

gesv_no_pivot

Size<M, N, K>

M is the size of square matrix A (M x N).
N must be equal to M.
K is the number of right-hand sides in matrix B (M x K).

getrf_partial_pivot

Size<M, N>

M is the number of rows in matrix A (M x N).
N is the number of columns in matrix A (M x N).
K, if specified, is ignored.

getrs_partial_pivot

Size<M, N, K>

M is the size of square matrix A (M x N).
N must be equal to M.
K is the number of right-hand sides in matrix B (M x K).

gesv_partial_pivot

Size<M, N, K>

M is the size of square matrix A (M x N).
N must be equal to M.
K is the number of right-hand sides in matrix B (M x K).

gtsv_no_pivot

Size<M, N, K>

M is the size of square matrix A (M x N).
N must be equal to M.
K is the number of right-hand sides in matrix B (M x K).

modified_lu

Size<M, N>

M is the number of rows in matrix A (M x N).
N is the number of columns in matrix A (M x N).
K, if specified, is ignored.

geqrf

Size<M, N>

M is the number of rows in matrix A (M x N).
N is the number of columns in matrix A (M x N).
K, if specified, is ignored.

gelqf

Size<M, N>

M is the number of rows in matrix A (M x N).
N is the number of columns in matrix A (M x N).
K, if specified, is ignored.

unmqr

Size<M, N, K>

M is the number of rows in matrix C (M x N).
N is the number of columns in matrix C (M x N).
K is the number of Householder reflectors to apply.
The size of matrix A is (M x K) if left side multiplication, and (N x K) if right side multiplication.

unmlq

Size<M, N, K>

M is the number of rows in matrix C (M x N).
N is the number of columns in matrix C (M x N).
K is the number of rows in matrix Q used in multiplication.
The size of matrix A is (K x M) if left side multiplication, and (K x N) if right side multiplication.

ungqr

Size<M, N, K>

M is the number of rows in matrix A (M x N).
N is the number of columns in matrix A (M x N).
K is the number of Householder reflectors to apply.
M >= N >= K is required.

unglq

Size<M, N, K>

M is the number of rows in matrix A (M x N).
N is the number of columns in matrix A (M x N).
K is the number of Householder reflectors to apply.
N >= M >= K is required.

gels

Size<M, N, K>

M is the number of rows in matrix A (M x N).
N is the number of columns in matrix A (M x N).
K is the number of right-hand sides in matrix B.

trsm

Size<M, N>

M is the number of rows in matrix B (M x N).
N is the number of columns in matrix B (M x N).
K, if specified, is ignored.
The size of the triangular matrix A is (M x M) if left side multiplication, and (N x N) if right side multiplication.

htev

Size<M>

M is the size of symmetric or Hermitian tridiagonal matrix A (M x M).
N, if specified, is ignored.
K, if specified, is ignored.

heev

Size<M>

M is the size of symmetric or Hermitian matrix A (M x M).
N, if specified, has to be equal to M.
K, if specified, is ignored.

bdsvd

Size<M>

M is the size of bidiagonal matrix A (M x M).
N, if specified, is ignored.
K, if specified, is ignored.

gesvd

Size<M, N>

M is the number of rows in matrix A (M x N).
N is the number of columns in matrix A (M x N).
K, if specified, is ignored.

Type Operator#

cusolverdx::Type<cusolverdx::type T>;

namespace cusolverdx {
  enum class type
  {
    real,
    complex
  };
}

Sets the type of input and output data used in computation. Use type::real for real data type, and type::complex for complex data type.

Precision Operator#

cusolverdx::Precision<Prec>;

Sets the floating-point precision of A, B, and X. cuSolverDx does not currently support mixed precision for A, B, and X.

The supported precision type is float or double. The set precision is used for both input and output.

Arrangement Operator#

cusolverdx::Arrangement<cusolverdx::arrangement Arr, cusolverdx::arrangement Brr = Arr, cusolverdx::arrangement Crr = Arr>;

namespace cusolverdx {
  enum class arrangement
  {
    col_major,
    row_major
  };

  inline constexpr auto col_major  = arrangement::col_major;
  inline constexpr auto row_major  = arrangement::row_major;
}

Sets the storage layout for matrices used in the function. The order can be either column-major or row-major.

For most of functions, only the first one or two arrangement parameters, Arr and Brr are relevant. In case of the gesvd function, the Arr parameter is used to set the storage layout for the input matrix A, and Brr and Crr parameters set the storage layout for the left and right singular vectors matrices U and VT respectively.

LeadingDimension Operator#

cusolverdx::LeadingDimension<unsigned int LDA, unsigned int LDB = LDA, unsigned int LDC = LDA>()

Defines leading dimensions for matrices used in the function. For most of functions, only the first one or two leading dimension parameters are relevant. For the gesvd function, the LDA parameter is used to set the leading dimension for the input matrix A, and LDB and LDC parameters set the leading dimensions for the left and right singular vectors matrices U and VT respectively.

The leading dimension of a matrix is the stride (in elements) to the beginning of the next column for a column-major matrix, or the next row for a row-major matrix. For example, if the problem size of A is M x N, the dimensions of matrices A can be described as follows (refer to Size Operator for specific matrix sizes for different functions):

  • If A is column-major, the actual dimensions of matrix A are \(\mathrm{LDA} \times N\) with \(\mathrm{LDA} \geq M\).

  • If A is row-major, the actual dimensions of matrix A are \(M \times \mathrm{LDA}\) with \(\mathrm{LDA} \geq N\).

Important

The leading dimensions correspond to the memory layout of matrices passed to the cuSolverDx execution method. If matrices reside in global memory (e.g., in a thread execution function), the leading dimensions specify global memory strides. If matrices reside in shared memory, they specify shared memory strides.

In addition to the compile-time LeadingDimension operator, cuSolverDx supports runtime leading dimensions (see Execution Method). If runtime leading dimensions are passed to the execution methods, they override the compile-time values. When using shared memory, call the get_shared_memory_size(lda, ldu, ldvt) function to calculate the required shared memory size.

Tip

When matrices are stored in shared memory, the LeadingDimension operator can be used to add padding to avoid bank conflicts and improve performance.

TransposeMode Operator#

cusolverdx::TransposeMode<cusolverdx::transpose A>;

namespace cusolverdx {
  enum class transpose
  {
      non_transposed,
      transposed,       // supported only for real data type
      conj_transposed,  // supported only for complex data type
  };
  inline constexpr auto non_trans  = transpose::non_transposed;
  inline constexpr auto trans      = transpose::transposed;
  inline constexpr auto conj_trans = transpose::conj_transposed;
}

Sets the transpose mode of matrix A.

Note

TransposeMode operator is not supported with Cholesky functions, i.e., potrf, potrs, posv. Specifying the operator with any of these functions will result in a compile-time error.

Only the non_transposed mode is supported for some functions, including geqrf, gelqf, getrf_no_pivot, getrf_partial_pivot. Specifying the operator to be either transposed or conj_transposed with any of these functions will either result in a compile-time error or be ignored.

For other functions, the default value non_transposed is used if the operator is not specified.

Side Operator#

cusolverdx::Side<cusolverdx::side S>;

namespace cusolverdx {
  enum class side
  {
      left,
      right,
  };

  inline constexpr auto left  = side::left;
  inline constexpr auto right = side::right;
}

Sets the side of matrix A in a multiplication operation.

Note

The Side operator is required to be specified when using the function unmqr, unmlq, or trsm, and not supported for other functions.

Not defining the operator when using the function unmqr, unmlq, or trsm, or defining the operator when using any other function will either result in a compile-time error or be ignored.

Diag Operator#

cusolverdx::Diag<cusolverdx::diag D>;

namespace cusolverdx {
  enum class diag
  {
      non_unit,
      unit,
  };
}

Indicates whether the diagonal elements of matrix A are ones or not.

Note

The Diag operator is required to be specified when using the function trsm, and not supported for other functions.

Not defining the operator when using the function trsm, or defining the operator with any other function will either result in a compile-time error or be ignored.

FillMode Operator#

cusolverdx::FillMode<cusolverdx::fill_mode A>;

namespace cusolverdx {
  enum class fill_mode
  {
      upper,
      lower,
  };

  inline constexpr auto upper = fill_mode::upper;
  inline constexpr auto lower = fill_mode::lower;
  }

Indicates which part (lower or upper) of the dense matrix A is filled and consequently should be used by the function.

Note

The FillMode operator is required to be specified when using the function potrf, potrs, posv, trsm, and not supported for other functions.

Not defining the operator when using the function potrf, potrs, posv, trsm, or defining the operator when using any other function will either result in a compile-time error or be ignored.

Job Operator#

cusolverdx::Job<cusolverdx::job jobu, cusolverdx::job jobvt = jobu>;

namespace cusolverdx {
  enum class job
  {
      no_vectors,        // Don't compute any vectors
      all_vectors,       // Compute all vectors in a separate array
      some_vectors,      // Compute min(m, n) vectors in a separate array
      multiply_vectors,  // Compute all vectors and multiply them to the existing array content
      overwrite_vectors, // Compute min(m, n) vectors and overwrite the A matrix
  };
}

This operator controls whether eigenvectors or singular vectors are computed in the functions htev, heev, bdsvd, and gesvd.

Note

The Job operator applies only to htev, heev, bdsvd, and gesvd. If not specified, it defaults to no_vectors.

Using a value other than no_vectors with other functions will result in a compile-time error or be ignored.

Separate left and right singular vector options are only relevant for the SVD functions bdsvd and gesvd. The table below describes the Job operator configurations for each function.

Table 4 Job Operator Allowed Configurations by Function#

Function

jobu

jobvt

htev

  • Allowed: no_vectors, all_vectors, multiply_vectors

  • Not allowed: some_vectors, overwrite_vectors

jobvt == jobu required for symmetric/Hermitian eigensolver

heev

  • Allowed: no_vectors, overwrite_vectors

  • Not allowed: all_vectors, some_vectors, multiply_vectors

jobvt == jobu required for symmetric/Hermitian eigensolver

bdsvd

  • Allowed: no_vectors, all_vectors, some_vectors, multiply_vectors

  • Not allowed: overwrite_vectors

  • Allowed: no_vectors, all_vectors, some_vectors, multiply_vectors

  • Not allowed: overwrite_vectors

gesvd

  • Allowed: no_vectors, all_vectors, some_vectors, overwrite_vectors.

  • Not allowed: multiply_vectors

  • jobu and jobvt cannot both be overwrite_vectors.

  • Allowed: no_vectors, all_vectors, some_vectors, overwrite_vectors.

  • Not allowed: multiply_vectors.

SM Operator#

cusolverdx::SM<unsigned int CC>()

Sets the target architecture CC for the underlying Solver function to use. Supported architectures are:

  • Turing: 750 (sm_75).

  • Ampere: 800, 860 and 870 (sm_80, sm_86, sm_87).

  • Ada: 890 (sm_89).

  • Hopper: 900 (sm_90).

  • Blackwell: 1000, 1030, 1100, 1200, 1210 (sm_100, sm_103, sm_110, sm_120, sm_121).

Note

When compiling cuSolverDx for XYa or XYf compute capability use XY0 in the SM operator (see also CUDA C++ Programming Guide: Feature Availability).

Warning

Starting with cuSolverDx 0.4.0, support for NVIDIA Volta (SM<700> or sm_70) and NVIDIA Xavier Tegra SoC (SM<720> or sm_72) has been removed.

Warning

Support for architectures sm_103 and sm_121 is experimental in cuSolverDx 0.4.0.

Warning

It is not guaranteed that executions of exactly the same complete function description on GPUs of different CUDA architectures will produce bit-identical results.