Other Traits#

The traits below are specific to block executions only. They are derived from the function description and provided for verifying shared memory requirements or improving performance.

Trait

Description

is_supported<Description, Arch>

Verifies if a given Solver operation is supported on the provided CUDA architecture Arch.

Description::suggested_block_dim

Recommended number of threads for computing Solver function that leads to the optimal performance.

Description::suggested_batches_per_block

Recommended number of batches per block for computing Solver function that leads to the optimal performance.

is_supported Trait#

// true if Solver is supported on the provided CUDA architecture
cusolverdx::is_supported<Solver, Arch>::value;
cusolverdx::is_supported_v<Solver, Arch>;

cusolverdx::is_supported checks whether a Solver operation is supported on CUDA architecture Arch, based on the shared memory size requirement for the operation, if the operation is executed in shared memory.

Requirements of using the trait:

  • Solver must have defined size, data type, arrangement, function, and batches per block. See Description Operators section.

  • Solver must include Block operator.

  • Solver can’t have defined target CUDA architecture via SM operator.

Example

using namespace cusolverdx;

using Solver = decltype(Size<160>() + Function<potrf>() + Type<type::real>() +
                        Block() + Precision<double>());
cusolverdx::is_supported<Solver, 1000>::value; // true
cusolverdx::is_supported<Solver,  900>::value; // true
cusolverdx::is_supported<Solver,  800>::value; // false

The shared memory size required to run Cholesky decomposition for a 160x160 double matrix is 200 KB, while the maximum amount of shared memory per thread block for Ampere, Hopper, and Blackwell is 163, 227, and 227 KB, respectively (see CUDA compute capability).

Important

The trait always returns true for thread execution functions, i.e., functions described with Thread Operator. Thread execution functions are intended for small problem sizes that could fit in registers, and typically operate on data in global memory. When using shared memory with thread execution functions, users are responsible for ensuring sufficient shared memory is allocated and within the thread block’s shared memory limits.

Suggested Block Dim Trait#

// dim3(X, Y==1, Z==1)
Solver::suggested_block_dim

Recommended number of threads for a Solver description, depending on the size, data type, GPU architecture, and set batches per thread block. Note that the suggested block dim is effectively a 1D value.

The suggested block dim is used if the BlockDim operator is not explicitly defined by the user.

Suggested Batches Per Block Trait#

// unsigned int
Solver::suggested_batches_per_block

Recommended optimal batches per block for a Solver description.

The following example shows how to use the trait:

  • First define a complete execution descriptor with required operators, including size, data type, function, SM, and the block operator;

  • Then form a second descriptor by adding the BatchesPerBlock operator using the trait.

Example

using Base = decltype(Size<14, 14>() + Precision<double>() + Type<type::complex>() +
              Block() + Function<posv>() + SM<Arch>());

using POSV = decltype(Base() +
                      BatchesPerBlock<Base::suggested_batches_per_block>() +
                      FillMode<fill_mode::lower>() +
                      Arrangement<arrangement::row_major>());