Execution Operators#

Execution operators configure how the function will run on the GPU. Combined with description operators, they form a complete function descriptor that can be executed on a GPU.

Operator	Description
`Block`	Creates block execution object. See Block Configuration Operators.

Block Operator#

cusolverdx::Block()

Generates a collective operation to run in a single CUDA block. Threads will cooperate to compute the collective operation. The layout and the number of threads participating in the execution, can be configured using Block Configuration Operators.

For example, the following code example creates a function descriptor for a Solver function that will run in a single CUDA block:

#include <cusolverdx.hpp>

using namespace cusolverdx;
using Solver = decltype(Size<32>()
                    + Precision<double>()
                    + Type<type::real>()
                    + Function<function::potrf>()
                    + FillMode<lower>
                    + Arrangement<row_major>()
                    + SM<700>()
                    + Block());

Block Configuration Operators#

Block-configuration operators allow the user to configure block size of a single CUDA block.

Operators	Default value	Description
`BlockDim<X, Y, Z>`	Based on heuristics	Number of threads used to perform Solver function.

Note

Block configuration operators can only be used with Block Operator.

Warning

It is not guaranteed that executions of exactly the same Solver function with exactly the same inputs but with different

CUDA architecture (SM), or
number of threads (BlockDim)
number of batches per block (BatchesPerBlock)

will produce bit-identical results.

BlockDim Operator#

struct cusolverdx::BlockDim<unsigned int X, unsigned int Y, unsigned int Z>()

Sets the CUDA block size to (X, Y, Z) to configure the execution, meaning it sets number of threads participating in the execution. Block dimension can be accessed via Solver::block_dim trait.

If the BlockDim operator is not set, the default block dimensions are used (the default value is Solver::suggested_block_dim). The suggested block dimensions lead to optimal performance for the cuSolverDx function in most cases, but if the function is fused with other operations, we recommend measuring the performance of the kernel first then experimenting with different values (see Performance).

If the set block dimensions BlockDim<X, Y, Z> is different from the suggested block dimensions, the cuSolverDx operations will still run correctly.

Warning

cuSolverDx enforces the following requirement for kernel launch configurations:

The kernel must be launched with exactly the block dimensions Solver::block_dim, which equals BlockDim<X, Y, Z> if the BlockDim operator is specified, or if not, BlockDim<Solver::suggested_block_dim>.

Note

cuSolverDx can’t validate all kernel launch configurations at runtime and check that the requirement is met, thus it is user’s responsibility to adhere to the rules listed above. Violating those rules is considered undefined behavior and can lead to incorrect results and/or failures.