Execution Operators#
Execution operators configure how the function will run on the GPU. Combined with description operators, they form a complete function descriptor that can be executed on a GPU.
Operator |
Description |
---|---|
Creates block execution object. See Block Configuration Operators. |
Block Operator#
cusolverdx::Block()
Generates a collective operation to run in a single CUDA block. Threads will cooperate to compute the collective operation. The layout and the number of threads participating in the execution, can be configured using Block Configuration Operators.
For example, the following code example creates a function descriptor for a Solver function that will run in a single CUDA block:
#include <cusolverdx.hpp>
using Solver = decltype(Size<32>()
+ Precision<double>()
+ Type<type::real>()
+ Function<function::potrf>()
+ FillMode<lower>
+ Arrangement<row_major>()
+ cusolverdx::SM<700>()
+ cusolverdx::Block());
Block Configuration Operators#
Block-configuration operators allow the user to configure block size of a single CUDA block.
Operators |
Default value |
Description |
---|---|---|
Based on heuristics |
Number of threads used to perform Solver function. |
Note
Block configuration operators can only be used with Block Operator.
Warning
It is not guaranteed that executions of exactly the same Solver function with exactly the same inputs but with different
CUDA architecture (SM), or
number of threads (BlockDim)
number of batches per block (BatchesPerBlock)
will produce bit-identical results.
BlockDim Operator#
struct cusolverdx::BlockDim<unsigned int X, unsigned int Y, unsigned int Z>()
Sets the CUDA block size to (X, Y, Z)
to configure the execution, meaning it sets number of threads participating
in the execution. Block dimension can be accessed via Solver::block_dim
trait.
If the BlockDim operator is not set, the default block dimensions are used (the default value is Solver::suggested_block_dim). The suggested block dimensions lead to optimal performance for the cuSolverDx function in most cases, but if the function is fused with other operations, we recommend measuring the performance of the kernel first then experimenting with different values (see Performance).
If the set block dimensions BlockDim<X, Y, Z>
is different from the suggested block dimensions, the cuSolverDx operations will still run correctly.
Warning
cuSolverDx enforces the following requirement for kernel launch configurations:
The kernel must be launched with exactly the block dimensions
Solver::block_dim
, which equalsBlockDim<X, Y, Z>
if the BlockDim operator is specified, or if not,BlockDim<Solver::suggested_block_dim>
.
Note
cuSolverDx can’t validate all kernel launch configuration at runtime and check that the requirement is met, thus it is user’s responsibility to adhere to the rules listed above. Violating those rules is considered undefined behavior and can lead to incorrect results and/or failures.