Other Traits#
The traits below are derived from the function description, and provided for the purpose of either cross check or performance improvement.
Trait |
Description |
---|---|
Verifies if a given Solver operation is supported on the provided CUDA architecture |
|
Recommended number of threads for computing Solver function that leads to the optimal performance. |
|
Recommended number of batches per block for computing Solver function that leads to the optimal performance. |
|
Recommended leading dimension of matrix |
|
Verifies if a given Solver operator is configured for block execution. |
is_supported Trait#
// true if Solver is supported on the provided CUDA architecture
cusolverdx::is_supported<Solver, Arch>::value;
cusolverdx::is_supported
checks whether a Solver
operation is supported on CUDA architecture Arch
, based on the shared memory size requirement.
Requirements of using the trait:
Solver
must have defined size, data type, arrangement, function, and batches per block. See Description Operators section.
Solver
must include Block operator.
Solver
can’t have defined target CUDA architecture via SM operator.
Example
using namespace cusolverdx;
using Solver = decltype(Size<128>() + Function<potrf> + Type<type::double>() +
Block() + Precision<real>());
cusolverdx::is_supported<Solver, 900>::value; // true
cusolverdx::is_supported<Solver, 800>::value; // true
cusolverdx::is_supported<Solver, 700>::value; // false
The shared memory size required to run Cholesky decomposition for a 128x128 double matrix is 128 KB, while the maximum amount of shared memory per thread block for Volta, Ampere, and Hopper is 96, 163, and 227 KB, respectively (see CUDA compute capability).
Suggested Block Dim Trait#
// dim3(X, Y==1, Z==1)
Solver::suggested_block_dim
Recommended number of threads for a Solver description, depending on the size, data type, GPU architecture, and set batches per thread block. Note that the suggested block dim is effectively a 1D value.
The suggested block dim is used if the BlockDim operator is not explicitly defined by the user.
Suggested Batches Per Block Trait#
// unsigned int
Solver::suggested_batches_per_block
Recommended optimal batches per block for a Solver description.
The following example shows how to use the trait:
First define a complete execution descriptor with required operators, including size, data type, function, SM, and the block operator;
Then form a second descriptor by adding the BatchesPerBlock operator using the trait.
Example
using Base = decltype(Size<14, 14>() + Precision<double>() + Type<type::complex>() +
Block() + Function<posv>() + SM<Arch>());
using POSV = decltype(Base() +
BatchesPerBlock<Base::suggested_batches_per_block>() +
FillMode<fill_mode::lower>() +
Arrangement<arrangement::row_major>());
suggested_leading_dimension_of Trait#
// cusolverdx::LeadingDimension<lda, ldb> type
cusolverdx::suggested_leading_dimension_of_t<Solver, Architecture>
// std::tuple<unsigned int, unsigned int>
cusolverdx::suggested_leading_dimension_of<Solver>::value
cusolverdx::suggested_leading_dimension_of_v<Solver>
// unsigned int
cusolverdx::suggested_leading_dimension_of<Solver>::a
cusolverdx::suggested_leading_dimension_of_v_a<Solver>
cusolverdx::suggested_leading_dimension_of<Solver>::b
cusolverdx::suggested_leading_dimension_of_v_b<Solver>
Recommended optimal leading dimensions for matrices A
and B
in shared memory storage of the matrices that are passed to cuSolverDx.
Returned values are strides (in elements) to the beginning of the next column for a column-major matrices or the next row for a row-major matrices. It is recommended to try them as in many cases it will lead to improved performance of the cuSolverDx operation.
Check LeadingDimension operator for more details about leading dimension feature in cuSolverDx.
is_block Trait#
// bool
cusolverdx::is_block<Sovler>::value
cusolverdx::is_block_v<Solver>
cusolverdx::is_block
checks whether a Solver
operation’s definition includes Block operator.