Traits¶
Traits provide the user with information about the function descriptor constructed using Operators. They are divided into Description Traits and Execution Traits.
Description Traits¶
Trait |
Description |
---|---|
Problem size we intend to solve. |
|
Data type used, either |
|
Precision of the underlying floating-point values used, |
|
Function to be executed. |
|
Transpose mode of the matrix - |
|
Target architecture for the underlying computation. |
|
|
|
|
|
|
|
|
Description traits can be retrieved from an function descriptor using the helper functions provided. For example:
#include <cublasdx.hpp>
using GEMM = decltype(cublasdx::Size<32, 32, 64>()
+ cublasdx::Precision<double>()
+ cublasdx::Type<cublasdx::type::real>()
+ cublasdx::TransposeMode<cublasdx::T, cublasdx::N>()
+ cublasdx::Function<cublasdx::function::MM>()
+ cublasdx::SM<700>()
+ cublasdx::Block());
if(cublasdx::is_complete_blas<GEMM>::value)
std::cout << "GEMM (M x N x K): "
<< cublasdx::size_of<GEMM>::m << " x "
<< cublasdx::size_of<GEMM>::n << " x "
<< cublasdx::size_of<GEMM>::k << std::endl;
Size Trait¶
// std::tuple<unsigned int, unsigned int, unsigned int>
cublasdx::size_of<BLAS>::value
cublasdx::size_of_v<BLAS>
// undefined int
cublasdx::size_of<BLAS>::m
cublasdx::size_of<BLAS>::n
cublasdx::size_of<BLAS>::k
size_of
trait gives size of the problem we want to solve, as set by Size Operator.
If the descriptor was not created using a Size Operator, compilation will fail with an error message.
Type Trait¶
// cublasdx::type
cublasdx::type_of<BLAS>::value
cublasdx::type_of_v<BLAS>
Data type (cublasdx::type::real
or cublasdx::type::complex
) used in the function, as set by Type Operator.
Precision Trait¶
// Precision type
cublasdx::precision_of<BLAS>::type
cublasdx::precision_of_t<BLAS>
Floating-point precision of the data and operation, as set by Precision Operator.
Function Trait¶
// cublasdx::function
cublasdx::function_of<BLAS>::value
cublasdx::function_of_v<BLAS>
Function to be executed, as set by Function Operator. If the descriptor was not created using a Function Operator, compilation will fail with an error message.
Transpose Mode Trait¶
// cublasdx::transpose_mode
cublasdx::transpose_mode_of<BLAS>::a_transpose_mode
cublasdx::transpose_mode_of<BLAS>::b_transpose_mode
Transpose mode of A
, and B
matrices.
SM Trait¶
// unsigned int
cublasdx::sm_of<BLAS>::value
cublasdx::sm_of_v<BLAS>
GPU architecture used to run the function. For example, gives 700
for Volta (sm_70).
is_blas Trait¶
// bool
cublasdx::is_blas<BLAS>::value
Trait is true
if the descriptor is a function description, formed with Description Operators.
is_blas_execution Trait¶
// bool
cublasdx::is_blas_execution<BLAS>::value
Trait is true
if the descriptor is a function description configured for execution, formed with Description Operators and Execution Operators.
is_complete_blas Trait¶
// bool
cublasdx::is_complete_blas<BLAS>::value
Trait is true
if the descriptor is a complete function description, formed with Description Operators.
Note
Complete in this context means that the descriptor has been formed with all the necessary Description Operators and it is only missing an Execution Operators to be able to run.
For an function descriptor to be complete, the following is required:
One, and only one, Size Operator.
One, and only one, TransposeMode Operator.
One, and only one, Function Operator.
One, and only one, SM Operator.
is_complete_blas_execution Trait¶
// bool
cublasdx::is_complete_blas_execution<BLAS>::value
Trait is true
if both is_blas_execution Trait and is_complete_blas Trait are true
.
Note
If is_complete_blas_execution Trait is true
for a descriptor descriptor
, then we can use the Execution Methods
to execute the function.
Execution Traits¶
Execution traits can be retrieved directly from a BLAS descriptor that has been configured with Execution Operators. The available execution traits may depend on the operator used to build the descriptor. Right now, Block Operator is the only available execution operator.
Block Traits¶
Trait |
Default value |
Description |
---|---|---|
|
Type of input/output data, as well as computation type. |
|
|
Type of the input data. |
|
|
Type of the output data (results). |
|
Determined by the problem size and transpose mode. |
Logical dimensions of |
|
Determined by the problem size or set via LeadingDimension Operator |
Leading dimensions of matrices |
|
Determined by the problem size and set leading dimensions. |
Number of elements in |
|
Determined by the problem size and set leading dimensions. |
The size of the shared memory in bytes required to allocate input and output matrices. |
|
Base on heuristic or set via BlockDim Operator |
Value of type |
|
Based on heuristic. |
Recommended number of threads for computing BLAS function represented as CUDA block dimensions ( |
|
|
Number of threads in |
Block traits can be retrieved from descriptors built with Block Operator.
For example:
#include <cublasdx.hpp>
Value Type Trait¶
BLAS::value_type
Type of input/output data, as well as type used for computations.
The default type is float
. It can be a real of complex type depending on set Type Operator.
Matrix Dim Trait¶
// tuple<unsigned int, unsigned int>
BLAS::a_dim
BLAS::b_dim
BLAS::c_dim
Logical dimensions of matrices A
, B
, C
in form of (rows, columns)
tuple.
The dimensions are determined by the problem size and transpose mode.
See GEMM, Size Operator and TransposeMode Operator.
Leading Dimension Trait¶
// unsigned int
BLAS::lda
BLAS::ldb
BLAS::ldc
Leading dimensions of matrices A
, B
, C
.
See GEMM, Size Operator and LeadingDimension Operator.
Matrix Size Trait¶
// unsigned int
BLAS::a_size
BLAS::b_size
BLAS::c_size
Number of elements in A
, B
, C
matrices. It includes padding determined by set leading dimensions.
See GEMM, Size Operator and LeadingDimension Operator.
Block Dim Trait¶
// dim3
BLAS::block_dim
Value of type dim3
representing number of threads that will be used to perform requested BLAS function. It is
equal to the specified (or default if not) CUDA block dimensions. See BlockDim Operator.
If BlockDim is not used in BLAS
description the default value of BLAS::block_dim
is equal to BLAS::suggested_block_dim.
Suggested Block Dim Trait¶
// dim3
BLAS::suggested_block_dim
Recommended number of threads for BLAS description.
Max Threads Per Block Trait¶
BLAS::max_threads_per_block
Number of thread is recommended or set block dimensions, X * Y * Z
where X, Y, Z = Description::block_dim
.
Other Traits¶
is_supported¶
namespace cublasdx {
// BLAS - BLAS description without CUDA architecture defined using SM operator
// Architecture - unsigned integer representing CUDA architecture (SM)
template<class BLAS, unsigned int Architecture>
struct is_supported : std::bool_constant<...> { };
// Helper variable template
template<class BLAS, unsigned int Architecture>
inline constexpr bool is_supported_v<BLAS, Architecture> = is_supported<BLAS, Architecture>::value;
}
// true if BLAS is supported on the provided CUDA architecture
cublasdx::is_supported<BLAS, Architecture>::value;
cublasdx::is_supported
checks whether a BLAS
operation is supported on Architecture
CUDA architecture.
// true if BLAS is supported on the provided CUDA architecture
cublasdx::is_supported<BLAS, Architecture>::value;
Requirements:
BLAS
must have defined size, function, and transpose mode. See Description Operators section.
BLAS
must include Block operator.
BLAS
can’t have defined target CUDA architecture via SM operator.
Example
using namespace cublasdx;
using BLAS = decltype(Size<128, 128, 128>() + Type<type::real>() + Block() + Precision<float>());
cublasdx::is_supported<BLAS, 900>::value; // true
cublasdx::is_supported<BLAS, 800>::value; // false
cublasdx::is_supported<BLAS, 700>::value; // false
using BLAS = decltype(Size<96, 96, 96>() + Type<type::real>() + Block() + Precision<float>());
cublasdx::is_supported<BLAS, 900>::value; // true
cublasdx::is_supported<BLAS, 800>::value; // true
cublasdx::is_supported<BLAS, 700>::value; // false
suggested_leading_dimension_of¶
namespace cublasdx {
// BLAS - BLAS description without CUDA architecture defined using SM operator
// Architecture - unsigned integer representing CUDA architecture (SM)
template<class BLAS, unsigned int Architecture>
struct suggested_leading_dimension_of {
static constexpr unsigned int lda;
static constexpr unsigned int ldb;
static constexpr unsigned int ldc;
using type = LeadingDimension<lda, ldb, ldc>;
}
// LeadingDimension operator with suggested leading dimensions
template<class BLAS, unsigned int Architecture>
using suggested_leading_dimension_of_t = typename suggested_leading_dimension_of<BLAS, Architecture>::type;
}
Type cublasdx::suggested_leading_dimension_of
provides suggested leading dimensions for matrices A
, B
, and C
.
It is recommended to try them as in many cases it will lead to improved performance of the BLAS operation.
It might be especially helpful when M
, N
, K
are powers of two or multiples of 16.
You can review cuBLASDx preformance examples.
Requirements:
BLAS
must have defined size, function, and transpose mode. See Description Operators section.
BLAS
must include Block operator.Using suggested leading dimensions for architecture
A1
must not be used with architectureA2
.
Example
using namespace cublasdx;
using BLAS1 = decltype(Size<128, 128, 128>() + Type<type::real>() + Block() + Precision<float>());
using SuggestedLD = cublasdx::suggested_leading_dimension_of_t<BLAS, 900>;
using BLAS2 = decltype(BLAS1() + SuggestedLD() + SM<900>());