Traits

Traits provide the user with information about the function descriptor constructed using Operators. They are divided into Description Traits and Execution Traits.


Description Traits

Trait

Description

size_of<Description>

Problem size we intend to solve.

type_of<Description>

Data type used, either type::real or type::complex.

precision_of<Description>

Precision of the underlying floating-point values used, half, double, or float.

function_of<Description>

Function to be executed.

transpose_mode_of<Description>

Transpose mode of the matrix - transpose_mode::non_transposed, transpose_mode::transposed, and transpose_mode::conj_transposed.

sm_of<Description>

Target architecture for the underlying computation.

is_blas<Description>

true if Description is a function description, formed with Description Operators.

is_blas_execution<Description>

true if Description is a function description, configured for execution with Execution Operators.

is_complete_blas<Description>

true if Description is a valid and complete function description, formed with Description Operators.

is_complete_blas_execution<Description>

true if is_complete_blas<Description> is true and is_blas_execution<Description> is true.

Description traits can be retrieved from an function descriptor using the helper functions provided. For example:

#include <cublasdx.hpp>

using GEMM = decltype(cublasdx::Size<32, 32, 64>()
              + cublasdx::Precision<double>()
              + cublasdx::Type<cublasdx::type::real>()
              + cublasdx::TransposeMode<cublasdx::T, cublasdx::N>()
              + cublasdx::Function<cublasdx::function::MM>()
              + cublasdx::SM<700>()
              + cublasdx::Block());

if(cublasdx::is_complete_blas<GEMM>::value)
  std::cout << "GEMM (M x N x K): "
    << cublasdx::size_of<GEMM>::m << " x "
    << cublasdx::size_of<GEMM>::n << " x "
    << cublasdx::size_of<GEMM>::k << std::endl;

Size Trait

// std::tuple<unsigned int, unsigned int, unsigned int>
cublasdx::size_of<BLAS>::value
cublasdx::size_of_v<BLAS>

// undefined int
cublasdx::size_of<BLAS>::m
cublasdx::size_of<BLAS>::n
cublasdx::size_of<BLAS>::k

size_of trait gives size of the problem we want to solve, as set by Size Operator. If the descriptor was not created using a Size Operator, compilation will fail with an error message.

Type Trait

// cublasdx::type
cublasdx::type_of<BLAS>::value
cublasdx::type_of_v<BLAS>

Data type (cublasdx::type::real or cublasdx::type::complex) used in the function, as set by Type Operator.

Precision Trait

// Precision type
cublasdx::precision_of<BLAS>::type
cublasdx::precision_of_t<BLAS>

Floating-point precision of the data and operation, as set by Precision Operator.

Function Trait

// cublasdx::function
cublasdx::function_of<BLAS>::value
cublasdx::function_of_v<BLAS>

Function to be executed, as set by Function Operator. If the descriptor was not created using a Function Operator, compilation will fail with an error message.

Transpose Mode Trait

// cublasdx::transpose_mode
cublasdx::transpose_mode_of<BLAS>::a_transpose_mode
cublasdx::transpose_mode_of<BLAS>::b_transpose_mode

Transpose mode of A, and B matrices.

SM Trait

// unsigned int
cublasdx::sm_of<BLAS>::value
cublasdx::sm_of_v<BLAS>

GPU architecture used to run the function. For example, gives 700 for Volta (sm_70).

is_blas Trait

// bool
cublasdx::is_blas<BLAS>::value

Trait is true if the descriptor is a function description, formed with Description Operators.

is_blas_execution Trait

// bool
cublasdx::is_blas_execution<BLAS>::value

Trait is true if the descriptor is a function description configured for execution, formed with Description Operators and Execution Operators.

is_complete_blas Trait

// bool
cublasdx::is_complete_blas<BLAS>::value

Trait is true if the descriptor is a complete function description, formed with Description Operators.

Note

Complete in this context means that the descriptor has been formed with all the necessary Description Operators and it is only missing an Execution Operators to be able to run.

For an function descriptor to be complete, the following is required:

is_complete_blas_execution Trait

// bool
cublasdx::is_complete_blas_execution<BLAS>::value

Trait is true if both is_blas_execution Trait and is_complete_blas Trait are true.

Note

If is_complete_blas_execution Trait is true for a descriptor descriptor, then we can use the Execution Methods to execute the function.

Execution Traits

Execution traits can be retrieved directly from a BLAS descriptor that has been configured with Execution Operators. The available execution traits may depend on the operator used to build the descriptor. Right now, Block Operator is the only available execution operator.

Block Traits

Trait

Default value

Description

Description::value_type

float

Type of input/output data, as well as computation type.

Description::input_type

Description::value_type

Type of the input data.

Description::output_type

Description::value_type

Type of the output data (results).

Description::<a/b/c>_dim

Determined by the problem size and transpose mode.

Logical dimensions of A, B, C matrices determined by the problem size and transpose mode.

Description::ld<a/b/c>

Determined by the problem size or set via LeadingDimension Operator

Leading dimensions of matrices A, B, C.

Description::<a/b/c>_size

Determined by the problem size and set leading dimensions.

Number of elements in A, B, C matrices. Includes padding determined by set leading dimensions.

Description::shared_memory_size

Determined by the problem size and set leading dimensions.

The size of the shared memory in bytes required to allocate input and output matrices.

Description::block_dim

Base on heuristic or set via BlockDim Operator

Value of type dim3 representing number of threads computing BLAS function. Set or recommended CUDA block dimensions.

Description::suggested_block_dim

Based on heuristic.

Recommended number of threads for computing BLAS function represented as CUDA block dimensions (dim3 value).

Description::max_threads_per_block

X * Y * Z where X, Y, Z = Description::block_dim

Number of threads in Description::block_dim.

Block traits can be retrieved from descriptors built with Block Operator.

For example:

#include <cublasdx.hpp>

Value Type Trait

BLAS::value_type

Type of input/output data, as well as type used for computations.

Determined by Precision Type

The default type is float. It can be a real of complex type depending on set Type Operator.

Input Type Trait

BLAS::input_type

Type of input data.

Same as Value Type Trait.

Output Type Trait

BLAS::output_type

Type of output data.

Same as Value Type Trait.

Matrix Dim Trait

// tuple<unsigned int, unsigned int>
BLAS::a_dim
BLAS::b_dim
BLAS::c_dim

Logical dimensions of matrices A, B, C in form of (rows, columns) tuple. The dimensions are determined by the problem size and transpose mode.

See GEMM, Size Operator and TransposeMode Operator.

Leading Dimension Trait

// unsigned int
BLAS::lda
BLAS::ldb
BLAS::ldc

Leading dimensions of matrices A, B, C.

See GEMM, Size Operator and LeadingDimension Operator.

Matrix Size Trait

// unsigned int
BLAS::a_size
BLAS::b_size
BLAS::c_size

Number of elements in A, B, C matrices. It includes padding determined by set leading dimensions.

See GEMM, Size Operator and LeadingDimension Operator.

Shared Memory Size Trait

// unsigned int
BLAS::shared_memory_size

unsigned int BLAS::get_shared_memory_size()
// Calculates shared memory size for dynamic leading dimensions
unsigned int BLAS::get_shared_memory_size(unsigned int lda, unsigned int ldb, unsigned int ldc)

The size of the shared memory in bytes required to allocate input and output matrices, and perform computations.

The default value is determined by the problem size and leading dimensions.

Block Dim Trait

// dim3
BLAS::block_dim

Value of type dim3 representing number of threads that will be used to perform requested BLAS function. It is equal to the specified (or default if not) CUDA block dimensions. See BlockDim Operator.

If BlockDim is not used in BLAS description the default value of BLAS::block_dim is equal to BLAS::suggested_block_dim.

Suggested Block Dim Trait

// dim3
BLAS::suggested_block_dim

Recommended number of threads for BLAS description.

Max Threads Per Block Trait

BLAS::max_threads_per_block

Number of thread is recommended or set block dimensions, X * Y * Z where X, Y, Z = Description::block_dim.

Other Traits

is_supported

namespace cublasdx {
  // BLAS - BLAS description without CUDA architecture defined using SM operator
  // Architecture - unsigned integer representing CUDA architecture (SM)
  template<class BLAS, unsigned int Architecture>
  struct is_supported : std::bool_constant<...> { };

  // Helper variable template
  template<class BLAS, unsigned int Architecture>
  inline constexpr bool is_supported_v<BLAS, Architecture> = is_supported<BLAS, Architecture>::value;
}

// true if BLAS is supported on the provided CUDA architecture
cublasdx::is_supported<BLAS, Architecture>::value;

cublasdx::is_supported checks whether a BLAS operation is supported on Architecture CUDA architecture.

// true if BLAS is supported on the provided CUDA architecture
cublasdx::is_supported<BLAS, Architecture>::value;

Requirements:

  • BLAS must have defined size, function, and transpose mode. See Description Operators section.

  • BLAS must include Block operator.

  • BLAS can’t have defined target CUDA architecture via SM operator.

Example

using namespace cublasdx;

using BLAS = decltype(Size<128, 128, 128>() + Type<type::real>() + Block() + Precision<float>());
cublasdx::is_supported<BLAS, 900>::value; // true
cublasdx::is_supported<BLAS, 800>::value; // false
cublasdx::is_supported<BLAS, 700>::value; // false

using BLAS = decltype(Size<96, 96, 96>() + Type<type::real>() + Block() + Precision<float>());
cublasdx::is_supported<BLAS, 900>::value; // true
cublasdx::is_supported<BLAS, 800>::value; // true
cublasdx::is_supported<BLAS, 700>::value; // false

suggested_leading_dimension_of

namespace cublasdx {
  // BLAS - BLAS description without CUDA architecture defined using SM operator
  // Architecture - unsigned integer representing CUDA architecture (SM)
  template<class BLAS, unsigned int Architecture>
  struct suggested_leading_dimension_of {
      static constexpr unsigned int lda;
      static constexpr unsigned int ldb;
      static constexpr unsigned int ldc;

      using type = LeadingDimension<lda, ldb, ldc>;
  }

  // LeadingDimension operator with suggested leading dimensions
  template<class BLAS, unsigned int Architecture>
  using suggested_leading_dimension_of_t = typename suggested_leading_dimension_of<BLAS, Architecture>::type;
}

Type cublasdx::suggested_leading_dimension_of provides suggested leading dimensions for matrices A, B, and C. It is recommended to try them as in many cases it will lead to improved performance of the BLAS operation. It might be especially helpful when M, N, K are powers of two or multiples of 16. You can review cuBLASDx preformance examples.

Requirements:

  • BLAS must have defined size, function, and transpose mode. See Description Operators section.

  • BLAS must include Block operator.

  • Using suggested leading dimensions for architecture A1 must not be used with architecture A2.

Example

using namespace cublasdx;

using BLAS1 = decltype(Size<128, 128, 128>() + Type<type::real>() + Block() + Precision<float>());
using SuggestedLD = cublasdx::suggested_leading_dimension_of_t<BLAS, 900>;
using BLAS2 = decltype(BLAS1() + SuggestedLD() + SM<900>());