Traits

Traits provide the user with information about the FFT description constructed using Operators. They are divided into three categories:


Description Traits

Trait

Default value

Description

size_of<Description>::value

None.

Size of the FFT to compute.

type_of<Description>::value

fft_type::c2c

Type of the FFT operation, either fft_type::c2c, fft_type::r2c or fft_type::c2r.

direction_of<Description>::value

See Direction Trait.

Direction of the FFT operation, either fft_direction::inverse or fft_direction::forward.

precision_of<Description>::type

float

Type of the underlying floating-point values used to compute the FFT: double, float or __half.

is_fft<Description>::value

None.

true if Description is an FFT description, formed with Description Operators.

is_fft_execution<Description>::value

None.

true if Description is an FFT description, configured with Execution Operators.

is_complete_fft<Description>::value

None.

true if Description is a valid FFT description, formed with Description Operators.

is_complete_fft_execution<Description>::value

None.

true if is_complete_fft<Description>::value is true and is_fft_execution<Description>::value is true.

Description traits can be retrieved from an FFT descriptor using the helper functions provided. For example:

#include <iostream>
#include <cufftdx.hpp>

using FFT = decltype( cufftdx::Size<8>() + cufftdx::Type<fft_type::c2c>()
                      + cufftdx::Direction<fft_direction::forward>()
                      + cufftdx::Precision<double>() + cufftdx::Thread() );

if(cufftdx::is_complete<FFT>::value)
  std::cout << "Size of the FFT operation: " << cufftdx::size_of<FFT>::value << std::endl;

Size Trait

cufftdx::size_of<FFT>::value

Size of the FFT to compute, as set by Size Operator.

There is no default size. If the descriptor was not created using a Size Operator, compilation will fail with an error message.

Type Trait

cufftdx::type_of<FFT>::value

Type of the FFT operation, as set by Type Operator.

The default type is complex-to-complex, fft_type::c2c.

Direction Trait

cufftdx::direction_of<FFT>::value

Direction of the FFT operation, as set by Direction Operator.

Default direction:

  • If the FFT type is fft_type::r2c, the default direction is fft_direction::forward.

  • If the FFT type is fft_type::c2r, the default direction is fft_direction::inverse.

  • For any other type, there is no default direction. If the descriptor was not created using a Direction Operator, compilation will fail with an error message.

Precision Trait

cufftdx::precision_of<FFT>::type

Floating-point precision of the FFT operation, as set by Precision Operator.

The default precision is float.

Is FFT? Trait

cufftdx::is_fft<FFT>::value

Trait is true if the descriptor is an FFT description, formed with Description Operators.

There is no default value. The descriptor either is or is not an FFT description.

Is FFT Execution? Trait

cufftdx::is_fft_execution<FFT>::value

Trait is true if the descriptor is an FFT description, formed with Description Operators and a Execution Operators.

There is no default value. The descriptor either is or is not an FFT description including an Execution Operators.

Is FFT-complete? Trait

cufftdx::is_complete_fft<FFT>::value

Trait is true if the descriptor is a complete FFT description, formed with Description Operators.

Note

Complete in this context means that the descriptor has been formed with all the necessary Description Operators and it is only missing an Execution Operators to be able to run.

For an FFT descriptor to be complete, the following is required:

There is no default value. The descriptor either is or is not an FFT-complete description.

Is FFT-complete Execution? Trait

cufftdx::is_complete_fft_execution<FFT>::value

Trait is true if both cufftdx::is_fft_execution and cufftdx::is_complete_fft are true.

Note

If cufftdx::is_complete_fft_execution trait is true for a descriptor FFT, then we can use the Execution Methods to compute the FFT.

There is no default value.


Execution Traits

Execution traits can be retrieved directly from an FFT descriptor that has been configured with Execution Operators. The available execution traits depend on the operator used to build the descriptor; either a Thread Operator or a Block Operator.

Thread Traits

Trait

Default value

Description

Description::value_type

detail::complex<float>

Complex type of the underlying data used to compute the FFT.

|inputtype-thread-trait-label|_

Description::value_type

Type of the underlying data used as input for the FFT.

|outputtype-thread-trait-label|_

Description::value_type

Type of the underlying data used as output for the FFT.

Description::input_ept

Same as Elements Per Thread Trait.

Count of elements of type Input Type Trait to be loaded for this FFT

Description::output_ept

Same as Elements Per Thread Trait.

Count of elements of type Output Type Trait to be stored after this FFT

Description::input_type

None.

Type of elements to be loaded. Provides the best possible vectorization and correctness.

Description::output_type

None.

Type of elements to be stored. Provides the best possible vectorization and correctness.

Description::input_length

None.

Length of the input of type Input Type Trait to be loaded as one batch.`

Description::output_length

None.

Length of the input of type Output Type Trait to be stored as one batch.

Description::implicit_type_batching

2 if cufftdx::precision_of<FFT>::type is __half, otherwise - 1

Number of values from different FFTs batched into one element of type Description::value_type.

Description::elements_per_thread

size_of<Description>::value

Number of FFT elements to be computed per thread.

Description::storage_size

Description::elements_per_thread

Number of Description::value_type elements that each thread must allocate to compute the FFT.

Description::stride

Always 1

Stride between elements of the thread FFT held by each thread in its input

Thread traits can be retrieved from descriptors built with Thread Operator.

For example:

#include <cufftdx.hpp>

using FFT          = decltype(cufftdx::Size<8>() + cufftdx::Type<fft_type::c2c>()
                            + cufftdx::Direction<fft_direction::forward>()
                            + cufftdx::Precision<double>() + Thread());


// Retrieve the FFT data type
using complex_type = typename FFT::value_type;

// Retrieve the number of elements per thread
auto elements_per_thread = FFT::elements_per_thread;

Value Type Trait

FFT::value_type

Complex type of the underlying data used for FFT computation.

The default type is cufftdx::detail::complex<float>, as defined in the types.hpp header file.

Input EPT Trait

FFT::input_ept

Maximal count of elements of type Input Type Trait to be provided by a single thread for FFT execution.

The default value is the same as Elements Per Thread Trait.

Output EPT Trait

FFT::output_ept

Maximal count of elements of type Output Type Trait that will be returned from a single thread after FFT execution.

The default value is the same as Elements Per Thread Trait.

Input Type Trait

FFT::input_type

The type of elements to be provided for FFT execution. This is the same as Value Type Trait for C2C and C2R configurations, but in R2C it becomes directly dependent on the used RealFFTOptions Operator values.

The default type is the same as Value Type Trait.

Output Type Trait

FFT::output_type

The type of elements that will be returned from FFT execution. This is the same as Value Type Trait for C2C and R2C configurations, but in C2R it becomes directly dependent on the used RealFFTOptions Operator values.

The default type is the same as Value Type Trait.

Input Length Trait

FFT::input_length

Complete length of a single batch input for this FFT. This value is equivalent to Size Trait for C2C, but dependent on RealFFTOptions Operator real_mode value for R2C and RealFFTOptions Operator complex_layout value for C2R.

The default value is the same as Size Trait.

Output Length Trait

FFT::output_length

Complete length of a single batch output for this FFT. This value is equivalent to Size Trait for C2C, but dependent on RealFFTOptions Operator real_mode value for C2R and RealFFTOptions Operator complex_layout value for R2C.

The default value is the same as Size Trait.

Implicit Type Batching Trait

FFT::implicit_type_batching

Number of values from different FFTs batched into one element of type Description::value_type used in FFT computation. If it’s higher than one it means that Thread FFT object calculates multiple FFTs in one go.

The value is 2 if cufftdx::precision_of<FFT>::type is __half, and 1 otherwise.

Note

Please note that in future releases of cuFFTDx FFT::implicit_type_batching may be replaced, and/or extended.

Elements Per Thread Trait

FFT::elements_per_thread

Logical number of FFT elements that each thread will compute. This may differ from actual number of elements since RealFFTOptions Operator may change both the number of input or output elements and their logical layout. Please refer to Input EPT Trait and Output EPT Trait for input and output memory operations and to Storage Size Trait for allocating register space.

The default value is the same as Size Trait.

Storage Size Trait

FFT::storage_size

Number of Description::value_type elements that each thread must allocate to compute the FFT.

The default value is the same as Input EPT Trait.

Stride Size Trait

FFT::stride

Stride between elements of the FFT held by each thread in input.

For thread FFT FFT::stride is always 1.

Block Traits

Trait

Default value

Description

Description::value_type

detail::complex<float>

Complex type of the underlying data used to compute the FFT.

|inputtype-block-trait-label|_

Description::value_type

Type of the underlying data used as input for the FFT.

|outputtype-block-trait-label|_

Description::value_type

Type of the underlying data used as output for the FFT.

Description::input_ept

None.

Count of elements of type Input Type Trait for each thread to load.

Description::output_ept

None.

Count of elements of type Output Type Trait for each thread to store.

Description::input_type

None.

Type of elements to be loaded by each thread. Provides the best possible vectorization and correctness.

Description::output_type

None.

Type of elements to be stored by each thread. Provides the best possible vectorization and correctness.

Description::input_length

None.

Length of the input of type Input Type Trait to be loaded as one batch.`

Description::output_length

None.

Length of the input of type Output Type Trait to be stored as one batch.

Description::workspace_type

Description::workspace_type

Device-side type of workspace required for FFT computation.

Description::implicit_type_batching

2 if cufftdx::precision_of<FFT>::type is __half, otherwise - 1

Number of values from different FFTs batched into one element of type Description::value_type.

Description::elements_per_thread

Heuristic.

Number of FFT elements to be computed per thread.

Description::storage_size

Determined by Description::elements_per_thread

Number of Description::value_type elements that each thread must allocate to compute the FFT.

Description::stride

Determined by Description::elements_per_thread and size of the FFT

Stride between elements of the block FFT held by each thread in its input

Description::ffts_per_block

1

Number of FFTs to compute by a CUDA block in this FFT operation.

Description::suggested_ffts_per_block

Heuristic.

Suggested number of FFTs to compute by a CUDA block to target maximum performance.

Description::shared_memory_size

Determined from Description::ffts_per_block and Description::elements_per_thread

Size of the shared memory in bytes.

Description::block_dim

See Block Dim Trait.

dim3 of the CUDA block to compute the FFT operation.

Description::max_threads_per_block

Determined from Description::block_dim

Total number of threads in the CUDA block.

Description::requires_workspace

true if FFT implementation requires extra workspace; otherwise - false.

Determines if it’s required to allocate extra workspace in global memory using cufftdx::make_workspace(cudaError_t&, cudaStream_t).

Description::workspace_size

0 if Description::requires_workspace is false, otherwise > 0.

Size of global memory required for workspace (in bytes).

Block traits can be retrieved from descriptors built with Block Operator.

For example:

#include <cufftdx.hpp>

using FFT = decltype( cufftdx::Size<128>() + cufftdx::Type<fft_type::c2c>()
                    + cufftdx::Direction<fft_direction::forward>()
                    + cufftdx::Precision<float>() + cufftdx::Block()
                    + cufftdx::ElementsPerThread<8>() + cufftdx::FFTsPerBlock<2>() );

// Retrieve the FFT data type
using complex_type = typename FFT::value_type;

// Allocate managed memory for input/output
complex_type* data;
auto          size       = FFT::ffts_per_block * cufftdx::size_of<FFT>::value;
auto          size_bytes = size * sizeof(complex_type);

cudaMallocManaged(&data, size_bytes);

Value Type Trait

FFT::value_type

Complex type of the underlying data used for FFT computation.

The default type is cufftdx::detail::complex<float>, as defined in the types.hpp header file.

Input EPT Trait

FFT::input_ept

Maximal count of elements of type Input Type Trait to be provided by a single thread for register API FFT execution.

Output EPT Trait

FFT::output_ept

Maximal count of elements of type Output Type Trait that will be returned by a single thread from register API FFT execution.

Input Type Trait

FFT::input_type

The type of elements to be provided for FFT execution. This is the same as Value Type Trait for C2C and C2R configurations, but in R2C it becomes directly dependent on the used RealFFTOptions Operator values.

Output Type Trait

FFT::output_type

The type of elements that will be returned from FFT execution. This is the same as Value Type Trait for C2C and R2C configurations, but in C2R it becomes directly dependent on the used RealFFTOptions Operator values.

Input Length Trait

FFT::input_length

Complete length of a single batch input for this FFT. This value is equivalent to Size Trait for C2C, but dependent on RealFFTOptions Operator real_mode value for R2C and RealFFTOptions Operator complex_layout value for C2R.

Output Length Trait

FFT::output_length

Complete length of single batch output for this FFT. This value is equivalent to Size Trait for C2C, but dependent on RealFFTOptions Operator real_mode value for C2R and RealFFTOptions Operator complex_layout value for R2C.

Workspace Type Trait

FFT::workspace_type

Type of a workspace required by execute(...) function of FFT. User should check if FFT requires a workspace using Description::requires_workspace trait, and create one with cufftdx::make_workspace<FFT>(cudaError_t&, cudaStream_t).

See Make Workspace Function for more details about workspace.

Warning

FFT::workspace_type object doesn’t track lifetime of underlying memory, and is only valid within a lifetime of workspace object it was casted from.

Warning

Type returned by cufftdx::make_workspace<FFT>(cudaError_t&, cudaStream_t) can be different for different FFT descriptions, and is not the same as FFT::workspace_type. User should use auto when creating a workspace object.

Implicit Type Batching Trait

FFT::implicit_type_batching

Number of values from different FFTs batched into one element of type Description::value_type used in FFT computation. If it’s higher than one it means that Block FFT object calculates multiple FFTs in one go.

The value is 2 if cufftdx::precision_of<FFT>::type is __half, and 1 otherwise.

Note

Please note that in future releases of cuFFTDx FFT::implicit_type_batching may be replaced, and/or extended.

Elements Per Thread Trait

FFT::elements_per_thread

Logical number of FFT elements that each thread will compute. This may differ from number of physical elements since RealFFTOptions Operator may change both the number of input or output elements and their logical layout among threads. Please refer to Input EPT Trait and Output EPT Trait for input and output memory operations and to Storage Size Trait for allocating register space.

The default value is the same as Size Trait.

Storage Size Trait

FFT::storage_size

Number of Description::value_type elements that each thread must allocate to compute the FFT.

The default value is the same as Input EPT Trait.

Stride Size Trait

FFT::stride

Stride between elements of the block FFT held by each thread in its input.

See also expected input data format.

Example

0-th thread of 8-point FFT with FFT::stride equal to 2 should have values 0, 2, 4, and 6 in its input.

FFTs Per Block Trait

FFT::ffts_per_block

Number of FFTs to compute in parallel within a CUDA block, as part of the collective FFT operation.

The default value is 1.

Suggested FFTs Per Block Trait

FFT::suggested_ffts_per_block

Suggested number of FFTs to compute in parallel within a CUDA block, as part of the collective FFT operation, to maximize performance.

The default value is heuristic, and depends on the size of the FFT, the number of elements per thread, and other parameters.

Shared Memory Size Trait

FFT::shared_memory_size

Size of the required shared memory for the FFT operation to execute, in bytes.

The default value is determined from FFTs Per Block Trait and Elements Per Thread Trait.

Block Dim Trait

FFT::block_dim

Required CUDA block dimensions with

  • x = (size_of<FFT>::value / FFT::elements_per_thread),

  • y = (FFT::ffts_per_block / FFT::implicit_type_batching), and

  • z = 1.

Max Threads Per Block Trait

FFT::max_threads_per_block

Maximum number of threads for the FFT in the CUDA block.

The default value is determined from FFTs Per Block Trait and Elements Per Thread Trait.

Requires Workspace Trait

FFT::requires_workspace

Boolean value. If true, a workspace must be created and passed to the FFT::execute(...) method (see block execute methods). Otherwise, it’s not necessary to create and pass a workspace. Workspace can be created using cufftdx::make_workspace<FFT>(cudaError_t&, cudaStream_t) function. Workspace created for FFT which does not require one will be empty and won’t allocate any global memory.

Workspace Size Trait

FFT::workspace_size

Informs how much global memory will be allocated by required workspace. If Description::requires_workspace is false it’s 0; otherwise it’s greater than zero.


Other Traits

Trait

Default value

Description

is_supported<FFT, Architecture>::value

false

true if FFT is supported on the provided CUDA architecture (Architecture).

cufftdx::is_supported

// FFT - FFT description without CUDA architecture defined using SM operator
// Architecture - unsigned integer representing CUDA architecture (SM)
template<class FFT, unsigned int Architecture>
struct is_supported : std::bool_constant<...> { };

// Helper variable template
template<class FFT, unsigned int Architecture>
inline constexpr bool is_supported_v<FFT, Architecture> = is_supported<FFT, Architecture>::value;

// true if FFT is supported on the provided CUDA architecture
cufftdx::is_supported<FFT, Architecture>::value;

cufftdx::is_supported checks whether an FFT is supported on Architecture CUDA architecture.

// true if FFT is supported on the provided CUDA architecture
cufftdx::is_supported<FFT, Architecture>::value;

Requirements:

  • FFT must have defined size, direction (if it can’t be deduced from the type). See Description Operators section.

  • FFT must include Thread of Block operator.

  • FFT can’t have defined target CUDA architecture via SM operator.

  • If FFT description includes ElementsPerThread operator, cufftdx::is_supported takes that into account when verifying support.

Example

using FFT = decltype(Size<32768>() + Type<fft_type::c2c>() + Direction<fft_direction::inverse>() + Block() + Precision<float>());
cufftdx::is_supported<FFT, 800>::value; // true
cufftdx::is_supported<FFT, 700>::value; // false

using FFT = decltype(Size<8192>() + Type<fft_type::c2c>() + Direction<fft_direction::forward>() + Block() + Precision<double>());
cufftdx::is_supported<FFT, 800>::value; // true
cufftdx::is_supported<FFT, 750>::value; // false
cufftdx::is_supported<FFT, 700>::value; // true

using FFT = decltype(Size<4095>() + Type<fft_type::c2c>() + Direction<fft_direction::inverse>() + Block() + Precision<float>());
cufftdx::is_supported_v<FFT, 800>; // true
cufftdx::is_supported_v<FFT, 750>; // false
cufftdx::is_supported_v<FFT, 700>; // true