Traits¶

Traits provide the user with information about the FFT description constructed using Operators. They are divided into three categories:

Description Traits,
Execution Traits, and
Other Traits.

Description Traits¶

Trait	Default value	Description
`size_of<Description>::value`	None.	Size of the FFT to compute.
`type_of<Description>::value`	`fft_type::c2c`	Type of the FFT operation, either `fft_type::c2c`, `fft_type::r2c` or `fft_type::c2r`.
`direction_of<Description>::value`	See Direction Trait.	Direction of the FFT operation, either `fft_direction::inverse` or `fft_direction::forward`.
`precision_of<Description>::type`	`float`	Type of the underlying floating-point values used to compute the FFT: `double`, `float` or `__half`.
`is_fft<Description>::value`	None.	`true` if `Description` is an FFT description, formed with Description Operators.
`is_fft_execution<Description>::value`	None.	`true` if `Description` is an FFT description, configured with Execution Operators.
`is_complete_fft<Description>::value`	None.	`true` if `Description` is a valid FFT description, formed with Description Operators.
`is_complete_fft_execution<Description>::value`	None.	`true` if `is_complete_fft<Description>::value` is `true` and `is_fft_execution<Description>::value` is `true`.

Description traits can be retrieved from an FFT descriptor using the helper functions provided. For example:

#include <iostream>
#include <cufftdx.hpp>

using FFT = decltype( cufftdx::Size<8>() + cufftdx::Type<fft_type::c2c>()
                      + cufftdx::Direction<fft_direction::forward>()
                      + cufftdx::Precision<double>() + cufftdx::Thread() );

if(cufftdx::is_complete<FFT>::value)
  std::cout << "Size of the FFT operation: " << cufftdx::size_of<FFT>::value << std::endl;

Size Trait¶

cufftdx::size_of<FFT>::value

Size of the FFT to compute, as set by Size Operator.

There is no default size. If the descriptor was not created using a Size Operator, compilation will fail with an error message.

Type Trait¶

cufftdx::type_of<FFT>::value

Type of the FFT operation, as set by Type Operator.

The default type is complex-to-complex, fft_type::c2c.

Direction Trait¶

cufftdx::direction_of<FFT>::value

Direction of the FFT operation, as set by Direction Operator.

Default direction:

If the FFT type is fft_type::r2c, the default direction is fft_direction::forward.

If the FFT type is fft_type::c2r, the default direction is fft_direction::inverse.

For any other type, there is no default direction. If the descriptor was not created using a Direction Operator, compilation will fail with an error message.

Precision Trait¶

cufftdx::precision_of<FFT>::type

Floating-point precision of the FFT operation, as set by Precision Operator.

The default precision is float.

Is FFT? Trait¶

cufftdx::is_fft<FFT>::value

Trait is true if the descriptor is an FFT description, formed with Description Operators.

There is no default value. The descriptor either is or is not an FFT description.

Is FFT Execution? Trait¶

cufftdx::is_fft_execution<FFT>::value

Trait is true if the descriptor is an FFT description, formed with Description Operators and a Execution Operators.

There is no default value. The descriptor either is or is not an FFT description including an Execution Operators.

Is FFT-complete? Trait¶

cufftdx::is_complete_fft<FFT>::value

Trait is true if the descriptor is a complete FFT description, formed with Description Operators.

Note

Complete in this context means that the descriptor has been formed with all the necessary Description Operators and it is only missing an Execution Operators to be able to run.

For an FFT descriptor to be complete, the following is required:

One, and only one, Size Operator.
One, and only one, Direction Operator unless either cufftdx::Type<fft_type::r2c>() or cufftdx::Type<fft_type::c2r>() are added.
One, and only one, SM Operator unless a Thread Operator is added.

There is no default value. The descriptor either is or is not an FFT-complete description.

Is FFT-complete Execution? Trait¶

cufftdx::is_complete_fft_execution<FFT>::value

Trait is true if both cufftdx::is_fft_execution and cufftdx::is_complete_fft are true.

Note

If cufftdx::is_complete_fft_execution trait is true for a descriptor FFT, then we can use the Execution Methods to compute the FFT.

There is no default value.

Execution Traits¶

Execution traits can be retrieved directly from an FFT descriptor that has been configured with Execution Operators. The available execution traits depend on the operator used to build the descriptor; either a Thread Operator or a Block Operator.

Thread Traits¶

Trait	Default value	Description
`Description::value_type`	`detail::complex<float>`	Complex type of the underlying data used to compute the FFT.
\|inputtype-thread-trait-label\|_	`Description::value_type`	Type of the underlying data used as input for the FFT.
\|outputtype-thread-trait-label\|_	`Description::value_type`	Type of the underlying data used as output for the FFT.
`Description::input_ept`	Same as Elements Per Thread Trait.	Count of elements of type Input Type Trait to be loaded for this FFT
`Description::output_ept`	Same as Elements Per Thread Trait.	Count of elements of type Output Type Trait to be stored after this FFT
`Description::input_type`	None.	Type of elements to be loaded. Provides the best possible vectorization and correctness.
`Description::output_type`	None.	Type of elements to be stored. Provides the best possible vectorization and correctness.
`Description::input_length`	None.	Length of the input of type Input Type Trait to be loaded as one batch.`
`Description::output_length`	None.	Length of the input of type Output Type Trait to be stored as one batch.
`Description::implicit_type_batching`	`2` if `cufftdx::precision_of<FFT>::type` is `__half`, otherwise - `1`	Number of values from different FFTs batched into one element of type `Description::value_type`.
`Description::elements_per_thread`	`size_of<Description>::value`	Number of FFT elements to be computed per thread.
`Description::storage_size`	`Description::elements_per_thread`	Number of `Description::value_type` elements that each thread must allocate to compute the FFT.
`Description::stride`	Always `1`	Stride between elements of the thread FFT held by each thread in its `input`

Thread traits can be retrieved from descriptors built with Thread Operator.

For example:

#include <cufftdx.hpp>

using FFT          = decltype(cufftdx::Size<8>() + cufftdx::Type<fft_type::c2c>()
                            + cufftdx::Direction<fft_direction::forward>()
                            + cufftdx::Precision<double>() + Thread());


// Retrieve the FFT data type
using complex_type = typename FFT::value_type;

// Retrieve the number of elements per thread
auto elements_per_thread = FFT::elements_per_thread;

Value Type Trait¶

FFT::value_type

Complex type of the underlying data used for FFT computation.

The default type is cufftdx::detail::complex<float>, as defined in the types.hpp header file.

Input EPT Trait¶

FFT::input_ept

Maximal count of elements of type Input Type Trait to be provided by a single thread for FFT execution.

The default value is the same as Elements Per Thread Trait.

Output EPT Trait¶

FFT::output_ept

Maximal count of elements of type Output Type Trait that will be returned from a single thread after FFT execution.

The default value is the same as Elements Per Thread Trait.

Input Type Trait¶

FFT::input_type

The type of elements to be provided for FFT execution. This is the same as Value Type Trait for C2C and C2R configurations, but in R2C it becomes directly dependent on the used RealFFTOptions Operator values.

The default type is the same as Value Type Trait.

Output Type Trait¶

FFT::output_type

The type of elements that will be returned from FFT execution. This is the same as Value Type Trait for C2C and R2C configurations, but in C2R it becomes directly dependent on the used RealFFTOptions Operator values.

The default type is the same as Value Type Trait.

Input Length Trait¶

FFT::input_length

Complete length of a single batch input for this FFT. This value is equivalent to Size Trait for C2C, but dependent on RealFFTOptions Operator real_mode value for R2C and RealFFTOptions Operator complex_layout value for C2R.

The default value is the same as Size Trait.

Output Length Trait¶

FFT::output_length

Complete length of a single batch output for this FFT. This value is equivalent to Size Trait for C2C, but dependent on RealFFTOptions Operator real_mode value for C2R and RealFFTOptions Operator complex_layout value for R2C.

The default value is the same as Size Trait.

Implicit Type Batching Trait¶

FFT::implicit_type_batching

Number of values from different FFTs batched into one element of type Description::value_type used in FFT computation. If it’s higher than one it means that Thread FFT object calculates multiple FFTs in one go.

The value is 2 if cufftdx::precision_of<FFT>::type is __half, and 1 otherwise.

Note

Please note that in future releases of cuFFTDx FFT::implicit_type_batching may be replaced, and/or extended.

Elements Per Thread Trait¶

FFT::elements_per_thread

Logical number of FFT elements that each thread will compute. This may differ from actual number of elements since RealFFTOptions Operator may change both the number of input or output elements and their logical layout. Please refer to Input EPT Trait and Output EPT Trait for input and output memory operations and to Storage Size Trait for allocating register space.

The default value is the same as Size Trait.

Storage Size Trait¶

FFT::storage_size

Number of Description::value_type elements that each thread must allocate to compute the FFT.

The default value is the same as Input EPT Trait.

Stride Size Trait¶

FFT::stride

Stride between elements of the FFT held by each thread in input.

For thread FFT FFT::stride is always 1.

Block Traits¶

Trait	Default value	Description
`Description::value_type`	`detail::complex<float>`	Complex type of the underlying data used to compute the FFT.
\|inputtype-block-trait-label\|_	`Description::value_type`	Type of the underlying data used as input for the FFT.
\|outputtype-block-trait-label\|_	`Description::value_type`	Type of the underlying data used as output for the FFT.
`Description::input_ept`	None.	Count of elements of type Input Type Trait for each thread to load.
`Description::output_ept`	None.	Count of elements of type Output Type Trait for each thread to store.
`Description::input_type`	None.	Type of elements to be loaded by each thread. Provides the best possible vectorization and correctness.
`Description::output_type`	None.	Type of elements to be stored by each thread. Provides the best possible vectorization and correctness.
`Description::input_length`	None.	Length of the input of type Input Type Trait to be loaded as one batch.`
`Description::output_length`	None.	Length of the input of type Output Type Trait to be stored as one batch.
`Description::workspace_type`	`Description::workspace_type`	Device-side type of workspace required for FFT computation.
`Description::implicit_type_batching`	`2` if `cufftdx::precision_of<FFT>::type` is `__half`, otherwise - `1`	Number of values from different FFTs batched into one element of type `Description::value_type`.
`Description::elements_per_thread`	Heuristic.	Number of FFT elements to be computed per thread.
`Description::storage_size`	Determined by `Description::elements_per_thread`	Number of `Description::value_type` elements that each thread must allocate to compute the FFT.
`Description::stride`	Determined by `Description::elements_per_thread` and size of the FFT	Stride between elements of the block FFT held by each thread in its `input`
`Description::ffts_per_block`	`1`	Number of FFTs to compute by a CUDA block in this FFT operation.
`Description::suggested_ffts_per_block`	Heuristic.	Suggested number of FFTs to compute by a CUDA block to target maximum performance.
`Description::shared_memory_size`	Determined from `Description::ffts_per_block` and `Description::elements_per_thread`	Size of the shared memory in bytes.
`Description::block_dim`	See Block Dim Trait.	`dim3` of the CUDA block to compute the FFT operation.
`Description::max_threads_per_block`	Determined from `Description::block_dim`	Total number of threads in the CUDA block.
`Description::requires_workspace`	`true` if FFT implementation requires extra workspace; otherwise - `false`.	Determines if it’s required to allocate extra workspace in global memory using `cufftdx::make_workspace(cudaError_t&, cudaStream_t)`.
`Description::workspace_size`	`0` if `Description::requires_workspace` is false, otherwise `> 0`.	Size of global memory required for workspace (in bytes).

Block traits can be retrieved from descriptors built with Block Operator.

For example:

#include <cufftdx.hpp>

using FFT = decltype( cufftdx::Size<128>() + cufftdx::Type<fft_type::c2c>()
                    + cufftdx::Direction<fft_direction::forward>()
                    + cufftdx::Precision<float>() + cufftdx::Block()
                    + cufftdx::ElementsPerThread<8>() + cufftdx::FFTsPerBlock<2>() );

// Retrieve the FFT data type
using complex_type = typename FFT::value_type;

// Allocate managed memory for input/output
complex_type* data;
auto          size       = FFT::ffts_per_block * cufftdx::size_of<FFT>::value;
auto          size_bytes = size * sizeof(complex_type);

cudaMallocManaged(&data, size_bytes);

Value Type Trait¶

FFT::value_type

Complex type of the underlying data used for FFT computation.

The default type is cufftdx::detail::complex<float>, as defined in the types.hpp header file.

Input EPT Trait¶

FFT::input_ept

Maximal count of elements of type Input Type Trait to be provided by a single thread for register API FFT execution.

Output EPT Trait¶

FFT::output_ept

Maximal count of elements of type Output Type Trait that will be returned by a single thread from register API FFT execution.

Input Type Trait¶

FFT::input_type

The type of elements to be provided for FFT execution. This is the same as Value Type Trait for C2C and C2R configurations, but in R2C it becomes directly dependent on the used RealFFTOptions Operator values.

Output Type Trait¶

FFT::output_type

The type of elements that will be returned from FFT execution. This is the same as Value Type Trait for C2C and R2C configurations, but in C2R it becomes directly dependent on the used RealFFTOptions Operator values.

Input Length Trait¶

FFT::input_length

Complete length of a single batch input for this FFT. This value is equivalent to Size Trait for C2C, but dependent on RealFFTOptions Operator real_mode value for R2C and RealFFTOptions Operator complex_layout value for C2R.

Output Length Trait¶

FFT::output_length

Complete length of single batch output for this FFT. This value is equivalent to Size Trait for C2C, but dependent on RealFFTOptions Operator real_mode value for C2R and RealFFTOptions Operator complex_layout value for R2C.

Workspace Type Trait¶

FFT::workspace_type

Type of a workspace required by execute(...) function of FFT. User should check if FFT requires a workspace using Description::requires_workspace trait, and create one with cufftdx::make_workspace<FFT>(cudaError_t&, cudaStream_t).

See Make Workspace Function for more details about workspace.

Warning

FFT::workspace_type object doesn’t track lifetime of underlying memory, and is only valid within a lifetime of workspace object it was casted from.

Warning

Type returned by cufftdx::make_workspace<FFT>(cudaError_t&, cudaStream_t) can be different for different FFT descriptions, and is not the same as FFT::workspace_type. User should use auto when creating a workspace object.

Implicit Type Batching Trait¶

FFT::implicit_type_batching

Number of values from different FFTs batched into one element of type Description::value_type used in FFT computation. If it’s higher than one it means that Block FFT object calculates multiple FFTs in one go.

The value is 2 if cufftdx::precision_of<FFT>::type is __half, and 1 otherwise.

Note

Please note that in future releases of cuFFTDx FFT::implicit_type_batching may be replaced, and/or extended.

Elements Per Thread Trait¶

FFT::elements_per_thread

Logical number of FFT elements that each thread will compute. This may differ from number of physical elements since RealFFTOptions Operator may change both the number of input or output elements and their logical layout among threads. Please refer to Input EPT Trait and Output EPT Trait for input and output memory operations and to Storage Size Trait for allocating register space.

The default value is the same as Size Trait.

Storage Size Trait¶

FFT::storage_size

Number of Description::value_type elements that each thread must allocate to compute the FFT.

The default value is the same as Input EPT Trait.

Stride Size Trait¶

FFT::stride

Stride between elements of the block FFT held by each thread in its input.

FFTs Per Block Trait¶

FFT::ffts_per_block

Number of FFTs to compute in parallel within a CUDA block, as part of the collective FFT operation.

The default value is 1.

Suggested FFTs Per Block Trait¶

FFT::suggested_ffts_per_block

Suggested number of FFTs to compute in parallel within a CUDA block, as part of the collective FFT operation, to maximize performance.

The default value is heuristic, and depends on the size of the FFT, the number of elements per thread, and other parameters.

Shared Memory Size Trait¶

FFT::shared_memory_size

Size of the required shared memory for the FFT operation to execute, in bytes.

The default value is determined from FFTs Per Block Trait and Elements Per Thread Trait.

Block Dim Trait¶

FFT::block_dim

Required CUDA block dimensions with

x = (size_of<FFT>::value / FFT::elements_per_thread),
y = (FFT::ffts_per_block / FFT::implicit_type_batching), and
z = 1.

Max Threads Per Block Trait¶

FFT::max_threads_per_block

Maximum number of threads for the FFT in the CUDA block.

The default value is determined from FFTs Per Block Trait and Elements Per Thread Trait.

Requires Workspace Trait¶

FFT::requires_workspace

Boolean value. If true, a workspace must be created and passed to the FFT::execute(...) method (see block execute methods). Otherwise, it’s not necessary to create and pass a workspace. Workspace can be created using cufftdx::make_workspace<FFT>(cudaError_t&, cudaStream_t) function. Workspace created for FFT which does not require one will be empty and won’t allocate any global memory.

Workspace Size Trait¶

FFT::workspace_size

Informs how much global memory will be allocated by required workspace. If Description::requires_workspace is false it’s 0; otherwise it’s greater than zero.

Other Traits¶

Trait	Default value	Description
`is_supported<FFT, Architecture>::value`	`false`	`true` if `FFT` is supported on the provided CUDA architecture (`Architecture`).

cufftdx::is_supported¶

// FFT - FFT description without CUDA architecture defined using SM operator
// Architecture - unsigned integer representing CUDA architecture (SM)
template<class FFT, unsigned int Architecture>
struct is_supported : std::bool_constant<...> { };

// Helper variable template
template<class FFT, unsigned int Architecture>
inline constexpr bool is_supported_v<FFT, Architecture> = is_supported<FFT, Architecture>::value;

// true if FFT is supported on the provided CUDA architecture
cufftdx::is_supported<FFT, Architecture>::value;

cufftdx::is_supported checks whether an FFT is supported on Architecture CUDA architecture.

// true if FFT is supported on the provided CUDA architecture
cufftdx::is_supported<FFT, Architecture>::value;

Requirements:

FFT must have defined size, direction (if it can’t be deduced from the type). See Description Operators section.

FFT must include Thread of Block operator.

FFT can’t have defined target CUDA architecture via SM operator.

If FFT description includes ElementsPerThread operator, cufftdx::is_supported takes that into account when verifying support.

Example

using FFT = decltype(Size<32768>() + Type<fft_type::c2c>() + Direction<fft_direction::inverse>() + Block() + Precision<float>());
cufftdx::is_supported<FFT, 800>::value; // true
cufftdx::is_supported<FFT, 700>::value; // false

using FFT = decltype(Size<8192>() + Type<fft_type::c2c>() + Direction<fft_direction::forward>() + Block() + Precision<double>());
cufftdx::is_supported<FFT, 800>::value; // true
cufftdx::is_supported<FFT, 750>::value; // false
cufftdx::is_supported<FFT, 700>::value; // true

using FFT = decltype(Size<4095>() + Type<fft_type::c2c>() + Direction<fft_direction::inverse>() + Block() + Precision<float>());
cufftdx::is_supported_v<FFT, 800>; // true
cufftdx::is_supported_v<FFT, 750>; // false
cufftdx::is_supported_v<FFT, 700>; // true