Traits¶

Traits provide the user with information about the FFT description constructed using Operators. They are divided into Description Traits and Execution Traits.

Description Traits¶

Trait	Default value	Description
`size_of<Description>::value`	None.	Size of the FFT to compute.
`type_of<Description>::value`	`fft_type::c2c`	Type of the FFT operation, either `fft_type::c2c`, `fft_type::r2c` or `fft_type::c2r`.
`direction_of<Description>::value`	See Direction Trait.	Direction of the FFT operation, either `fft_direction::inverse` or `fft_direction::forward`.
`precision_of<Description>`	`float`	Type of the underlying floating-point values used to compute the FFT: `double`, `float` or `__half`.
`is_fft<Description>`	None.	`true` if `Description` is an FFT description, formed with Description Operators.
`is_fft_execution<Description>`	None.	`true` if `Description` is an FFT description, configured with Execution Operators.
`is_complete_fft<Description>`	None.	`true` if `Description` is a valid FFT description, formed with Description Operators.
`is_complete_fft_execution<Description>`	None.	`true` if `is_complete_fft<Description>` is `true` and `is_fft_execution<Description>` is `true`.

Description traits can be retrieved from an FFT descriptor using the helper functions provided. For example:

#include <iostream>
#include <cufftdx.hpp>

using FFT = decltype( cufftdx::Size<8>() + cufftdx::Type<fft_type::c2c>()
                      + cufftdx::Direction<fft_direction::forward>()
                      + cufftdx::Precision<double>() + cufftdx::Thread() );

if(cufftdx::is_complete<FFT>::value)
  std::cout << "Size of the FFT operation: " << cufftdx::size_of<FFT>::value << std::endl;

Size Trait¶

cufftdx::size_of<FFT>::value

Size of the FFT to compute, as set by Size Operator.

There is no default size. If the descriptor was not created using a Size Operator, compilation will fail with an error message.

Type Trait¶

cufftdx::type_of<FFT>::value

Type of the FFT operation, as set by Type Operator.

The default type is complex-to-complex, fft_type::c2c.

Direction Trait¶

cufftdx::direction_of<FFT>::value

Direction of the FFT operation, as set by Direction Operator.

Default direction:

If the FFT type is fft_type::r2c, the default direction is fft_direction::forward.

If the FFT type is fft_type::c2r, the default direction is fft_direction::inverse.

For any other type, there is no default direction. If the descriptor was not created using a Direction Operator, compilation will fail with an error message.

Precision Trait¶

cufftdx::precision_of<FFT>::type

Floating-point precision of the FFT operation, as set by Precision Operator.

The default precision is float.

Is FFT? Trait¶

cufftdx::is_fft<FFT>::value

Trait is true if the descriptor is an FFT description, formed with Description Operators.

There is no default value. The descriptor either is or is not an FFT description.

Is FFT Execution? Trait¶

cufftdx::is_fft_execution<FFT>::value

Trait is true if the descriptor is an FFT description, formed with Description Operators and a Execution Operators.

There is no default value. The descriptor either is or is not an FFT description including an Execution Operators.

Is FFT-complete? Trait¶

cufftdx::is_complete_fft<FFT>::value

Trait is true if the descriptor is a complete FFT description, formed with Description Operators.

Note

Complete in this context means that the descriptor has been formed with all the necessary Description Operators and it is only missing an Execution Operators to be able to run.

For an FFT descriptor to be complete, the following is required:

One, and only one, Size Operator.
One, and only one, Direction Operator unless either cufftdx::Type<fft_type::r2c>() or cufftdx::Type<fft_type::c2r>() are added.
One, and only one, SM Operator unless a Thread Operator is added.

There is no default value. The descriptor either is or is not an FFT-complete description.

Is FFT-complete Execution? Trait¶

cufftdx::is_complete_fft_execution<FFT>::value

Trait is true if both cufftdx::is_fft_execution and cufftdx::is_complete_fft are true.

Note

If cufftdx::is_complete_fft_execution trait is true for a descriptor FFT, then we can use the Execution Methods to compute the FFT.

There is no default value.

Execution Traits¶

Execution traits can be retrieved directly from an FFT descriptor that has been configured with Execution Operators. The available execution traits depend on the operator used to build the descriptor; either a Thread Operator or a Block Operator.

Thread Traits¶

Trait	Default value	Description
`Description::value_type`	`detail::complex<float>`	Complex type of the underlying data used to compute the FFT.
`Description::input_type`	`Description::value_type`	Type of the underlying data used as input for the FFT.
`Description::output_type`	`Description::value_type`	Type of the underlying data used as output for the FFT.
`Description::implicit_type_batching`	`2` if `cufftdx::precision_of<FFT>::type` is `__half`, otherwise - `1`	Number of values from different FFTs batched into one element of type `Description::value_type`.
`Description::elements_per_thread`	`size_of<Description>::value`	Number of FFT elements to be computed per thread.
`Description::storage_size`	`Description::elements_per_thread`	Number of `Description::value_type` elements that each thread must allocate to compute the FFT.
`Description::stride`	Always `1`	Stride between elements of the thread FFT held by each thread in its `input`

Thread traits can be retrieved from descriptors built with Thread Operator.

For example:

#include <cufftdx.hpp>

using FFT          = decltype(cufftdx::Size<8>() + cufftdx::Type<fft_type::c2c>()
                            + cufftdx::Direction<fft_direction::forward>()
                            + cufftdx::Precision<double>() + Thread());


// Retrieve the FFT data type
using complex_type = typename FFT::value_type;

// Retrieve the number of elements per thread
auto elements_per_thread = FFT::elements_per_thread;

Value Type Trait¶

FFT::value_type

Complex type of the underlying data used for FFT computation.

The default type is cufftdx::detail::complex<float>, as defined in the types.hpp header file.

Input Type Trait¶

FFT::input_type

Complex type of the underlying data used as input of the FFT computation.

The default type is the same as Value Type Trait.

Output Type Trait¶

FFT::output_type

Complex type of the underlying data used as output of the FFT computation.

The default type is the same as Value Type Trait.

Implicit Type Batching Trait¶

FFT::implicit_type_batching

Number of values from different FFTs batched into one element of type Description::value_type used in FFT computation. If it’s higher than one it means that Thread FFT object calculates multiple FFTs in one go.

The value is 2 if cufftdx::precision_of<FFT>::type is __half, and 1 otherwise.

Note

Please note that in future releases of cuFFTDx FFT::implicit_type_batching may be replaced, and/or extended.

Elements Per Thread Trait¶

FFT::elements_per_thread

Number of FFT elements of the type returned by Value Type Trait that each thread will compute.

The default value is the same as Size Trait.

Storage Size Trait¶

FFT::storage_size

Number of Description::value_type elements that each thread must allocate to compute the FFT.

The default value is the same as Elements Per Thread Trait.

Stride Size Trait¶

FFT::stride

Stride between elements of the FFT held by each thread in input.

For thread FFT FFT::stride is always 1.

Block Traits¶

Trait	Default value	Description
`Description::value_type`	`detail::complex<float>`	Complex type of the underlying data used to compute the FFT.
`Description::input_type`	`Description::value_type`	Type of the underlying data used as input for the FFT.
`Description::output_type`	`Description::value_type`	Type of the underlying data used as output for the FFT.
`Description::workspace_type`	`Description::workspace_type`	Device-side type of workspace required for FFT computation.
`Description::implicit_type_batching`	`2` if `cufftdx::precision_of<FFT>::type` is `__half`, otherwise - `1`	Number of values from different FFTs batched into one element of type `Description::value_type`.
`Description::elements_per_thread`	Heuristic.	Number of FFT elements to be computer per thread.
`Description::storage_size`	Determined by `Description::elements_per_thread`	Number of `Description::value_type` elements that each thread must allocate to compute the FFT.
`Description::stride`	Determined by `Description::elements_per_thread` and size of the FFT	Stride between elements of the block FFT held by each thread in its `input`
`Description::ffts_per_block`	`1`	Number of FFTs to compute by a CUDA block in this FFT operation.
`Description::suggested_ffts_per_block`	Heuristic.	Suggested number of FFTs to compute by a CUDA block to target maximum performance.
`Description::shared_memory_size`	Determined from `Description::ffts_per_block` and `Description::elements_per_thread`	Size of the shared memory in bytes.
`Description::block_dim`	See Block Dim Trait.	`dim3` of the CUDA block to compute the FFT operation.
`Description::max_threads_per_block`	Determined from `Description::block_dim`	Total number of threads in the CUDA block.
`Description::requires_workspace`	`True` if FFT implementation requires extra workspace; otherwise - `false`.	Determines if it’s required to allocate extra workspace in global memory using `cufftdx::make_workspace(cudaError_t&)`.
`Description::workspace_size`	`0` if `Description::workspace_size` is true, otherwise `> 0`.	Size of global memory required for workspace (in bytes).

Block traits can be retrieved from descriptors built with Block Operator.

For example:

#include <cufftdx.hpp>

using FFT = decltype( cufftdx::Size<128>() + cufftdx::Type<fft_type::c2c>()
                    + cufftdx::Direction<fft_direction::forward>()
                    + cufftdx::Precision<float>() + cufftdx::Block()
                    + cufftdx::ElementsPerThread<8>() + cufftdx::FFTsPerBlock<2>() );

// Retrieve the FFT data type
using complex_type = typename FFT::value_type;

// Allocate managed memory for input/output
complex_type* data;
auto          size       = FFT::ffts_per_block * cufftdx::size_of<FFT>::value;
auto          size_bytes = size * sizeof(complex_type);

cudaMallocManaged(&data, size_bytes);

Value Type Trait¶

FFT::value_type

Complex type of the underlying data used for FFT computation.

The default type is cufftdx::detail::complex<float>, as defined in the types.hpp header file.

Input Type Trait¶

FFT::input_type

Complex type of the underlying data used as input of the FFT computation.

The default type is the same as Value Type Trait.

Output Type Trait¶

FFT::output_type

Complex type of the underlying data used as output of the FFT computation.

The default type is the same as Value Type Trait.

Workspace Type Trait¶

FFT::workspace_type

Type of a workspace required by execute(...) function of FFT. User should check if FFT requires a workspace using Description::requires_workspace trait, and create one with cufftdx::make_workspace<FFT>(cudaError_t&).

See Make Workspace Function for more details about workspace.

Warning

FFT::workspace_type object doesn’t track lifetime of underlying memory, and is only valid within a lifetime of workspace object it was casted from.

Warning

Type returned by cufftdx::make_workspace<FFT>(cudaError_t&) can be different for different FFT descriptions, and is not the same as FFT::workspace_type. User should use auto when creating a workspace object.

Implicit Type Batching Trait¶

FFT::implicit_type_batching

Number of values from different FFTs batched into one element of type Description::value_type used in FFT computation. If it’s higher than one it means that Block FFT object calculates multiple FFTs in one go.

The value is 2 if cufftdx::precision_of<FFT>::type is __half, and 1 otherwise.

Note

Please note that in future releases of cuFFTDx FFT::implicit_type_batching may be replaced, and/or extended.

Elements Per Thread Trait¶

FFT::elements_per_thread

Number of FFT elements of the type returned by _valuetype-thread-trait-label that each thread will compute.

The default value is the same as Size Trait.

Storage Size Trait¶

FFT::storage_size

Number of Description::value_type elements that each thread must allocate to compute the FFT.

The default value is the same as Elements Per Thread Trait.

Stride Size Trait¶

FFT::stride

Stride between elements of the block FFT held by each thread in its input.

FFTs Per Block Trait¶

FFT::ffts_per_block

Number of FFTs to compute in parallel within a CUDA block, as part of the collective FFT operation.

The default value is 1.

Suggested FFTs Per Block Trait¶

FFT::suggested_ffts_per_block

Suggested number of FFTs to compute in parallel within a CUDA block, as part of the collective FFT operation, to maximize performance.

The default value is heuristic, and depends on the size of the FFT, the number of elements per thread, and other parameters.

Shared Memory Size Trait¶

FFT::shared_memory_size

Size of the required shared memory for the FFT operation to execute, in bytes.

The default value is determined from FFTs Per Block Trait and Elements Per Thread Trait.

Block Dim Trait¶

FFT::block_dim

BlockDim<unsigned int X, Y, Z> with x = size_of<Description>::value/Description::elements_per_thread, y = Description::ffts_per_block, z = 1

Max Threads Per Block Trait¶

FFT::max_threads_per_block

Maximum number of threads for the FFT in the CUDA block.

The default value is determined from FFTs Per Block Trait and Elements Per Thread Trait.

Requires Workspace Trait¶

FFT::requires_workspace

Boolean value. If true, a workspace must be created and passed to the FFT::execute(...) method (see block execute methods). Otherwise, it’s not necessary to create and pass a workspace. Workspace can be created using cufftdx::make_workspace<FFT>(cudaError_t&) function. Workspace created for FFT which does not require one will be empty and won’t allocate any global memory.

Workspace Size Trait¶

FFT::workspace_size

Informs how much global memory will be allocated by required workspace. If Description::workspace_size is false it’s 0; otherwise it’s greater than zero.