Traits¶
Traits provide the user with information about the FFT description constructed using Operators. They are divided into Description Traits and Execution Traits.
Description Traits¶
Trait |
Default value |
Description |
---|---|---|
None. |
Size of the FFT to compute. |
|
|
Type of the FFT operation, either |
|
See Direction Trait. |
Direction of the FFT operation, either |
|
|
Type of the underlying floating-point values used to compute the FFT: |
|
None. |
|
|
None. |
|
|
None. |
|
|
None. |
|
Description traits can be retrieved from an FFT descriptor using the helper functions provided. For example:
#include <iostream>
#include <cufftdx.hpp>
using FFT = decltype( cufftdx::Size<8>() + cufftdx::Type<fft_type::c2c>()
+ cufftdx::Direction<fft_direction::forward>()
+ cufftdx::Precision<double>() + cufftdx::Thread() );
if(cufftdx::is_complete<FFT>::value)
std::cout << "Size of the FFT operation: " << cufftdx::size_of<FFT>::value << std::endl;
Size Trait¶
cufftdx::size_of<FFT>::value
Size of the FFT to compute, as set by Size Operator.
There is no default size. If the descriptor was not created using a Size Operator, compilation will fail with an error message.
Type Trait¶
cufftdx::type_of<FFT>::value
Type of the FFT operation, as set by Type Operator.
The default type is complex-to-complex, fft_type::c2c
.
Direction Trait¶
cufftdx::direction_of<FFT>::value
Direction of the FFT operation, as set by Direction Operator.
Default direction:
If the FFT type is
fft_type::r2c
, the default direction isfft_direction::forward
.If the FFT type is
fft_type::c2r
, the default direction isfft_direction::inverse
.For any other type, there is no default direction. If the descriptor was not created using a Direction Operator, compilation will fail with an error message.
Precision Trait¶
cufftdx::precision_of<FFT>::type
Floating-point precision of the FFT operation, as set by Precision Operator.
The default precision is float
.
Is FFT? Trait¶
cufftdx::is_fft<FFT>::value
Trait is true
if the descriptor is an FFT description, formed with Description Operators.
There is no default value. The descriptor either is or is not an FFT description.
Is FFT Execution? Trait¶
cufftdx::is_fft_execution<FFT>::value
Trait is true
if the descriptor is an FFT description, formed with Description Operators and a Execution Operators.
There is no default value. The descriptor either is or is not an FFT description including an Execution Operators.
Is FFT-complete? Trait¶
cufftdx::is_complete_fft<FFT>::value
Trait is true
if the descriptor is a complete FFT description, formed with Description Operators.
Note
Complete in this context means that the descriptor has been formed with all the necessary Description Operators and it is only missing an Execution Operators to be able to run.
For an FFT descriptor to be complete, the following is required:
One, and only one, Size Operator.
One, and only one, Direction Operator unless either
cufftdx::Type<fft_type::r2c>()
orcufftdx::Type<fft_type::c2r>()
are added.One, and only one, SM Operator unless a Thread Operator is added.
There is no default value. The descriptor either is or is not an FFT-complete description.
Is FFT-complete Execution? Trait¶
cufftdx::is_complete_fft_execution<FFT>::value
Trait is true
if both cufftdx::is_fft_execution and cufftdx::is_complete_fft are true
.
Note
If cufftdx::is_complete_fft_execution trait is true
for a descriptor FFT
, then we can use the Execution Methods
to compute the FFT.
There is no default value.
Execution Traits¶
Execution traits can be retrieved directly from an FFT descriptor that has been configured with Execution Operators. The available execution traits depend on the operator used to build the descriptor; either a Thread Operator or a Block Operator.
Thread Traits¶
Trait |
Default value |
Description |
---|---|---|
|
Complex type of the underlying data used to compute the FFT. |
|
|
Type of the underlying data used as input for the FFT. |
|
|
Type of the underlying data used as output for the FFT. |
|
|
Number of values from different FFTs batched into one element of type |
|
|
Number of FFT elements to be computed per thread. |
|
|
Number of |
|
Always |
Stride between elements of the thread FFT held by each thread in its |
Thread traits can be retrieved from descriptors built with Thread Operator.
For example:
#include <cufftdx.hpp>
using FFT = decltype(cufftdx::Size<8>() + cufftdx::Type<fft_type::c2c>()
+ cufftdx::Direction<fft_direction::forward>()
+ cufftdx::Precision<double>() + Thread());
// Retrieve the FFT data type
using complex_type = typename FFT::value_type;
// Retrieve the number of elements per thread
auto elements_per_thread = FFT::elements_per_thread;
Value Type Trait¶
FFT::value_type
Complex type of the underlying data used for FFT computation.
The default type is cufftdx::detail::complex<float>
, as defined in the types.hpp
header file.
Input Type Trait¶
FFT::input_type
Complex type of the underlying data used as input of the FFT computation.
The default type is the same as Value Type Trait.
Output Type Trait¶
FFT::output_type
Complex type of the underlying data used as output of the FFT computation.
The default type is the same as Value Type Trait.
Implicit Type Batching Trait¶
FFT::implicit_type_batching
Number of values from different FFTs batched into one element of type Description::value_type
used in FFT computation. If
it’s higher than one it means that Thread FFT object calculates multiple FFTs in one go.
The value is 2
if cufftdx::precision_of<FFT>::type
is __half
, and 1
otherwise.
Note
Please note that in future releases of cuFFTDx FFT::implicit_type_batching
may be replaced, and/or extended.
Elements Per Thread Trait¶
FFT::elements_per_thread
Number of FFT elements of the type returned by Value Type Trait that each thread will compute.
The default value is the same as Size Trait.
Storage Size Trait¶
FFT::storage_size
Number of Description::value_type
elements that each thread must allocate to compute the FFT.
The default value is the same as Elements Per Thread Trait.
Stride Size Trait¶
FFT::stride
Stride between elements of the FFT held by each thread in input
.
For thread FFT FFT::stride
is always 1
.
Block Traits¶
Trait |
Default value |
Description |
---|---|---|
|
Complex type of the underlying data used to compute the FFT. |
|
|
Type of the underlying data used as input for the FFT. |
|
|
Type of the underlying data used as output for the FFT. |
|
|
Device-side type of workspace required for FFT computation. |
|
|
Number of values from different FFTs batched into one element of type |
|
Heuristic. |
Number of FFT elements to be computer per thread. |
|
Determined by |
Number of |
|
Determined by |
Stride between elements of the block FFT held by each thread in its |
|
|
Number of FFTs to compute by a CUDA block in this FFT operation. |
|
Heuristic. |
Suggested number of FFTs to compute by a CUDA block to target maximum performance. |
|
Determined from |
Size of the shared memory in bytes. |
|
See Block Dim Trait. |
|
|
Determined from |
Total number of threads in the CUDA block. |
|
|
Determines if it’s required to allocate extra workspace in global memory using |
|
|
Size of global memory required for workspace (in bytes). |
Block traits can be retrieved from descriptors built with Block Operator.
For example:
#include <cufftdx.hpp>
using FFT = decltype( cufftdx::Size<128>() + cufftdx::Type<fft_type::c2c>()
+ cufftdx::Direction<fft_direction::forward>()
+ cufftdx::Precision<float>() + cufftdx::Block()
+ cufftdx::ElementsPerThread<8>() + cufftdx::FFTsPerBlock<2>() );
// Retrieve the FFT data type
using complex_type = typename FFT::value_type;
// Allocate managed memory for input/output
complex_type* data;
auto size = FFT::ffts_per_block * cufftdx::size_of<FFT>::value;
auto size_bytes = size * sizeof(complex_type);
cudaMallocManaged(&data, size_bytes);
Value Type Trait¶
FFT::value_type
Complex type of the underlying data used for FFT computation.
The default type is cufftdx::detail::complex<float>
, as defined in the types.hpp
header file.
Input Type Trait¶
FFT::input_type
Complex type of the underlying data used as input of the FFT computation.
The default type is the same as Value Type Trait.
Output Type Trait¶
FFT::output_type
Complex type of the underlying data used as output of the FFT computation.
The default type is the same as Value Type Trait.
Workspace Type Trait¶
FFT::workspace_type
Type of a workspace required by execute(...)
function of FFT
. User should check if FFT
requires a workspace
using Description::requires_workspace
trait, and create one with cufftdx::make_workspace<FFT>(cudaError_t&)
.
See Make Workspace Function for more details about workspace.
Warning
FFT::workspace_type
object doesn’t track lifetime of underlying memory, and is only valid within a lifetime of
workspace object it was casted from.
Warning
Type returned by cufftdx::make_workspace<FFT>(cudaError_t&)
can be different for different FFT descriptions,
and is not the same as FFT::workspace_type
. User should use auto
when creating a workspace object.
Implicit Type Batching Trait¶
FFT::implicit_type_batching
Number of values from different FFTs batched into one element of type Description::value_type
used in FFT computation. If
it’s higher than one it means that Block FFT object calculates multiple FFTs in one go.
The value is 2
if cufftdx::precision_of<FFT>::type
is __half
, and 1
otherwise.
Note
Please note that in future releases of cuFFTDx FFT::implicit_type_batching
may be replaced, and/or extended.
Elements Per Thread Trait¶
FFT::elements_per_thread
Number of FFT elements of the type returned by _valuetype-thread-trait-label that each thread will compute.
The default value is the same as Size Trait.
Storage Size Trait¶
FFT::storage_size
Number of Description::value_type
elements that each thread must allocate to compute the FFT.
The default value is the same as Elements Per Thread Trait.
Stride Size Trait¶
FFT::stride
Stride between elements of the block FFT held by each thread in its input
.
See also expected input data format.
Example
0
-th thread of 8-point FFT with FFT::stride
equal to 2 should have values 0, 2, 4, and 6 in its input
.
FFTs Per Block Trait¶
FFT::ffts_per_block
Number of FFTs to compute in parallel within a CUDA block, as part of the collective FFT operation.
The default value is 1
.
Suggested FFTs Per Block Trait¶
FFT::suggested_ffts_per_block
Suggested number of FFTs to compute in parallel within a CUDA block, as part of the collective FFT operation, to maximize performance.
The default value is heuristic, and depends on the size of the FFT, the number of elements per thread, and other parameters.
Block Dim Trait¶
FFT::block_dim
BlockDim<unsigned int X, Y, Z>
with x = size_of<Description>::value/Description::elements_per_thread
, y = Description::ffts_per_block
, z = 1
Max Threads Per Block Trait¶
FFT::max_threads_per_block
Maximum number of threads for the FFT in the CUDA block.
The default value is determined from FFTs Per Block Trait and Elements Per Thread Trait.
Requires Workspace Trait¶
FFT::requires_workspace
Boolean value. If true
, a workspace must be created and passed to the FFT::execute(...)
method (see block execute methods).
Otherwise, it’s not necessary to create and pass a workspace. Workspace can be created using
cufftdx::make_workspace<FFT>(cudaError_t&) function. Workspace created for FFT which does not require one
will be empty and won’t allocate any global memory.
Workspace Size Trait¶
FFT::workspace_size
Informs how much global memory will be allocated by required workspace. If Description::workspace_size
is false
it’s 0
; otherwise it’s greater than zero.