Traits¶
Traits provide the user with information about the FFT description constructed using Operators. They are divided into three categories:
Description Traits¶
Trait |
Default value |
Description |
---|---|---|
None. |
Size of the FFT to compute. |
|
|
Type of the FFT operation, either |
|
See Direction Trait. |
Direction of the FFT operation, either |
|
|
Type of the underlying floating-point values used to compute the FFT: |
|
None. |
|
|
None. |
|
|
None. |
|
|
None. |
|
Description traits can be retrieved from an FFT descriptor using the helper functions provided. For example:
#include <iostream>
#include <cufftdx.hpp>
using FFT = decltype( cufftdx::Size<8>() + cufftdx::Type<fft_type::c2c>()
+ cufftdx::Direction<fft_direction::forward>()
+ cufftdx::Precision<double>() + cufftdx::Thread() );
if(cufftdx::is_complete<FFT>::value)
std::cout << "Size of the FFT operation: " << cufftdx::size_of<FFT>::value << std::endl;
Size Trait¶
cufftdx::size_of<FFT>::value
Size of the FFT to compute, as set by Size Operator.
There is no default size. If the descriptor was not created using a Size Operator, compilation will fail with an error message.
Type Trait¶
cufftdx::type_of<FFT>::value
Type of the FFT operation, as set by Type Operator.
The default type is complex-to-complex, fft_type::c2c
.
Direction Trait¶
cufftdx::direction_of<FFT>::value
Direction of the FFT operation, as set by Direction Operator.
Default direction:
If the FFT type is
fft_type::r2c
, the default direction isfft_direction::forward
.If the FFT type is
fft_type::c2r
, the default direction isfft_direction::inverse
.For any other type, there is no default direction. If the descriptor was not created using a Direction Operator, compilation will fail with an error message.
Precision Trait¶
cufftdx::precision_of<FFT>::type
Floating-point precision of the FFT operation, as set by Precision Operator.
The default precision is float
.
Is FFT? Trait¶
cufftdx::is_fft<FFT>::value
Trait is true
if the descriptor is an FFT description, formed with Description Operators.
There is no default value. The descriptor either is or is not an FFT description.
Is FFT Execution? Trait¶
cufftdx::is_fft_execution<FFT>::value
Trait is true
if the descriptor is an FFT description, formed with Description Operators and a Execution Operators.
There is no default value. The descriptor either is or is not an FFT description including an Execution Operators.
Is FFT-complete? Trait¶
cufftdx::is_complete_fft<FFT>::value
Trait is true
if the descriptor is a complete FFT description, formed with Description Operators.
Note
Complete in this context means that the descriptor has been formed with all the necessary Description Operators and it is only missing an Execution Operators to be able to run.
For an FFT descriptor to be complete, the following is required:
One, and only one, Size Operator.
One, and only one, Direction Operator unless either
cufftdx::Type<fft_type::r2c>()
orcufftdx::Type<fft_type::c2r>()
are added.One, and only one, SM Operator unless a Thread Operator is added.
There is no default value. The descriptor either is or is not an FFT-complete description.
Is FFT-complete Execution? Trait¶
cufftdx::is_complete_fft_execution<FFT>::value
Trait is true
if both cufftdx::is_fft_execution and cufftdx::is_complete_fft are true
.
Note
If cufftdx::is_complete_fft_execution trait is true
for a descriptor FFT
, then we can use the Execution Methods
to compute the FFT.
There is no default value.
Execution Traits¶
Execution traits can be retrieved directly from an FFT descriptor that has been configured with Execution Operators. The available execution traits depend on the operator used to build the descriptor; either a Thread Operator or a Block Operator.
Thread Traits¶
Trait |
Default value |
Description |
---|---|---|
|
Complex type of the underlying data used to compute the FFT. |
|
|
Type of the underlying data used as input for the FFT. |
|
|
Type of the underlying data used as output for the FFT. |
|
Same as Elements Per Thread Trait. |
Count of elements of type Input Type Trait to be loaded for this FFT |
|
Same as Elements Per Thread Trait. |
Count of elements of type Output Type Trait to be stored after this FFT |
|
None. |
Type of elements to be loaded. Provides the best possible vectorization and correctness. |
|
None. |
Type of elements to be stored. Provides the best possible vectorization and correctness. |
|
None. |
Length of the input of type Input Type Trait to be loaded as one batch.` |
|
None. |
Length of the input of type Output Type Trait to be stored as one batch. |
|
|
Number of values from different FFTs batched into one element of type |
|
|
Number of FFT elements to be computed per thread. |
|
|
Number of |
|
Always |
Stride between elements of the thread FFT held by each thread in its |
Thread traits can be retrieved from descriptors built with Thread Operator.
For example:
#include <cufftdx.hpp>
using FFT = decltype(cufftdx::Size<8>() + cufftdx::Type<fft_type::c2c>()
+ cufftdx::Direction<fft_direction::forward>()
+ cufftdx::Precision<double>() + Thread());
// Retrieve the FFT data type
using complex_type = typename FFT::value_type;
// Retrieve the number of elements per thread
auto elements_per_thread = FFT::elements_per_thread;
Value Type Trait¶
FFT::value_type
Complex type of the underlying data used for FFT computation.
The default type is cufftdx::detail::complex<float>
, as defined in the types.hpp
header file.
Input EPT Trait¶
FFT::input_ept
Maximal count of elements of type Input Type Trait to be provided by a single thread for FFT execution.
The default value is the same as Elements Per Thread Trait.
Output EPT Trait¶
FFT::output_ept
Maximal count of elements of type Output Type Trait that will be returned from a single thread after FFT execution.
The default value is the same as Elements Per Thread Trait.
Input Type Trait¶
FFT::input_type
The type of elements to be provided for FFT execution. This is the same as Value Type Trait for C2C and C2R configurations, but in R2C it becomes directly dependent on the used RealFFTOptions Operator values.
The default type is the same as Value Type Trait.
Output Type Trait¶
FFT::output_type
The type of elements that will be returned from FFT execution. This is the same as Value Type Trait for C2C and R2C configurations, but in C2R it becomes directly dependent on the used RealFFTOptions Operator values.
The default type is the same as Value Type Trait.
Input Length Trait¶
FFT::input_length
Complete length of a single batch input for this FFT. This value is
equivalent to Size Trait for C2C, but dependent on RealFFTOptions Operator
real_mode
value for R2C and RealFFTOptions Operator complex_layout
value
for C2R.
The default value is the same as Size Trait.
Output Length Trait¶
FFT::output_length
Complete length of a single batch output for this FFT. This value is
equivalent to Size Trait for C2C, but dependent on RealFFTOptions Operator
real_mode
value for C2R and RealFFTOptions Operator complex_layout
value
for R2C.
The default value is the same as Size Trait.
Implicit Type Batching Trait¶
FFT::implicit_type_batching
Number of values from different FFTs batched into one element of type Description::value_type
used in FFT computation. If
it’s higher than one it means that Thread FFT object calculates multiple FFTs in one go.
The value is 2
if cufftdx::precision_of<FFT>::type
is __half
, and 1
otherwise.
Note
Please note that in future releases of cuFFTDx FFT::implicit_type_batching
may be replaced, and/or extended.
Elements Per Thread Trait¶
FFT::elements_per_thread
Logical number of FFT elements that each thread will compute. This may differ from actual number of elements since RealFFTOptions Operator may change both the number of input or output elements and their logical layout. Please refer to Input EPT Trait and Output EPT Trait for input and output memory operations and to Storage Size Trait for allocating register space.
The default value is the same as Size Trait.
Storage Size Trait¶
FFT::storage_size
Number of Description::value_type
elements that each thread must allocate to compute the FFT.
The default value is the same as Input EPT Trait.
Stride Size Trait¶
FFT::stride
Stride between elements of the FFT held by each thread in input
.
For thread FFT FFT::stride
is always 1
.
Block Traits¶
Trait |
Default value |
Description |
---|---|---|
|
Complex type of the underlying data used to compute the FFT. |
|
|
Type of the underlying data used as input for the FFT. |
|
|
Type of the underlying data used as output for the FFT. |
|
None. |
Count of elements of type Input Type Trait for each thread to load. |
|
None. |
Count of elements of type Output Type Trait for each thread to store. |
|
None. |
Type of elements to be loaded by each thread. Provides the best possible vectorization and correctness. |
|
None. |
Type of elements to be stored by each thread. Provides the best possible vectorization and correctness. |
|
None. |
Length of the input of type Input Type Trait to be loaded as one batch.` |
|
None. |
Length of the input of type Output Type Trait to be stored as one batch. |
|
|
Device-side type of workspace required for FFT computation. |
|
|
Number of values from different FFTs batched into one element of type |
|
Heuristic. |
Number of FFT elements to be computed per thread. |
|
Determined by |
Number of |
|
Determined by |
Stride between elements of the block FFT held by each thread in its |
|
|
Number of FFTs to compute by a CUDA block in this FFT operation. |
|
Heuristic. |
Suggested number of FFTs to compute by a CUDA block to target maximum performance. |
|
Determined from |
Size of the shared memory in bytes. |
|
See Block Dim Trait. |
|
|
Determined from |
Total number of threads in the CUDA block. |
|
|
Determines if it’s required to allocate extra workspace in global memory using |
|
|
Size of global memory required for workspace (in bytes). |
Block traits can be retrieved from descriptors built with Block Operator.
For example:
#include <cufftdx.hpp>
using FFT = decltype( cufftdx::Size<128>() + cufftdx::Type<fft_type::c2c>()
+ cufftdx::Direction<fft_direction::forward>()
+ cufftdx::Precision<float>() + cufftdx::Block()
+ cufftdx::ElementsPerThread<8>() + cufftdx::FFTsPerBlock<2>() );
// Retrieve the FFT data type
using complex_type = typename FFT::value_type;
// Allocate managed memory for input/output
complex_type* data;
auto size = FFT::ffts_per_block * cufftdx::size_of<FFT>::value;
auto size_bytes = size * sizeof(complex_type);
cudaMallocManaged(&data, size_bytes);
Value Type Trait¶
FFT::value_type
Complex type of the underlying data used for FFT computation.
The default type is cufftdx::detail::complex<float>
, as defined in the types.hpp
header file.
Input EPT Trait¶
FFT::input_ept
Maximal count of elements of type Input Type Trait to be provided by a single thread for register API FFT execution.
Output EPT Trait¶
FFT::output_ept
Maximal count of elements of type Output Type Trait that will be returned by a single thread from register API FFT execution.
Input Type Trait¶
FFT::input_type
The type of elements to be provided for FFT execution. This is the same as Value Type Trait for C2C and C2R configurations, but in R2C it becomes directly dependent on the used RealFFTOptions Operator values.
Output Type Trait¶
FFT::output_type
The type of elements that will be returned from FFT execution. This is the same as Value Type Trait for C2C and R2C configurations, but in C2R it becomes directly dependent on the used RealFFTOptions Operator values.
Input Length Trait¶
FFT::input_length
Complete length of a single batch input for this FFT. This value is
equivalent to Size Trait for C2C, but dependent on RealFFTOptions Operator
real_mode
value for R2C and RealFFTOptions Operator complex_layout
value
for C2R.
Output Length Trait¶
FFT::output_length
Complete length of single batch output for this FFT. This value is
equivalent to Size Trait for C2C, but dependent on RealFFTOptions Operator
real_mode
value for C2R and RealFFTOptions Operator complex_layout
value
for R2C.
Workspace Type Trait¶
FFT::workspace_type
Type of a workspace required by execute(...)
function of FFT
. User should check if FFT
requires a workspace
using Description::requires_workspace
trait, and create one with cufftdx::make_workspace<FFT>(cudaError_t&, cudaStream_t)
.
See Make Workspace Function for more details about workspace.
Warning
FFT::workspace_type
object doesn’t track lifetime of underlying memory, and is only valid within a lifetime of
workspace object it was casted from.
Warning
Type returned by cufftdx::make_workspace<FFT>(cudaError_t&, cudaStream_t)
can be different for different FFT descriptions,
and is not the same as FFT::workspace_type
. User should use auto
when creating a workspace object.
Implicit Type Batching Trait¶
FFT::implicit_type_batching
Number of values from different FFTs batched into one element of type Description::value_type
used in FFT computation. If
it’s higher than one it means that Block FFT object calculates multiple FFTs in one go.
The value is 2
if cufftdx::precision_of<FFT>::type
is __half
, and 1
otherwise.
Note
Please note that in future releases of cuFFTDx FFT::implicit_type_batching
may be replaced, and/or extended.
Elements Per Thread Trait¶
FFT::elements_per_thread
Logical number of FFT elements that each thread will compute. This may differ from number of physical elements since RealFFTOptions Operator may change both the number of input or output elements and their logical layout among threads. Please refer to Input EPT Trait and Output EPT Trait for input and output memory operations and to Storage Size Trait for allocating register space.
The default value is the same as Size Trait.
Storage Size Trait¶
FFT::storage_size
Number of Description::value_type
elements that each thread must allocate to compute the FFT.
The default value is the same as Input EPT Trait.
Stride Size Trait¶
FFT::stride
Stride between elements of the block FFT held by each thread in its input
.
See also expected input data format.
Example
0
-th thread of 8-point FFT with FFT::stride
equal to 2 should have values 0, 2, 4, and 6 in its input
.
FFTs Per Block Trait¶
FFT::ffts_per_block
Number of FFTs to compute in parallel within a CUDA block, as part of the collective FFT operation.
The default value is 1
.
Suggested FFTs Per Block Trait¶
FFT::suggested_ffts_per_block
Suggested number of FFTs to compute in parallel within a CUDA block, as part of the collective FFT operation, to maximize performance.
The default value is heuristic, and depends on the size of the FFT, the number of elements per thread, and other parameters.
Block Dim Trait¶
FFT::block_dim
Required CUDA block dimensions with
x = (size_of<FFT>::value / FFT::elements_per_thread)
,y = (FFT::ffts_per_block / FFT::implicit_type_batching)
, andz = 1
.
Max Threads Per Block Trait¶
FFT::max_threads_per_block
Maximum number of threads for the FFT in the CUDA block.
The default value is determined from FFTs Per Block Trait and Elements Per Thread Trait.
Requires Workspace Trait¶
FFT::requires_workspace
Boolean value. If true
, a workspace must be created and passed to the FFT::execute(...)
method (see block execute methods).
Otherwise, it’s not necessary to create and pass a workspace. Workspace can be created using
cufftdx::make_workspace<FFT>(cudaError_t&, cudaStream_t) function. Workspace created for FFT which does not require one
will be empty and won’t allocate any global memory.
Workspace Size Trait¶
FFT::workspace_size
Informs how much global memory will be allocated by required workspace. If Description::requires_workspace
is false
it’s 0
; otherwise it’s greater than zero.
Other Traits¶
Trait |
Default value |
Description |
---|---|---|
|
|
cufftdx::is_supported¶
// FFT - FFT description without CUDA architecture defined using SM operator
// Architecture - unsigned integer representing CUDA architecture (SM)
template<class FFT, unsigned int Architecture>
struct is_supported : std::bool_constant<...> { };
// Helper variable template
template<class FFT, unsigned int Architecture>
inline constexpr bool is_supported_v<FFT, Architecture> = is_supported<FFT, Architecture>::value;
// true if FFT is supported on the provided CUDA architecture
cufftdx::is_supported<FFT, Architecture>::value;
cufftdx::is_supported
checks whether an FFT
is supported on Architecture
CUDA architecture.
// true if FFT is supported on the provided CUDA architecture
cufftdx::is_supported<FFT, Architecture>::value;
Requirements:
FFT
must have defined size, direction (if it can’t be deduced from the type). See Description Operators section.
FFT
can’t have defined target CUDA architecture via SM operator.If
FFT
description includes ElementsPerThread operator,cufftdx::is_supported
takes that into account when verifying support.
Example
using FFT = decltype(Size<32768>() + Type<fft_type::c2c>() + Direction<fft_direction::inverse>() + Block() + Precision<float>());
cufftdx::is_supported<FFT, 800>::value; // true
cufftdx::is_supported<FFT, 700>::value; // false
using FFT = decltype(Size<8192>() + Type<fft_type::c2c>() + Direction<fft_direction::forward>() + Block() + Precision<double>());
cufftdx::is_supported<FFT, 800>::value; // true
cufftdx::is_supported<FFT, 750>::value; // false
cufftdx::is_supported<FFT, 700>::value; // true
using FFT = decltype(Size<4095>() + Type<fft_type::c2c>() + Direction<fft_direction::inverse>() + Block() + Precision<float>());
cufftdx::is_supported_v<FFT, 800>; // true
cufftdx::is_supported_v<FFT, 750>; // false
cufftdx::is_supported_v<FFT, 700>; // true