Operators¶
Operators are used to describe the FFT operation to solve, and to configure the execution. They are divided into Description Operators and Execution Operators.
Description Operators¶
Operator 
Default value 
Description 

Not set. 
Size 

Not set. 
Direction of the FFT, either 


Types of input and output data (C2C, R2C, C2R). 


Precision 

Not set. 
Target CUDA architecure for which the FFT function should be generated. 
Description operators define the FFT operation to be solved. Combined with Execution Operators, they form a complete FFT descriptor that can be executed on a GPU.
Operators are added to construct the FFT descriptor type. For example, for a forward FFT operation consisting of a FFT with 8 double
elements per thread:
#include <cufftdx.hpp>
using FFT = decltype( cufftdx::Size<8>() + cufftdx::Type<fft_type::c2c>()
+ cufftdx::Direction<fft_direction::forward>()
+ cufftdx::Precision<double>() + cufftdx::Thread() );
 For an FFT descriptor to be complete, the following is required:
One, and only one, Size Operator.
One, and only one, Direction Operator unless either
cufftdx::Type<fft_type::r2c>()
orcufftdx::Type<fft_type::c2r>()
are added.One, and only one, SM Operator unless a Thread Operator is added.
Size Operator¶
cufftdx::Size<unsigned int S>()
Sets the size S
of the FFT operation to compute.
There is no default size.
 Restrictions:
S
must be greater than1
.
Direction Operator¶
cufftdx::Direction<cufftdx::fft_direction>()
Sets the direction of the FFT, either fft_direction::inverse
or fft_direction::forward
.
There is no default direction.
If the FFT is constructed with the Type<R2C>
operator, direction is assumed to be forward and
a direction operator is not necessary.
If the FFT is constructed with the Type<C2R>
operator, direction is assumed to be inverse and
a direction operator is not necessary.
 Restrictions:
fft_direction::forward
requiresType<C2C>
orType<R2C>
.fft_direction::inverse
requiresType<C2C>
orType<C2R>
.
cuFFTDx performs unnormalized FFTs; that is, performing a forward FFT on an input data set followed by an inverse FFT on the resulting set yields data that is equal to the input, scaled by the size of the FFT. Scaling either transform by the reciprocal of the size of the data set is left for the user to perform as seen fit.
Type Operator¶
cufftdx::Type<cufftdx::fft_type>()
Sets the type of the FFT to compute, either fft_type::c2c
for complextocomplex; fft_type::r2c
for
realtocomplex; or fft_type::c2r
for complextoreal.
The default type is fft_type::c2c
.
 Restrictions:
fft_type::r2c
requiresfft_direction::forward
. If no direction is specified, it is assumed to befft_direction::forward
.fft_type::c2r
requiresfft_direction::inverse
. If no direction is specified, it is assumed to befft_direction::inverse
.cuFFDx performs unnormalized Fast Fourier Transform calculations.
Precision Operator¶
cufftdx::Precision<__half>()
cufftdx::Precision<float>()
cufftdx::Precision<double>()
Sets the floatingpoint precision used to compute the FFT. This is the type of the values used for input and output, as well as the underlying type of the values used to compute the FFT.
The default precision is float
.
SM Operator¶
cufftdx::SM<unsigned int CC>()
Sets the target architecture CC
for the underlying FFT function to use. Supported architectures are:
Volta:
700
and720
(sm_70, sm_72).Turing:
750
(sm_75).Ampere:
800
,860
and870
(sm_80, sm_86, sm_87).Ada:
890
(sm_89).Hopper:
900
(sm_90, sm_90a).
Note
When compiling cuFFTDx for 9.0a compute capability use 900
in the SM
operator (see also CUDA C++ Programming Guide: Feature Availability).
Warning
It is not guaranteed that executions of exactly the same FFTs on GPUs of different CUDA architectures will produce bitidentical results.
Execution Operators¶
Operator 
Default value 
Description 

Not set. 
Creates FFT thread execution object. 

Not set. 
Creates FFT block execution object. See Block Configuration Operators. 
Execution operators configure how the FFT operation will run on the GPU. Combined with Description Operators, they form a complete FFT descriptor that can be executed on a GPU.
Operators are added to construct the FFT descriptor type. For example, for a forward FFT operation consisting of two FFTs with 128 float
elements each, running simultaneously in one CUDA block:
#include <cufftdx.hpp>
using FFT = decltype( cufftdx::Size<128>() + cufftdx::Type<fft_type::c2c>()
+ cufftdx::Direction<fft_direction::forward>()
+ cufftdx::Precision<float>() + cufftdx::Block()
+ cufftdx::ElementsPerThread<8>() + cufftdx::FFTsPerBlock<2>() );
Thread Operator¶
cufftdx::Thread()
Sets the FFT operation to run in a thread context. The FFT operation will simultaneously run a single, independent FFT (described using Description Operators) per thread.
Each thread will compute one FFT of the size defined by the Size Operator.
 Restrictions:
Is mutually exclusive with
Block
operatorCompilation will fail when used with blockonly operators:
FFTsPerBlock
,ElementsPerThread
,BlockDim
.With
Precision<__half>
restrictsSize
to range \([2, 32]\).With
Precision<float>
restrictsSize
to range \([2, 32]\).With
Precision<double>
restrictsSize
to range \([2, 16]\).
Block Operator¶
cufftdx::Block()
Generates a collective FFT operation to run in a single CUDA block. One or more threads will cooperate to compute the collective FFT operation.
The number of FFTs to compute, as well as the number of threads used to calculate each FFT, can be configured using Block Configuration Operators.
 Restrictions:
Is mutually exclusive with
Thread
operatorUnless a BlockDim Operator is used, the collective FFT operation can only be executed inside a 2D block of sizes:
blockDim.x = size_of<Description>::value/Description::elements_per_thread
.blockDim.y = Description::ffts_per_block
.blockDim.z = 1
.
BlockDim Operator is not implemented yet.
Operator
cufftdx::Precision<__half>()
restrictscufftdx::Size<U>()
to range \([2, 32768]\).Operator
cufftdx::Precision<float>()
restrictscufftdx::Size<U>()
to range \([2, 32768]\).Operator
cufftdx::Precision<double>()
restrictscufftdx::Size<U>()
to range \([2, 16384]\).
Block Configuration Operators¶
Operators 
Default value 
Description 

1 
Number 

Heuristic. 
Number 

Not set. 
Required for executing block FFT within block with custom dimensions. 
Blockconfiguration operators allow the user to tune how the collective FFT operation will run on a single CUDA block.
Note
Block configuration operators can only be used with Block Operator.
Warning
It is not guaranteed that executions of the same FFTs (size, direction, type, precision) but with different
number of elements per thread (ElementsPerThread),
number of FFTs calculated per CUDA block (FFTsPerBlock), or
block dimension (BlockDim),
will produce bitidentical results.
FFTs Per Block Operator¶
cufftdx::FFTsPerBlock<unsigned int>()
Sets the number of FFT to compute in parallel within a single CUDA block. Each FFT is computed concurrently by a separate group of threads.
The default is one FFT per block.
Elements Per Thread Operator¶
cufftdx::ElementsPerThread<unsigned int>()
Sets the number of FFT elements to be computed by each thread.
The default is determined heuristically to target performance.
Restrictions:
If
FFT::requires_workspace
isfalse
, it must be a divisor of the requested FFT size.If
FFT::requires_workspace
istrue
, it must be a power of two smaller than the size of the FFT.Must be in range \([2; 32]\) for
cufftdx::Precision<float>()
andcufftdx::Precision<__half>()
.Must be in range \([2; 16]\) for
cufftdx::Precision<double>()
.
BlockDim Operator¶
struct cufftdx::BlockDim<unsigned int X, unsigned int Y, unsigned int Z>()
Sets the CUDA block size to (X, Y, Z), to configure the execution.
Using this operator, the user can run the collective FFT operation with 2D or 3D CUDA blocks.
Default BlockDim
size:
blockDim.x = size_of<Description>::value/Description::elements_per_thread
.blockDim.y = Description::ffts_per_block
.blockDim.z = 1
.
See FFT::block_dim.
Note
BlockDim
operator is not implemented yet.