FFT implements the Fourier Transform applied to 2D images, supporting real- and complex-valued inputs. It returns the image representation in the frequency domain. It is useful for content analysis based on signal frequencies, and enables image operations that are more efficiently performed on the frequency domain, among other uses.
Input in space domain
Magnitude spectrum
Implementation
The FFT - Fast Fourier Transform is a divide-and-conquer algorithm for efficiently computing discrete Fourier transforms of complex or real-valued data sets in \(O(n \log n)\). It is one of the most important and widely used numerical algorithms in computational physics and general signal processing.
The Discrete Fourier Transform (DFT) generally maps a complex-valued vector \(x_k\) on the space domain into its frequency domain representation given by:
\[ I'[u,v] = \sum^{M-1}_{m=0} \sum^{N-1}_{n=0} I[m,n] e^{-2\pi i (\frac{um}{M}+\frac{vn}{N})} \]
Where:
\(I\) is the input image in spatial domain
\(I'\) is its frequency domain representation
\(M\times N\) is input's dimensions
Note that the \(I'\) is left denormalized as the spectrum is eventually normalized in subsequent stages of the pipeline (if at all).
Depending on image dimension, different techniques are employed for best performance:
CPU backend
Fast paths when \(M\) or \(N\) can be factored into \(2^a \times 3^b \times 5^c\)
CUDA backend
Fast paths when \(M\) or \(N\) can be factored into \(2^a \times 3^b \times 5^c \times 7^d\)
In general the smaller the prime factor is, the better the performance, i.e., powers of two are fastest.
FFT supports the following transform types:
complex-to-complex, C2C
real-to-complex, R2C.
Data Layout
Data layout depends strictly on the transform type. In case of general C2C transform, both input and output data shall be of type VPI_IMAGE_FORMAT_2F32 and have the same size. In R2C mode, each input image row \((x_1,x_2,\dots,x_N)\) with type VPI_IMAGE_FORMAT_F32 results in an output row \((X_1,X_2,\dots,X_{\lfloor\frac{N}{2}\rfloor+1})\) of non-redundant complex elements with type VPI_IMAGE_FORMAT_2F32. In both cases, input and output image heights are the same.
For real inputs, the output contains only the non-redundant elements, thus only the left half of spectrum returned. In order to retrieve the whole Hermitian (symmetric-conjugate) output, the right half must be completed so that the following conditions are met:
The following diagram shows how this is accomplished:
Completing the output spectrum
The zero-th element in spectrum output represents the DC (0 Hz) component. Usually for visualization it's preferred that DC is at the center. One way to accomplish this is to shift the 4 quadrants around as shown in the diagram below:
\(Q_0\)
\(Q_1\)
\(Q_2\)
\(Q_3\)
\(\rightarrow\)
\(Q_3\)
\(Q_2\)
\(Q_1\)
\(Q_0\)
C API functions
For list of limitations, constraints and backends that implements the algorithm, consult reference documentation of the following functions:
Runs the direct Fast Fourier Transform on single image.
Attention
This algorithm requires that the library libcufft.so.10 be installed in the system.
Usage
Language:
Import VPI module
import vpi
Optionally convert the input to 32-bit floating point and apply the FFT operation. The resulting image will have complex components, represented by vpi.Format._2F32 format. All operations will be executed by the CUDA backend.
with vpi.Backend.CUDA:
output = input.convert(vpi.Format.F32) \
.fft()
(optional) Convert the complex frequency data into magnitude ready to get displayed.
Copy the output to a numpy array and convert it to np.complex64
Fills the right half of the complex data with the missing values. The left half is copied directly from VPI's output. The result is a full Hermitian matrix.
Create the output image that will hold the spectrum. Since input is real, only non-redundant values will be calculated and output width is \(\lfloor \frac{w}{2} \rfloor+1\).
Create the temporary image that stores the floating-point representation of the input image. Also allocates host memory to hold the full Hermitian representation, calculated from VPI's output.
Converts the image contents to the desired format, with optional scaling and offset.
Submit the FFT algorithm to the stream, passing the floating-point input and the output buffer. Since the payload was created on the CUDA backend, it'll use CUDA for processing.
Convert the complex frequency data into magnitude ready to get displayed.
Fills the right half of the complex data with the missing values. The left half is copied directly from VPI's output. The result is a full Hermitian matrix.
/* make width/height even*/
width = width & -2;
height = height & -2;
/* center pixel coordinate */
int cx = width / 2;
int cy = height / 2;
/* Image data is in host-accessible pitch-linear layout. */