Image FFT implements the Fourier Transform applied to 2D images, supporting real- and complex-valued inputs. It returns the image representation in the frequency domain. It is useful for content analysis based on signal frequencies, and enables image operations that are more efficiently performed on the frequency domain, among other uses.
Input in space domain | Magnitude spectrum |
---|---|
![]() | ![]() |
The FFT - Fast Fourier Transform is a divide-and-conquer algorithm for efficiently computing discrete Fourier transforms of complex or real-valued data sets in \(O(n \log n)\). It is one of the most important and widely used numerical algorithms in computational physics and general signal processing.
The Discrete Fourier Transform (DFT) generally maps a complex-valued vector \(x_k\) on the space domain into its frequency domain representation given by:
\[ I'[u,v] = \sum^{M-1}_{m=0} \sum^{N-1}_{n=0} I[m,n] e^{-2\pi i (\frac{um}{M}+\frac{vn}{N})} \]
Where:
Note that the \(I'\) is left denormalized as the spectrum is eventually normalized in subsequent stages of the pipeline (if at all).
Depending on image dimension, different techniques are employed for best performance:
In general the smaller the prime factor is, the better the performance, i.e., powers of two are fastest.
Image FFT supports the following transform types:
Data layout depends strictly on the transform type. In case of general C2C transform, both input and output data shall be of type VPI_IMAGE_TYPE_2F32 and have the same size. In R2C mode, each input image row \((x_1,x_2,\dots,x_N)\) with type VPI_IMAGE_TYPE_F32 results in an output row \((X_1,X_2,\dots,X_{\lfloor\frac{N}{2}\rfloor+1})\) of non-redundant complex elements with type VPI_IMAGE_TYPE_2F32. In both cases, input and output image heights are the same.
FFT type | input | output | ||
---|---|---|---|---|
type | size | type | size | |
C2C | VPI_IMAGE_TYPE_2F32 | \(W \times H\) | VPI_IMAGE_TYPE_2F32 | \(W \times H\) |
R2C | VPI_IMAGE_TYPE_F32 | \(W \times H\) | VPI_IMAGE_TYPE_2F32 | \(\big( \lfloor\frac{W}{2}\rfloor+1 \big) \times H\) |
For real inputs, the output contains only the non-redundant elements, thus only the left half of spectrum returned. In order to retrieve the whole Hermitian (symmetric-conjugate) output, the right half must be completed so that the following conditions are met:
\begin{align*} F_{\textbf{re}}(u,v) &= F_{\textbf{re}}(-u,-v) \\ F_{\textbf{re}}(-u,v) &= F_{\textbf{re}}(u,-v) \\ F_{\textbf{im}}(u,v) &= -F_{\textbf{im}}(-u,-v) \\ F_{\textbf{im}}(-u,v) &= -F_{\textbf{im}}(u,-v) \end{align*}
The following diagram shows how this is accomplished:
The zero-th element in spectrum output represents the DC (0 Hz) component. Usually for visualization it's preferred that DC is at the center. One way to accomplish this is to shift the 4 quadrants around as shown in the diagram below:
| \(\rightarrow\) |
|
For more details, consult the API reference.
Constraints for specific backends supersede the ones specified for all backends.
For further information on how performance was benchmarked, see Performance Measurement.
size | type | CPU | CUDA | PVA |
---|---|---|---|---|
1920x1080 | R2C | 7.1 ms | 0.7956 ms | n/a |
1920x1080 | C2C | 14.43 ms | 1.2522 ms | n/a |
1024x1024 | R2C | 5.1 ms | 0.1925 ms | n/a |
1024x1024 | C2C | 6.5 ms | 0.4684 ms | n/a |
626x626 | R2C | 33.8 ms | 0.4271 ms | n/a |
626x626 | C2C | 66.8 ms | 0.7725 ms | n/a |
size | type | CPU | CUDA | PVA |
---|---|---|---|---|
1920x1080 | R2C | 18.32 ms | 2.478 ms | n/a |
1920x1080 | C2C | 44.4 ms | 4.21 ms | n/a |
1024x1024 | R2C | 11.84 ms | 0.927 ms | n/a |
1024x1024 | C2C | 39.0 ms | 1.50 ms | n/a |
626x626 | R2C | 72.6 ms | 1.388 ms | n/a |
626x626 | C2C | 148 ms | 2.90 ms | n/a |
size | type | CPU | CUDA | PVA |
---|---|---|---|---|
1920x1080 | R2C | 35.3 ms | 5.693 ms | n/a |
1920x1080 | C2C | 75.76 ms | 10.444 ms | n/a |
1024x1024 | R2C | 21.09 ms | 1.874 ms | n/a |
1024x1024 | C2C | 64.5 ms | 3.036 ms | n/a |
626x626 | R2C | 171.2 ms | 3.279 ms | n/a |
626x626 | C2C | 340.8 ms | 6.239 ms | n/a |