VPI - Vision Programming Interface

0.3.7 Release

Image FFT

Overview

Image FFT implements the Fourier Transform applied to 2D images, supporting real- and complex-valued inputs. It returns the image representation in the frequency domain. It is useful for content analysis based on signal frequencies, and enables image operations that are more efficiently performed on the frequency domain, among other uses.

Input in space domain Magnitude spectrum

Implementation

The FFT - Fast Fourier Transform is a divide-and-conquer algorithm for efficiently computing discrete Fourier transforms of complex or real-valued data sets in \(O(n \log n)\). It is one of the most important and widely used numerical algorithms in computational physics and general signal processing.

The Discrete Fourier Transform (DFT) generally maps a complex-valued vector \(x_k\) on the space domain into its frequency domain representation given by:

\[ I'[u,v] = \sum^{M-1}_{m=0} \sum^{N-1}_{n=0} I[m,n] e^{-2\pi i (\frac{um}{M}+\frac{vn}{N})} \]

Where:

  • \(I\) is the input image in spatial domain
  • \(I'\) is its frequency domain representation
  • \(M\times N\) is input's dimensions

Note that the \(I'\) is left denormalized as the spectrum is eventually normalized in subsequent stages of the pipeline (if at all).

Depending on image dimension, different techniques are employed for best performance:

  • CPU backend
    • Fast paths when \(M\) or \(N\) can be factored into \(2^a \times 3^b \times 5^c\)
  • CUDA backend
    • Fast paths when \(M\) or \(N\) can be factored into \(2^a \times 3^b \times 5^c \times 7^d\)

In general the smaller the prime factor is, the better the performance, i.e., powers of two are fastest.

Image FFT supports the following transform types:

  • complex-to-complex, C2C
  • real-to-complex, R2C.

Data Layout

Data layout depends strictly on the transform type. In case of general C2C transform, both input and output data shall be of type VPI_IMAGE_TYPE_2F32 and have the same size. In R2C mode, each input image row \((x_1,x_2,\dots,x_N)\) with type VPI_IMAGE_TYPE_F32 results in an output row \((X_1,X_2,\dots,X_{\lfloor\frac{N}{2}\rfloor+1})\) of non-redundant complex elements with type VPI_IMAGE_TYPE_2F32. In both cases, input and output image heights are the same.

FFT typeinputoutput
typesizetypesize
C2CVPI_IMAGE_TYPE_2F32\(W \times H\)VPI_IMAGE_TYPE_2F32\(W \times H\)
R2CVPI_IMAGE_TYPE_F32\(W \times H\)VPI_IMAGE_TYPE_2F32\(\big( \lfloor\frac{W}{2}\rfloor+1 \big) \times H\)

For real inputs, the output contains only the non-redundant elements, thus only the left half of spectrum returned. In order to retrieve the whole Hermitian (symmetric-conjugate) output, the right half must be completed so that the following conditions are met:

\begin{align*} F_{\textbf{re}}(u,v) &= F_{\textbf{re}}(-u,-v) \\ F_{\textbf{re}}(-u,v) &= F_{\textbf{re}}(u,-v) \\ F_{\textbf{im}}(u,v) &= -F_{\textbf{im}}(-u,-v) \\ F_{\textbf{im}}(-u,v) &= -F_{\textbf{im}}(u,-v) \end{align*}

The following diagram shows how this is accomplished:

Completing the output spectrum

The zero-th element in spectrum output represents the DC (0 Hz) component. Usually for visualization it's preferred that DC is at the center. One way to accomplish this is to shift the 4 quadrants around as shown in the diagram below:

\(Q_0\)\(Q_1\)
\(Q_2\)\(Q_3\)

\(\rightarrow\)

\(Q_3\)\(Q_2\)
\(Q_1\)\(Q_0\)

Usage

  1. Initialization phase
    1. Include the header that defines the image FFT function and the needed image format conversion functionality.
    2. Define the stream on which the algorithm will be executed and the image input.
      VPIStream stream = /*...*/;
      VPIImage input = /*...*/;
    3. Create the output image that will hold the spectrum. Since input is real, only non-redundant values will be calculated and output width is \(\lfloor \frac{w}{2} \rfloor+1\).
      uint32_t w, h;
      vpiImageGetSize(input, &w, &h);
      VPIImage spectrum;
      vpiImageCreate(w / 2 + 1, h, VPI_IMAGE_TYPE_2F32, 0, &spectrum);
    4. Create the temporary image that stores the floating-point representation of the input image.
      VPIImage inputF32;
      vpiImageCreate(w, h, VPI_IMAGE_TYPE_F32, 0, &inputF32);
      float *tmpBuffer = (float *)malloc(w * 2 * h * sizeof(float));
    5. Create the ImageFFT payload. Input is real and output is complex. Problem size refers to input size.
  2. Processing phase
    1. Convert the input to floating-point, keeping the input range the same.
      vpiSubmitImageFormatConverter(stream, input, inputF32, VPI_CONVERSION_CAST, 1, 0);
    2. Submit the FFT algorithm to the stream, passing the floating-point input and the output buffer.
      vpiSubmitImageFFT(fft, inputF32, spectrum, 0);
    3. Wait until the processing is done.
      vpiStreamSync(stream);
    4. Now on to the result. First lock the spectrum data.
      vpiImageLock(spectrum, VPI_LOCK_READ, &data);
    5. Convert the complex frequency data into magnitude ready to get displayed.
      1. First allocate a new buffer that will store the complete Hermitian output and fill the missing values
        // make width/height even
        w = w & -2;
        h = h & -2;
        // center pixel coordinate
        int cx = w / 2;
        int cy = h / 2;
        for (int i = 0; i < (int)h; ++i)
        {
        for (int j = 0; j < (int)w; ++j)
        {
        float re, im;
        if (j < cx)
        {
        const float *pix =
        (const float *)((const char *)data.planes[0].data + i * data.planes[0].rowStride) + j * 2;
        re = pix[0];
        im = pix[1];
        }
        else
        {
        const float *pix =
        (const float *)((const char *)data.planes[0].data + ((h - i) % h) * data.planes[0].rowStride) +
        ((w - j) % w) * 2;
        // complex conjugate
        re = pix[0];
        im = -pix[1];
        }
        tmpBuffer[i * (w * 2) + j * 2] = re;
        tmpBuffer[i * (w * 2) + j * 2 + 1] = im;
        }
        }
      2. Convert the complex frequencies into normalized log-magnitude
        float min = FLT_MAX, max = -FLT_MAX;
        for (int i = 0; i < (int)h; ++i)
        {
        for (int j = 0; j < (int)w; ++j)
        {
        float re, im;
        re = tmpBuffer[i * (w * 2) + j * 2];
        im = tmpBuffer[i * (w * 2) + j * 2 + 1];
        float mag = logf(sqrtf(re * re + im * im) + 1);
        tmpBuffer[i * w + j] = mag;
        min = mag < min ? mag : min;
        max = mag > max ? mag : max;
        }
        }
        for (int i = 0; i < (int)h; ++i)
        {
        for (int j = 0; j < (int)w; ++j)
        {
        tmpBuffer[i * w + j] = (tmpBuffer[i * w + j] - min) / (max - min);
        }
        }
      3. Shift the spectrum so that DC is at center
        for (int i = 0; i < (int)h; ++i)
        {
        for (int j = 0; j < (int)cx; ++j)
        {
        float a = tmpBuffer[i * w + j];
        // quadrant 0?
        if (i < cy)
        {
        // swap it with quadrant 3
        tmpBuffer[i * w + j] = tmpBuffer[(i + cy) * w + (j + cx)];
        tmpBuffer[(i + cy) * w + (j + cx)] = a;
        }
        // quadrant 2?
        else
        {
        // swap it with quadrant 1
        tmpBuffer[i * w + j] = tmpBuffer[(i - cy) * w + (j + cx)];
        tmpBuffer[(i - cy) * w + (j + cx)] = a;
        }
        }
        }
    6. As VPI output image isn't needed anymore, just unlock it
      vpiImageUnlock(spectrum);
    7. Now display tmpBuffer using your preferred method.

For more details, consult the API reference.

Limitations and Constraints

Constraints for specific backends supersede the ones specified for all backends.

All Backends

CPU

  • If input has type VPI_IMAGE_TYPE_F32, its width must be even.
  • Input and output rows must be aligned to 4 bytes.

CUDA

  • There can be memory allocation in first call to vpiSubmitImageFFT and every time the input or output row stride changes with respect to previous call.

PVA

  • Not implemented.

Performance

For further information on how performance was benchmarked, see Performance Measurement.

Jetson AGX Xavier
sizetypeCPUCUDAPVA
1920x1080R2C 7.1 ms0.7956 msn/a
1920x1080C2C 14.43 ms1.2522 msn/a
1024x1024R2C 5.1 ms0.1925 msn/a
1024x1024C2C 6.5 ms0.4684 msn/a
626x626R2C 33.8 ms0.4271 msn/a
626x626C2C 66.8 ms0.7725 msn/a
Jetson TX2
sizetypeCPUCUDAPVA
1920x1080R2C 18.32 ms2.478 msn/a
1920x1080C2C 44.4 ms4.21 msn/a
1024x1024R2C 11.84 ms0.927 msn/a
1024x1024C2C 39.0 ms1.50 msn/a
626x626R2C 72.6 ms1.388 msn/a
626x626C2C 148 ms2.90 msn/a
Jetson Nano
sizetypeCPUCUDAPVA
1920x1080R2C 35.3 ms5.693 msn/a
1920x1080C2C 75.76 ms10.444 msn/a
1024x1024R2C 21.09 ms1.874 msn/a
1024x1024C2C 64.5 ms3.036 msn/a
626x626R2C 171.2 ms3.279 msn/a
626x626C2C 340.8 ms6.239 msn/a
VPI_IMAGE_TYPE_2F32
@ VPI_IMAGE_TYPE_2F32
2 interleaved channels of 32-bit floats.
Definition: Types.h:216
vpiSubmitImageFFT
VPIStatus vpiSubmitImageFFT(VPIPayload payload, VPIImage input, VPIImage output, uint32_t flags)
Runs FFT on single image.
VPI_CONVERSION_CAST
@ VPI_CONVERSION_CAST
Casts input to the output type.
Definition: Types.h:388
vpiCreateImageFFT
VPIStatus vpiCreateImageFFT(VPIStream stream, uint32_t inputWidth, uint32_t inputHeight, const VPIImageType inType, const VPIImageType outType, VPIPayload *payload)
Creates payload for vpiSubmitImageFFT.
VPI_LOCK_READ
@ VPI_LOCK_READ
Lock memory only for reading.
Definition: Types.h:499
VPIImagePlane::rowStride
uint32_t rowStride
Difference in bytes of beginning of one row and the beginning of the previous.
Definition: Image.h:138
vpiImageUnlock
VPIStatus vpiImageUnlock(VPIImage img)
Releases the lock on an image object.
vpiStreamSync
VPIStatus vpiStreamSync(VPIStream stream)
Blocks the calling thread until all submitted commands in this stream queue are done (queue is empty)...
VPIImageData
Stores information about image characteristics and content.
Definition: Image.h:155
VPIStream
struct VPIStreamImpl * VPIStream
A handle to a stream.
Definition: Types.h:177
VPIImageData::planes
VPIImagePlane planes[VPI_MAX_PLANE_COUNT]
Data of all image planes.
Definition: Image.h:162
ImageFormatConverter.h
VPIImage
struct VPIImageImpl * VPIImage
A handle to an image.
Definition: Types.h:183
vpiImageGetSize
VPIStatus vpiImageGetSize(VPIImage img, uint32_t *width, uint32_t *height)
Get the image size in pixels.
ImageFFT.h
vpiSubmitImageFormatConverter
VPIStatus vpiSubmitImageFormatConverter(VPIStream stream, VPIImage input, VPIImage output, VPIConversionPolicy convPolicy, float scale, float offset)
Converts the image contents to the desired format, with optional scaling and offset.
VPIPayload
struct VPIPayloadImpl * VPIPayload
A handle to an algorithm payload.
Definition: Types.h:195
vpiImageLock
VPIStatus vpiImageLock(VPIImage img, VPILockMode mode, VPIImageData *hostData)
Acquires the lock on an image object and returns a pointer to the image planes Depending on the inter...
vpiImageCreate
VPIStatus vpiImageCreate(uint32_t width, uint32_t height, VPIImageType type, uint32_t flags, VPIImage *img)
Create an empty image instance with the specified flags.
VPI_IMAGE_TYPE_F32
@ VPI_IMAGE_TYPE_F32
1 channel of 32-bit float.
Definition: Types.h:215
VPIImagePlane::data
void * data
Pointer to the first row of this plane.
Definition: Image.h:146