VPI - Vision Programming Interface

3.0 Release

Dense Optical Flow


The Dense Optical Flow algorithm estimates the motion vectors in every 4x4 pixel block between the previous and current frames. Its uses include motion detection and object tracking.

The output below represents each vector in the HSV color space, where the hue is related to the motion direction, and the value is proportional to the speed.

InputOutput motion vectors


The algorithm analyzes the content of two images, previous and current, and writes an estimate of the motion to an output image.

As shown below, the algorithm splits input images into 4x4 pixel blocks. Then, for each block, it estimates the content translation from the previous to the current frame, and writes the estimate as a motion vector to the corresponding pixel in the output image.

Dense Optical Flow Estimation

The 2D motion vector is represented as a X,Y coordinate pair, with each coordinate in S10.5 signed fixed-point format, as shown below:

S10.5 signed fixed-point format

Conversion between S10.5 format and floating point format is done as follows:

\begin{align*} S_{10.5} &= \lfloor F \times 32 \rfloor \\ F &= \lfloor S_{10.5} / 32 \rfloor \end{align*}

C API functions

For list of limitations, constraints and backends that implements the algorithm, consult reference documentation of the following functions:

Function Description
vpiCreateOpticalFlowDense Creates payload for vpiSubmitOpticalFlowDense.
vpiSubmitOpticalFlowDense Runs dense Optical Flow on two frames, outputting motion vectors.
vpiOpticalFlowDenseSetSGMParams Sets the semi-global matching parameters to be used by the Dense Optical Flow operations with the given payload.
vpiOpticalFlowDenseGetSGMParams Retrieves the semi-global matching parameters set up in the Dense Optical Flow payload.


  1. Import VPI module
    import vpi
  2. Fetch the first frame.
    prevImage = inVideo.read()[1]
  3. Fetch the next frame.
    while inVideo.read(curImage)[0]:
  4. Execute the algorithm using OFA backend, passing to it the previous and the current frame.
    with vpi.Backend.OFA:
    motion = vpi.optflow_dense(prevImage, curImage)
  5. Prepare for next iteration by assigning current frame to previous frame.
    prevImage = curImage
  1. Initialization phase:
    1. Include the header that defines the needed functions and types:
      Declares functions that implement the dense optical flow.
    2. Create the stream where the algorithm will be submitted for execution:
      VPIStream stream;
      vpiStreamCreate(0, &stream);
      struct VPIStreamImpl * VPIStream
      A handle to a stream.
      Definition: Types.h:250
      VPIStatus vpiStreamCreate(uint64_t flags, VPIStream *stream)
      Create a stream instance.
    3. Define the motion vector image with block-linear memory layout. The motion vector is in the form [x, y], representing the estimated translation, with both coordinates in S10.5 format. The output dimensions are calculated taking into account that one 4x4 input pixel block corresponds to one output vector:
      VPIImage mvImage;
      int32_t mvWidth = (width + 3) / 4;
      int32_t mvHeight = (height + 3) / 4;
      vpiImageCreate(mvWidth, mvHeight, VPI_IMAGE_FORMAT_2S16_BL, 0, &mvImage);
      #define VPI_IMAGE_FORMAT_2S16_BL
      Single plane with two interleaved block-linear 16-bit signed integer channel.
      Definition: ImageFormat.h:127
      struct VPIImageImpl * VPIImage
      A handle to an image.
      Definition: Types.h:256
      VPIStatus vpiImageCreate(int32_t width, int32_t height, VPIImageFormat fmt, uint64_t flags, VPIImage *img)
      Create an empty image instance with the specified flags.
    4. Create the payload to contain temporary buffers. The payload is configured for processing by the OFA backend:
      VPIPayload optflow;
      int32_t gridSize = 4;
      vpiCreateOpticalFlowDense(VPI_BACKEND_OFA, width, height, imgFmtBL, &gridSize, 1, quality, &optflow));
      VPIStatus vpiCreateOpticalFlowDense(uint64_t backends, int32_t width, int32_t height, VPIImageFormat inputFmt, const int32_t *gridSize, int32_t numLevels, VPIOpticalFlowQuality quality, VPIPayload *payload)
      Creates payload for vpiSubmitOpticalFlowDense.
      struct VPIPayloadImpl * VPIPayload
      A handle to an algorithm payload.
      Definition: Types.h:268
      OFA backend.
      Definition: Types.h:97
    5. Fetch first frame:
      VPIImage prevImage = /* previous frame */;
  2. Processing phase:
    1. Start the processing loop at the second frame:
      for (int idframe = 1; idframe < frame_count; ++idframe)
    2. Fetch the current frame:
      VPIImage curImage = /* current frame */;
    3. Submit the algorithm. The algorithm must feed both previous and current images to the NVIDIA encoder engine and generate motion vectors for each 4x4 pixel block:
      vpiSubmitOpticalFlowDense(stream, VPI_BACKEND_OFA, optflow, prevImage, curImage, mvImage);
      VPIStatus vpiSubmitOpticalFlowDense(VPIStream stream, uint64_t backend, VPIPayload payload, VPIImage prevImg, VPIImage curImg, VPIImage mvImg)
      Runs dense Optical Flow on two frames, outputting motion vectors.
    4. (optional) If there are no more tasks to be submitted to the stream, wait until the stream finishes processing. Once the sync is done, you can use the output motion vectors calculated in this iteration:
      VPIStatus vpiStreamSync(VPIStream stream)
      Blocks the calling thread until all submitted commands in this stream queue are done (queue is empty)...
    5. Swap the previous and current images so that the current image becomes the next iteration's previous image:
      VPIImage tmpImg = prevImage;
      prevImage = curImage;
      curImage = tmpImg;
  3. Cleanup phase:
    1. Free resources held by the stream, the payload, and the input and output arrays:
      void vpiImageDestroy(VPIImage img)
      Destroy an image instance.
      void vpiPayloadDestroy(VPIPayload payload)
      Deallocates the payload object and all associated resources.
      void vpiStreamDestroy(VPIStream stream)
      Destroy a stream instance and deallocate all HW resources.

Consult the Dense Optical Flow sample for a complete example.

For more information, see Dense Optical Flow in the "C API Reference" section of VPI - Vision Programming Interface.


For information on how to use the performance table below, see Algorithm Performance Tables.
Before comparing measurements, consult Comparing Algorithm Elapsed Times.
For further information on how performance was benchmarked, see Performance Benchmark.