VPI - Vision Programming Interface

0.4.4 Release

Basic concepts

VPI is used to implement asynchronous compute pipelines suited for real-time image processing applications. Pipelines are composed of one or more asynchronous compute streams that run algorithms on buffers in the available compute backends. Synchronization between streams is done using events. All these objects are owned by a particular context.

These components are described below.

Streams

A VPI stream is an asynchronous queue that executes algorithms in sequence on a given backend device. To achieve high-level parallelism across backends, a given processing pipeline can be configured with processing stages running concurrently, each one in its VPI stream. VPI streams can collaborate with each other by exchanging data structures with the help of synchronization primitives provided by VPI.

Backends

Backend represents the compute hardware that ultimately runs an algorithm. VPI supports CPU, GPU (using CUDA), PVA (Programmable Vision Accelerator) and VIC (Video and Image Compositor). Not all devices support all backends.

BackendDevice/platform
CPUAll devices on x86 (linux) and Jetson aarch64 platforms.
CUDAAll devices on x86 (linux) with Maxwell or superior Nvidia GPU, and Jetson aarch64 platforms.
PVAAll Jetson Xavier devices.
VICAll Jetson devices.

Algorithms

VPI supports several computer vision algorithms, used to calculate disparity between stereo images, Harris keypoints detection, image blurring, etc. Some algorithms may need temporary buffers, called VPI Payload to perform the processing. Payloads can be created once, then reused each time the algorithm is submitted to a stream. Sometimes a payload is created for a given input image size. In this case, the payload cannot be reused for different input image sizes.

Data Buffers

VPI abstracts the data into buffers for each algorithm that it works with. VPI provides abstractions for 2D images, 1D data arrays, and 2D image pyramids. These abstractions can be allocated and managed by VPI. Additionally, for images and arrays, VPI can wrap externally-allocated memories to be used directly by algorithms. In both cases, to achieve high throughput, VPI attempts to perform shared, or zero-copy, mapping of the memory to the target backend. If unsuccessful, usually due to alignment issues or other memory characteristics, VPI seamlessly performs deep memory copies as needed.

2D Images

2D images are represented by a block of memory of a given width, height, and image format. Once the image size and format is defined during construction, it cannot be changed.

1D Arrays

In VPI 1D arrays are basically a linear block of memory of a given capacity, in elements, and format. Different from images, an array size can vary over time, not exceeding the allocated capacity.

2D Image Pyramids

Conceptually, a pyramid is a collection of 2D images of the same format. Image pyramids are defined by:

  • The number of levels, from fine to coarse.
  • The width and height of the finest level.
  • The scale from one level to the next.
  • The image format.

User-allocated Memory Wrapping

If the user application has an existing memory buffer that must serve as input and/or output buffers to VPI algorithms, the user has the option to wrap them into a VPIImage or VPIArray. This is the case, for example, when the main loop grabs a frame from a camera device and inputs it to a VPI processing pipeline. Depending on the frame characteristics, this memory block is used by the VPI without any memory copies being made; the so-called zero-copy shared memory mapping.

In other situations, for specifying temporary buffers used in a sequence of algorithm invocations, the user can allocate these buffers using VPI. Buffers created this way are allocated in such way that zero-copy shared mapping is more likely to happen, thereby increasing pipeline performance.

Synchronization Primitives

There are several facilities to coordinate work among different streams and guarantee proper task execution ordering:

  • Synchronize a given stream to the calling thread, thereby having the latter wait until all work submitted so far to the stream to finish. The application can inspect and/or forward the results of all processing to another stage, like visualization.
  • For more fine-grained coordination between streams, use VPIEvent to have one stream, or the calling thread, wait for a particular task to finish on one or more streams, effectively implementing a barrier synchronization mechanism.

VPI Applications

The application stages include:

  1. Initialization: where memory is allocated, VPI objects, such as streams, images, arrays, and contexts, are created and setup, and any other one-time initialization tasks take place.
  2. Processing loop: where the application spends most of its time. External data is wrapped for use by VPI. Payloads created during initialization are submitted to streams. Results are read from and passed to other stages for further processing or visualization.
  3. Clean up: all objects allocated during initialization are destroyed.

Consult the tutorial to build your first VPI application.