VPI is used to implement asynchronous computing pipelines suited for real-time image processing applications. Pipelines are composed of one or more asynchronous compute streams that run algorithms on buffers in the available compute backends. Synchronization between streams is done using events.
All of these elements are described below.
A VPIStream is an asynchronous queue that executes algorithms in sequence on a given backend device. To achieve high degree of parallelism across backends, a given processing pipeline can be configured with several processing stages running concurrently, each one in its VPI stream. VPI streams can collaborate with each other by exchanging data structures with the help of synchronization primitives provided by VPI.
A backend comprises the compute hardware that ultimately runs an algorithm. VPI supports the backends CPU, GPU (using CUDA), PVA (Programmable Vision Accelerator), VIC (Video and Image Compositor) and OFA (Optical Flow Accelerator).
See the following sections for information about which backends is supported by which devices.
Backend | Device/platform |
---|---|
CPU | All devices on x86 (linux) and Jetson aarch64 platforms |
CUDA | All devices on x86 (linux) with a Maxwell or superior NVIDIA GPU, and Jetson aarch64 platforms |
PVA | All Jetson AGX Orin and Jetson Orin NX devices |
VIC | All Jetson devices. |
OFA | All Jetson Orin devices. |
VPI supports computer vision algorithms for several purposes, such as calculating disparity between stereo images abd Harris keypoint detection, and image blurring. Some algorithms use temporary buffers, called a VPIPayload, to perform the processing. Payloads can be created once, then reused each time the algorithm is submitted to a stream. Sometimes a payload is created for a given input image size. In this case, the payload cannot be reused for different input image sizes.
VPI encapsulates data into buffers for each algorithm that it works with. VPI provides abstractions for 2D images, 1D data arrays, and 2D image pyramids. VPI can allocate and manage these abstractions. Additionally, for images and arrays, VPI can wrap externally allocated memory to be used directly by algorithms. In both cases, VPI attempts to achieve high throughput by means of zero-copy (shared) memory mapping to the target backend. If VPI cannot use zero-copy memory mapping, usually due to alignment issues or other memory characteristics, it seamlessly performs deep memory copies as needed.
VPI represents 2D images by one block of memory of a specified width, height, and image format. Once a 2D image's image size and format are defined during construction, they cannot be changed.
VPI 1D arrays are basically linear blocks of memory of a given type and capacity. Capacity is measured in units determined by the array's type. Unlike an image, an array's size can vary over time, although it must not exceed the array's capacity.
A pyramid is a collection of 2D images with the same format. Image pyramids are defined by:
If a user application has existing memory buffers that must serve as input and/or output buffers to VPI algorithms, you can wrap them into a VPIImage or a VPIArray. This is the case, for example, when the main loop grabs a frame from a camera device and inputs it to a VPI processing pipeline. Depending on the frame characteristics, this memory block is used by the VPI without any memory copies being made. This is another instance of zero-copy memory mapping.
User-allocated wrapped memories should be avoided when possible, specially for temporary buffers to be used in a sequence of algorithm invocations. Instead, it's recommended to user VPI-allocated buffers. They are in such a way that zero-copy mapping is more likely to happen, increasing pipeline performance.
VPI offers several ways to coordinate work among different streams and to ensure that tasks are executed in the proper order
VPI applications consist of three major stages:
Go to the tutorial to learn how to build your first VPI application.