VPI is used to implement asynchronous compute pipelines suited for real-time image processing applications. Pipelines are composed of one or more asynchronous compute streams that run algorithms on buffers in the available compute backends. Synchronization between streams is done using events. All these objects are owned by a particular context.
These components are described below.
A VPI stream is an asynchronous queue that executes algorithms in sequence on a given backend device. To achieve high-level parallelism across backends, a given processing pipeline can be configured with processing stages running concurrently, each one in its VPI stream. VPI streams can collaborate with each other by exchanging data structures with the help of synchronization primitives provided by VPI.
Backend represents the compute hardware that ultimately runs an algorithm. VPI supports CPU, GPU (using CUDA), PVA (Programmable Vision Accelerator) and VIC (Video and Image Compositor). Not all devices support all backends.
Backend | Device/platform |
---|---|
CPU | All devices on x86 (linux) and Jetson aarch64 platforms. |
CUDA | All devices on x86 (linux) with Maxwell or superior Nvidia GPU, and Jetson aarch64 platforms. |
PVA | All Jetson Xavier devices. |
VIC | All Jetson devices. |
VPI supports several computer vision algorithms, used to calculate disparity between stereo images, Harris keypoints detection, image blurring, etc. Some algorithms may need temporary buffers, called VPI Payload to perform the processing. Payloads can be created once, then reused each time the algorithm is submitted to a stream. Sometimes a payload is created for a given input image size. In this case, the payload cannot be reused for different input image sizes.
VPI abstracts the data into buffers for each algorithm that it works with. VPI provides abstractions for 2D images, 1D data arrays, and 2D image pyramids. These abstractions can be allocated and managed by VPI. Additionally, for images and arrays, VPI can wrap externally-allocated memories to be used directly by algorithms. In both cases, to achieve high throughput, VPI attempts to perform shared, or zero-copy, mapping of the memory to the target backend. If unsuccessful, usually due to alignment issues or other memory characteristics, VPI seamlessly performs deep memory copies as needed.
2D images are represented by a block of memory of a given width, height, and image format. Once the image size and format is defined during construction, it cannot be changed.
In VPI 1D arrays are basically a linear block of memory of a given capacity, in elements, and format. Different from images, an array size can vary over time, not exceeding the allocated capacity.
Conceptually, a pyramid is a collection of 2D images of the same format. Image pyramids are defined by:
If the user application has an existing memory buffer that must serve as input and/or output buffers to VPI algorithms, the user has the option to wrap them into a VPIImage or VPIArray. This is the case, for example, when the main loop grabs a frame from a camera device and inputs it to a VPI processing pipeline. Depending on the frame characteristics, this memory block is used by the VPI without any memory copies being made; the so-called zero-copy shared memory mapping.
In other situations, for specifying temporary buffers used in a sequence of algorithm invocations, the user can allocate these buffers using VPI. Buffers created this way are allocated in such way that zero-copy shared mapping is more likely to happen, thereby increasing pipeline performance.
There are several facilities to coordinate work among different streams and guarantee proper task execution ordering:
The application stages include:
Consult the tutorial to build your first VPI application.