VPI is comprised of asynchronous compute streams that run algorithms on buffers in the available compute backends. Streams are synchronized using events. All these objects are owned by a particular context.
The components are described in the following sections.
Backend represents the compute hardware that ultimately runs an algorithm. VPI supports CPU, GPU (using CUDA) and PVA (Programmable Vision Accelerator) on devices that support it, such as Jetson AGX Xavier and Jetson Xavier NX.
A VPI stream is an asynchronous queue that executes algorithms in sequence on a given backend. To achieve high-level parallelism across backends, a given processing pipeline can be configured with processing stages running concurrently, each one in its VPI stream. VPI streams can collaborate with each other by exchanging data structures with the help of synchronization primitives provided by VPI.
VPI supports several computer vision algorithms, used to calculate disparity between stereo images, Harris keypoints detection, image blurring, etc. Some algorithms may need temporary buffers, called VPI Payload to perform the processing. Payloads can be created once, then reused each time the algorithm is submitted to the same stream. Sometimes a payload is created for a given input image size. In this case, the payload cannot be reused for different input image sizes.
VPI abstracts the data into buffers for each algorithm that it works with. VPI provides abstractions for 2D images, 1D data arrays, and 2D image pyramids. These abstractions can be allocated and managed by VPI. Additionally, for images and arrays, VPI can wrap externally-allocated memories to be used directly by algorithms. In both cases, to achieve high throughput, VPI attempts to perform shared, or zero-copy, mapping of the memory to the target backend. If unsuccessful, usually due to alignment issues or other memory characteristics, VPI seamlessly performs deep memory copies as needed.
2D images are represented by a block of memory of a given width, height, and image type. Once the image size and type is defined during construction, it cannot be changed. Image type is one of the VPIImageType enum, and can be:
Not all algorithms support all image types provided; however, usually several are supported.
The 2D image is laid out in memory row by row, one after the other. Each row can be larger than necessary, with some padding added to the end to have properly aligned row start addresses.
In VPI 1D arrays are basically a linear block of memory of a given capacity, in elements, and type. Different from images, an array size can vary over time, not exceeding the allocated capacity.
Array types are drawn from VPIArrayType enum. Algorithms that require arrays inputs/outputs, such as KLT template tracker, usually accept one specific type of array.
Conceptually, one pyramid is a collection of 2D images of the same type. Image pyramids are defined by:
If the user application has an existing memory buffer that must serve as input and/or output buffers to VPI algorithms, the user has the option to wrap them into a VPI Image. This is the case, for example, when the main loop grabs a frame from a camera device and inputs it to a VPI processing pipeline. Depending on the frame characteristics, this memory block is used by the VPI without any memory copies being made; the so-called zero-copy shared memory mapping.
In other situations, for specifying temporary buffers used in a sequence of algorithm invocations, the user can allocate these buffers using VPI. Buffers created this way are allocated in such way that zero-copy shared mapping is more likely to happen, thereby increasing pipeline performance.
There are several facilities to coordinate work among different streams and guarantee proper task execution ordering:
The application stages include:
Consult the tutorial to build your first VPI application.