This is the second public release of VPI. As this is a developer preview, it's intended to let our users experiment with the library, allow access to PVA hardware where available, and do integration testing in existing systems.

Until VPI-1.0 is released, API and ABI backward compatibility for new versions cannot be 100% guaranteed, although it is not expected that they will be broken.

As with any developer preview release, use of VPI-0.2.0 in critical systems isn't recommended.

Changes Since v0.1.0

New Features

New algorithms
Added support for cross-compilation to aarch64 targets from x86_64 hosts, see Cross-Compilation Targeting aarch64.
VPI now can be used in x86_64 targets. CPU and CUDA backends are available, PVA is disabled.
Harris Keypoint Detector is now enabled in PVA backend.
Added support for VPI_IMAGE_TYPE_U8 inputs to Stereo Disparity Estimator in CUDA and CPU backends.
Added function vpiStreamWrapCuda to wrap an existing cudaStream_t handle to be used as a VPIStream.
Added support to PVA's vpiSubmitBoxImageFilter for VPI_BOUNDARY_COND_CLAMP only when kernel size is \(3x3\). For all other sizes, only VPI_BOUNDARY_COND_ZERO is available.
Added support for VPI_IMAGE_TYPE_BGR8 and VPI_IMAGE_TYPE_BGRA8 types. Currently not many algorithms support it, but Image Format Converter supports and can be used to convert them to a supported format.
VPI_NUM_THREADS environment variable overrides how many threads will be used by CPU worker thread pool. This is useful for debugging.

API Updates

Add _PRECISE suffix to existing interpolators types without _FAST prefix. These now are an alias to the corresponds to the _FAST variants. See VPIInterpolationType.
Deprecated VPI_(IMAGE|ARRAY|PYRAMID|EVENT|CONTEXT)_(NO|ONLY)_* identifiers. They were replaced by VPI_BACKEND_NO_* and VPI_BACKEND_ONLY_*. Now these are accepted as flags during object's creation. This brings some uniformity in specifying the same set of flags (with same behavior) to different objects.
Deprecated several VPIImageType and VPIPixelType enumerations, replacing them with a shorter and more expressive name. For instance, VPI_IMAGE_TYPE_1C_F32 was renamed to VPI_IMAGE_TYPE_F32.

Non-Breaking Changes

Each one of the CPU's worker threads has affinity to one CPU logical core, minimizing performance hits due to thread core migration.
Same VPI binary works in both Ubuntu 16.04 and 18.08.
Harris Keypoint Detector output scores' range was updated in both CUDA and CPU backends.
Several algorithms were highly optimized across all backends, most notably:
Please refer to Performance Improvements for more details.
vpiStreamWaitFor optimized when CUDA stream is waiting for a CUDA event.
CUDA backend of VPIStream's internal queue handling got optimized. Now if user wants to have all backends to behave the same in presence of vpiSubmitUserFunction calls, with CUDA backend the callback function passed must return VPI_SUCCESS. If it does return an error, it's not guaranteed that subsequent tasks queued onto the stream will be cancelled.
vpiArraySetSize and vpiArrayGetSize can be called even if array is unlocked.
Images, arrays and pyramids can all be locked recursively by the same thread. Each lock call must be matched by a unlock call.
Maximum number of created VPIStream handles increased from 8 to 16.

Breaking Changes

Applications built against vpi-0.2.0 that use VPI_CONTEXT_NO_CUDA flag when creating contexts might fail if used with vpi-0.1.0 shared library. This flag's value was changed, affecting binary compatibility. More specifically, vpiContextCreate call might fail with VPI_ERROR_INVALID_ARGUMENT in this case.

Miscellaneous Changes

Updated where VPI is installed, see Installation. Existing projects that link against VPI shouldn't be affected, nor cmake-based projects must be updated.
Shared library file size reduced drastically, from 197 Mb to 81 Mb.

Bug Fixes

Don't output error/warning messages to stderr when VPI function call succeeds.
If creating memory for a particular backend fails, try different backends.
PVA backend for vpiSubmitHarrisKeypointDetector makes sure that outFeatures and outScores are returned with the same size.
Harris Keypoint Detector was returning spurious keypoints, specially noticeable when input consists of a white image. This got fixed.
Fixed output error of PVA's Image Convolver for certain kernels that aren't normalized. Notably, now 2D Image Convolution sample application works as expected on PVA.
Enabled CUDA/CPU to PVA shared memory mapping.
On vpiCreateExtractHOGFeatures, explictly don't allow inputs that are too tall. In some platforms it was resulting in silent failures, specially on Jetson Nano.
Now VPI works correctly on devices equipped with a Maxwell-grade dGPU. Before some memory operations could lead to bus errors.
CUDA memories returned by cudaMallocPitch and wrapped by VPI can now be shared mapped to be used by PVA backend. Before shared mapping was disabled and deep copies were performed instead.
Enabled PVA backend for vpiSubmitImageConvolver, vpiSubmitBoxImageFilter and vpiSubmitGaussianImageFilter when number of kernel weights is greater than or equal to 49, e.g. \(7 \times 7\) and \( 5 \times 11\).
Enabled shared mapping of wrapped NV12 EGLImage to be used by PVA backend algorithms.
Image Convolver on PVA backend was failing with input images of certain dimensions such as 640x161, 640x340, 1280x304, etc. Now it's working as expected.

Documentation Fixes

Previously it was stated that VPI_INTERP_CATMULL_ROM_FAST for backends other than CPU worked exactly like VPI_INTERP_LINEAR, which is wrong.
Image types description regarding endianess was clarified.

Known Issues

Shared (i.e. zero-copy) memory mapping of NV12 images from CUDA to PVA backends is disabled
If there's a backend mismatch between a memory buffer and the streams that operates on it, the stream will issue more memory mapping operations than strictly needed. To mitigate performance hit that might arise, make sure that the memories used on the stream already resides in the stream's backend. For instance, in a stream for CUDA backend, memories created with VPI_BACKEND_ONLY_CUDA flag will have better performance because memory mapping isn't needed since the memory is allocated in the GPU itself.
Some stream operations might block the default CUDA stream, affecting CUDA processing outside VPI.
PVA backend implementation of KLT Bounding Box Tracker doesn't match CUDA and CPU's output.
PVA backend implementation of vpiSubmitImageConvolver currently doesn't work with 3264x2448 inputs, it'll return an error instead.
Some algorithms, notably Image Convolver, might segfault on Jetson Nano if input image is too big, such as 4064x2704 on CPU backend.
Harris Keypoint Detector on PVA may return spurious keypoints when input image is larger than 1088x1088.
Results from Box Image Filter on PVA might have increased absolute error with respect to other backends for 16-bit inputs.

List of Past Versions of VPI.

Notices

Disclaimer

ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, "MATERIALS") ARE BEING PROVIDED "AS IS." NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE.

Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use. No license is granted by implication of otherwise under any patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change without notice. This publication supersedes and replaces all other information previously supplied. NVIDIA Corporation products are not authorized as critical components in life support devices or systems without express written approval of NVIDIA Corporation.

Trademarks

NVIDIA and the NVIDIA logo are trademarks or registered trademarks of NVIDIA Corporation in the U.S. and other countries. Other company and product names may be trademarks of the respective companies with which they are associated.

VPI - Vision Programming Interface

0.2.0 Release