This is the second public release of VPI. As this is a developer preview, it's intended to let our users experiment with the library, allow access to PVA hardware where available, and do integration testing in existing systems.
Until VPI-1.0 is released, API and ABI backward compatibility for new versions cannot be 100% guaranteed, although it is not expected that they will be broken.
As with any developer preview release, use of VPI-0.2.0 in critical systems isn't recommended.
Changes Since v0.1.0
New Features
API Updates
- Add
_PRECISE suffix to existing interpolators types without _FAST prefix. These now are an alias to the corresponds to the _FAST variants. See VPIInterpolationType.
- Deprecated
VPI_(IMAGE|ARRAY|PYRAMID|EVENT|CONTEXT)_(NO|ONLY)_* identifiers. They were replaced by VPI_BACKEND_NO_* and VPI_BACKEND_ONLY_*. Now these are accepted as flags during object's creation. This brings some uniformity in specifying the same set of flags (with same behavior) to different objects.
- Deprecated several VPIImageType and VPIPixelType enumerations, replacing them with a shorter and more expressive name. For instance, VPI_IMAGE_TYPE_1C_F32 was renamed to VPI_IMAGE_TYPE_F32.
Non-Breaking Changes
- Each one of the CPU's worker threads has affinity to one CPU logical core, minimizing performance hits due to thread core migration.
- Same VPI binary works in both Ubuntu 16.04 and 18.08.
- Harris Keypoint Detector output scores' range was updated in both CUDA and CPU backends.
Several algorithms were highly optimized across all backends, most notably:
Please refer to Performance Improvements for more details.
- vpiStreamWaitFor optimized when CUDA stream is waiting for a CUDA event.
- CUDA backend of VPIStream's internal queue handling got optimized. Now if user wants to have all backends to behave the same in presence of vpiSubmitUserFunction calls, with CUDA backend the callback function passed must return VPI_SUCCESS. If it does return an error, it's not guaranteed that subsequent tasks queued onto the stream will be cancelled.
- vpiArraySetSize and vpiArrayGetSize can be called even if array is unlocked.
- Images, arrays and pyramids can all be locked recursively by the same thread. Each lock call must be matched by a unlock call.
- Maximum number of created VPIStream handles increased from 8 to 16.
Breaking Changes
Miscellaneous Changes
- Updated where VPI is installed, see Installation. Existing projects that link against VPI shouldn't be affected, nor cmake-based projects must be updated.
- Shared library file size reduced drastically, from 197 Mb to 81 Mb.
Bug Fixes
- Don't output error/warning messages to
stderr when VPI function call succeeds.
- If creating memory for a particular backend fails, try different backends.
- PVA backend for vpiSubmitHarrisKeypointDetector makes sure that
outFeatures and outScores are returned with the same size.
- Harris Keypoint Detector was returning spurious keypoints, specially noticeable when input consists of a white image. This got fixed.
- Fixed output error of PVA's Image Convolver for certain kernels that aren't normalized. Notably, now 2D Image Convolution sample application works as expected on PVA.
- Enabled CUDA/CPU to PVA shared memory mapping.
- On vpiCreateExtractHOGFeatures, explictly don't allow inputs that are too tall. In some platforms it was resulting in silent failures, specially on Jetson Nano.
- Now VPI works correctly on devices equipped with a Maxwell-grade dGPU. Before some memory operations could lead to bus errors.
- CUDA memories returned by cudaMallocPitch and wrapped by VPI can now be shared mapped to be used by PVA backend. Before shared mapping was disabled and deep copies were performed instead.
- Enabled PVA backend for vpiSubmitImageConvolver, vpiSubmitBoxImageFilter and vpiSubmitGaussianImageFilter when number of kernel weights is greater than or equal to 49, e.g. \(7 \times 7\) and \( 5 \times 11\).
- Enabled shared mapping of wrapped NV12 EGLImage to be used by PVA backend algorithms.
- Image Convolver on PVA backend was failing with input images of certain dimensions such as 640x161, 640x340, 1280x304, etc. Now it's working as expected.
Documentation Fixes
Known Issues
- Shared (i.e. zero-copy) memory mapping of NV12 images from CUDA to PVA backends is disabled
- If there's a backend mismatch between a memory buffer and the streams that operates on it, the stream will issue more memory mapping operations than strictly needed. To mitigate performance hit that might arise, make sure that the memories used on the stream already resides in the stream's backend. For instance, in a stream for CUDA backend, memories created with VPI_BACKEND_ONLY_CUDA flag will have better performance because memory mapping isn't needed since the memory is allocated in the GPU itself.
- Some stream operations might block the default CUDA stream, affecting CUDA processing outside VPI.
- PVA backend implementation of KLT Bounding Box Tracker doesn't match CUDA and CPU's output.
- PVA backend implementation of vpiSubmitImageConvolver currently doesn't work with 3264x2448 inputs, it'll return an error instead.
- Some algorithms, notably Image Convolver, might segfault on Jetson Nano if input image is too big, such as 4064x2704 on CPU backend.
- Harris Keypoint Detector on PVA may return spurious keypoints when input image is larger than 1088x1088.
- Results from Box Image Filter on PVA might have increased absolute error with respect to other backends for 16-bit inputs.
List of Past Versions of VPI.
Notices
Disclaimer
ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, "MATERIALS") ARE BEING PROVIDED "AS IS." NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE.
Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use. No license is granted by implication of otherwise under any patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change without notice. This publication supersedes and replaces all other information previously supplied. NVIDIA Corporation products are not authorized as critical components in life support devices or systems without express written approval of NVIDIA Corporation.
Trademarks
NVIDIA and the NVIDIA logo are trademarks or registered trademarks of NVIDIA Corporation in the U.S. and other countries. Other company and product names may be trademarks of the respective companies with which they are associated.
Copyright
© 2019-2020 NVIDIA Corporation. All rights reserved.