VPI - Vision Programming Interface

0.3.7 Release

Release Notes v0.3

This is the public release of VPI v0.3. As this is a developer preview, it's intended to let our users experiment with the library, allow access to PVA hardware where available and do integration testing in existing systems.

Until VPI-1.0 is released, API and ABI backward compatibility for new versions cannot be 100% guaranteed, although they are not expected to be broken.

As with any developer preview release, use of VPI-0.3 in critical systems isn't recommended.

Changes Since v0.2.0

New Features

Optimization

  • Histogram of Oriented Gradients (experimental)
    • CUDA backend
      • Jetson AGX Xavier: 17.55x faster
      • Jetson TX2: 59% faster
      • Jetson Nano: 21% faster
    • CPU backend
      • Jetson AGX Xavier: 10% faster
      • Jetson TX2: 33% faster
      • Jetson Nano: 29x faster
  • Bilateral Image Filter
    • CUDA backend
      • Jetson AGX Xavier: 55% faster
      • Jetson TX2: 58% faster
      • Jetson Nano: 52% faster
  • Gaussian Image Filter
    • CUDA backend
      • Jetson TX2: 41% faster

API Updates

  • Remove NV_VPI_LOCAL_CHANGES macro from Version.h

Non-Breaking Changes

  • vpiImageWrapEglImage is exported in x86 variant of VPI, but returns VPI_ERROR_NOT_IMPLEMENTED, as it's currently only available on Tegra platforms.
  • Do not use default CUDA stream for memory copies, fill memory with zero and memory mapping. Also avoid it in some contexts when setting and getting a VPI array size.
  • Launched kernels by user on a VPI CUDA stream that is wrapping a user-provided cudaStream_t aren't synchronized when VPI's CUDA stream is destroyed. It's only synchronized up to the last VPI algorithm that has payload that was submitted to the device. All remaining kernels on the stream will continue execution after VPI CUDA stream is destroyed.

Breaking Changes

none

Bug Fixes

  • Fixed issue with CUDA to PVA shared mapping for NV12 images. It is now enabled.
  • Maximum score on Harris Corners Detector sample now is being rendered as white instead of black.
  • Memory mapping now happens serially with respect to the algorithm. Before it might have been unmapped while algorithm was still being executed, leading to VPI_ERROR_BUFFER_LOCKED errors or invalid output.
  • Fix some bugs in vpiSubmitUserFunction on PVA streams. Callbacks might not execute in proper order or internal stream state might become inconsistent, leading to failures.
  • Calling vpiStreamWaitFor on a VPI CUDA stream doesn't block indefinitely anymore.
  • Return VPI_ERROR_OUT_OF_MEMORY when trying to create an image too big on Tegra devices instead of segfault.
  • Calling Gaussian Pyramid Generator when the input is a wrapped level of the output pyramid now works.
  • Don't segfault on Harris Keypoint Detector on CUDA when minNMSdistance is less than 1. Returns an error instead.

Documentation Fixes

Known Issues

  • If there's a backend mismatch between a memory buffer and the streams that operates on it, the stream will issue more memory mapping operations than strictly needed. To mitigate performance hit that might arise, make sure that the memories used on the stream already resides in the stream's backend. For instance, in a stream for CUDA backend, memories created with VPI_BACKEND_ONLY_CUDA flag will have better performance because memory mapping isn't needed since the memory is allocated in the GPU itself.
  • Some stream operations might block the default CUDA stream, affecting CUDA processing outside VPI.
  • PVA backend implementation of KLT Bounding Box Tracker doesn't match CUDA and CPU's output.
  • PVA backend implementation of vpiSubmitImageConvolver currently doesn't work with 3264x2448 inputs, it'll return an error instead.
  • Some algorithms, notably Image Convolver, might segfault on Jetson Nano if input image is too big, such as 4064x2704 on CPU backend.
  • Harris Keypoint Detector on PVA may return spurious keypoints when input image is larger than 1088x1088.
  • A small memory leak could occur if the same image wrapping a user-provided EGLImage or CUDA memory is used simultaneously as the input image in multiple PVA streams.
  • In some rare instances, a moderately complex processing pipeline might erroneously return VPI_ERROR_BUFFER_LOCKED when performing memory mapping.
  • CPU to CUDA image shared mapping of wrapped non-CUDA-managed CPU memory had to be disabled due to some rare segfaults. In this case, memory mapping is now done via memory copies.
  • vpiStreamWaitFor on a CUDA stream that is wrapping a user-provided cudaStream_t might block the calling thread until the event is signaled.
  • vpiEventRecord will invalidate previously recorded stream state. This makes existing streams that are waiting for this event via vpiStreamWaitFor either unblock or wait for the new recorded stream state to be emptied. This behavior doesn't follow CUDA SDK events semantics, but it should.
  • Image Resampling sample might segfault on error situations, for instance, when trying to run it using the PVA backend (it's not implemented yet).
  • Stereo Disparity Estimator output might slightly differ on CPU backend with respect to PVA and CUDA backends.
  • Harris Keypoint Detector result scores/positions might differ among backends.
  • Sample applications that use OpenCV won't compile on Ubuntu 16.04. For a workaround, consult samples build instructions.

List of Past Versions of VPI.

Notices

Disclaimer

ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, "MATERIALS") ARE BEING PROVIDED "AS IS." NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE.

Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use. No license is granted by implication of otherwise under any patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change without notice. This publication supersedes and replaces all other information previously supplied. NVIDIA Corporation products are not authorized as critical components in life support devices or systems without express written approval of NVIDIA Corporation.

Trademarks

NVIDIA and the NVIDIA logo are trademarks or registered trademarks of NVIDIA Corporation in the U.S. and other countries. Other company and product names may be trademarks of the respective companies with which they are associated.

Copyright

© 2019-2020 NVIDIA Corporation. All rights reserved.