This is the public release of VPI v0.4. As this is a developer preview, it's intended to let our users experiment with the library, allow access to PVA and VIC hardware where available and do integration testing in existing systems.

Until VPI-1.0 is released, API and ABI backward compatibility for new versions cannot be 100% guaranteed, although they are not expected to be broken.

As with any developer preview release, use of VPI-0.4 in critical systems isn't recommended.

Changes Since v0.3.7

New Features

Simplified pipeline creation.
- A VPIStream object supports one or more backends.
- Submit functions of algorithms that don't have payload have an additional VPIBackend parameter specifying in which backend it will be executed.
- Functions that create payload for algorithms now have an additional VPIBackend parameter that specifies in which backend it will be executed. When submitting the algorithm to a stream, the backend is implicitly defined by the payload at payload creation time.
- Algorithms that are executed in different backends can be submitted to the same VPIStream. They will be synchronized automatically to ensure sequencial execution.
Added the VIC backend.
- The old PVA backend is now split into PVA and VIC backends, better reflecting in what processor the algorithm will be executed.
- It's only available on Jetson platforms.
Remap / Lens Distortion Correction
- Added CUDA and GPU backend implementations.
- Algorithm made available in x86 builds.
- It now supports input and output that have different dimensions.
- Added support for the following image formats:
  - VPI_IMAGE_FORMAT_NV12 (VIC, CPU and CUDA backends).
  - VPI_IMAGE_FORMAT_U8 (CPU and CUDA backends).
  - VPI_IMAGE_FORMAT_U16 (CPU and CUDA backends).
  - VPI_IMAGE_FORMAT_RGBA8 (CPU and CUDA backends).
  - VPI_IMAGE_FORMAT_BGRA8 (CPU and CUDA backends).
  - VPI_IMAGE_FORMAT_RGB8 (CPU and CUDA backends).
  - VPI_IMAGE_FORMAT_BGR8 (CPU and CUDA backends).
Perspective Warp
- Added CUDA and GPU backend implementations.
- Algorithm made available in x86 builds.
- Added support for the following image formats:
  - VPI_IMAGE_FORMAT_NV12 (VIC, CPU and CUDA backends).
  - VPI_IMAGE_FORMAT_NV12_BL (VIC backend).
  - VPI_IMAGE_FORMAT_RGBA8 (VIC, CPU and CUDA backends).
  - VPI_IMAGE_FORMAT_BGRA8 (VIC, CPU and CUDA backends).
  - VPI_IMAGE_FORMAT_RGB8 (CPU backend)
  - VPI_IMAGE_FORMAT_BGR8 (CPU backend)
Rescale
- Added VIC backend implementation.
- Added support for the following image formats:
  - VPI_IMAGE_FORMAT_NV12 (VIC, CPU and CUDA backends).
  - VPI_IMAGE_FORMAT_NV12_BL (VIC backend).
  - VPI_IMAGE_FORMAT_RGBA8 (VIC, CPU and CUDA backends).
  - VPI_IMAGE_FORMAT_BGRA8 (VIC, CPU and CUDA backends).
  - VPI_IMAGE_FORMAT_RGB8 (CPU backend)
  - VPI_IMAGE_FORMAT_BGR8 (CPU backend)
Convert Image Format
- Added VIC backend implementation.
- Added support for following conversions:
  - VPI_IMAGE_FORMAT_NV12 <-> VPI_IMAGE_FORMAT_NV12_BL (VIC backend)
  - VPI_IMAGE_FORMAT_RGBA8 <-> VPI_IMAGE_FORMAT_NV12_BL (VIC backend)
  - VPI_IMAGE_FORMAT_BGRA8 <-> VPI_IMAGE_FORMAT_NV12_BL (VIC backend)
Lens Distortion Correction
- vpiWarpMapGenerateFromFisheyeLensDistortionModel and vpiWarpMapGenerateFromPolynomialLensDistortionModel now support the skew parameter of VPICameraIntrinsic.
Added block-linear image format support.
- Currently only VPI_IMAGE_FORMAT_NV12_BL is officially supported.
- Only supported on Jetson platforms using VIC backend.
Block-linear images can be locked for host access. They are implicitly converted to pitch-linear format on host, and this is returned to the user. If this isn't needed, user can pass flag VPI_DISABLE_BL_HOST_LOCK to image creation function to save memory. vpiImageLock will fail in this case.
Added support for wrapping block-linear EGLImage and NvBuffer.
VPIImageFormat and VPIPixelFormat are now defined by their characteristics such as color space, number of channels, bits per pixel, etc. New formats can be created based on these parameters, and they can be queried from an existing format. Although these formats can be created, VPI algorithms are only guaranteed to work with the predefined formats. User-defined formats might work, so results must be checked for correctness prior to deploying the solution to production environment. See vpi/Format.h, vpi/ImageFormat.h and vpi/PixelFormat.h
Added fast user-provided memory wrapping.
- VPIImage and VPIArray that wraps an user-provided memory can now be reused to wrap another user-provided memory block without incurring in extra memory allocations. This allows for efficient memory wrapping in main application loop.
Added new functions to return the last VPIStatus returned by a VPI function in the calling thread.
- vpiGetLastStatus
- vpiGetLastStatusMessage
- vpiPeekAtLastStatus
- vpiPeekAtLastStatusMessage
Added vpiSubmitHostFunctionEx that is guaranteed to execute the user function exactly once. This makes it easy to allocate a structure with more complex parameters to be passed to the user function, and have it deallocated by it even if the VPIStream is at error state.
The host function submitted by vpiSubmitHostFunction and vpiSubmitHostFunctionEx can launch CUDA kernels and do other processing with guarantee that it will be executed sequentially with respect the other tasks submitted to the stream. This allows adding processing stages from outside VPI into the VPI pipeline.
There are no more limits on the number of VPI objects that can be created.
vpiPyramidLock, vpiImageLock and vpiArrayLock accept NULL for its returned data pointer. This effectively locks the object but doesn't return the host-side pointer to its data. This is useful to make sure the wrapped memory is updated with changes made by VPI. The corresponding unlock function still must be called.
vpiPyramidLock, vpiImageLock and vpiArrayLock calls can be nested. For each call, the corresponding unlock function must be called. The last unlock will effectively unlock the object.
Added vpiStreamGetFlags function to return the flags used during VPIStream creation.
Added vpiEventGetFlags function to return the flags used during VPIEvent creation.

Optimization

Temporal Noise Reduction version 2:
- Use of block-linear image support allows for faster execution on VIC backend, from 2x to 4x, depending on the Jetson platform used.

Breaking Changes / API Updates

Several API updates were made to make the library easier to use and more uniform, taking advantage of its developer-preview status. Once version 1.0 is released, any further API updates will necessarily have the old API go through a deprecation phase, to minimize code disruptions.

Despite these updates, the library is still ABI-compatible with the previous vpi-0.3. Application binaries linked to the old version are supposed to work with the new library without relinking.

VPIDeviceType renamed to VPIBackend.
VPI_DEVICE_TYPE_CPU, VPI_DEVICE_TYPE_CUDA, etc, renamed to VPI_BACKEND_CPU, VPI_BACKEND_CUDA, etc.
The flags that enable/disable VPIImage backends, VPI_IMAGE_DISABLE_CPU, VPI_IMAGE_ONLY_PVA (and others) were removed. Now the enabled backends can be defined by or-ing together the corresponding VPIBackend flags. As before, not specifying any backend, e.g. by passing 0 as flags, enables all available backends.
Rename VPIImageType to VPIImageFormat.
Rename VPI_IMAGE_TYPE_RGB8 to VPI_IMAGE_FORMAT_RGB8. Other image formats renamed similarly.
Rename VPI_PIXEL_TYPE_U8 to VPI_PIXEL_FORMAT_U8. Other pixels formats renamed similarly.
Removed deprecated image format enums.
Removed deprecated pixel format enums.
Similarly to VPIImage, the backend enable/disable flags for VPIContext, VPIArray, VPIPyramid and VPIStream (which now supports several backends) don't use the old flags, which got removed, but the new VPIBackend flags.
vpiSubmitSeparableConvolution now returns VPI_ERROR_INVALID_IMAGE_FORMAT instead of VPI_ERROR_INVALID_ARGUMENT when an unsupported image format is used. This might affect applications linked to vpi-0.3 after upgrading to vpi-0.4.
The following algorithms had their PVA backend implementation moved to the new VPI_BACKEND_VIC, as they actually are being executed on VIC hardware:
- Rescale
- Temporal Noise Reduction
- Lens Distortion Correction
- Remap
- Perspective Warp
vpiSubmitUserFunction was renamed to vpiSubmitHostFunction.
VPIUserFunction was renamed to VPIHostFunction and its signature was changed to void(void *). To have the host function return a status code, use the new vpiSubmitHostFunctionEx.
Calling vpiStreamDestroy on a VPIStream that is wrapping a user-provided cudaStream_t will always synchronize the stream prior destroying it. Previously, it wasn't synchronizing it in some situations. This is a backward-incompatible change with respect to vpi-0.3.
vpiStreamGetThreadHandle will return a valid CPU thread handle even if the VPIStream is wrapping a cudaStream_t. This is a backward-incompatible change with respect to vpi-0.3.
VPIImagePlane.stride was renamed to VPIImagePlane::pitchBytes.
VPIArrayData.rowStride was renamed to VPIArrayData::strideBytes.
vpiPyramidGetWidth and vpiPyramidGetHeight were replaced by a single vpiPyramidGetSize call that returns both values.
Temporarily disabled support for NvBuffer with VPI_IMAGE_FORMAT_U8 content.

Functions that create wrappers for external objects were renamed:

Old	New
`vpiImageWrapCudaDeviceMem`	vpiImageCreateCudaMemWrapper
`vpiImageWrapHostMem`	vpiImageCreateHostMemWrapper
`vpiImageWrapNvBuffer`	vpiImageCreateNvBufferWrapper
`vpiImageWrapEglImage`	vpiImageCreateEglImageWrapper
`vpiArrayWrapCudaDeviceMem`	vpiArrayCreateCudaMemWrapper
`vpiArrayWrapHostMem`	vpiArrayCreateHostMemWrapper
`vpiEventWrapEglSync`	vpiEventCreateEglSyncWrapper

Several algorithms and their respective headers, functions and types were renamed to their well-known names:

Old	New
Box Image Filter	Box Filter
vpi/algo/BoxImageFilter.h	vpi/algo/BoxFilter.h
`vpiSubmitBoxImageFilter`	vpiSubmitBoxFilter
Bilateral Image Filter	Bilateral Filter
vpi/algo/BilateralImageFilter.h	vpi/algo/BilateralFilter.h
`vpiSubmitBilateralImageFilter`	vpiSubmitBilateralFilter
Gaussian Image Filter	Gaussian Filter
vpi/algo/GaussianImageFilter.h	vpi/algo/GaussianFilter.h
`vpiSubmitGaussianImageFilter`	vpiSubmitGaussianFilter
Image Convolver	Convolution
vpi/algo/ImageConvolver.h	vpi/algo/Convolution.h
`vpiSubmitImageConvolver`	vpiSubmitConvolution
Separable Image Convolver	Separable Convolution
vpi/algo/SeparableImageConvolver.h	vpi/algo/Convolution.h
`vpiSubmitSeparableImageConvolver`	vpiSubmitSeparableConvolution
KLT Bounding Box Tracker	KLT Feature Tracker
vpi/algo/KLTBoundingBoxTracker.h	vpi/algo/KLTFeatureTracker.h
`vpiCreateKLTBoundingBoxTracker`	vpiCreateKLTFeatureTracker
`vpiSubmitKLTBoundingBoxTracker`	vpiSubmitKLTFeatureTracker
`VPIKLTBoundingBoxTrackerParams`	VPIKLTFeatureTrackerParams
Gaussian Pyramid Generator
vpi/algo/GaussianPyramidGenerator.h	vpi/algo/GaussianPyramid.h
Harris Keypoint Detector	Harris Corner Detector
vpi/algo/HarrisKeypointDetector.h	vpi/algo/HarrisCornerDetector.h
`vpiCreateHarrisKeypointDetector`	vpiCreateHarrisCornerDetector
`vpiSubmitHarrisKeypointDetector`	vpiSubmitHarrisCornerDetector
`VPIHarrisKeypointDetectorParams`	VPIHarrisCornerDetectorParams
Image FFT	FFT
vpi/algo/ImageFFT.h	vpi/algo/FFT.h
`vpiCreateImageFFT`	vpiCreateFFT
`vpiSubmitImageFFT`	vpiSubmitFFT
Image Inverse FFT	Inverse FFT
vpi/algo/ImageIFFT.h	vpi/algo/FFT.h
`vpiCreateImageIFFT`	vpiCreateIFFT
`vpiSubmitImageIFFT`	vpiSubmitIFFT
Image Format Converter	Convert Image Format
vpi/algo/ImageFormatConverter.h	vpi/algo/ConvertImageFormat.h
`vpiSubmitImageFormatConverter`	vpiSubmitConvertImageFormat
Image Remap	Remap
vpi/algo/ImageRemap.h	vpi/algo/Remap.h
`vpiCreateImageRemap`	vpiCreateRemap
`vpiSubmitImageRemap`	vpiSubmitRemap
Image Resampler	Rescale
vpi/algo/ImageResampler.h	vpi/algo/Rescale.h
`vpiSubmitImageResampler`	vpiSubmitRescale
Perspective Image Warp	Perspective Warp
vpi/algo/PerspectiveImageWarp.h	vpi/algo/PerspectiveWarp.h
`vpiCreatePerspectiveImageWarp`	vpiCreatePerspectiveWarp
`vpiSubmitPespectiveImageWarp`	vpiSubmitPerspectiveWarp
Stereo Disparity Estimator
vpi/algo/StereoDisparityEstimator.h	vpi/algo/StereoDisparity.h

Non-Breaking Changes

Added VIC support to Image Resampling. Because it currently only supports color images, the whole sample was reworked to output color results.
Sample applications updates:
- Display error message using new vpiGetLastStatusMessage function.
- Use the new simplified pipeline description API.
- Use the optimized memory wrapping functions.

Bug Fixes

Sample applications that use OpenCV on Ubuntu 16.04 will now compile. The exception is Fisheye Distortion Correction that requires OpenCV>=2.4.10, not available by default on Ubuntu 16.04.
Dropped the pva option for FFT sample, as FFT isn't implemented on this backend.
Added the proper input image format check on Convolution and Separable Convolution.
Avoid segfault when invalid parameters are passed to vpiWarpMapGenerateIdentity. The function now returns an error.
Avoid warning printed to stderr on Quill: [WARN ] SetCurrentThreadAffinity() failed. Return value: EINVAL
Do not print VPI error messages to stderr. If application wants to retrieve the error message, it must use the new functions vpiGetLastStatusMessage or vpiPeekAtLastStatusMessage.
The first few frames output by Temporal Noise Reduction sample application on VIC (previously PVA) had a greenish tint. This was fixed.
vpiEventRecord can be called multiple times on the same or different stream. New calls won't affect the existing calls to vpiEventSync or vpiStreamWaitFor waiting on the event. They will still unblock when the original recorded stream tasks are finished. This makes VPIEvent have the same semantics of cudaEvent_t.
vpiStreamSync waits for synchronization to complete before returning the stream error code (if any). In case of stream errors, the post-condition is that no more tasks are left to be executed by the stream.
Fixed some memory cache coherency issues when using a wrapped EGLImage in a CUDA algorithm on Jetson TX2.

Documentation Updates

Updated apt installation instructions to cope with missing add-apt-repository on Ubuntu 16.04.
Updated Clock Frequency and Power Settings section with a script to set device clock frequencies to maximum for benchmarking purposes.
Updated Basic concepts and Architecture with new information regading how the new VPIStream works.
Complex Pipeline rewritten with a new pipeline that can actually be implemented on the current VPI version.
Improved algorithm performance tables, such as this.

Known Issues

A VPIEvent that recorded a stream whose last task was submitted to the CUDA backend can only record another stream after the event is signaled. An error is returned if this condition isn't met.
Convert Image Format on CUDA might introduce a small error of at most 2 when compared with other backends.
If there's a backend mismatch between a memory buffer and the streams that operates on it, the stream will issue more memory mapping operations than strictly needed. To mitigate a performance hit that might arise, make sure that the memories used on the stream already reside in the stream's backend. For instance, in a stream for CUDA backend, memories created with VPI_BACKEND_CUDA flag will have better performance because memory mapping isn't needed, since the memory is allocated in the GPU itself.
PVA backend implementation of KLT Feature Tracker doesn't match CUDA and CPU's output.
PVA backend implementation of vpiSubmitConvolution currently doesn't work with 3264x2448 inputs, it'll return an error instead.
Some algorithms, notably Convolution, might segfault on Jetson Nano if input image is too big, such as 4064x2704 on CPU backend.
Harris Corner Detector on PVA may return spurious keypoints when input image is larger than 1088x1088.
A small memory leak could occur if the same image wrapping a user-provided EGLImage or CUDA memory is used simultaneously as the input image in multiple PVA streams.
In some rare instances, a moderately complex processing pipeline might erroneously return VPI_ERROR_BUFFER_LOCKED when performing memory mapping.
CPU to CUDA image shared mapping of wrapped non-CUDA-managed CPU memory had to be disabled due to some rare segfaults. In this case, memory mapping is now done via memory copies.
vpiStreamWaitFor on a CUDA stream that is wrapping a user-provided cudaStream_t might block the calling thread until the event is signaled.
Stereo Disparity Estimator output might slightly differ on CPU backend with respect to PVA and CUDA backends.
Harris Corner Detector result scores/positions might differ among backends.
Output of Remap from CPU and CUDA backends has a subpixel translation with respect to VIC's output.

Notices

Disclaimer

ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, "MATERIALS") ARE BEING PROVIDED "AS IS." NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE.

Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use. No license is granted by implication of otherwise under any patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change without notice. This publication supersedes and replaces all other information previously supplied. NVIDIA Corporation products are not authorized as critical components in life support devices or systems without express written approval of NVIDIA Corporation.

Trademarks

NVIDIA and the NVIDIA logo are trademarks or registered trademarks of NVIDIA Corporation in the U.S. and other countries. Other company and product names may be trademarks of the respective companies with which they are associated.

VPI - Vision Programming Interface

0.4.4 Release