This is the public release of VPI v0.4. As this is a developer preview, it's intended to let our users experiment with the library, allow access to PVA and VIC hardware where available and do integration testing in existing systems.
Until VPI-1.0 is released, API and ABI backward compatibility for new versions cannot be 100% guaranteed, although they are not expected to be broken.
As with any developer preview release, use of VPI-0.4 in critical systems isn't recommended.
Changes Since v0.3.7
New Features
- Simplified pipeline creation.
- A VPIStream object supports one or more backends.
- Submit functions of algorithms that don't have payload have an additional VPIBackend parameter specifying in which backend it will be executed.
- Functions that create payload for algorithms now have an additional VPIBackend parameter that specifies in which backend it will be executed. When submitting the algorithm to a stream, the backend is implicitly defined by the payload at payload creation time.
- Algorithms that are executed in different backends can be submitted to the same VPIStream. They will be synchronized automatically to ensure sequencial execution.
- Added the VIC backend.
- The old PVA backend is now split into PVA and VIC backends, better reflecting in what processor the algorithm will be executed.
- It's only available on Jetson platforms.
- Remap / Lens Distortion Correction
- Added CUDA and GPU backend implementations.
- Algorithm made available in x86 builds.
- It now supports input and output that have different dimensions.
- Added support for the following image formats:
- Perspective Warp
- Added CUDA and GPU backend implementations.
- Algorithm made available in x86 builds.
- Added support for the following image formats:
- Rescale
- Added VIC backend implementation.
- Added support for the following image formats:
- Convert Image Format
- Added VIC backend implementation.
- Added support for following conversions:
- Lens Distortion Correction
- Added block-linear image format support.
- Block-linear images can be locked for host access. They are implicitly converted to pitch-linear format on host, and this is returned to the user. If this isn't needed, user can pass flag VPI_DISABLE_BL_HOST_LOCK to image creation function to save memory. vpiImageLock will fail in this case.
- Added support for wrapping block-linear
EGLImage
and NvBuffer
.
- VPIImageFormat and VPIPixelFormat are now defined by their characteristics such as color space, number of channels, bits per pixel, etc. New formats can be created based on these parameters, and they can be queried from an existing format. Although these formats can be created, VPI algorithms are only guaranteed to work with the predefined formats. User-defined formats might work, so results must be checked for correctness prior to deploying the solution to production environment. See vpi/Format.h, vpi/ImageFormat.h and vpi/PixelFormat.h
- Added fast user-provided memory wrapping.
- VPIImage and VPIArray that wraps an user-provided memory can now be reused to wrap another user-provided memory block without incurring in extra memory allocations. This allows for efficient memory wrapping in main application loop.
- Added new functions to return the last VPIStatus returned by a VPI function in the calling thread.
- Added vpiSubmitHostFunctionEx that is guaranteed to execute the user function exactly once. This makes it easy to allocate a structure with more complex parameters to be passed to the user function, and have it deallocated by it even if the VPIStream is at error state.
- The host function submitted by vpiSubmitHostFunction and vpiSubmitHostFunctionEx can launch CUDA kernels and do other processing with guarantee that it will be executed sequentially with respect the other tasks submitted to the stream. This allows adding processing stages from outside VPI into the VPI pipeline.
- There are no more limits on the number of VPI objects that can be created.
- vpiPyramidLock, vpiImageLock and vpiArrayLock accept NULL for its returned data pointer. This effectively locks the object but doesn't return the host-side pointer to its data. This is useful to make sure the wrapped memory is updated with changes made by VPI. The corresponding unlock function still must be called.
- vpiPyramidLock, vpiImageLock and vpiArrayLock calls can be nested. For each call, the corresponding unlock function must be called. The last unlock will effectively unlock the object.
- Added vpiStreamGetFlags function to return the flags used during VPIStream creation.
- Added vpiEventGetFlags function to return the flags used during VPIEvent creation.
Optimization
- Temporal Noise Reduction version 2:
- Use of block-linear image support allows for faster execution on VIC backend, from 2x to 4x, depending on the Jetson platform used.
Breaking Changes / API Updates
Several API updates were made to make the library easier to use and more uniform, taking advantage of its developer-preview status. Once version 1.0 is released, any further API updates will necessarily have the old API go through a deprecation phase, to minimize code disruptions.
Despite these updates, the library is still ABI-compatible with the previous vpi-0.3. Application binaries linked to the old version are supposed to work with the new library without relinking.
VPIDeviceType
renamed to VPIBackend.
VPI_DEVICE_TYPE_CPU
, VPI_DEVICE_TYPE_CUDA
, etc, renamed to VPI_BACKEND_CPU, VPI_BACKEND_CUDA, etc.
- The flags that enable/disable VPIImage backends,
VPI_IMAGE_DISABLE_CPU
, VPI_IMAGE_ONLY_PVA
(and others) were removed. Now the enabled backends can be defined by or-ing together the corresponding VPIBackend flags. As before, not specifying any backend, e.g. by passing 0 as flags, enables all available backends.
- Rename
VPIImageType
to VPIImageFormat.
- Rename
VPI_IMAGE_TYPE_RGB8
to VPI_IMAGE_FORMAT_RGB8. Other image formats renamed similarly.
- Rename
VPI_PIXEL_TYPE_U8
to VPI_PIXEL_FORMAT_U8. Other pixels formats renamed similarly.
- Removed deprecated image format enums.
- Removed deprecated pixel format enums.
- Similarly to VPIImage, the backend enable/disable flags for VPIContext, VPIArray, VPIPyramid and VPIStream (which now supports several backends) don't use the old flags, which got removed, but the new VPIBackend flags.
- vpiSubmitSeparableConvolution now returns VPI_ERROR_INVALID_IMAGE_FORMAT instead of VPI_ERROR_INVALID_ARGUMENT when an unsupported image format is used. This might affect applications linked to vpi-0.3 after upgrading to vpi-0.4.
- The following algorithms had their PVA backend implementation moved to the new VPI_BACKEND_VIC, as they actually are being executed on VIC hardware:
vpiSubmitUserFunction
was renamed to vpiSubmitHostFunction.
VPIUserFunction
was renamed to VPIHostFunction and its signature was changed to void(void *)
. To have the host function return a status code, use the new vpiSubmitHostFunctionEx.
- Calling vpiStreamDestroy on a VPIStream that is wrapping a user-provided
cudaStream_t
will always synchronize the stream prior destroying it. Previously, it wasn't synchronizing it in some situations. This is a backward-incompatible change with respect to vpi-0.3.
- vpiStreamGetThreadHandle will return a valid CPU thread handle even if the VPIStream is wrapping a
cudaStream_t
. This is a backward-incompatible change with respect to vpi-0.3.
VPIImagePlane.stride
was renamed to VPIImagePlane::pitchBytes.
VPIArrayData.rowStride
was renamed to VPIArrayData::strideBytes.
vpiPyramidGetWidth
and vpiPyramidGetHeight
were replaced by a single vpiPyramidGetSize call that returns both values.
- Temporarily disabled support for NvBuffer with VPI_IMAGE_FORMAT_U8 content.
- Functions that create wrappers for external objects were renamed:
- Several algorithms and their respective headers, functions and types were renamed to their well-known names:
Non-Breaking Changes
- Added VIC support to Image Resampling. Because it currently only supports color images, the whole sample was reworked to output color results.
- Sample applications updates:
- Display error message using new vpiGetLastStatusMessage function.
- Use the new simplified pipeline description API.
- Use the optimized memory wrapping functions.
Bug Fixes
- Sample applications that use OpenCV on Ubuntu 16.04 will now compile. The exception is Fisheye Distortion Correction that requires OpenCV>=2.4.10, not available by default on Ubuntu 16.04.
- Dropped the pva option for FFT sample, as FFT isn't implemented on this backend.
- Added the proper input image format check on Convolution and Separable Convolution.
- Avoid segfault when invalid parameters are passed to vpiWarpMapGenerateIdentity. The function now returns an error.
- Avoid warning printed to stderr on Quill:
[WARN ] SetCurrentThreadAffinity() failed. Return value: EINVAL
- Do not print VPI error messages to stderr. If application wants to retrieve the error message, it must use the new functions vpiGetLastStatusMessage or vpiPeekAtLastStatusMessage.
- The first few frames output by Temporal Noise Reduction sample application on VIC (previously PVA) had a greenish tint. This was fixed.
- vpiEventRecord can be called multiple times on the same or different stream. New calls won't affect the existing calls to vpiEventSync or vpiStreamWaitFor waiting on the event. They will still unblock when the original recorded stream tasks are finished. This makes VPIEvent have the same semantics of
cudaEvent_t
.
- vpiStreamSync waits for synchronization to complete before returning the stream error code (if any). In case of stream errors, the post-condition is that no more tasks are left to be executed by the stream.
- Fixed some memory cache coherency issues when using a wrapped EGLImage in a CUDA algorithm on Jetson TX2.
Documentation Updates
Known Issues
- A VPIEvent that recorded a stream whose last task was submitted to the CUDA backend can only record another stream after the event is signaled. An error is returned if this condition isn't met.
- Convert Image Format on CUDA might introduce a small error of at most 2 when compared with other backends.
- If there's a backend mismatch between a memory buffer and the streams that operates on it, the stream will issue more memory mapping operations than strictly needed. To mitigate a performance hit that might arise, make sure that the memories used on the stream already reside in the stream's backend. For instance, in a stream for CUDA backend, memories created with VPI_BACKEND_CUDA flag will have better performance because memory mapping isn't needed, since the memory is allocated in the GPU itself.
- PVA backend implementation of KLT Feature Tracker doesn't match CUDA and CPU's output.
- PVA backend implementation of vpiSubmitConvolution currently doesn't work with 3264x2448 inputs, it'll return an error instead.
- Some algorithms, notably Convolution, might segfault on Jetson Nano if input image is too big, such as 4064x2704 on CPU backend.
- Harris Corner Detector on PVA may return spurious keypoints when input image is larger than 1088x1088.
- A small memory leak could occur if the same image wrapping a user-provided EGLImage or CUDA memory is used simultaneously as the input image in multiple PVA streams.
- In some rare instances, a moderately complex processing pipeline might erroneously return VPI_ERROR_BUFFER_LOCKED when performing memory mapping.
- CPU to CUDA image shared mapping of wrapped non-CUDA-managed CPU memory had to be disabled due to some rare segfaults. In this case, memory mapping is now done via memory copies.
- vpiStreamWaitFor on a CUDA stream that is wrapping a user-provided cudaStream_t might block the calling thread until the event is signaled.
- Stereo Disparity Estimator output might slightly differ on CPU backend with respect to PVA and CUDA backends.
- Harris Corner Detector result scores/positions might differ among backends.
- Output of Remap from CPU and CUDA backends has a subpixel translation with respect to VIC's output.
Notices
Disclaimer
ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, "MATERIALS") ARE BEING PROVIDED "AS IS." NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE.
Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use. No license is granted by implication of otherwise under any patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change without notice. This publication supersedes and replaces all other information previously supplied. NVIDIA Corporation products are not authorized as critical components in life support devices or systems without express written approval of NVIDIA Corporation.
Trademarks
NVIDIA and the NVIDIA logo are trademarks or registered trademarks of NVIDIA Corporation in the U.S. and other countries. Other company and product names may be trademarks of the respective companies with which they are associated.
Copyright
© 2019-2020 NVIDIA Corporation. All rights reserved.