Comparison with VPI-0.1.0

VPI-0.2.0 brings several performance improvements in most of its algorithms across all backends with respect to previous VPI-0.1.0.

The charts below shows the performance increase per backend, per device. The benchmarking procedure used is described here.

Algorithm Parameters

Here is a list of the parameters used while benchmarking all above algorithms. This helps puts the values above into context.

Gaussian Pyramid Generator
- CPU/CUDA: 8-bit 1920x1080 input, 4 levels, 0.5x scale (each level is 4x smaller than previous)
- PVA: 16-bit 3264x2448, 5 levels, 0.5x scale
Gaussian Image Filter
- CPU/CUDA: 8-bit 1920x1080 input, 3x3 kernel support size
- PVA: 8-bit 3264x2448 input, 3x3 kernel support size.
Image Convolver
- CPU/CUDA: 8-bit 1920x1080 input, 5x5 kernel support size.
- PVA: 8-bit 3264x2448 input, 5x5 kernel support size.
Box Image Filter
- CPU/CUDA: 8-bit 1920x1080 input, 3x3 kernel support, clamp boundary condition.
- PVA: 8-bit 3264x2448 input, 3x3 kernel support, zero boundary condition.
Image Resampler
- CPU/CUDA: 8-bit 480x270 upscale to 1920x1080 with linear interpolation
Stereo Disparity Estimator
- CPU/CUDA/PVA 16-bit 480x270 input.
Bilateral Image Filter
- CPU/CUDA: 8-bit 3264x2448 input, 5x5 spatial kernel support size.
Separable Image Convolver
- CPU/CUDA: 16-bit 1920x1080 input, 5x5 kernel support size.
- PVA: 16-bit 3264x2448 input, 5x5 kernel support size.
Harris Keypoint Detector
- CPU/CUDA: 16-bit 1920x1080 input, 3x3 gradient kernel size, 3x3 block size.