The algorithms supported by VPI are described in the following sections. An overview of the algorithm, instructions on how to use the algorithm, a list of any backend limitations, and the API reference are provided.

Algorithm	Backend Support
Algorithm	CPU	CUDA	PVA
Stereo Disparity Estimator	yes	yes	yes
KLT Bounding Box Tracker	yes	yes	yes
Gaussian Pyramid Generator	yes	yes	yes
Image Convolver	yes	yes	yes
Separable Image Convolver	yes	yes	yes
Box Image Filter	yes	yes	yes
Gaussian Image Filter	yes	yes	yes
Bilateral Image Filter	yes	yes	no
Image Resampler	yes	yes	no
Harris Keypoint Detector	yes	yes	yes
Image FFT	yes	yes	no
Image Inverse FFT	yes	yes	no
Image Format Converter	yes	yes	no

Performance Measurement

Most of the algorithms' description pages have a section where the running time of a single call/iteration is shown, along with the parameters used. This information can be helpful to performance-critical applications, allowing the user to evaluate the impact of certain parameter/backend combinations in application performance, such as:

What Jetson device to use,
What backend to use,
Speed/quality trade-offs.

Benchmarking Method

The benchmark procedure used to measure the performance numbers is described in detail here. This information helps understand what context the performance numbers refer to.

All payloads, inputs and output memory buffers are created beforehand.
1 second warm-up time running the algorithm in a loop.
Run the algorithm in batches and measuring its the average running time within each batch. Number of calls in a batch varies with the approximate running time (faster algorithms, larger batch, max 100 calls). This is made to exclude from the algorithm run time the time spent performing the measurement.
Perform item 3 for at least 5s, making sure that we do it at least 10 times.
From all average running times of each batch, we exclude the lowest and highest 5% values.
From the result set, we take the median. This is the value used as final run time for the algorithm.

Device Configuration

To make the measurements somewhat stable across runs, the device's frequency and power parameters were maxed out prior to benchmarking. This also mimics the situation where the system is under full load, thus making execution time closer to the lower bound. In real applications, depending on the system load, the execution time might be longer due to frequency throttling and other effects.

What follows is a list of all devices used for measurement, along with their frequency and power configurations.

Jetson AGX Xavier

CPU: 8x ARMv8 Processor rev 0 (v8l) running at 2.1606 GHz
Memory controller freq.: 1.9865 GHz
GPU freq.: 1.2824 GHz
PVA/VPS freq.: 1.0133 GHz
PVA/CORE freq.: 805.6641 MHz
Power mode: MAXN
Fan speed: MAX

Note: Despite fixing the frequencies and power mode on Jetson AGX Xavier, CPU benchmarking on it can yield significant fluctuations, at most 2x depending on the algorithm. In this case, the performance values must be evaluated with this jitter in mind.

Jetson Xavier TX2

CPU: 6x ARMv8 Processor rev 3 (v8l) running at 1.9409 GHz
Memory controller freq.: 1.7378 GHz
GPU freq.: 1.2112 GHz
Power mode: MAXN
Fan speed: MAX

Jetson Nano

CPU: 4x ARMv8 Processor rev 1 (v8l) running at 1.4105 GHz
Memory controller freq: 1.4901 GHz
GPU freq.: 878.9062 MHz
Power mode: MAXN
Fan speed: MAX