VPI - Vision Programming Interface

0.2.0 Release

Algorithms

The algorithms supported by VPI are described in the following sections. An overview of the algorithm, instructions on how to use the algorithm, a list of any backend limitations, and the API reference are provided.

AlgorithmBackend Support
CPUCUDAPVA
Stereo Disparity Estimator yes yes yes
KLT Bounding Box Tracker yes yes yes
Gaussian Pyramid Generator yes yes yes
Image Convolver yes yes yes
Separable Image Convolver yes yes yes
Box Image Filter yes yes yes
Gaussian Image Filter yes yes yes
Bilateral Image Filter yes yes no
Image Resampler yes yes no
Harris Keypoint Detector yes yes yes
Image FFT yes yes no
Image Inverse FFT yes yes no
Image Format Converter yes yes no

Performance Measurement

Most of the algorithms' description pages have a section where the running time of a single call/iteration is shown, along with the parameters used. This information can be helpful to performance-critical applications, allowing the user to evaluate the impact of certain parameter/backend combinations in application performance, such as:

  • What Jetson device to use,
  • What backend to use,
  • Speed/quality trade-offs.

Benchmarking Method

The benchmark procedure used to measure the performance numbers is described in detail here. This information helps understand what context the performance numbers refer to.

  1. All payloads, inputs and output memory buffers are created beforehand.
  2. 1 second warm-up time running the algorithm in a loop.
  3. Run the algorithm in batches and measuring its the average running time within each batch. Number of calls in a batch varies with the approximate running time (faster algorithms, larger batch, max 100 calls). This is made to exclude from the algorithm run time the time spent performing the measurement.
  4. Perform item 3 for at least 5s, making sure that we do it at least 10 times.
  5. From all average running times of each batch, we exclude the lowest and highest 5% values.
  6. From the result set, we take the median. This is the value used as final run time for the algorithm.

Device Configuration

To make the measurements somewhat stable across runs, the device's frequency and power parameters were maxed out prior to benchmarking. This also mimics the situation where the system is under full load, thus making execution time closer to the lower bound. In real applications, depending on the system load, the execution time might be longer due to frequency throttling and other effects.

What follows is a list of all devices used for measurement, along with their frequency and power configurations.

Jetson AGX Xavier

  • CPU: 8x ARMv8 Processor rev 0 (v8l) running at 2.1606 GHz
  • Memory controller freq.: 1.9865 GHz
  • GPU freq.: 1.2824 GHz
  • PVA/VPS freq.: 1.0133 GHz
  • PVA/CORE freq.: 805.6641 MHz
  • Power mode: MAXN
  • Fan speed: MAX
Note
Despite fixing the frequencies and power mode on Jetson AGX Xavier, CPU benchmarking on it can yield significant fluctuations, at most 2x depending on the algorithm. In this case, the performance values must be evaluated with this jitter in mind.

Jetson Xavier TX2

  • CPU: 6x ARMv8 Processor rev 3 (v8l) running at 1.9409 GHz
  • Memory controller freq.: 1.7378 GHz
  • GPU freq.: 1.2112 GHz
  • Power mode: MAXN
  • Fan speed: MAX

Jetson Nano

  • CPU: 4x ARMv8 Processor rev 1 (v8l) running at 1.4105 GHz
  • Memory controller freq: 1.4901 GHz
  • GPU freq.: 878.9062 MHz
  • Power mode: MAXN
  • Fan speed: MAX