In this section we compare VPI's performance with other well-known Computer Vision libraries. Performance numbers were collected following the method described in Benchmarking Method.

Benchmarking was done on NVIDIA® Jetson AGX Orin™ devices, with clock frequencies maxed out.

The numbers show that VPI provides a significant speed up in many use cases.

OpenCV

Comparison made with OpenCV 4.5.4 built with NVIDIA® CUDA® support enabled. This version matches the OpenCV version shipped with NVIDIA® JetPack™.

All plots use logarithmic scale due to the large difference between different algorithm performance numbers.

CPU Performance

Both OpenCV and VPI measurements are done using one dispatch thread. Many OpenCV algorithms once dispatched make use of multiple CPU cores during execution, but some others might not. This is a contrast with VPI, where all available CPU cores are always used.

The main implication is that the OpenCV algorithms that only use one core can have several instances running in parallel, up to the number of CPU cores, without affecting their performance. On the other hand, VPI CPU algorithm performance scales linearly with the number of parallel instances. The advantage for VPI in this case is that performance increases linearly with the number of additional cores added, whereas OpenCV's single-thread algorithms performance will be unchanged.

Jetson AGX Orin CPU with twelve.

OpenCV vs. VPI - CPU performance
Algorithm	Parameters	OpenCV 4.5.4 CPU	VPI 2.0 CPU	Speed-up
Gaussian Pyramid	1920x1080 U8 scale=0.5, nlevels=5	0.299 ms	0.399 ms	0.75x
Gaussian Filter	1920x1080 U8 3x3	0.251 ms	0.204 ms	1.23x
Convolution	1920x1080 U8 3x3	6.871 ms	0.315 ms	21.81x
Box Filter	1920x1080 U8 3x3 clamp	1.520 ms	0.272 ms	5.59x
Rescale	1280x720 to 1920x1080, RGBA8 linear interp.	4.260 ms	6.542 ms	0.65x
Bilateral Filter	1920x1080 U8 3x3	2.012 ms	1.579 ms	1.27x
Separable Convolution	1920x1080 U8 11x11	18.250 ms	0.545 ms	33.49x
FFT	626x626 Real->Complex	81.060 ms	19.560 ms	4.14x
Harris Corner Detector	1920x1080 U8 grad=3x3, win=3x3	39.400 ms	7.410 ms	5.32x
Convert Image Format	1920x1080 NV12_ER to RGBA8	1.866 ms	0.850 ms	2.20x
Remap	1920x1080 RGBA8 dense, linear interp.	6.280 ms	5.320 ms	1.18x
Pyramidal LK Optical Flow	1920x1080 U8 3x3, 3 levels, win=11x11	1.610 ms	3.030 ms	0.53x
Laplacian Pyramid	1920x1080 U8 -> S16, scale=0.5, 5 levels	5.100 ms	2.880 ms	1.77x
Histogram	1920x1080 U8, [0,256) range, 256 bins	1.35 ms	3.3 ms	0.4x
Equalize Histogram	1920x1080 U8	0.452 ms	0.339 ms	1.33x
Erode	1920x1080 U8 3x3	0.530 ms	0.117 ms	4.54x
Min/Max Location	1920x1080 U8 2 locations, min+max	0.241 ms	0.280 ms	0.86x
Image Flip	1920x1080 RGBA8 both	1.630 ms	0.245 ms	6.65x

CUDA Performance

Both OpenCV and VPI benchmarking use one stream for algorithm execution.

OpenCV vs. VPI - CUDA performance
Algorithm	Parameters	OpenCV 4.5.4 CUDA	VPI 2.0 CUDA	Speed-up
Gaussian Pyramid	1920x1080 U8 scale=0.5, 5 levels	0.143 ms	0.040 ms	3.54x
Gaussian Filter	1920x1080 U8 3x3	0.141 ms	0.035 ms	4.36x
Convolution	1920x1080 U8 3x3	0.228 ms	0.034 ms	6.68x
Box Filter	1920x1080 U8 3x3 clamp	0.205 ms	0.045 ms	6.06x
Rescale	1280x720 -> 1920x1080 RGBA8, linear interp.	0.114 ms	0.078 ms	1.84x
Bilateral Filter	1920x1080 U8 3x3	0.295 ms	0.064 ms	4.61x
Separable Convolution	1920x1080 U8 11x11	0.220 ms	0.056 ms	3.93x
FFT	626x626 Real->Complex	2.162 ms	0.190 ms	11.37x
Harris Corner Detection	1920x1080 U8 grad=3x3, win=3x3	19.480 ms	0.425 ms	45.87x
Remap	1920x1080 RGBA8, dense, linear interp.	0.246 ms	0.199 ms	1.24x
Pyramidal LK Optical Flow	1920x1080 RGBA8 dense, linear interp.	0.949 ms	0.212 ms	4.48x
Laplacian Pyramid	1920x1080 U8 -> S16, scale=0.5, 5 levels	0.355 ms	0.608 ms	0.58x
Histogram	1920x1080 U8, [0,256) range, 256 bins	0.041 ms	0.033 ms	1.23x
Equalize Histogram	1920x1080 U8	0.350 ms	0.090 ms	3.89x
Erode	1920x1080 U8 3x3	0.246 ms	0.031 ms	7.96x
Min/Max Location	1920x1080 U8 2 locations, min+max	0.700 ms	0.042 ms	16.51x
Image Flip	1920x1080 RGBA8 both	0.155 ms	0.093 ms	1.66x

VPI - Vision Programming Interface

3.2 Release

OpenCV

CPU Performance

CUDA Performance