VPI - Vision Programming Interface

3.0 Release

Benchmarking

Overview

This application shows how to properly measure the time taken to process VPI tasks. It follows a simplified version of the procedure described in Benchmarking Method section.

In a nutshell, the sample returns the median of a series of measurements of 5x5 Gaussian Filter algorithm execution time. Each measurement comprises running the algorithm 50 times and taking the average running time. This form of measurement in batches and taking the median leads to a more stable elapsed time value, as it excludes external perturbation factors.

Instructions

The usage is:

./vpi_sample_05_benchmark <backend>

where

  • backend: either cpu, cuda or pva; it defines the backend that will perform the processing.

Here's one example:

./vpi_sample_05_benchmark cuda

This is using the CUDA backend. Try other backends to see how the processing time differs between them.

Results

Note
The benchmark results shown below are for demonstration purposes only, therefore, we're not specifying here the hardware used. You should run the sample on the platform where you want to gather benchmarking information from.

CPU Backend

Input size: 1920 x 1080
Image format: VPI_IMAGE_FORMAT_U16
Algorithm: 5x5 Gaussian Filter
Approximated elapsed time per call: 2.368492 ms
#define VPI_IMAGE_FORMAT_U16
Single plane with one 16-bit unsigned integer channel.
Definition: ImageFormat.h:109

CUDA Backend

Input size: 1920 x 1080
Image format: VPI_IMAGE_FORMAT_U16
Algorithm: 5x5 Gaussian Filter
Approximated elapsed time per call: 0.043012 ms

PVA Backend

Input size: 1920 x 1080
Image format: VPI_IMAGE_FORMAT_U16
Algorithm: 5x5 Gaussian Filter
NVMEDIA_ARRAY: 53, Version 2.1
NVMEDIA_VPI : 172, Version 2.4
Approximated elapsed time per call: 1.692010 ms

Source Code

For convenience, here's the code that is also installed in the samples directory.

Language:
29 #include <vpi/Event.h>
30 #include <vpi/Image.h>
31 #include <vpi/Status.h>
32 #include <vpi/Stream.h>
34 
35 #include <algorithm>
36 #include <cstdio>
37 #include <iostream>
38 #include <sstream>
39 #include <vector>
40 
41 #define CHECK_STATUS(STMT) \
42  do \
43  { \
44  VPIStatus status = (STMT); \
45  if (status != VPI_SUCCESS) \
46  { \
47  char buffer[VPI_MAX_STATUS_MESSAGE_LENGTH]; \
48  vpiGetLastStatusMessage(buffer, sizeof(buffer)); \
49  std::ostringstream ss; \
50  ss << vpiStatusGetName(status) << ": " << buffer; \
51  throw std::runtime_error(ss.str()); \
52  } \
53  } while (0);
54 
55 int main(int argc, char *argv[])
56 {
57  VPIImage image = NULL;
58  VPIImage blurred = NULL;
59  VPIStream stream = NULL;
60 
61  VPIEvent evStart = NULL;
62  VPIEvent evStop = NULL;
63 
64  int retval = 0;
65 
66  try
67  {
68  // 1. Processing of command line parameters -----------
69 
70  if (argc != 2)
71  {
72  throw std::runtime_error(std::string("Usage: ") + argv[0] + " <cpu|pva|cuda>");
73  }
74 
75  std::string strBackend = argv[1];
76 
77  // Parse the backend
78  VPIBackend backend;
79 
80  if (strBackend == "cpu")
81  {
82  backend = VPI_BACKEND_CPU;
83  }
84  else if (strBackend == "cuda")
85  {
86  backend = VPI_BACKEND_CUDA;
87  }
88  else if (strBackend == "pva")
89  {
90  backend = VPI_BACKEND_PVA;
91  }
92  else
93  {
94  throw std::runtime_error("Backend '" + strBackend +
95  "' not recognized, it must be either cpu, cuda or pva.");
96  }
97 
98  // 2. Initialization stage ----------------------
99 
100  // Create the stream for the given backend.
101  CHECK_STATUS(vpiStreamCreate(0, &stream));
102 
103  int width = 1920, height = 1080;
105 
106  std::cout << "Input size: " << width << " x " << height << '\n'
107  << "Image format: " << vpiImageFormatGetName(imgFormat) << '\n'
108  << "Algorithm: 5x5 Gaussian Filter" << std::endl;
109 
110  // Memory flags set to guarantee top performance.
111  // Only the benchmarked backend is enabled, and memories
112  // are guaranteed to be used by only one stream.
113  uint64_t memFlags = backend | VPI_EXCLUSIVE_STREAM_ACCESS;
114 
115  // Create image with zero content
116  CHECK_STATUS(vpiImageCreate(width, height, imgFormat, memFlags, &image));
117 
118  // Create a temporary image convolved with a low-pass filter.
119  CHECK_STATUS(vpiImageCreate(width, height, imgFormat, memFlags, &blurred));
120 
121  // Create the events we'll need to get timing info
122  CHECK_STATUS(vpiEventCreate(0, &evStart));
123  CHECK_STATUS(vpiEventCreate(0, &evStop));
124 
125  // 3. Gather timings --------------------
126 
127  const int BATCH_COUNT = 20;
128  const int AVERAGING_COUNT = 50;
129 
130  // Collect measurements for each execution batch
131  std::vector<float> timingsMS;
132  for (int batch = 0; batch < BATCH_COUNT; ++batch)
133  {
134  // Record stream queue when we start processing
135  CHECK_STATUS(vpiEventRecord(evStart, stream));
136 
137  // Get the average running time within this batch.
138  for (int i = 0; i < AVERAGING_COUNT; ++i)
139  {
140  // Call the algorithm to be measured.
141  CHECK_STATUS(vpiSubmitGaussianFilter(stream, backend, image, blurred, 5, 5, 1, 1, VPI_BORDER_ZERO));
142  }
143 
144  // Record stream queue just after blurring
145  CHECK_STATUS(vpiEventRecord(evStop, stream));
146 
147  // Wait until the batch processing is done
148  CHECK_STATUS(vpiEventSync(evStop));
149 
150  float elapsedMS;
151  CHECK_STATUS(vpiEventElapsedTimeMillis(evStart, evStop, &elapsedMS));
152  timingsMS.push_back(elapsedMS / AVERAGING_COUNT);
153  }
154 
155  // 4. Performance analysis ----------------------
156 
157  // Get the median of the measurements so that outliers aren't considered.
158  nth_element(timingsMS.begin(), timingsMS.begin() + timingsMS.size() / 2, timingsMS.end());
159  float medianMS = timingsMS[timingsMS.size() / 2];
160 
161  printf("Approximated elapsed time per call: %f ms\n", medianMS);
162  }
163  catch (std::exception &e)
164  {
165  std::cerr << e.what() << std::endl;
166  retval = 1;
167  }
168 
169  // 4. Clean up -----------------------------------
170 
171  // Destroy stream first, it'll make sure all processing
172  // submitted to it is finished.
173  vpiStreamDestroy(stream);
174 
175  // Now we can destroy other VPI objects, since they aren't being
176  // used anymore.
177  vpiImageDestroy(image);
178  vpiImageDestroy(blurred);
179  vpiEventDestroy(evStart);
180  vpiEventDestroy(evStop);
181 
182  return retval;
183 }
Functions and structures for dealing with VPI events.
Declares functions that implement the Gaussian Filter algorithm.
const char * vpiImageFormatGetName(VPIImageFormat fmt)
Returns a string representation of the image format.
Functions and structures for dealing with VPI images.
Declaration of VPI status codes handling functions.
Declares functions dealing with VPI streams.
#define VPI_EXCLUSIVE_STREAM_ACCESS
Specifies that the memory will be accessed by only one stream at a time.
Definition: Types.h:136
struct VPIEventImpl * VPIEvent
A handle to an event.
Definition: Types.h:244
VPIStatus vpiEventElapsedTimeMillis(VPIEvent start, VPIEvent end, float *msec)
Computes the elapsed time in milliseconds between two completed events.
VPIStatus vpiEventRecord(VPIEvent event, VPIStream stream)
Captures in the event the contents of the stream command queue at the time of this call.
VPIStatus vpiEventCreate(uint64_t flags, VPIEvent *event)
Create an event instance.
VPIStatus vpiEventSync(VPIEvent event)
Blocks the calling thread until the event is signaled.
void vpiEventDestroy(VPIEvent event)
Destroy an event instance as well as all resources it owns.
VPIStatus vpiSubmitGaussianFilter(VPIStream stream, uint64_t backend, VPIImage input, VPIImage output, int32_t kernelSizeX, int32_t kernelSizeY, float sigmaX, float sigmaY, VPIBorderExtension border)
Runs a 2D Gaussian filter over an image.
uint64_t VPIImageFormat
Pre-defined image formats.
Definition: ImageFormat.h:94
void vpiImageDestroy(VPIImage img)
Destroy an image instance.
struct VPIImageImpl * VPIImage
A handle to an image.
Definition: Types.h:256
VPIStatus vpiImageCreate(int32_t width, int32_t height, VPIImageFormat fmt, uint64_t flags, VPIImage *img)
Create an empty image instance with the specified flags.
struct VPIStreamImpl * VPIStream
A handle to a stream.
Definition: Types.h:250
VPIBackend
VPI Backend types.
Definition: Types.h:91
void vpiStreamDestroy(VPIStream stream)
Destroy a stream instance and deallocate all HW resources.
VPIStatus vpiStreamCreate(uint64_t flags, VPIStream *stream)
Create a stream instance.
@ VPI_BACKEND_CUDA
CUDA backend.
Definition: Types.h:93
@ VPI_BACKEND_PVA
PVA backend.
Definition: Types.h:94
@ VPI_BACKEND_CPU
CPU backend.
Definition: Types.h:92
@ VPI_BORDER_ZERO
All pixels outside the image are considered to be zero.
Definition: Types.h:278