VPI - Vision Programming Interface

3.2 Release

Benchmarking

Overview

This application shows how to properly measure the time taken to process VPI tasks. It follows a simplified version of the procedure described in Benchmarking Method section.

In a nutshell, the sample returns the median of a series of measurements of 5x5 Gaussian Filter algorithm execution time. Each measurement comprises running the algorithm 50 times and taking the average running time. This form of measurement in batches and taking the median leads to a more stable elapsed time value, as it excludes external perturbation factors.

Instructions

The usage is:

./vpi_sample_05_benchmark <backend>

where

  • backend: either cpu, cuda or pva; it defines the backend that will perform the processing.

Here's one example:

./vpi_sample_05_benchmark cuda

This is using the CUDA backend. Try other backends to see how the processing time differs between them.

Results

Note
The benchmark results shown below are for demonstration purposes only, therefore, we're not specifying here the hardware used. You should run the sample on the platform where you want to gather benchmarking information from.

CPU Backend

Input size: 1920 x 1080
Image format: VPI_IMAGE_FORMAT_U16
Algorithm: 5x5 Gaussian Filter
Approximated elapsed time per call: 2.368492 ms
#define VPI_IMAGE_FORMAT_U16
Single plane with one 16-bit unsigned integer channel.
Definition: ImageFormat.h:111

CUDA Backend

Input size: 1920 x 1080
Image format: VPI_IMAGE_FORMAT_U16
Algorithm: 5x5 Gaussian Filter
Approximated elapsed time per call: 0.043012 ms

PVA Backend

Input size: 1920 x 1080
Image format: VPI_IMAGE_FORMAT_U16
Algorithm: 5x5 Gaussian Filter
NVMEDIA_ARRAY: 53, Version 2.1
NVMEDIA_VPI : 172, Version 2.4
Approximated elapsed time per call: 1.692010 ms

Source Code

For convenience, here's the code that is also installed in the samples directory.

Language:
29 #include <vpi/Event.h>
30 #include <vpi/Image.h>
31 #include <vpi/Status.h>
32 #include <vpi/Stream.h>
34 
35 #include <algorithm>
36 #include <cstdio>
37 #include <iostream>
38 #include <sstream>
39 #include <vector>
40 
41 #define CHECK_STATUS(STMT) \
42  do \
43  { \
44  VPIStatus status = (STMT); \
45  if (status != VPI_SUCCESS) \
46  { \
47  char buffer[VPI_MAX_STATUS_MESSAGE_LENGTH]; \
48  vpiGetLastStatusMessage(buffer, sizeof(buffer)); \
49  std::ostringstream ss; \
50  ss << "" #STMT "\n"; \
51  ss << vpiStatusGetName(status) << ": " << buffer; \
52  throw std::runtime_error(ss.str()); \
53  } \
54  } while (0);
55 
56 int main(int argc, char *argv[])
57 {
58  VPIImage image = NULL;
59  VPIImage blurred = NULL;
60  VPIStream stream = NULL;
61 
62  VPIEvent evStart = NULL;
63  VPIEvent evStop = NULL;
64 
65  int retval = 0;
66 
67  try
68  {
69  // 1. Processing of command line parameters -----------
70 
71  if (argc != 2)
72  {
73  throw std::runtime_error(std::string("Usage: ") + argv[0] + " <cpu|pva|cuda>");
74  }
75 
76  std::string strBackend = argv[1];
77 
78  // Parse the backend
79  VPIBackend backend;
80 
81  if (strBackend == "cpu")
82  {
83  backend = VPI_BACKEND_CPU;
84  }
85  else if (strBackend == "cuda")
86  {
87  backend = VPI_BACKEND_CUDA;
88  }
89  else if (strBackend == "pva")
90  {
91  backend = VPI_BACKEND_PVA;
92  }
93  else
94  {
95  throw std::runtime_error("Backend '" + strBackend +
96  "' not recognized, it must be either cpu, cuda or pva.");
97  }
98 
99  // 2. Initialization stage ----------------------
100 
101  // Create the stream for only the target backend.
102  uint64_t streamFlags = (uint64_t)backend | VPI_REQUIRE_BACKENDS;
103  CHECK_STATUS(vpiStreamCreate(streamFlags, &stream));
104 
105  int width = 1920, height = 1080;
107 
108  std::cout << "Input size: " << width << " x " << height << '\n'
109  << "Image format: " << vpiImageFormatGetName(imgFormat) << '\n'
110  << "Algorithm: 5x5 Gaussian Filter on " << strBackend << std::endl;
111 
112  // Memory flags set to guarantee top performance.
113  // Only the benchmarked backend is enabled, and memories
114  // are guaranteed to be used by only one stream.
115  uint64_t memFlags = (uint64_t)backend | VPI_EXCLUSIVE_STREAM_ACCESS;
116 
117  // Create image with zero content
118  CHECK_STATUS(vpiImageCreate(width, height, imgFormat, memFlags, &image));
119 
120  // Create a temporary image convolved with a low-pass filter.
121  CHECK_STATUS(vpiImageCreate(width, height, imgFormat, memFlags, &blurred));
122 
123  // Create the events we'll need to get timing info
124  CHECK_STATUS(vpiEventCreate(backend, &evStart));
125  CHECK_STATUS(vpiEventCreate(backend, &evStop));
126 
127  // 3. Gather timings --------------------
128 
129  const int BATCH_COUNT = 20;
130  const int AVERAGING_COUNT = 50;
131 
132  // Collect measurements for each execution batch
133  std::vector<float> timingsMS;
134  for (int batch = 0; batch < BATCH_COUNT; ++batch)
135  {
136  // Record stream queue when we start processing
137  CHECK_STATUS(vpiEventRecord(evStart, stream));
138 
139  // Get the average running time within this batch.
140  for (int i = 0; i < AVERAGING_COUNT; ++i)
141  {
142  // Call the algorithm to be measured.
143  CHECK_STATUS(vpiSubmitGaussianFilter(stream, backend, image, blurred, 5, 5, 1, 1, VPI_BORDER_ZERO));
144  }
145 
146  // Record stream queue just after blurring
147  CHECK_STATUS(vpiEventRecord(evStop, stream));
148 
149  // Wait until the batch processing is done
150  CHECK_STATUS(vpiEventSync(evStop));
151 
152  float elapsedMS;
153  CHECK_STATUS(vpiEventElapsedTimeMillis(evStart, evStop, &elapsedMS));
154  timingsMS.push_back(elapsedMS / AVERAGING_COUNT);
155  }
156 
157  // 4. Performance analysis ----------------------
158 
159  // Get the median of the measurements so that outliers aren't considered.
160  nth_element(timingsMS.begin(), timingsMS.begin() + timingsMS.size() / 2, timingsMS.end());
161  float medianMS = timingsMS[timingsMS.size() / 2];
162 
163  printf("Approximated elapsed time per call on %s: %f ms\n", strBackend.c_str(), medianMS);
164  }
165  catch (std::exception &e)
166  {
167  std::cerr << e.what() << std::endl;
168  retval = 1;
169  }
170 
171  // 4. Clean up -----------------------------------
172 
173  // Destroy stream first, it'll make sure all processing
174  // submitted to it is finished.
175  vpiStreamDestroy(stream);
176 
177  // Now we can destroy other VPI objects, since they aren't being
178  // used anymore.
179  vpiImageDestroy(image);
180  vpiImageDestroy(blurred);
181  vpiEventDestroy(evStart);
182  vpiEventDestroy(evStop);
183 
184  return retval;
185 }
Functions and structures for dealing with VPI events.
Declares functions that implement the Gaussian Filter algorithm.
const char * vpiImageFormatGetName(VPIImageFormat fmt)
Returns a string representation of the image format.
Functions and structures for dealing with VPI images.
Declaration of VPI status codes handling functions.
Declares functions dealing with VPI streams.
#define VPI_REQUIRE_BACKENDS
Require creation of requested backends.
Definition: Types.h:159
#define VPI_EXCLUSIVE_STREAM_ACCESS
Specifies that the memory will be accessed by only one stream at a time.
Definition: Types.h:136
struct VPIEventImpl * VPIEvent
A handle to an event.
Definition: Types.h:244
VPIStatus vpiEventElapsedTimeMillis(VPIEvent start, VPIEvent end, float *msec)
Computes the elapsed time in milliseconds between two completed events.
VPIStatus vpiEventRecord(VPIEvent event, VPIStream stream)
Captures in the event the contents of the stream command queue at the time of this call.
VPIStatus vpiEventCreate(uint64_t flags, VPIEvent *event)
Create an event instance.
VPIStatus vpiEventSync(VPIEvent event)
Blocks the calling thread until the event is signaled.
void vpiEventDestroy(VPIEvent event)
Destroy an event instance as well as all resources it owns.
VPIStatus vpiSubmitGaussianFilter(VPIStream stream, uint64_t backend, VPIImage input, VPIImage output, int32_t kernelSizeX, int32_t kernelSizeY, float sigmaX, float sigmaY, VPIBorderExtension border)
Runs a 2D Gaussian filter over an image.
uint64_t VPIImageFormat
Pre-defined image formats.
Definition: ImageFormat.h:94
void vpiImageDestroy(VPIImage img)
Destroy an image instance.
struct VPIImageImpl * VPIImage
A handle to an image.
Definition: Types.h:256
VPIStatus vpiImageCreate(int32_t width, int32_t height, VPIImageFormat fmt, uint64_t flags, VPIImage *img)
Create an empty image instance with the specified flags.
struct VPIStreamImpl * VPIStream
A handle to a stream.
Definition: Types.h:250
VPIBackend
VPI Backend types.
Definition: Types.h:91
void vpiStreamDestroy(VPIStream stream)
Destroy a stream instance and deallocate all HW resources.
VPIStatus vpiStreamCreate(uint64_t flags, VPIStream *stream)
Create a stream instance.
@ VPI_BACKEND_CUDA
CUDA backend.
Definition: Types.h:93
@ VPI_BACKEND_PVA
PVA backend.
Definition: Types.h:94
@ VPI_BACKEND_CPU
CPU backend.
Definition: Types.h:92
@ VPI_BORDER_ZERO
All pixels outside the image are considered to be zero.
Definition: Types.h:278