VPI - Vision Programming Interface

4.1 Release

Submit CUDA Host Function

Overview

This sample demonstrates how to use vpiSubmitCUDAHostFunction to schedule custom CUDA work (NPP calls and user-written kernels) onto a VPI stream alongside VPI algorithm submissions. By using a CUDA host function callback, all GPU work is ordered on a single stream without manual synchronization between VPI and non-VPI stages.

The sample implements a multi-stage image processing pipeline:

  1. VIC: Convert BGR input to grayscale
  2. NPP + custom CUDA kernel: Gaussian blur followed by a top-border overlay
  3. VIC: Vertical flip of the result

Two execution modes are compared:

  • CUDA host function mode: A single VPI stream handles the entire pipeline. The NPP blur and custom kernel are submitted through vpiSubmitCUDAHostFunction, so VPI guarantees correct ordering automatically.
  • Sync mode: A VPI stream wrapping a CUDA stream. Manual vpiStreamSync calls separate VPI and non-VPI stages to ensure correct execution order.

When an iteration count greater than one is supplied, the sample benchmarks both modes and prints timing statistics so you can compare the overhead of manual synchronization against the host-function approach.

Instructions

The command line parameters are:

<input image> [iteration_count]

where

  • input image: input image file name; accepts png, jpeg, and other common formats.
  • iteration_count (optional): number of benchmark iterations to run; defaults to 1. When greater than 1, timing statistics are printed for both modes.

Here are some examples:

  • Single run (correctness check only):
    ./vpi_sample_22_submit_cuda_host ../assets/kodim08.png
  • Benchmark with 100 iterations:
    ./vpi_sample_22_submit_cuda_host ../assets/kodim08.png 100
Note
This sample requires a platform with VIC (Video Image Compositor) support and a CUDA-capable GPU with the NPP library.
The output images are in grayscale as the pipeline converts the input to single-channel Y8 format.

Features

  • vpiSubmitCUDAHostFunction: Schedule arbitrary CUDA work (NPP, custom kernels) on a VPI stream
  • Multi-Backend Pipeline: Combines VIC and CUDA processing in a single ordered stream
  • Two Execution Modes: Compares host-function-based ordering with manual synchronization
  • Benchmarking: Optional iteration count for performance comparison between modes
  • Output Verification: Validates that both modes produce identical results

Workflow

  1. Load input image using OpenCV and convert to BGRA
  2. Create VPI stream (native or wrapping a CUDA stream depending on mode)
  3. Submit VIC color conversion (BGR to grayscale) to the stream
  4. Submit NPP Gaussian blur and custom border kernel:
    • Host function mode: via vpiSubmitCUDAHostFunction callback
    • Sync mode: via manual stream synchronization and direct CUDA calls
  5. Submit VIC vertical flip to the stream
  6. Synchronize and optionally copy result to host
  7. Verify both modes produce identical output
  8. Save output images to disk

Results

The sample produces two output images that should be identical:

  • multi_backend_cuda_hostfn_pipelined.png — result from the CUDA host function mode
  • multi_backend_sync_pipelined.png — result from the sync mode

When run with an iteration count, timing statistics (min, max, mean, median, standard deviation) are printed for each mode.

Source Code

For convenience, here's the code that is also installed in the samples directory.

6 #include <opencv2/core/version.hpp>
7 #include <opencv2/opencv.hpp>
8 
9 #include <vpi/CUDAInterop.h>
10 #include <vpi/Context.h>
11 #include <vpi/Stream.h>
13 #include <vpi/algo/ImageFlip.h>
14 #if CV_MAJOR_VERSION >= 3
15 # include <opencv2/imgcodecs.hpp>
16 #else
17 # include <opencv2/highgui/highgui.hpp>
18 #endif
19 
20 #include "custom.cuh"
21 
22 #include <vpi/OpenCVInterop.hpp>
23 
24 #include <cuda_runtime.h>
25 #include <npp.h>
26 #include <nppcore.h>
27 #include <nppi.h>
28 
29 #include <algorithm>
30 #include <cassert>
31 #include <chrono>
32 #include <cmath>
33 #include <cstdlib>
34 #include <cstring>
35 #include <iomanip>
36 #include <iostream>
37 #include <numeric>
38 #include <sstream>
39 #include <vector>
40 
41 #define CHECK_VPI_STATUS(STMT) \
42  do \
43  { \
44  VPIStatus status = (STMT); \
45  if (status != VPI_SUCCESS) \
46  { \
47  char buffer[VPI_MAX_STATUS_MESSAGE_LENGTH]; \
48  vpiGetLastStatusMessage(buffer, sizeof(buffer)); \
49  std::ostringstream ss; \
50  ss << vpiStatusGetName(status) << ": " << buffer; \
51  throw std::runtime_error(ss.str()); \
52  } \
53  } while (0);
54 
55 #define CHECK_CUDA_STATUS(STMT) \
56  do \
57  { \
58  cudaError_t status = (STMT); \
59  if (status != cudaSuccess) \
60  { \
61  std::ostringstream ss; \
62  ss << cudaGetErrorString(status); \
63  throw std::runtime_error(ss.str()); \
64  } \
65  } while (0);
66 
67 #define CHECK_NPP_STATUS(STMT) \
68  do \
69  { \
70  NppStatus status = (STMT); \
71  if (status != NPP_SUCCESS) \
72  { \
73  std::ostringstream ss; \
74  ss << status; \
75  throw std::runtime_error(ss.str()); \
76  } \
77  } while (0);
78 
79 enum class PipelineMode
80 {
81  CUDA_HOST_FUNCTION_MODE, // One stream, vpiSubmitCUDAHostFunction for NPP + custom kernel
82  SYNC_MODE // Wrapped stream, manual sync between stages
83 };
84 
85 static void printBenchmarkStats(std::vector<int64_t> data)
86 {
87  size_t n = data.size();
88  std::cout << std::fixed << std::setprecision(2);
89  std::cout << "Input Data Size: " << n << std::endl;
90  std::cout << "\nStatistics (in microseconds):" << std::endl;
91 
92  // =============================
93  // Calculate Min and Max
94  int64_t min_val = *std::min_element(data.begin(), data.end());
95  int64_t max_val = *std::max_element(data.begin(), data.end());
96  std::cout << " Min Value: " << min_val << std::endl;
97  std::cout << " Max Value: " << max_val << std::endl;
98 
99  // =============================
100  // Calculate sum and mean
101  int64_t sum = std::accumulate(data.begin(), data.end(), int64_t{0});
102  double mean = ((double)sum / n);
103  std::cout << " Sum: " << (double)sum << std::endl;
104  std::cout << " Mean: " << mean << std::endl;
105 
106  // =============================
107  // Calculate median
108  std::vector<int64_t> sorted_data = data;
109  std::sort(sorted_data.begin(), sorted_data.end());
110  double median;
111  if (n % 2 != 0)
112  {
113  // Odd number of elements
114  median = (double)sorted_data[n / 2];
115  }
116  else
117  {
118  // Even number of elements
119  int64_t mid1 = sorted_data[n / 2 - 1];
120  int64_t mid2 = sorted_data[n / 2];
121  median = (double)(mid1 + mid2) / 2.0;
122  }
123  std::cout << " Median: " << median << std::endl;
124 
125  // =============================
126  // Calculate standard deviation
127  long double variance_sum = 0.0;
128  for (int64_t val : data)
129  {
130  long double diff = (long double)val - mean;
131  variance_sum += diff * diff;
132  }
133  double std_dev = (double)std::sqrt(variance_sum / n);
134  std::cout << " Standard Deviation: " << std_dev << std::endl;
135 }
136 
137 static VPIImageData getImgData(int width, int height, VPIByte *pBase)
138 {
139  VPIImageBufferPitchLinear imgData = {};
140  imgData.format = VPI_IMAGE_FORMAT_Y8;
141  imgData.numPlanes = 1;
142  imgData.planes[0].width = width;
143  imgData.planes[0].height = height;
144  imgData.planes[0].pitchBytes = sizeof(uint8_t) * width;
146  imgData.planes[0].offsetBytes = 0;
147  imgData.planes[0].pBase = pBase;
148  VPIImageData vpiImgData = {};
150  vpiImgData.buffer.pitch = imgData;
151  return vpiImgData;
152 }
153 
154 struct CudaHostFnData
155 {
156  uint8_t *cudaGrayImg;
157  uint8_t *cudaBlurredImg;
158  uint8_t *cudaBorderedImg;
159  int width;
160  int height;
161  int pitchBytes;
162  NppStreamContext nppCtx;
163  int borderPixels;
164 };
165 
166 static void nppBlurAndBorderCallback(cudaStream_t cudaStream, void *userData)
167 {
168  CudaHostFnData *data = static_cast<CudaHostFnData *>(userData);
169  data->nppCtx.hStream = cudaStream;
170  cudaStreamGetFlags(cudaStream, &data->nppCtx.nStreamFlags);
171 
172  CHECK_NPP_STATUS(nppiFilterGaussBorder_8u_C1R_Ctx(
173  data->cudaGrayImg, data->pitchBytes, {data->width, data->height}, {0, 0}, data->cudaBlurredImg,
174  data->pitchBytes, {data->width, data->height}, NPP_MASK_SIZE_3_X_3, NPP_BORDER_REPLICATE, data->nppCtx));
175 
176  submitBorderChange(data->cudaBlurredImg, data->cudaBorderedImg, static_cast<uint32_t>(data->width),
177  static_cast<uint32_t>(data->height), data->borderPixels, cudaStream);
178 }
179 
180 static std::vector<int64_t> runPipeline(const std::string &filename, cv::Mat &output, PipelineMode mode, int numIters,
181  bool writeOutputOnFinalIteration)
182 {
183  /*
184  * This sample runs Tegra (VIC) and custom CUDA work in order on one stream.
185  *
186  * CUDA_HOST_FUNCTION_MODE: Single VPI stream. VIC convert -> vpiSubmitCUDAHostFunction
187  * (NPP blur + custom border kernel) -> VIC flip. Ordering is guaranteed by VPI.
188  *
189  * SYNC_MODE: Single wrapped CUDA stream. Same pipeline with manual vpiStreamSync
190  * between stages so NPP and custom kernel run after VIC, and VIC flip after them.
191  *
192  * Pipeline: BGR input -> VIC convert to grayscale -> NPP Gaussian blur ->
193  * custom top border -> VIC vertical flip -> output.
194  *
195  * Reuses stream and images for numIters iterations. Returns per-iteration times (us).
196  * Copies output to cv::Mat only on the final iteration when writeOutputOnFinalIteration.
197  */
198 
199  VPIStream stream;
200  cudaStream_t cudaStream = nullptr;
201 
202  if (mode == PipelineMode::CUDA_HOST_FUNCTION_MODE)
203  {
204  CHECK_VPI_STATUS(vpiStreamCreate(0, &stream));
205  }
206  else
207  {
208  CHECK_CUDA_STATUS(cudaStreamCreate(&cudaStream));
209  CHECK_VPI_STATUS(vpiStreamCreateWrapperCUDA(cudaStream, 0, &stream));
210  }
211 
212  NppStreamContext nppCtx;
213  nppCtx.hStream = cudaStream;
214  CHECK_CUDA_STATUS(cudaGetDevice(&nppCtx.nCudaDeviceId));
215  cudaDeviceProp gpuProps;
216  CHECK_CUDA_STATUS(cudaGetDeviceProperties(&gpuProps, nppCtx.nCudaDeviceId));
217  nppCtx.nMultiProcessorCount = gpuProps.multiProcessorCount;
218  nppCtx.nMaxThreadsPerMultiProcessor = gpuProps.maxThreadsPerMultiProcessor;
219  nppCtx.nMaxThreadsPerBlock = gpuProps.maxThreadsPerBlock;
220  nppCtx.nSharedMemPerBlock = gpuProps.sharedMemPerBlock;
221  nppCtx.nCudaDevAttrComputeCapabilityMajor = gpuProps.major;
222  nppCtx.nCudaDevAttrComputeCapabilityMinor = gpuProps.minor;
223  if (cudaStream != nullptr)
224  {
225  CHECK_CUDA_STATUS(cudaStreamGetFlags(cudaStream, &nppCtx.nStreamFlags));
226  }
227  else
228  {
229  nppCtx.nStreamFlags = 0;
230  }
231 
232  cv::Mat bgrImg, cvImage;
233  bgrImg = cv::imread(filename);
234  if (bgrImg.channels() == 3)
235  {
236  cv::cvtColor(bgrImg, cvImage, cv::COLOR_BGR2BGRA);
237  }
238  else
239  {
240  cvImage = bgrImg;
241  }
242  int width = cvImage.cols;
243  int height = cvImage.rows;
244 
245  VPIImage input, gray, bordered, vpiOutput;
246  CHECK_VPI_STATUS(vpiImageCreateWrapperOpenCVMat(cvImage, 0, &input));
247  CHECK_VPI_STATUS(vpiImageCreate(width, height, VPI_IMAGE_FORMAT_Y8, 0, &vpiOutput));
248 
249  int pitchBytes = sizeof(uint8_t) * width;
250  uint8_t *cudaGrayImg, *cudaBlurredImg, *cudaBorderedImg;
251  CHECK_CUDA_STATUS(cudaMalloc(&cudaGrayImg, pitchBytes * height));
252  CHECK_CUDA_STATUS(cudaMalloc(&cudaBlurredImg, pitchBytes * height));
253  CHECK_CUDA_STATUS(cudaMalloc(&cudaBorderedImg, pitchBytes * height));
254 
255  VPIImageData grayData = getImgData(width, height, static_cast<VPIByte *>(cudaGrayImg));
256  VPIImageData borderedData = getImgData(width, height, static_cast<VPIByte *>(cudaBorderedImg));
257  CHECK_VPI_STATUS(vpiImageCreateWrapper(&grayData, NULL, 0, &gray));
258  CHECK_VPI_STATUS(vpiImageCreateWrapper(&borderedData, NULL, 0, &bordered));
259 
260  std::vector<int64_t> timings;
261  timings.reserve(static_cast<size_t>(numIters));
262 
263  for (int iter = 0; iter < numIters; ++iter)
264  {
265  auto start = std::chrono::high_resolution_clock::now();
266 
267  if (mode == PipelineMode::CUDA_HOST_FUNCTION_MODE)
268  {
271  CHECK_VPI_STATUS(vpiSubmitConvertImageFormat(stream, VPI_BACKEND_VIC, input, gray, &convertFormatParams));
272 
273  CudaHostFnData hostFnData = {cudaGrayImg, cudaBlurredImg, cudaBorderedImg, width,
274  height, pitchBytes, nppCtx, 50};
275  CHECK_VPI_STATUS(vpiSubmitCUDAHostFunction(stream, nppBlurAndBorderCallback, &hostFnData));
276 
277  CHECK_VPI_STATUS(vpiSubmitImageFlip(stream, VPI_BACKEND_VIC, bordered, vpiOutput, VPI_FLIP_VERT));
278 
279  CHECK_VPI_STATUS(vpiStreamSync(stream));
280  }
281  else
282  {
285  CHECK_VPI_STATUS(vpiSubmitConvertImageFormat(stream, VPI_BACKEND_VIC, input, gray, &convertFormatParams));
286  CHECK_VPI_STATUS(vpiStreamSync(stream));
287 
288  CHECK_NPP_STATUS(nppiFilterGaussBorder_8u_C1R_Ctx(cudaGrayImg, pitchBytes, {width, height}, {0, 0},
289  cudaBlurredImg, pitchBytes, {width, height},
290  NPP_MASK_SIZE_3_X_3, NPP_BORDER_REPLICATE, nppCtx));
291 
292  submitBorderChange(cudaBlurredImg, cudaBorderedImg, width, height, 50, cudaStream);
293  CHECK_VPI_STATUS(vpiStreamSync(stream));
294 
295  CHECK_VPI_STATUS(vpiSubmitImageFlip(stream, VPI_BACKEND_VIC, bordered, vpiOutput, VPI_FLIP_VERT));
296 
297  CHECK_VPI_STATUS(vpiStreamSync(stream));
298  }
299 
300  auto end = std::chrono::high_resolution_clock::now();
301  timings.push_back(std::chrono::duration_cast<std::chrono::microseconds>(end - start).count());
302 
303  const bool isFinalIteration = (iter == numIters - 1);
304  if (isFinalIteration && writeOutputOnFinalIteration)
305  {
306  VPIImageData outData;
307  CHECK_VPI_STATUS(vpiImageLockData(vpiOutput, VPI_LOCK_READ, VPI_IMAGE_BUFFER_HOST_PITCH_LINEAR, &outData));
308  VPIImageBufferPitchLinear &outDataPitch = outData.buffer.pitch;
309  cv::Mat cvOut(height, width, CV_8U, outDataPitch.planes[0].pBase);
310  output = cvOut.clone();
311  CHECK_VPI_STATUS(vpiImageUnlock(vpiOutput));
312  }
313  }
314 
315  vpiImageDestroy(input);
316  vpiImageDestroy(gray);
317  vpiImageDestroy(bordered);
318  vpiImageDestroy(vpiOutput);
319  cudaFree(cudaGrayImg);
320  cudaFree(cudaBlurredImg);
321  cudaFree(cudaBorderedImg);
322  vpiStreamDestroy(stream);
323  if (cudaStream != nullptr)
324  {
325  cudaStreamDestroy(cudaStream);
326  }
327  return timings;
328 }
329 
330 int main(int argc, char *argv[])
331 {
332  int retval = 0;
333  VPIContext context = nullptr;
334 
335  try
336  {
337  if (argc < 2 || argc > 3)
338  {
339  throw std::runtime_error(std::string("Usage: ") + argv[0] + " <input image> [iteration_count]");
340  }
341 
342  const std::string filename = argv[1];
343  int iterationCount = 1;
344  if (argc == 3)
345  {
346  iterationCount = std::atoi(argv[2]);
347  if (iterationCount < 1)
348  {
349  throw std::runtime_error("iteration_count must be >= 1.");
350  }
351  }
352 
353  CHECK_VPI_STATUS(vpiContextCreate(VPI_BACKEND_CPU | VPI_BACKEND_CUDA | VPI_BACKEND_VIC, &context));
354  CHECK_VPI_STATUS(vpiContextSetCurrent(context));
355 
356  cv::Mat hostFnOutput;
357  cv::Mat syncOutput;
358 
359  if (iterationCount > 1)
360  {
361  std::cout << "Benchmark: CUDA host function mode (" << iterationCount << " iterations)" << std::endl;
362  std::vector<int64_t> hostFnTimings =
363  runPipeline(filename, hostFnOutput, PipelineMode::CUDA_HOST_FUNCTION_MODE, iterationCount, false);
364  std::cout << "--- CUDA host function mode ---" << std::endl;
365  printBenchmarkStats(hostFnTimings);
366  std::cout << std::endl;
367 
368  std::cout << "Benchmark: Sync mode (" << iterationCount << " iterations)" << std::endl;
369  std::vector<int64_t> syncTimings =
370  runPipeline(filename, syncOutput, PipelineMode::SYNC_MODE, iterationCount, false);
371  std::cout << "--- Sync mode ---" << std::endl;
372  printBenchmarkStats(syncTimings);
373  std::cout << std::endl;
374  }
375 
376  runPipeline(filename, hostFnOutput, PipelineMode::CUDA_HOST_FUNCTION_MODE, 1, true);
377  runPipeline(filename, syncOutput, PipelineMode::SYNC_MODE, 1, true);
378 
379  if (hostFnOutput.size() != syncOutput.size() || hostFnOutput.type() != syncOutput.type())
380  {
381  throw std::runtime_error("FAIL: CUDA host function and sync outputs differ in size or type.");
382  }
383 
384  cv::Mat diff;
385  cv::absdiff(hostFnOutput, syncOutput, diff);
386  int numDiffPixels = cv::countNonZero(diff);
387  if (numDiffPixels != 0)
388  {
389  std::ostringstream ss;
390  ss << "FAIL: Outputs differ (" << numDiffPixels << " pixels).";
391  throw std::runtime_error(ss.str());
392  }
393 
394  std::cout << "PASS: CUDA host function and sync mode outputs are identical." << std::endl;
395 
396  cv::imwrite("multi_backend_cuda_hostfn_pipelined.png", hostFnOutput);
397  cv::imwrite("multi_backend_sync_pipelined.png", syncOutput);
398  }
399  catch (std::exception &e)
400  {
401  std::cerr << e.what() << std::endl;
402  retval = 1;
403  }
404 
405  vpiContextDestroy(context);
406 
407  if (retval == 0)
408  {
409  // The Jetson CUDA/EGL driver stack can abort in process-global finalizers
410  // after this CUDA-runtime sample has already released its resources.
411  std::cout.flush();
412  std::cerr.flush();
413  std::_Exit(EXIT_SUCCESS);
414  }
415 
416  return retval;
417 }
Functions and structures for handling CUDA interoperability with VPI.
Functions and structures for dealing with VPI contexts.
Declares functions that handle image format conversion.
Declares functions that implement Image flip algorithms.
#define VPI_IMAGE_FORMAT_Y8
Single plane with one pitch-linear 8-bit unsigned integer channel with limited-range luma (grayscale)...
Definition: ImageFormat.h:147
Functions for handling OpenCV interoperability with VPI.
#define VPI_PIXEL_TYPE_INVALID
Signal format conversion errors.
Definition: PixelType.h:85
Declares functions dealing with VPI streams.
unsigned char VPIByte
Definition of a byte type.
Definition: Types.h:288
@ VPI_FLIP_VERT
Flip vertically.
Definition: Types.h:718
VPIStatus vpiSubmitCUDAHostFunction(VPIStream stream, VPICUDAHostFunction hostFunc, void *userData)
Submits a CUDA host function to run on the stream in submission order.
VPIStatus vpiContextCreate(uint64_t flags, VPIContext *ctx)
Create a context instance.
VPIStatus vpiContextSetCurrent(VPIContext ctx)
Sets the context for the calling thread.
void vpiContextDestroy(VPIContext ctx)
Destroy a context instance as well as all resources it owns.
struct VPIContextImpl * VPIContext
A handle to a context.
Definition: Types.h:236
VPIStatus vpiSubmitConvertImageFormat(VPIStream stream, uint64_t backend, VPIImage input, VPIImage output, const VPIConvertImageFormatParams *params)
Converts the image contents to the desired format, with optional scaling and offset.
@ VPI_CONVERSION_CLAMP
Clamps input to output's type range.
Definition: Types.h:301
Parameters for customizing image format conversion.
VPIStatus vpiSubmitImageFlip(VPIStream stream, uint64_t backend, VPIImage input, VPIImage output, VPIFlipMode flipMode)
Flips a 2D image either horizontally, vertically or both.
VPIImageBuffer buffer
Stores the image contents.
Definition: Image.h:276
VPIImagePlanePitchLinear planes[VPI_MAX_PLANE_COUNT]
Data of all image planes in pitch-linear layout.
Definition: Image.h:164
VPIImageBufferPitchLinear pitch
Image stored in pitch-linear layout.
Definition: Image.h:239
int32_t numPlanes
Number of planes.
Definition: Image.h:160
VPIImageFormat format
Image format.
Definition: Image.h:156
VPIPixelType pixelType
Type of each pixel within this plane.
Definition: Image.h:115
VPIImageBufferType bufferType
Type of image buffer.
Definition: Image.h:273
int64_t offsetBytes
Offset in bytes from pBase to the first column of the first plane row.
Definition: Image.h:137
int32_t height
Height of this plane in pixels.
Definition: Image.h:123
VPIByte * pBase
Pointer to the memory buffer which contains the plane data.
Definition: Image.h:145
int32_t width
Width of this plane in pixels.
Definition: Image.h:119
int32_t pitchBytes
Difference in bytes of beginning of one row and the beginning of the previous.
Definition: Image.h:134
void vpiImageDestroy(VPIImage img)
Destroy an image instance.
struct VPIImageImpl * VPIImage
A handle to an image.
Definition: Types.h:254
VPIStatus vpiImageCreateWrapper(const VPIImageData *data, const VPIImageWrapperParams *params, uint64_t flags, VPIImage *img)
Create an image object by wrapping an existing memory block.
VPIStatus vpiImageLockData(VPIImage img, VPILockMode mode, VPIImageBufferType bufType, VPIImageData *data)
Acquires the lock on an image object and returns the image contents.
VPIStatus vpiImageCreate(int32_t width, int32_t height, VPIImageFormat fmt, uint64_t flags, VPIImage *img)
Create an empty image instance with the specified flags.
VPIStatus vpiImageUnlock(VPIImage img)
Releases the lock on an image object.
@ VPI_IMAGE_BUFFER_CUDA_PITCH_LINEAR
CUDA-accessible with planes in pitch-linear memory layout.
Definition: Image.h:179
@ VPI_IMAGE_BUFFER_HOST_PITCH_LINEAR
Host-accessible with planes in pitch-linear memory layout.
Definition: Image.h:176
Stores the image plane contents.
Definition: Image.h:154
Stores information about image characteristics and content.
Definition: Image.h:269
VPIStatus vpiImageCreateWrapperOpenCVMat(const cv::Mat &mat, VPIImageFormat fmt, uint64_t flags, VPIImage *img)
Wraps a cv::Mat in an VPIImage with the given image format.
struct VPIStreamImpl * VPIStream
A handle to a stream.
Definition: Types.h:248
VPIStatus vpiStreamSync(VPIStream stream)
Blocks the calling thread until all submitted commands in this stream queue are done (queue is empty)...
void vpiStreamDestroy(VPIStream stream)
Destroy a stream instance and deallocate all HW resources.
VPIStatus vpiStreamCreateWrapperCUDA(CUstream cudaStream, uint64_t flags, VPIStream *stream)
Wraps an existing cudaStream_t into a VPI stream.
VPIStatus vpiStreamCreate(uint64_t flags, VPIStream *stream)
Create a stream instance.
@ VPI_BACKEND_CUDA
CUDA backend.
Definition: Types.h:93
@ VPI_BACKEND_VIC
VIC backend.
Definition: Types.h:95
@ VPI_BACKEND_CPU
CPU backend.
Definition: Types.h:92
@ VPI_INTERP_NEAREST
Nearest neighbor interpolation.
Definition: Interpolation.h:78
@ VPI_LOCK_READ
Lock memory only for reading.
Definition: Types.h:621