VPI - Vision Programming Interface

4.1 Release

Wrap Raw Pointers

Overview

The Wrap Raw Pointers sample demonstrates how to wrap externally allocated memory buffers into VPI images for integration with existing pipelines. This is particularly useful when integrating VPI with other frameworks or when working with pre-allocated memory from hardware accelerators.

The sample shows three different memory types that can be wrapped:

  • CUDA pointers: Device memory allocated via cudaMalloc
  • CPU pointers: Host memory allocated via malloc
  • NvBufSurface: NVIDIA buffer surfaces used in multimedia pipelines (Jetson platform)

Wrapping external memory can avoid unnecessary memory copies when the selected backend can access the wrapped allocation directly. The sample applies a vertical image flip operation to demonstrate processing on the wrapped buffers.

Instructions

The command line parameters are:

<input_image> <pointer_type>

where

  • input_image: input image file name; accepts png, jpeg, and other common formats.
  • pointer_type: type of pointer to wrap; one of cuda_ptr, cpu_ptr, or nvbufsurface.
Note
The output will be in grayscale as this sample uses single-channel Y8 format.
The nvbufsurface option is only available on Jetson platforms with NvBufSurface support.

Here are some examples:

  • C++ (CUDA pointer)
    ./vpi_sample_21_wrap_raw_ptr ../assets/kodim08.png cuda_ptr
  • C++ (CPU pointer)
    ./vpi_sample_21_wrap_raw_ptr ../assets/kodim08.png cpu_ptr
  • C++ (NvBufSurface - Jetson only)
    ./vpi_sample_21_wrap_raw_ptr ../assets/kodim08.png nvbufsurface

The output is saved as output.png in the current directory.

Features

  • Interop Wrapping: Wrap existing memory buffers and process them with a backend that can access that memory type
  • Multiple Memory Types: Support for CUDA device memory, CPU host memory, and NvBufSurface
  • Backend Selection: Automatically selects the appropriate VPI backend based on pointer type:
    • CUDA backend for CUDA pointers
    • CPU backend for CPU pointers
    • VIC backend for NvBufSurface (Jetson)
  • OpenCV Integration: Demonstrates loading images with OpenCV and transferring to wrapped VPI buffers
  • Image Processing: Applies vertical flip operation on wrapped buffers

Internal Copies

The wrapper itself does not copy the external allocation. Copies can still happen later if an algorithm backend needs a memory representation that cannot be shared with the wrapped allocation.

In particular, a CUDA pitch-linear pointer wrapped as a VPIImage is a good match for CUDA backend work. If that same image is then consumed by PVA, VIC, OFA, or another Tegra engine, VPI may create an internal backend-compatible allocation and copy the data before or after the operation. Similarly, wrapping an NvBufSurface may require a copy when VPI needs an image backed by NvSciBuf for another backend.

To avoid these copies in CUDA/VPI interop pipelines, prefer the reverse direction: allocate the image with vpiImageCreate, run VPI operations on that VPI-owned image, and lock out the CUDA view with vpiImageLockData using VPI_IMAGE_BUFFER_CUDA_PITCH_LINEAR when CUDA code needs to access it. Keep the returned VPIImageBufferPitchLinear data only while the image is locked. For multimedia or NvSci pipelines, apply the same rule: create in VPI first, then lock or extract the interop handle that the external code needs.

Workflow

  1. Load input image using OpenCV (grayscale)
  2. Allocate memory buffer based on specified pointer type
  3. Copy image data to the allocated buffer
  4. Create VPI image wrapper around the external buffer
  5. Submit image flip operation to VPI stream
  6. Synchronize and copy result back to OpenCV image
  7. Save output image

Results

The sample produces a vertically flipped grayscale version of the input image saved as output.png.

Source Code

For convenience, here's the code that is also installed in the samples directory.

29 #include <opencv2/core/version.hpp>
30 #if CV_MAJOR_VERSION >= 3
31 # include <opencv2/imgcodecs.hpp>
32 #else
33 # include <opencv2/contrib/contrib.hpp> // for colormap
34 # include <opencv2/highgui/highgui.hpp>
35 #endif
36 
37 #include <cuda_runtime.h>
38 #include <nvbufsurface.h>
39 #include <vpi/Context.h>
40 #include <vpi/Image.h>
41 #include <vpi/Stream.h>
42 #include <vpi/algo/ImageFlip.h>
43 
44 #include <algorithm>
45 #include <cctype>
46 #include <cstdlib>
47 #include <cstring>
48 #include <iostream>
49 #include <sstream>
50 #include <stdexcept>
51 #include <string>
52 #include <thread>
53 
54 #define CHECK_STATUS(STMT) \
55  do \
56  { \
57  VPIStatus status = (STMT); \
58  if (status != VPI_SUCCESS) \
59  { \
60  char buffer[VPI_MAX_STATUS_MESSAGE_LENGTH]; \
61  vpiGetLastStatusMessage(buffer, sizeof(buffer)); \
62  std::ostringstream ss; \
63  ss << "line " << __LINE__ << " " << vpiStatusGetName(status) << ": " << buffer; \
64  throw std::runtime_error(ss.str()); \
65  } \
66  } while (0);
67 
68 namespace {
69 
70 VPIImageData GetGenericPLData(int imgWidth, int imgHeight, VPIImageBufferType bufferType, VPIByte *pBase)
71 {
72  VPIImageBufferPitchLinear imgData = {};
73  imgData.format = VPI_IMAGE_FORMAT_Y8;
74  imgData.numPlanes = 1;
75  imgData.planes[0].width = imgWidth;
76  imgData.planes[0].height = imgHeight;
77  imgData.planes[0].pitchBytes = sizeof(uint8_t) * imgWidth;
79  imgData.planes[0].offsetBytes = 0;
80  imgData.planes[0].pBase = pBase;
81  VPIImageData vpiImgData = {};
82  vpiImgData.bufferType = bufferType;
83  vpiImgData.buffer.pitch = imgData;
84  return vpiImgData;
85 }
86 
87 } // namespace
88 
89 int main(int argc, char *argv[])
90 {
91  // =============================
92  // Parse command line parameters
93  if (argc != 3)
94  {
95  throw std::runtime_error(std::string("Usage: ") + argv[0] + " " +
96  "<input_image> <cuda_ptr|cpu_ptr|nvbufsurface>");
97  }
98  std::string imgName = argv[1];
99  std::string ptrToWrap = argv[2];
100  VPIBackend backend;
101 
102  if (ptrToWrap == "cuda_ptr")
103  {
104  backend = VPI_BACKEND_CUDA;
105  }
106  else if (ptrToWrap == "cpu_ptr")
107  {
108  backend = VPI_BACKEND_CPU;
109  }
110  else if (ptrToWrap == "nvbufsurface")
111  {
112  backend = VPI_BACKEND_VIC;
113  }
114  else
115  {
116  throw std::runtime_error("Pointer to wrap must be one of cuda_ptr, cpu_ptr or nvbufsurface");
117  }
118 
119  VPIContext context = nullptr;
120  CHECK_STATUS(vpiContextCreate(backend | VPI_BACKEND_CPU, &context));
121  CHECK_STATUS(vpiContextSetCurrent(context));
122 
123  // =============================
124  // Read input image
125  cv::Mat image = cv::imread(imgName, cv::IMREAD_GRAYSCALE);
126  if (image.empty())
127  {
128  throw std::runtime_error("Failed to read input image: " + imgName);
129  }
130  uint32_t imgWidth = image.cols;
131  uint32_t imgHeight = image.rows;
132 
133  // =============================
134  // Allocate pointers to be wrapped and create wrapped images
135  VPIImage bufIn = nullptr, bufOut = nullptr;
136  VPIImageData dataIn = {}, dataOut = {};
137  uint8_t *cudaIn = nullptr, *cudaOut = nullptr;
138  NvBufSurface *surfIn = nullptr, *surfOut = nullptr;
139  uint8_t *cpuIn = nullptr, *cpuOut = nullptr;
140  VPIImageWrapperParams wrapParams = {};
141  wrapParams.colorSpec = VPI_COLOR_SPEC_DEFAULT;
142  if (ptrToWrap == "nvbufsurface")
143  {
144  // =============================
145  // Allocate NvBufSurface
146  NvBufSurfaceCreateParams nv_buf_params{};
147  nv_buf_params.width = imgWidth;
148  nv_buf_params.height = imgHeight;
149  nv_buf_params.memType = NVBUF_MEM_SURFACE_ARRAY;
150  nv_buf_params.layout = NVBUF_LAYOUT_PITCH;
151  nv_buf_params.colorFormat = NVBUF_COLOR_FORMAT_GRAY8;
152  if (0 != NvBufSurfaceCreate(&surfIn, 1, &nv_buf_params))
153  {
154  throw std::runtime_error("NvBufSurfaceCreate failed");
155  }
156  if (0 != NvBufSurfaceCreate(&surfOut, 1, &nv_buf_params))
157  {
158  throw std::runtime_error("NvBufSurfaceCreate failed");
159  }
160 
161  // =============================
162  // Copy over data from OpenCV image to NvBufSurface
163  if (0 != Raw2NvBufSurface(image.data, 0, 0, imgWidth, imgHeight, surfIn))
164  {
165  throw std::runtime_error("Copying to NvBufSurface failed");
166  }
167 
168  // =============================
169  // Wrap NvBufSurfaces directly in VPIImage
171  dataIn.buffer.fd = surfIn->surfaceList[0].bufferDesc;
172  CHECK_STATUS(vpiImageCreateWrapper(&dataIn, &wrapParams, backend | VPI_BACKEND_CPU, &bufIn));
173 
174  dataOut.bufferType = VPI_IMAGE_BUFFER_NVBUFFER;
175  dataOut.buffer.fd = surfOut->surfaceList[0].bufferDesc;
176  CHECK_STATUS(vpiImageCreateWrapper(&dataOut, &wrapParams, backend | VPI_BACKEND_CPU, &bufOut));
177  }
178  else if (ptrToWrap == "cuda_ptr")
179  {
180  // =============================
181  // Allocate CUDA pitch
182  if (0 != cudaMalloc(&cudaIn, imgWidth * imgHeight * sizeof(uint8_t)))
183  {
184  throw std::runtime_error("Allocating CUDA pitch failed");
185  }
186  if (0 != cudaMalloc(&cudaOut, imgWidth * imgHeight * sizeof(uint8_t)))
187  {
188  throw std::runtime_error("Allocating CUDA pitch failed");
189  }
190 
191  // =============================
192  // Copy over data from OpenCV image to input CUDA pitch
193  if (0 != cudaMemcpy2D(cudaIn, imgWidth, image.data, imgWidth, imgWidth, imgHeight, cudaMemcpyHostToDevice))
194  {
195  throw std::runtime_error("Copying to CUDA pitch failed");
196  }
197 
198  // =============================
199  // Wrap CUDA pitches directly in VPIImage
200  dataIn = GetGenericPLData(imgWidth, imgHeight, VPI_IMAGE_BUFFER_CUDA_PITCH_LINEAR, cudaIn);
201  dataOut = GetGenericPLData(imgWidth, imgHeight, VPI_IMAGE_BUFFER_CUDA_PITCH_LINEAR, cudaOut);
202  CHECK_STATUS(vpiImageCreateWrapper(&dataIn, &wrapParams, backend | VPI_BACKEND_CPU, &bufIn));
203  CHECK_STATUS(vpiImageCreateWrapper(&dataOut, &wrapParams, backend | VPI_BACKEND_CPU, &bufOut));
204  }
205  else
206  {
207  // =============================
208  // Allocate CPU pitch
209  cpuIn = static_cast<uint8_t *>(malloc(imgWidth * imgHeight * sizeof(uint8_t)));
210  cpuOut = static_cast<uint8_t *>(malloc(imgWidth * imgHeight * sizeof(uint8_t)));
211 
212  // =============================
213  // Copy over data from OpenCV image to input CPU pitch
214  memcpy(cpuIn, image.data, imgWidth * imgHeight);
215 
216  // =============================
217  // Wrap CUDA pitches directly in VPIImage
218  dataIn = GetGenericPLData(imgWidth, imgHeight, VPI_IMAGE_BUFFER_HOST_PITCH_LINEAR, cpuIn);
219  dataOut = GetGenericPLData(imgWidth, imgHeight, VPI_IMAGE_BUFFER_HOST_PITCH_LINEAR, cpuOut);
220  CHECK_STATUS(vpiImageCreateWrapper(&dataIn, &wrapParams, backend | VPI_BACKEND_CPU, &bufIn));
221  CHECK_STATUS(vpiImageCreateWrapper(&dataOut, &wrapParams, backend | VPI_BACKEND_CPU, &bufOut));
222  }
223 
224  // =============================
225  // Create VPI Stream
226  VPIStream stream;
227  CHECK_STATUS(vpiStreamCreate(0, &stream));
228 
229  // =============================
230  // Submit image flip
231  CHECK_STATUS(vpiSubmitImageFlip(stream, backend, bufIn, bufOut, VPI_FLIP_VERT));
232 
233  // =============================
234  // Sync stream
235  CHECK_STATUS(vpiStreamSync(stream));
236 
237  // =============================
238  // Copy back to OpenCV Image and save output. Zero copying between VPI and wrapped type!
239  if (ptrToWrap == "nvbufsurface")
240  {
241  if (0 != NvBufSurface2Raw(surfOut, 0, 0, imgWidth, imgHeight, image.data))
242  {
243  throw std::runtime_error("Copying from NvBufSurface failed");
244  }
245  }
246  else if (ptrToWrap == "cuda_ptr")
247  {
248  if (0 != cudaMemcpy2D(image.data, imgWidth, cudaOut, imgWidth, imgWidth, imgHeight, cudaMemcpyDeviceToHost))
249  {
250  throw std::runtime_error("Copying to CUDA pitch failed");
251  }
252  }
253  else
254  {
255  memcpy(image.data, cpuOut, imgWidth * imgHeight);
256  }
257  cv::imwrite("output.png", image);
258 
259  // =============================
260  // Cleanup
261  vpiStreamDestroy(stream);
262  vpiImageDestroy(bufIn); // note that destroying a wrapper image will not free the underlying memory
263  vpiImageDestroy(bufOut);
264  if (ptrToWrap == "nvbufsurface")
265  {
266  NvBufSurfaceDestroy(surfIn);
267  NvBufSurfaceDestroy(surfOut);
268  }
269  else if (ptrToWrap == "cuda_ptr")
270  {
271  cudaFree(cudaIn);
272  cudaFree(cudaOut);
273  }
274  else
275  {
276  free(cpuIn);
277  free(cpuOut);
278  }
279  vpiContextDestroy(context);
280 
281  if (ptrToWrap == "cuda_ptr")
282  {
283  // The Jetson CUDA/EGL driver stack can abort in process-global finalizers
284  // after this CUDA-runtime sample has already released its resources.
285  image.release();
286  std::cout.flush();
287  std::cerr.flush();
288  std::_Exit(EXIT_SUCCESS);
289  }
290 }
Functions and structures for dealing with VPI contexts.
Declares functions that implement Image flip algorithms.
#define VPI_IMAGE_FORMAT_Y8
Single plane with one pitch-linear 8-bit unsigned integer channel with limited-range luma (grayscale)...
Definition: ImageFormat.h:147
Functions and structures for dealing with VPI images.
#define VPI_PIXEL_TYPE_INVALID
Signal format conversion errors.
Definition: PixelType.h:85
Declares functions dealing with VPI streams.
unsigned char VPIByte
Definition of a byte type.
Definition: Types.h:288
@ VPI_FLIP_VERT
Flip vertically.
Definition: Types.h:718
@ VPI_COLOR_SPEC_DEFAULT
Default color spec.
Definition: ColorSpec.h:167
VPIStatus vpiContextCreate(uint64_t flags, VPIContext *ctx)
Create a context instance.
VPIStatus vpiContextSetCurrent(VPIContext ctx)
Sets the context for the calling thread.
void vpiContextDestroy(VPIContext ctx)
Destroy a context instance as well as all resources it owns.
struct VPIContextImpl * VPIContext
A handle to a context.
Definition: Types.h:236
VPIStatus vpiSubmitImageFlip(VPIStream stream, uint64_t backend, VPIImage input, VPIImage output, VPIFlipMode flipMode)
Flips a 2D image either horizontally, vertically or both.
VPIImageBuffer buffer
Stores the image contents.
Definition: Image.h:276
VPIImagePlanePitchLinear planes[VPI_MAX_PLANE_COUNT]
Data of all image planes in pitch-linear layout.
Definition: Image.h:164
VPIImageBufferPitchLinear pitch
Image stored in pitch-linear layout.
Definition: Image.h:239
int32_t numPlanes
Number of planes.
Definition: Image.h:160
VPIImageFormat format
Image format.
Definition: Image.h:156
VPIPixelType pixelType
Type of each pixel within this plane.
Definition: Image.h:115
VPIColorSpec colorSpec
Color spec to override the one defined by the VPIImageData wrapper.
Definition: Image.h:333
VPIImageBufferType bufferType
Type of image buffer.
Definition: Image.h:273
int fd
Image stored as an NvBuffer file descriptor.
Definition: Image.h:263
int64_t offsetBytes
Offset in bytes from pBase to the first column of the first plane row.
Definition: Image.h:137
int32_t height
Height of this plane in pixels.
Definition: Image.h:123
VPIByte * pBase
Pointer to the memory buffer which contains the plane data.
Definition: Image.h:145
int32_t width
Width of this plane in pixels.
Definition: Image.h:119
int32_t pitchBytes
Difference in bytes of beginning of one row and the beginning of the previous.
Definition: Image.h:134
void vpiImageDestroy(VPIImage img)
Destroy an image instance.
struct VPIImageImpl * VPIImage
A handle to an image.
Definition: Types.h:254
VPIStatus vpiImageCreateWrapper(const VPIImageData *data, const VPIImageWrapperParams *params, uint64_t flags, VPIImage *img)
Create an image object by wrapping an existing memory block.
VPIImageBufferType
Represents how the image data is stored.
Definition: Image.h:170
@ VPI_IMAGE_BUFFER_CUDA_PITCH_LINEAR
CUDA-accessible with planes in pitch-linear memory layout.
Definition: Image.h:179
@ VPI_IMAGE_BUFFER_HOST_PITCH_LINEAR
Host-accessible with planes in pitch-linear memory layout.
Definition: Image.h:176
@ VPI_IMAGE_BUFFER_NVBUFFER
NvBuffer.
Definition: Image.h:200
Stores the image plane contents.
Definition: Image.h:154
Stores information about image characteristics and content.
Definition: Image.h:269
Parameters for customizing image wrapping.
Definition: Image.h:329
struct VPIStreamImpl * VPIStream
A handle to a stream.
Definition: Types.h:248
VPIStatus vpiStreamSync(VPIStream stream)
Blocks the calling thread until all submitted commands in this stream queue are done (queue is empty)...
VPIBackend
VPI Backend types.
Definition: Types.h:91
void vpiStreamDestroy(VPIStream stream)
Destroy a stream instance and deallocate all HW resources.
VPIStatus vpiStreamCreate(uint64_t flags, VPIStream *stream)
Create a stream instance.
@ VPI_BACKEND_CUDA
CUDA backend.
Definition: Types.h:93
@ VPI_BACKEND_VIC
VIC backend.
Definition: Types.h:95
@ VPI_BACKEND_CPU
CPU backend.
Definition: Types.h:92