VPI - Vision Programming Interface

1.2 Release

Stereo Disparity

Overview

The Stereo Disparity application receives left and right stereo pair images and returns the disparity between them, which is a function of image depth. The result is saved as an image file to disk. If available, it'll also output the corresponding confidence map.

Instructions

The command line parameters are:

<backend> <left image> <right image>

where

  • backend: either cpu, cuda, pva or pva-nvenc-vic; it defines the backend that will perform the processing. pva-nvenc-vic and cuda allows output of the confidence map in addition to the disparity.
  • left image: left input image of a rectified stereo pair, it accepts png, jpeg and possibly others.
  • right image: right input image of a stereo pair.

Here's one example:

  • C++
    ./vpi_sample_02_stereo_disparity cuda ../assets/chair_stereo_left.png ../assets/chair_stereo_right.png
  • Python
    python main.py cuda ../assets/chair_stereo_left.png ../assets/chair_stereo_right.png

This is using the CUDA backend and the provided sample images. You can try with other stereo pair images, respecting the constraints imposed by the algorithm.

Results

Left input image Right input image
Stereo disparity Confidence map

Source Code

For convenience, here's the code that is also installed in the samples directory.

Language:
27 import cv2
28 import sys
29 import vpi
30 import numpy as np
31 from PIL import Image
32 from argparse import ArgumentParser
33 
34 # ----------------------------
35 # Parse command line arguments
36 
37 parser = ArgumentParser()
38 parser.add_argument('backend', choices=['cpu','cuda','pva','pva-nvenc-vic'],
39  help='Backend to be used for processing')
40 
41 parser.add_argument('left',
42  help='Rectified left input image from a stereo pair')
43 
44 parser.add_argument('right',
45  help='Rectified right input image from a stereo pair')
46 
47 args = parser.parse_args();
48 
49 # pixel value scaling factor when loading input
50 scale=1
51 
52 if args.backend == 'cpu':
53  backend = vpi.Backend.CPU
54 elif args.backend == 'cuda':
55  backend = vpi.Backend.CUDA
56 elif args.backend == 'pva':
57  backend = vpi.Backend.PVA
58 else:
59  assert args.backend == 'pva-nvenc-vic'
60  backend = vpi.Backend.PVA|vpi.Backend.NVENC|vpi.Backend.VIC
61 
62  # For PVA+NVENC+VIC mode, 16bpp input must be MSB-aligned, which
63  # is equivalent to say that it is Q8.8 (fixed-point, 8 decimals).
64  scale=256
65 
66 # Streams for left and right independent pre-processing
67 streamLeft = vpi.Stream()
68 streamRight = vpi.Stream()
69 
70 # --------------------------------------------------------------
71 # Load input into a vpi.Image and convert it to grayscale, 16bpp
72 with vpi.Backend.CUDA:
73  with streamLeft:
74  left = vpi.asimage(np.asarray(Image.open(args.left))).convert(vpi.Format.Y16_ER, scale=scale)
75  with streamRight:
76  right = vpi.asimage(np.asarray(Image.open(args.right))).convert(vpi.Format.Y16_ER, scale=scale)
77 
78 # --------------------------------------------------------------
79 # Preprocess input
80 
81 # Block linear format is needed for pva-nvenc-vic pipeline
82 # Currently we can only convert to block-linear using VIC backend.
83 # The input also must be 1080p
84 if args.backend == 'pva-nvenc-vic':
85  with vpi.Backend.VIC:
86  with streamLeft:
87  left = left.convert(vpi.Format.Y16_ER_BL).rescale((1920,1080))
88  with streamRight:
89  right = right.convert(vpi.Format.Y16_ER_BL).rescale((1920,1080))
90  maxDisparity = 256
91 else:
92  maxDisparity = 64
93 
94 if args.backend == 'pva-nvenc-vic' or args.backend == 'cuda':
95  # only PVA-NVENC-VIC and CUDA have confidence map
96  confidenceMap = vpi.Image(left.size, vpi.Format.U16)
97 else:
98  confidenceMap = None
99 
100 # Use stream left to consolidate actual stereo processing
101 streamStereo = streamLeft
102 
103 # ---------------------------------------------
104 # Estimate stereo disparity
105 with streamStereo, backend:
106  disparity = vpi.stereodisp(left, right, out_confmap=confidenceMap, window=5, maxdisp=maxDisparity)
107 
108 # ---------------------------------------------
109 # Postprocess results and save them to disk
110 with streamStereo, vpi.Backend.CUDA:
111  # Scale disparity and confidence map so that values like between 0 and 255.
112 
113  # Disparities are in Q10.5 format, so to map it to float, it gets
114  # divided by 32. Then the resulting disparity range, from 0 to
115  # stereo.maxDisparity gets mapped to 0-255 for proper output.
116  disparity = disparity.convert(vpi.Format.U8, scale=255.0/(32*maxDisparity))
117 
118  # Apply JET colormap to turn the disparities into color, reddish hues
119  # represent objects closer to the camera, blueish are farther away.
120  disparityColor = cv2.applyColorMap(disparity.cpu(), cv2.COLORMAP_JET)
121 
122  # Converts to RGB for output with PIL
123  disparityColor = cv2.cvtColor(disparityColor, cv2.COLOR_BGR2RGB)
124 
125  if confidenceMap:
126  confidenceMap = confidenceMap.convert(vpi.Format.U8, scale=255.0/65535)
127 
128  # When pixel confidence is 0, its color in the disparity
129  # output is black.
130  mask = cv2.threshold(confidenceMap.cpu(), 1, 255, cv2.THRESH_BINARY)[1]
131  mask = cv2.cvtColor(mask, cv2.COLOR_GRAY2BGR)
132  disparityColor = cv2.bitwise_and(disparityColor, mask)
133 
134 # -------------------
135 # Save result to disk
136 
137 Image.fromarray(disparityColor).save('disparity_python'+str(sys.version_info[0])+'_'+args.backend+'.png')
138 
139 if confidenceMap:
140  Image.fromarray(confidenceMap.cpu()).save('confidence_python'+str(sys.version_info[0])+'_'+args.backend+'.png')
141 
142 # vim: ts=8:sw=4:sts=4:et:ai
29 #include <opencv2/core/version.hpp>
30 #if CV_MAJOR_VERSION >= 3
31 # include <opencv2/imgcodecs.hpp>
32 #else
33 # include <opencv2/contrib/contrib.hpp> // for colormap
34 # include <opencv2/highgui/highgui.hpp>
35 #endif
36 
37 #include <opencv2/imgproc/imgproc.hpp>
38 #include <vpi/OpenCVInterop.hpp>
39 
40 #include <vpi/Image.h>
41 #include <vpi/Status.h>
42 #include <vpi/Stream.h>
44 #include <vpi/algo/Rescale.h>
46 
47 #include <cstring> // for memset
48 #include <iostream>
49 #include <sstream>
50 
51 #define CHECK_STATUS(STMT) \
52  do \
53  { \
54  VPIStatus status = (STMT); \
55  if (status != VPI_SUCCESS) \
56  { \
57  char buffer[VPI_MAX_STATUS_MESSAGE_LENGTH]; \
58  vpiGetLastStatusMessage(buffer, sizeof(buffer)); \
59  std::ostringstream ss; \
60  ss << vpiStatusGetName(status) << ": " << buffer; \
61  throw std::runtime_error(ss.str()); \
62  } \
63  } while (0);
64 
65 int main(int argc, char *argv[])
66 {
67  // OpenCV image that will be wrapped by a VPIImage.
68  // Define it here so that it's destroyed *after* wrapper is destroyed
69  cv::Mat cvImageLeft, cvImageRight;
70 
71  // VPI objects that will be used
72  VPIImage inLeft = NULL;
73  VPIImage inRight = NULL;
74  VPIImage tmpLeft = NULL;
75  VPIImage tmpRight = NULL;
76  VPIImage stereoLeft = NULL;
77  VPIImage stereoRight = NULL;
78  VPIImage disparity = NULL;
79  VPIImage confidenceMap = NULL;
80  VPIStream stream = NULL;
81  VPIPayload stereo = NULL;
82 
83  int retval = 0;
84 
85  try
86  {
87  // =============================
88  // Parse command line parameters
89 
90  if (argc != 4)
91  {
92  throw std::runtime_error(std::string("Usage: ") + argv[0] +
93  " <cpu|pva|cuda|pva-nvenc-vic> <left image> <right image>");
94  }
95 
96  std::string strBackend = argv[1];
97  std::string strLeftFileName = argv[2];
98  std::string strRightFileName = argv[3];
99 
100  uint32_t backends;
101 
102  if (strBackend == "cpu")
103  {
104  backends = VPI_BACKEND_CPU;
105  }
106  else if (strBackend == "cuda")
107  {
108  backends = VPI_BACKEND_CUDA;
109  }
110  else if (strBackend == "pva")
111  {
112  backends = VPI_BACKEND_PVA;
113  }
114  else if (strBackend == "pva-nvenc-vic")
115  {
117  }
118  else
119  {
120  throw std::runtime_error("Backend '" + strBackend +
121  "' not recognized, it must be either cpu, cuda, pva or pva-nvenc-vic.");
122  }
123 
124  // =====================
125  // Load the input images
126  cvImageLeft = cv::imread(strLeftFileName);
127  if (cvImageLeft.empty())
128  {
129  throw std::runtime_error("Can't open '" + strLeftFileName + "'");
130  }
131 
132  cvImageRight = cv::imread(strRightFileName);
133  if (cvImageRight.empty())
134  {
135  throw std::runtime_error("Can't open '" + strRightFileName + "'");
136  }
137 
138  // =================================
139  // Allocate all VPI resources needed
140 
141  int32_t inputWidth = cvImageLeft.cols;
142  int32_t inputHeight = cvImageLeft.rows;
143 
144  // Create the stream that will be used for processing.
145  CHECK_STATUS(vpiStreamCreate(0, &stream));
146 
147  // We now wrap the loaded images into a VPIImage object to be used by VPI.
148  // VPI won't make a copy of it, so the original image must be in scope at all times.
149  CHECK_STATUS(vpiImageCreateOpenCVMatWrapper(cvImageLeft, 0, &inLeft));
150  CHECK_STATUS(vpiImageCreateOpenCVMatWrapper(cvImageRight, 0, &inRight));
151 
152  // Format conversion parameters needed for input pre-processing
153  VPIConvertImageFormatParams convParams;
154  CHECK_STATUS(vpiInitConvertImageFormatParams(&convParams));
155 
156  // Set algorithm parameters to be used. Only values what differs from defaults will be overwritten.
158  CHECK_STATUS(vpiInitStereoDisparityEstimatorCreationParams(&stereoParams));
159 
160  // Define some backend-dependent parameters
161 
162  VPIImageFormat stereoFormat;
163  int stereoWidth, stereoHeight;
164  if (strBackend == "pva-nvenc-vic")
165  {
166  stereoFormat = VPI_IMAGE_FORMAT_Y16_ER_BL;
167 
168  // Input width and height has to be 1920x1080 in block-linear format for pva-nvenc-vic pipeline
169  stereoWidth = 1920;
170  stereoHeight = 1080;
171 
172  // For PVA+NVENC+VIC mode, 16bpp input must be MSB-aligned, which
173  // is equivalent to say that it is Q8.8 (fixed-point, 8 decimals).
174  convParams.scale = 256;
175 
176  // Maximum disparity is fixed to 256.
177  stereoParams.maxDisparity = 256;
178  }
179  else
180  {
181  stereoFormat = VPI_IMAGE_FORMAT_Y16_ER;
182 
183  if (strBackend == "pva")
184  {
185  stereoWidth = 480;
186  stereoHeight = 270;
187  }
188  else
189  {
190  stereoWidth = inputWidth;
191  stereoHeight = inputHeight;
192  }
193 
194  stereoParams.maxDisparity = 64;
195  }
196 
197  // Create the payload for Stereo Disparity algorithm.
198  // Payload is created before the image objects so that non-supported backends can be trapped with an error.
199  CHECK_STATUS(vpiCreateStereoDisparityEstimator(backends, stereoWidth, stereoHeight, stereoFormat, &stereoParams,
200  &stereo));
201 
202  // Create the image where the disparity map will be stored.
203  CHECK_STATUS(vpiImageCreate(stereoWidth, stereoHeight, VPI_IMAGE_FORMAT_U16, 0, &disparity));
204 
205  if (strBackend == "pva-nvenc-vic")
206  {
207  // Need an temporary image to convert BGR8 input from OpenCV into pixel-linear 16bpp grayscale.
208  // We can't convert it directly to block-linear since CUDA backend doesn't support it, and
209  // VIC backend doesn't support BGR8 inputs.
210  CHECK_STATUS(vpiImageCreate(inputWidth, inputHeight, VPI_IMAGE_FORMAT_Y16_ER, 0, &tmpLeft));
211  CHECK_STATUS(vpiImageCreate(inputWidth, inputHeight, VPI_IMAGE_FORMAT_Y16_ER, 0, &tmpRight));
212 
213  // Input to pva-nvenc-vic stereo disparity must be block linear
214  CHECK_STATUS(vpiImageCreate(stereoWidth, stereoHeight, stereoFormat, 0, &stereoLeft));
215  CHECK_STATUS(vpiImageCreate(stereoWidth, stereoHeight, stereoFormat, 0, &stereoRight));
216 
217  // confidence map is needed for pva-nvenc-vic pipeline
218  CHECK_STATUS(vpiImageCreate(stereoWidth, stereoHeight, VPI_IMAGE_FORMAT_U16, 0, &confidenceMap));
219  }
220  else
221  {
222  // PVA requires that input resolution is 480x270
223  if (strBackend == "pva")
224  {
225  CHECK_STATUS(vpiImageCreate(inputWidth, inputHeight, stereoFormat, 0, &tmpLeft));
226  CHECK_STATUS(vpiImageCreate(inputWidth, inputHeight, stereoFormat, 0, &tmpRight));
227  }
228  else if (strBackend == "cuda")
229  {
230  CHECK_STATUS(vpiImageCreate(inputWidth, inputHeight, VPI_IMAGE_FORMAT_U16, 0, &confidenceMap));
231  }
232 
233  // Allocate input to stereo disparity algorithm, pitch-linear 16bpp grayscale
234  CHECK_STATUS(vpiImageCreate(stereoWidth, stereoHeight, stereoFormat, 0, &stereoLeft));
235  CHECK_STATUS(vpiImageCreate(stereoWidth, stereoHeight, stereoFormat, 0, &stereoRight));
236  }
237 
238  // ================
239  // Processing stage
240 
241  // -----------------
242  // Pre-process input
243  if (strBackend == "pva-nvenc-vic" || strBackend == "pva")
244  {
245  // Convert opencv input to temporary grayscale format using CUDA
246  CHECK_STATUS(vpiSubmitConvertImageFormat(stream, VPI_BACKEND_CUDA, inLeft, tmpLeft, &convParams));
247  CHECK_STATUS(vpiSubmitConvertImageFormat(stream, VPI_BACKEND_CUDA, inRight, tmpRight, &convParams));
248 
249  // Do both scale and final image format conversion on VIC.
250  CHECK_STATUS(
251  vpiSubmitRescale(stream, VPI_BACKEND_VIC, tmpLeft, stereoLeft, VPI_INTERP_LINEAR, VPI_BORDER_CLAMP, 0));
252  CHECK_STATUS(vpiSubmitRescale(stream, VPI_BACKEND_VIC, tmpRight, stereoRight, VPI_INTERP_LINEAR,
253  VPI_BORDER_CLAMP, 0));
254  }
255  else
256  {
257  // Convert opencv input to grayscale format using CUDA
258  CHECK_STATUS(vpiSubmitConvertImageFormat(stream, VPI_BACKEND_CUDA, inLeft, stereoLeft, &convParams));
259  CHECK_STATUS(vpiSubmitConvertImageFormat(stream, VPI_BACKEND_CUDA, inRight, stereoRight, &convParams));
260  }
261 
262  // ------------------------------
263  // Do stereo disparity estimation
264 
265  // Submit it with the input and output images
266  CHECK_STATUS(vpiSubmitStereoDisparityEstimator(stream, backends, stereo, stereoLeft, stereoRight, disparity,
267  confidenceMap, NULL));
268 
269  // Wait until the algorithm finishes processing
270  CHECK_STATUS(vpiStreamSync(stream));
271 
272  // ========================================
273  // Output pre-processing and saving to disk
274  // Lock output to retrieve its data on cpu memory
275  VPIImageData data;
276  CHECK_STATUS(vpiImageLock(disparity, VPI_LOCK_READ, &data));
277 
278  // Make an OpenCV matrix out of this image
279  cv::Mat cvDisparity;
280  CHECK_STATUS(vpiImageDataExportOpenCVMat(data, &cvDisparity));
281 
282  // Scale result and write it to disk. Disparities are in Q10.5 format,
283  // so to map it to float, it gets divided by 32. Then the resulting disparity range,
284  // from 0 to stereo.maxDisparity gets mapped to 0-255 for proper output.
285  cvDisparity.convertTo(cvDisparity, CV_8UC1, 255.0 / (32 * stereoParams.maxDisparity), 0);
286 
287  // Apply JET colormap to turn the disparities into color, reddish hues
288  // represent objects closer to the camera, blueish are farther away.
289  cv::Mat cvDisparityColor;
290  applyColorMap(cvDisparity, cvDisparityColor, cv::COLORMAP_JET);
291 
292  // Done handling output, don't forget to unlock it.
293  CHECK_STATUS(vpiImageUnlock(disparity));
294 
295  // If we have a confidence map,
296  if (confidenceMap)
297  {
298  // Write it to disk too.
299  //
300  VPIImageData data;
301  CHECK_STATUS(vpiImageLock(confidenceMap, VPI_LOCK_READ, &data));
302 
303  cv::Mat cvConfidence;
304  CHECK_STATUS(vpiImageDataExportOpenCVMat(data, &cvConfidence));
305 
306  // Confidence map varies from 0 to 65535, we scale it to
307  // [0-255].
308  cvConfidence.convertTo(cvConfidence, CV_8UC1, 255.0 / 65535, 0);
309  imwrite("confidence_" + strBackend + ".png", cvConfidence);
310 
311  CHECK_STATUS(vpiImageUnlock(confidenceMap));
312 
313  // When pixel confidence is 0, its color in the disparity
314  // output is black.
315  cv::Mat cvMask;
316  threshold(cvConfidence, cvMask, 1, 255, cv::THRESH_BINARY);
317  cvtColor(cvMask, cvMask, cv::COLOR_GRAY2BGR);
318  bitwise_and(cvDisparityColor, cvMask, cvDisparityColor);
319  }
320 
321  imwrite("disparity_" + strBackend + ".png", cvDisparityColor);
322  }
323  catch (std::exception &e)
324  {
325  std::cerr << e.what() << std::endl;
326  retval = 1;
327  }
328 
329  // ========
330  // Clean up
331 
332  // Destroying stream first makes sure that all work submitted to
333  // it is finished.
334  vpiStreamDestroy(stream);
335 
336  // Only then we can destroy the other objects, as we're sure they
337  // aren't being used anymore.
338 
339  vpiImageDestroy(inLeft);
340  vpiImageDestroy(inRight);
341  vpiImageDestroy(tmpLeft);
342  vpiImageDestroy(tmpRight);
343  vpiImageDestroy(stereoLeft);
344  vpiImageDestroy(stereoRight);
345  vpiImageDestroy(confidenceMap);
346  vpiImageDestroy(disparity);
347  vpiPayloadDestroy(stereo);
348 
349  return retval;
350 }
351 
352 // vim: ts=8:sw=4:sts=4:et:ai
Declares functions that handle image format conversion.
Functions and structures for dealing with VPI images.
Functions for handling OpenCV interoperability with VPI.
Declares functions that implement the Rescale algorithm.
Declaration of VPI status codes handling functions.
Declares functions that implement stereo disparity estimation algorithms.
Declares functions dealing with VPI streams.
float scale
Scaling factor.
VPIStatus vpiSubmitConvertImageFormat(VPIStream stream, uint32_t backend, VPIImage input, VPIImage output, const VPIConvertImageFormatParams *params)
Converts the image contents to the desired format, with optional scaling and offset.
VPIStatus vpiInitConvertImageFormatParams(VPIConvertImageFormatParams *params)
Initialize VPIConvertImageFormatParams with default values.
Parameters for customizing image format conversion.
VPIImageFormat
Pre-defined image formats.
Definition: ImageFormat.h:99
@ VPI_IMAGE_FORMAT_Y16_ER_BL
Single plane with one block-linear 16-bit unsigned integer channel with full-range luma (grayscale) i...
Definition: ImageFormat.h:164
@ VPI_IMAGE_FORMAT_U16
Single plane with one 16-bit unsigned integer channel.
Definition: ImageFormat.h:110
@ VPI_IMAGE_FORMAT_Y16_ER
Single plane with one pitch-linear 16-bit unsigned integer channel with full-range luma (grayscale) i...
Definition: ImageFormat.h:159
VPIStatus vpiImageLock(VPIImage img, VPILockMode mode, VPIImageData *hostData)
Acquires the lock on an image object and returns a pointer to the image planes.
void vpiImageDestroy(VPIImage img)
Destroy an image instance.
struct VPIImageImpl * VPIImage
A handle to an image.
Definition: Types.h:215
VPIStatus vpiImageCreate(int32_t width, int32_t height, VPIImageFormat fmt, uint32_t flags, VPIImage *img)
Create an empty image instance with the specified flags.
VPIStatus vpiImageUnlock(VPIImage img)
Releases the lock on an image object.
Stores information about image characteristics and content.
Definition: Image.h:159
VPIStatus vpiImageDataExportOpenCVMat(const VPIImageData &imgData, cv::Mat *mat)
Fills an existing cv::Mat with data from VPIImageData coming from a locked VPIImage.
VPIStatus vpiImageCreateOpenCVMatWrapper(const cv::Mat &mat, VPIImageFormat fmt, uint32_t flags, VPIImage *img)
Wraps a cv::Mat in an VPIImage with the given image format.
struct VPIPayloadImpl * VPIPayload
A handle to an algorithm payload.
Definition: Types.h:227
void vpiPayloadDestroy(VPIPayload payload)
Deallocates the payload object and all associated resources.
VPIStatus vpiSubmitRescale(VPIStream stream, uint32_t backend, VPIImage input, VPIImage output, VPIInterpolationType interpolationType, VPIBorderExtension border, uint32_t flags)
Changes the size and scale of a 2D image.
int32_t maxDisparity
Maximum disparity for matching search.
VPIStatus vpiInitStereoDisparityEstimatorCreationParams(VPIStereoDisparityEstimatorCreationParams *params)
Initializes VPIStereoDisparityEstimatorCreationParams with default values.
VPIStatus vpiSubmitStereoDisparityEstimator(VPIStream stream, uint32_t backend, VPIPayload payload, VPIImage left, VPIImage right, VPIImage disparity, VPIImage confidenceMap, const VPIStereoDisparityEstimatorParams *params)
Runs stereo processing on a pair of images and outputs a disparity map.
VPIStatus vpiCreateStereoDisparityEstimator(uint32_t backends, int32_t imageWidth, int32_t imageHeight, VPIImageFormat inputFormat, const VPIStereoDisparityEstimatorCreationParams *params, VPIPayload *payload)
Creates payload for vpiSubmitStereoDisparityEstimator.
Structure that defines the parameters for vpiCreateStereoDisparityEstimator.
struct VPIStreamImpl * VPIStream
A handle to a stream.
Definition: Types.h:209
VPIStatus vpiStreamSync(VPIStream stream)
Blocks the calling thread until all submitted commands in this stream queue are done (queue is empty)...
void vpiStreamDestroy(VPIStream stream)
Destroy a stream instance and deallocate all HW resources.
VPIStatus vpiStreamCreate(uint32_t flags, VPIStream *stream)
Create a stream instance.
@ VPI_BACKEND_CUDA
CUDA backend.
Definition: Types.h:93
@ VPI_BACKEND_PVA
PVA backend.
Definition: Types.h:94
@ VPI_BACKEND_NVENC
NVENC backend.
Definition: Types.h:96
@ VPI_BACKEND_VIC
VIC backend.
Definition: Types.h:95
@ VPI_BACKEND_CPU
CPU backend.
Definition: Types.h:92
@ VPI_BORDER_CLAMP
Border pixels are repeated indefinitely.
Definition: Types.h:238
@ VPI_INTERP_LINEAR
Linear interpolation.
Definition: Interpolation.h:93
@ VPI_LOCK_READ
Lock memory only for reading.
Definition: Types.h:383