Overview

The Stereo Disparity application receives left and right stereo pair images and returns the disparity between them, which is a function of image depth. The result is saved as an image file to disk. If available, it'll also output the corresponding confidence map.

Instructions

The command line parameters are:

where

backend: either cpu, cuda, pva, ofa, ofa-pva-vic or pva-nvenc-vic; it defines the backend that will perform the processing. pva-nvenc-vic, ofa-pva-vic and cuda allow output of the confidence map in addition to the disparity.
left image: left input image of a rectified stereo pair, it accepts png, jpeg and possibly others.
right image: right input image of a stereo pair.
Note: for pva-nvenc-vic backend the left_image's size must be 1920x1080.

Here's one example:

C++
./vpi_sample_02_stereo_disparity cuda ../assets/chair_stereo_left.png ../assets/chair_stereo_right.png
Python
python3 main.py cuda ../assets/chair_stereo_left.png ../assets/chair_stereo_right.png

This is using the CUDA backend and the provided sample images. You can try with other stereo pair images, respecting the constraints imposed by the algorithm.

The Python version of this sample also allow for setting various additional parameters, as well as additional input image extensions and to turn on verbose mode. The following command-line arguments can be passed to the Python sample:

Python
python3 main.py <backend> <left image> <right image> --width W --height H --downscale D --window_size WIN

--skip_confidence --conf_threshold T --conf_type absolute/relative -p1 P1 -p2 P2 --p2_alpha P2alpha

--uniqueness U --skip_diagonal --num_passes N --min_disparity MIN --max_disparity MAX --output_mode 0/1/2

-v/--verbose

where the additional optional arguments are:
width: set the width W when passing ".raw" input images
height: set the height H when passing ".raw" input images
downscale: to set the output downscale factor as D
window_size: to set the median filter window size as WIN
skip_confidence: to avoid calculating confidence and applying it as a mask
conf_threshold: set the confidence threshold as T
conf_type: set the confidence type as either absolute or relative
p1: set p1 penalty as P1
p2: set p2 penalty as P2
p2_alpha: set p2Alpha adaptive penalty as P2alpha
uniqueness: set uniqueness as U
skip_diagonal: to avoid using diagonal paths in CUDA or OFA backends
num_passes: to set the number of passes N in OFA backends
min_disparity: to set the minimum disparity MIN in CUDA backend
max_disparity: to set the maximum disparity MAX in backends
output_mode: 0 for colored output, 1 for grayscale and 2 for raw binary
verbose: to turn on verbose mode To understand in detail each of these aditional arguments related to the stereo disparity algorithm, please read the corresponding documentation.

Results

Left input image	Right input image

Stereo disparity	Confidence map

Source Code

For convenience, here's the code that is also installed in the samples directory.

Language: C++ Python

 import cv2
 import sys
 import vpi
 import numpy as np
 from PIL import Image
 from argparse import ArgumentParser
  
  
 def read_raw_file(fpath, resize_to=None, verbose=False):
     try:
         if verbose:
             print(f'I Reading: {fpath}', end=' ', flush=True)
         f = open(fpath, 'rb')
         np_arr = np.fromfile(f, dtype=np.uint16, count=-1)
         f.close()
         if verbose:
             print(f'done!\nI Raw array: shape: {np_arr.shape} dtype: {np_arr.dtype}')
         if resize_to is not None:
             np_arr = np_arr.reshape(resize_to, order='C')
         if verbose:
             print(f'I Reshaped array: shape: {np_arr.shape} dtype: {np_arr.dtype}')
         pil_img = Image.fromarray(np_arr, mode="I;16L")
         return pil_img
     except:
         raise ValueError(f'E Cannot process raw input: {fpath}')
  
  
 def process_arguments():
     parser = ArgumentParser()
  
     parser.add_argument('backend', choices=['cpu','cuda','pva','ofa','ofa-pva-vic','pva-nvenc-vic'],
                         help='Backend to be used for processing')
     parser.add_argument('left', help='Rectified left input image from a stereo pair')
     parser.add_argument('right', help='Rectified right input image from a stereo pair')
     parser.add_argument('--width', default=-1, type=int, help='Input width for raw input files')
     parser.add_argument('--height', default=-1, type=int, help='Input height for raw input files')
     parser.add_argument('--downscale', default=1, type=int, help='Output downscale factor')
     parser.add_argument('--window_size', default=5, type=int, help='Median filter window size')
     parser.add_argument('--skip_confidence', default=False, action='store_true', help='Do not calculate confidence')
     parser.add_argument('--conf_threshold', default=32767, type=int, help='Confidence threshold')
     parser.add_argument('--conf_type', default='absolute', choices=['absolute', 'relative'],
                         help='Computation type to produce the confidence output')
     parser.add_argument('-p1', default=3, type=int, help='Penalty P1 on small disparities')
     parser.add_argument('-p2', default=48, type=int, help='Penalty P2 on large disparities')
     parser.add_argument('--p2_alpha', default=0, type=int, help='Alpha for adaptive P2 Penalty')
     parser.add_argument('--uniqueness', default=-1, type=float, help='Uniqueness ratio')
     parser.add_argument('--skip_diagonal', default=False, action='store_true', help='Do not use diagonal paths')
     parser.add_argument('--num_passes', default=3, type=int, help='Number of passes')
     parser.add_argument('--min_disparity', default=0, type=int, help='Minimum disparity')
     parser.add_argument('--max_disparity', default=256, type=int, help='Maximum disparity')
     parser.add_argument('--output_mode', default=0, type=int, help='0: color; 1: grayscale; 2: raw binary')
     parser.add_argument('-v', '--verbose', default=False, action='store_true', help='Verbose mode')
  
     return parser.parse_args()
  
  
 def main():
     args = process_arguments()
  
     scale = 1 # pixel value scaling factor when loading input
  
     if args.backend == 'cpu':
         backend = vpi.Backend.CPU
     elif args.backend == 'cuda':
         backend = vpi.Backend.CUDA
     elif args.backend == 'pva':
         backend = vpi.Backend.PVA
     elif args.backend == 'ofa':
         backend = vpi.Backend.OFA
     elif args.backend == 'ofa-pva-vic':
         backend = vpi.Backend.OFA|vpi.Backend.PVA|vpi.Backend.VIC
     elif args.backend == 'pva-nvenc-vic':
         backend = vpi.Backend.PVA|vpi.Backend.NVENC|vpi.Backend.VIC
         # For PVA+NVENC+VIC mode, 16bpp input must be MSB-aligned, which
         # is equivalent to say that it is Q8.8 (fixed-point, 8 decimals).
         scale = 256
     else:
         raise ValueError(f'E Invalid backend: {args.backend}')
  
     conftype = None
     if args.conf_type == 'absolute':
         conftype = vpi.ConfidenceType.ABSOLUTE
     elif args.conf_type == 'relative':
         conftype = vpi.ConfidenceType.RELATIVE
     else:
         raise ValueError(f'E Invalid confidence type: {args.conf_type}')
  
     minDisparity = args.min_disparity
     maxDisparity = args.max_disparity
     includeDiagonals = not args.skip_diagonal
     numPasses = args.num_passes
     calcConf = not args.skip_confidence
     downscale = args.downscale
     windowSize = args.window_size
     quality = 6
  
     if args.verbose:
         print(f'I Backend: {backend}\nI Left image: {args.left}\nI Right image: {args.right}\n'
               f'I Disparities (min, max): {(minDisparity, maxDisparity)}\n'
               f'I Input scale factor: {scale}\nI Output downscale factor: {downscale}\n'
               f'I Window size: {windowSize}\nI Quality: {quality}\n'
               f'I Calculate confidence: {calcConf}\nI Confidence threshold: {args.conf_threshold}\n'
               f'I Confidence type: {conftype}\nI Uniqueness ratio: {args.uniqueness}\n'
               f'I Penalty P1: {args.p1}\nI Penalty P2: {args.p2}\nI Adaptive P2 alpha: {args.p2_alpha}\n'
               f'I Include diagonals: {includeDiagonals}\nI Number of passes: {numPasses}\n'
               f'I Output mode: {args.output_mode}\nI Verbose: {args.verbose}\n'
               , end='', flush=True)
  
     if 'raw' in args.left:
         pil_left = read_raw_file(args.left, resize_to=[args.height, args.width], verbose=args.verbose)
     else:
         try:
             pil_left = Image.open(args.left)
         except:
             raise ValueError(f'E Cannot open left input image: {args.left}')
  
     if 'raw' in args.right:
         pil_right = read_raw_file(args.right, resize_to=[args.height, args.width], verbose=args.verbose)
     else:
         try:
             pil_right = Image.open(args.right)
         except:
             raise ValueError(f'E Cannot open right input image: {args.right}')
  
     # Streams for left and right independent pre-processing
     streamLeft = vpi.Stream()
     streamRight = vpi.Stream()
  
     # Load input into a vpi.Image and convert it to grayscale, 16bpp
     with vpi.Backend.CUDA:
         with streamLeft:
             left = vpi.asimage(np.asarray(pil_left)).convert(vpi.Format.Y16_ER, scale=scale)
         with streamRight:
             right = vpi.asimage(np.asarray(pil_right)).convert(vpi.Format.Y16_ER, scale=scale)
  
     # Preprocess input
     # Block linear format is needed for pva-nvenc-vic pipeline and ofa backends
     # Currently we can only convert to block-linear using VIC backend.
     # The input also must be 1080p for pva-nvenc-vic backend.
     if args.backend in {'pva-nvenc-vic', 'ofa-pva-vic', 'ofa'}:
         if args.verbose:
             print(f'W {args.backend} forces to convert input images to block linear', flush=True)
         with vpi.Backend.VIC:
             with streamLeft:
                 left = left.convert(vpi.Format.Y16_ER_BL)
             with streamRight:
                 right = right.convert(vpi.Format.Y16_ER_BL)
         if args.backend == 'pva-nvenc-vic':
             if left.size[0] != 1920 or left.size[1] != 1080:
                 raise ValueError(f'E {args.backend} requires input to be 1920x1080')
  
     if args.verbose:
         print(f'I Input left image: {left.size} {left.format}\n'
               f'I Input right image: {right.size} {right.format}', flush=True)
  
     confidenceU16 = None
  
     if args.backend == 'pva-nvenc-vic':
         if args.verbose:
             print(f'W {args.backend} forces to calculate confidence', flush=True)
         calcConf = True
  
     if calcConf:
         if args.backend not in {'cuda', 'ofa-pva-vic', 'pva-nvenc-vic'}:
             # Only CUDA, OFA-PVA-VIC and PVA-NVENC-VIC have confidence map
             calcConf = False
             if args.verbose:
                 print(f'W {args.backend} does not allow to calculate confidence', flush=True)
  
     if calcConf:
         if args.backend == 'pva-nvenc-vic':
             # PVA-NVENC-VIC only supports 1/4 of the input size
             downscale = 4
             if args.verbose:
                 print(f'W {args.backend} forces downscale to {downscale}', flush=True)
  
     outWidth = (left.size[0] + downscale - 1) // downscale
     outHeight = (left.size[1] + downscale - 1) // downscale
  
     if calcConf:
         confidenceU16 = vpi.Image((outWidth, outHeight), vpi.Format.U16)
  
     # Use stream left to consolidate actual stereo processing
     streamStereo = streamLeft
  
     if args.backend in {'pva', 'cpu'}:
         maxDisparity = 64
         if args.verbose:
             print(f'W {args.backend} forces maxDisparity to {maxDisparity}', flush=True)
     elif args.backend == 'ofa-pva-vic' and maxDisparity not in {128, 256}:
         maxDisparity = 256
         if args.verbose:
             print(f'W {args.backend} forces maxDisparity to {maxDisparity}', flush=True)
  
     if args.verbose:
         if 'ofa' not in args.backend:
             print('W Ignoring P2 alpha and number of passes since not an OFA backend', flush=True)
         if args.backend != 'cuda':
             print('W Ignoring uniqueness since not a CUDA backend', flush=True)
         print('I Estimating stereo disparity ... ', end='', flush=True)
  
     # Estimate stereo disparity.
     with streamStereo, backend:
         disparityS16 = vpi.stereodisp(left, right, downscale=downscale, out_confmap=confidenceU16,
                                       window=windowSize, maxdisp=maxDisparity, confthreshold=args.conf_threshold,
                                       quality=quality, conftype=conftype, mindisp=minDisparity,
                                       p1=args.p1, p2=args.p2, p2alpha=args.p2_alpha, uniqueness=args.uniqueness,
                                       includediagonals=includeDiagonals, numpasses=numPasses)
  
     if args.verbose:
         print('done!\nI Post-processing ... ', end='', flush=True)
  
     # Postprocess results and save them to disk
     with streamStereo, vpi.Backend.CUDA:
         # Some backends outputs disparities in block-linear format, we must convert them to
         # pitch-linear for consistency with other backends.
         if disparityS16.format == vpi.Format.S16_BL:
             disparityS16 = disparityS16.convert(vpi.Format.S16, backend=vpi.Backend.VIC)
  
         # Scale disparity and confidence map so that values like between 0 and 255.
  
         # Disparities are in Q10.5 format, so to map it to float, it gets
         # divided by 32. Then the resulting disparity range, from 0 to
         # stereo.maxDisparity gets mapped to 0-255 for proper output.
         # Copy disparity values back to the CPU.
         disparityU8 = disparityS16.convert(vpi.Format.U8, scale=255.0/(32*maxDisparity)).cpu()
  
         # Apply JET colormap to turn the disparities into color, reddish hues
         # represent objects closer to the camera, blueish are farther away.
         disparityColor = cv2.applyColorMap(disparityU8, cv2.COLORMAP_JET)
  
         # Converts to RGB for output with PIL.
         disparityColor = cv2.cvtColor(disparityColor, cv2.COLOR_BGR2RGB)
  
         if calcConf:
             confidenceU8 = confidenceU16.convert(vpi.Format.U8, scale=255.0/65535).cpu()
  
             # When pixel confidence is 0, its color in the disparity is black.
             mask = cv2.threshold(confidenceU8, 1, 255, cv2.THRESH_BINARY)[1]
             mask = cv2.cvtColor(mask, cv2.COLOR_GRAY2BGR)
             disparityColor = cv2.bitwise_and(disparityColor, mask)
  
     fext = '.raw' if args.output_mode == 2 else '.png'
  
     disparity_fname = f'disparity_python{sys.version_info[0]}_{args.backend}' + fext
     confidence_fname = f'confidence_python{sys.version_info[0]}_{args.backend}' + fext
  
     if args.verbose:
         print(f'done!\nI Disparity output: {disparity_fname}', flush=True)
         if calcConf:
             print(f'I Confidence output: {confidence_fname}', flush=True)
  
     # Save results to disk.
     try:
         if args.output_mode == 0:
             Image.fromarray(disparityColor).save(disparity_fname)
             if args.verbose:
                 print(f'I Output disparity image: {disparityColor.shape} '
                       f'{disparityColor.dtype}', flush=True)
         elif args.output_mode == 1:
             Image.fromarray(disparityU8).save(disparity_fname)
             if args.verbose:
                 print(f'I Output disparity image: {disparityU8.shape} '
                       f'{disparityU8.dtype}', flush=True)
         elif args.output_mode == 2:
             disparityS16.cpu().tofile(disparity_fname)
             if args.verbose:
                 print(f'I Output disparity image: {disparityS16.size} '
                       f'{disparityS16.format}', flush=True)
  
         if calcConf:
             if args.output_mode == 0 or args.output_mode == 1:
                 Image.fromarray(confidenceU8).save(confidence_fname)
                 if args.verbose:
                     print(f'I Output confidence image: {confidenceU8.shape} '
                           f'{confidenceU8.dtype}', flush=True)
             else:
                 confidenceU16.cpu().tofile(confidence_fname)
                 if args.verbose:
                     print(f'I Output confidence image: {confidenceU16.size} '
                           f'{confidenceU16.format}', flush=True)
  
     except:
         raise ValueError(f'E Cannot write outputs: {disparity_fname}, {confidence_fname}\n'
                          f'E Using output mode: {args.output_mode}')
  
  
 if __name__ == '__main__':
     main()

 #include <opencv2/core/version.hpp>
 #if CV_MAJOR_VERSION >= 3
 #    include <opencv2/imgcodecs.hpp>
 #else
 #    include <opencv2/contrib/contrib.hpp> // for colormap
 #    include <opencv2/highgui/highgui.hpp>
 #endif
  
 #include <opencv2/imgproc/imgproc.hpp>
 #include <vpi/OpenCVInterop.hpp>
  
 #include <vpi/Image.h>
 #include <vpi/Status.h>
 #include <vpi/Stream.h>
 #include <vpi/algo/ConvertImageFormat.h>
 #include <vpi/algo/Rescale.h>
 #include <vpi/algo/StereoDisparity.h>
  
 #include <cstring> // for memset
 #include <iostream>
 #include <sstream>
  
 #define CHECK_STATUS(STMT)                                    \
     do                                                        \
     {                                                         \
         VPIStatus status = (STMT);                            \
         if (status != VPI_SUCCESS)                            \
         {                                                     \
             char buffer[VPI_MAX_STATUS_MESSAGE_LENGTH];       \
             vpiGetLastStatusMessage(buffer, sizeof(buffer));  \
             std::ostringstream ss;                            \
             ss << vpiStatusGetName(status) << ": " << buffer; \
             throw std::runtime_error(ss.str());               \
         }                                                     \
     } while (0);
  
 int main(int argc, char *argv[])
 {
     // OpenCV image that will be wrapped by a VPIImage.
     // Define it here so that it's destroyed *after* wrapper is destroyed
     cv::Mat cvImageLeft, cvImageRight;
  
     // VPI objects that will be used
     VPIImage inLeft        = NULL;
     VPIImage inRight       = NULL;
     VPIImage tmpLeft       = NULL;
     VPIImage tmpRight      = NULL;
     VPIImage stereoLeft    = NULL;
     VPIImage stereoRight   = NULL;
     VPIImage disparity     = NULL;
     VPIImage confidenceMap = NULL;
     VPIStream stream       = NULL;
     VPIPayload stereo      = NULL;
  
     int retval = 0;
  
     try
     {
         // =============================
         // Parse command line parameters
  
         if (argc != 4)
         {
             throw std::runtime_error(std::string("Usage: ") + argv[0] +
                                      " <cpu|pva|cuda|pva-nvenc-vic|ofa|ofa-pva-vic> <left image> <right image>\nNote: "
                                      "For pva-nvenc-vic backend the left_image's size must be 1920x1080");
         }
  
         std::string strBackend       = argv[1];
         std::string strLeftFileName  = argv[2];
         std::string strRightFileName = argv[3];
  
         uint64_t backends;
  
         if (strBackend == "cpu")
         {
             backends = VPI_BACKEND_CPU;
         }
         else if (strBackend == "cuda")
         {
             backends = VPI_BACKEND_CUDA;
         }
         else if (strBackend == "pva")
         {
             backends = VPI_BACKEND_PVA;
         }
         else if (strBackend == "pva-nvenc-vic")
         {
             backends = VPI_BACKEND_PVA | VPI_BACKEND_NVENC | VPI_BACKEND_VIC;
         }
         else if (strBackend == "ofa")
         {
             backends = VPI_BACKEND_OFA;
         }
         else if (strBackend == "ofa-pva-vic")
         {
             backends = VPI_BACKEND_OFA | VPI_BACKEND_PVA | VPI_BACKEND_VIC;
         }
         else
         {
             throw std::runtime_error(
                 "Backend '" + strBackend +
                 "' not recognized, it must be either cpu, cuda, pva, ofa, ofa-pva-vic or pva-nvenc-vic.");
         }
  
         // =====================
         // Load the input images
         cvImageLeft = cv::imread(strLeftFileName);
         if (cvImageLeft.empty())
         {
             throw std::runtime_error("Can't open '" + strLeftFileName + "'");
         }
  
         cvImageRight = cv::imread(strRightFileName);
         if (cvImageRight.empty())
         {
             throw std::runtime_error("Can't open '" + strRightFileName + "'");
         }
  
         // =================================
         // Allocate all VPI resources needed
  
         int32_t inputWidth  = cvImageLeft.cols;
         int32_t inputHeight = cvImageLeft.rows;
  
         // Create the stream that will be used for processing.
         CHECK_STATUS(vpiStreamCreate(0, &stream));
  
         // We now wrap the loaded images into a VPIImage object to be used by VPI.
         // VPI won't make a copy of it, so the original image must be in scope at all times.
         CHECK_STATUS(vpiImageCreateWrapperOpenCVMat(cvImageLeft, 0, &inLeft));
         CHECK_STATUS(vpiImageCreateWrapperOpenCVMat(cvImageRight, 0, &inRight));
  
         // Format conversion parameters needed for input pre-processing
         VPIConvertImageFormatParams convParams;
         CHECK_STATUS(vpiInitConvertImageFormatParams(&convParams));
  
         // Set algorithm parameters to be used. Only values what differs from defaults will be overwritten.
         VPIStereoDisparityEstimatorCreationParams stereoParams;
         CHECK_STATUS(vpiInitStereoDisparityEstimatorCreationParams(&stereoParams));
  
         // Default format and size for inputs and outputs
         VPIImageFormat stereoFormat    = VPI_IMAGE_FORMAT_Y16_ER;
         VPIImageFormat disparityFormat = VPI_IMAGE_FORMAT_S16;
  
         int stereoWidth  = inputWidth;
         int stereoHeight = inputHeight;
         int outputWidth  = inputWidth;
         int outputHeight = inputHeight;
  
         // Override some backend-dependent parameters
         if (strBackend == "pva-nvenc-vic")
         {
             // Input and output width and height has to be 1920x1080 in block-linear format for pva-nvenc-vic pipeline
             stereoFormat = VPI_IMAGE_FORMAT_Y16_ER_BL;
             stereoWidth  = 1920;
             stereoHeight = 1080;
  
             // For PVA+NVENC+VIC mode, 16bpp input must be MSB-aligned, which
             // is equivalent to say that it is Q8.8 (fixed-point, 8 decimals).
             convParams.scale = 256;
  
             // Maximum disparity is fixed to 256.
             stereoParams.maxDisparity = 256;
  
             // pva-nvenc-vic pipeline only supports downscaleFactor = 4
             stereoParams.downscaleFactor = 4;
             outputWidth                  = stereoWidth / stereoParams.downscaleFactor;
             outputHeight                 = stereoHeight / stereoParams.downscaleFactor;
         }
         else if (strBackend.find("ofa") != std::string::npos)
         {
             // Implementations using OFA require BL input
             stereoFormat = VPI_IMAGE_FORMAT_Y16_ER_BL;
  
             if (strBackend == "ofa")
             {
                 disparityFormat = VPI_IMAGE_FORMAT_S16_BL;
             }
  
             // Output width including downscaleFactor must be at least max(64, maxDisparity/downscaleFactor) when OFA+PVA+VIC are used
             if (strBackend.find("pva") != std::string::npos)
             {
                 int downscaledWidth = (inputWidth + stereoParams.downscaleFactor - 1) / stereoParams.downscaleFactor;
                 int minWidth = std::max(stereoParams.maxDisparity / stereoParams.downscaleFactor, downscaledWidth);
                 outputWidth  = std::max(64, minWidth);
                 outputHeight = (inputHeight * stereoWidth) / inputWidth;
                 stereoWidth  = outputWidth * stereoParams.downscaleFactor;
                 stereoHeight = outputHeight * stereoParams.downscaleFactor;
             }
  
             // Maximum disparity can be either 128 or 256
             stereoParams.maxDisparity = 128;
         }
         else if (strBackend == "pva")
         {
             // PVA requires that input and output resolution is 480x270
             stereoWidth = outputWidth = 480;
             stereoHeight = outputHeight = 270;
  
             // maxDisparity must be 64
             stereoParams.maxDisparity = 64;
         }
  
         // Create the payload for Stereo Disparity algorithm.
         // Payload is created before the image objects so that non-supported backends can be trapped with an error.
         CHECK_STATUS(vpiCreateStereoDisparityEstimator(backends, stereoWidth, stereoHeight, stereoFormat, &stereoParams,
                                                        &stereo));
  
         // Create the image where the disparity map will be stored.
         CHECK_STATUS(vpiImageCreate(outputWidth, outputHeight, disparityFormat, 0, &disparity));
  
         // Create the input stereo images
         CHECK_STATUS(vpiImageCreate(stereoWidth, stereoHeight, stereoFormat, 0, &stereoLeft));
         CHECK_STATUS(vpiImageCreate(stereoWidth, stereoHeight, stereoFormat, 0, &stereoRight));
  
         // Create some temporary images, and the confidence image if the backend can support it
         if (strBackend == "pva-nvenc-vic")
         {
             // Need an temporary image to convert BGR8 input from OpenCV into pixel-linear 16bpp grayscale.
             // We can't convert it directly to block-linear since CUDA backend doesn't support it, and
             // VIC backend doesn't support BGR8 inputs.
             CHECK_STATUS(vpiImageCreate(inputWidth, inputHeight, VPI_IMAGE_FORMAT_Y16_ER, 0, &tmpLeft));
             CHECK_STATUS(vpiImageCreate(inputWidth, inputHeight, VPI_IMAGE_FORMAT_Y16_ER, 0, &tmpRight));
  
             // confidence map is needed for pva-nvenc-vic pipeline
             CHECK_STATUS(vpiImageCreate(outputWidth, outputHeight, VPI_IMAGE_FORMAT_U16, 0, &confidenceMap));
         }
         else if (strBackend.find("ofa") != std::string::npos)
         {
             // OFA also needs a temporary buffer for format conversion
             CHECK_STATUS(vpiImageCreate(inputWidth, inputHeight, VPI_IMAGE_FORMAT_Y16_ER, 0, &tmpLeft));
             CHECK_STATUS(vpiImageCreate(inputWidth, inputHeight, VPI_IMAGE_FORMAT_Y16_ER, 0, &tmpRight));
  
             if (strBackend.find("pva") != std::string::npos)
             {
                 // confidence map is supported by OFA+PVA
                 CHECK_STATUS(vpiImageCreate(outputWidth, outputHeight, VPI_IMAGE_FORMAT_U16, 0, &confidenceMap));
             }
         }
         else if (strBackend == "pva")
         {
             // PVA also needs a temporary buffer for format conversion and rescaling
             CHECK_STATUS(vpiImageCreate(inputWidth, inputHeight, stereoFormat, 0, &tmpLeft));
             CHECK_STATUS(vpiImageCreate(inputWidth, inputHeight, stereoFormat, 0, &tmpRight));
         }
         else if (strBackend == "cuda")
         {
             CHECK_STATUS(vpiImageCreate(inputWidth, inputHeight, VPI_IMAGE_FORMAT_U16, 0, &confidenceMap));
         }
  
         // ================
         // Processing stage
  
         // -----------------
         // Pre-process input
         if (strBackend == "pva-nvenc-vic" || strBackend == "pva" || strBackend == "ofa" || strBackend == "ofa-pva-vic")
         {
             // Convert opencv input to temporary grayscale format using CUDA
             CHECK_STATUS(vpiSubmitConvertImageFormat(stream, VPI_BACKEND_CUDA, inLeft, tmpLeft, &convParams));
             CHECK_STATUS(vpiSubmitConvertImageFormat(stream, VPI_BACKEND_CUDA, inRight, tmpRight, &convParams));
  
             // Do both scale and final image format conversion on VIC.
             CHECK_STATUS(
                 vpiSubmitRescale(stream, VPI_BACKEND_VIC, tmpLeft, stereoLeft, VPI_INTERP_LINEAR, VPI_BORDER_CLAMP, 0));
             CHECK_STATUS(vpiSubmitRescale(stream, VPI_BACKEND_VIC, tmpRight, stereoRight, VPI_INTERP_LINEAR,
                                           VPI_BORDER_CLAMP, 0));
         }
         else
         {
             // Convert opencv input to grayscale format using CUDA
             CHECK_STATUS(vpiSubmitConvertImageFormat(stream, VPI_BACKEND_CUDA, inLeft, stereoLeft, &convParams));
             CHECK_STATUS(vpiSubmitConvertImageFormat(stream, VPI_BACKEND_CUDA, inRight, stereoRight, &convParams));
         }
  
         // ------------------------------
         // Do stereo disparity estimation
  
         // Submit it with the input and output images
         CHECK_STATUS(vpiSubmitStereoDisparityEstimator(stream, backends, stereo, stereoLeft, stereoRight, disparity,
                                                        confidenceMap, NULL));
  
         // Wait until the algorithm finishes processing
         CHECK_STATUS(vpiStreamSync(stream));
  
         // ========================================
         // Output pre-processing and saving to disk
         // Lock output to retrieve its data on cpu memory
         VPIImageData data;
         CHECK_STATUS(vpiImageLockData(disparity, VPI_LOCK_READ, VPI_IMAGE_BUFFER_HOST_PITCH_LINEAR, &data));
  
         // Make an OpenCV matrix out of this image
         cv::Mat cvDisparity;
         CHECK_STATUS(vpiImageDataExportOpenCVMat(data, &cvDisparity));
  
         // Scale result and write it to disk. Disparities are in Q10.5 format,
         // so to map it to float, it gets divided by 32. Then the resulting disparity range,
         // from 0 to stereo.maxDisparity gets mapped to 0-255 for proper output.
         cvDisparity.convertTo(cvDisparity, CV_8UC1, 255.0 / (32 * stereoParams.maxDisparity), 0);
  
         // Apply JET colormap to turn the disparities into color, reddish hues
         // represent objects closer to the camera, blueish are farther away.
         cv::Mat cvDisparityColor;
         applyColorMap(cvDisparity, cvDisparityColor, cv::COLORMAP_JET);
  
         // Done handling output, don't forget to unlock it.
         CHECK_STATUS(vpiImageUnlock(disparity));
  
         // If we have a confidence map,
         if (confidenceMap)
         {
             // Write it to disk too.
             //
             VPIImageData data;
             CHECK_STATUS(vpiImageLockData(confidenceMap, VPI_LOCK_READ, VPI_IMAGE_BUFFER_HOST_PITCH_LINEAR, &data));
  
             cv::Mat cvConfidence;
             CHECK_STATUS(vpiImageDataExportOpenCVMat(data, &cvConfidence));
  
             // Confidence map varies from 0 to 65535, we scale it to
             // [0-255].
             cvConfidence.convertTo(cvConfidence, CV_8UC1, 255.0 / 65535, 0);
             imwrite("confidence_" + strBackend + ".png", cvConfidence);
  
             CHECK_STATUS(vpiImageUnlock(confidenceMap));
  
             // When pixel confidence is 0, its color in the disparity
             // output is black.
             cv::Mat cvMask;
             threshold(cvConfidence, cvMask, 1, 255, cv::THRESH_BINARY);
             cvtColor(cvMask, cvMask, cv::COLOR_GRAY2BGR);
             bitwise_and(cvDisparityColor, cvMask, cvDisparityColor);
         }
  
         imwrite("disparity_" + strBackend + ".png", cvDisparityColor);
     }
     catch (std::exception &e)
     {
         std::cerr << e.what() << std::endl;
         retval = 1;
     }
  
     // ========
     // Clean up
  
     // Destroying stream first makes sure that all work submitted to
     // it is finished.
     vpiStreamDestroy(stream);
  
     // Only then we can destroy the other objects, as we're sure they
     // aren't being used anymore.
  
     vpiImageDestroy(inLeft);
     vpiImageDestroy(inRight);
     vpiImageDestroy(tmpLeft);
     vpiImageDestroy(tmpRight);
     vpiImageDestroy(stereoLeft);
     vpiImageDestroy(stereoRight);
     vpiImageDestroy(confidenceMap);
     vpiImageDestroy(disparity);
     vpiPayloadDestroy(stereo);
  
     return retval;
 }

VPI - Vision Programming Interface

2.4 Release

Overview

Instructions

Results

Source Code