Overview

Given a pair of rectified images from a stereo camera, the Stereo Disparity algorithm uses high-quality dense stereo matching to produce an output image of the same resolution as the input with left-right disparity information. This allows for inferring the depth of the scene captured by the left and right images.

Left image	Right image

Disparity map	Confidence map

Implementation

The stereo disparity estimator uses semi-global matching algorithm to compute the disparity. We deviate from the original algorithm by using as cost function the hamming distance of the census transforms of the stereo pair.

C API functions

For list of limitations, constraints and backends that implements the algorithm, consult reference documentation of the following functions:

Function	Description
vpiInitStereoDisparityEstimatorCreationParams	Initializes VPIStereoDisparityEstimatorCreationParams with default values.
vpiCreateStereoDisparityEstimator	Creates payload for vpiSubmitStereoDisparityEstimator.
vpiInitStereoDisparityEstimatorParams	Initializes VPIStereoDisparityEstimatorParams with default values.
vpiSubmitStereoDisparityEstimator	Runs stereo processing on a pair of images and outputs a disparity map.

Usage

Language: C/C++ Python

Import VPI module
import vpi
Estimate the disparity between left and right input VPI images, using 5x5 window and 64 maximum disparity. Optionally, the resulting disparity image is converted to U8, with range [0,255], suited for display.
with vpi.Backend.CUDA:

output = vpi.stereodisp(left, right, window=5, maxdisp=64) \

.convert(vpi.Format.U8, scale=1.0/(32*64)*255)

Initialization phase
1. Include the header that defines the needed functions and structures. ColorImageFormat algorithm will be needed to process the disparity output for display.
  #include <vpi/algo/ConvertImageFormat.h>
  
  #include <vpi/algo/StereoDisparity.h>
  
  ConvertImageFormat.h
  Declares functions that handle image format conversion.
  
  StereoDisparity.h
  Declares functions that implement stereo disparity estimation algorithms.
2. Define the input rectified stereo pair.
  VPIImage left = /*...*/;
  
  VPIImage right = /*...*/;
  
  VPIImage
  struct VPIImageImpl * VPIImage
  A handle to an image.
  Definition: Types.h:256
3. Create the output disparity and confidence images, and the image used for disparity display (optional).
  int32_t w, h;
  
  vpiImageGetSize(left, &w, &h);
  
  VPIImage disparity;
  
  vpiImageCreate(w, h, VPI_IMAGE_FORMAT_S16, 0, &disparity);
  
  VPIImage confidence;
  
  vpiImageCreate(w, h, VPI_IMAGE_FORMAT_U16, 0, &confidence);
  
  VPIImage display;
  
  vpiImageCreate(w, h, VPI_IMAGE_FORMAT_U8, 0, &display);
  
  VPI_IMAGE_FORMAT_U16
  #define VPI_IMAGE_FORMAT_U16
  Single plane with one 16-bit unsigned integer channel.
  Definition: ImageFormat.h:111
  
  VPI_IMAGE_FORMAT_S16
  #define VPI_IMAGE_FORMAT_S16
  Single plane with one 16-bit signed integer channel.
  Definition: ImageFormat.h:120
  
  VPI_IMAGE_FORMAT_U8
  #define VPI_IMAGE_FORMAT_U8
  Single plane with one 8-bit unsigned integer channel.
  Definition: ImageFormat.h:100
  
  vpiImageCreate
  VPIStatus vpiImageCreate(int32_t width, int32_t height, VPIImageFormat fmt, uint64_t flags, VPIImage *img)
  Create an empty image instance with the specified flags.
  
  vpiImageGetSize
  VPIStatus vpiImageGetSize(VPIImage img, int32_t *width, int32_t *height)
  Get the image dimensions in pixels.
4. Create the payload that will contain all temporary buffers needed for processing. It'll be created on the CUDA backend.
  VPIPayload stereo;
  
  vpiCreateStereoDisparityEstimator(VPI_BACKEND_CUDA, 480, 270, VPI_IMAGE_FORMAT_U16, NULL, &stereo);
  
  VPIPayload
  struct VPIPayloadImpl * VPIPayload
  A handle to an algorithm payload.
  Definition: Types.h:268
  
  vpiCreateStereoDisparityEstimator
  VPIStatus vpiCreateStereoDisparityEstimator(uint64_t backends, int32_t imageWidth, int32_t imageHeight, VPIImageFormat inputFormat, const VPIStereoDisparityEstimatorCreationParams *params, VPIPayload *payload)
  Creates payload for vpiSubmitStereoDisparityEstimator.
  
  VPI_BACKEND_CUDA
  @ VPI_BACKEND_CUDA
  CUDA backend.
  Definition: Types.h:93
5. Create the stream where the algorithm will be submitted for execution.
  VPIStream stream;
  
  vpiStreamCreate(0, &stream);
  
  VPIStream
  struct VPIStreamImpl * VPIStream
  A handle to a stream.
  Definition: Types.h:250
  
  vpiStreamCreate
  VPIStatus vpiStreamCreate(uint64_t flags, VPIStream *stream)
  Create a stream instance.
Processing phase
1. Define the configuration parameters needed for algorithm execution.
  VPIStereoDisparityEstimatorParams params;
  
  vpiInitStereoDisparityEstimatorParams(&params);
  
  params.windowSize = 5;
  
  params.maxDisparity = 64;
  
  VPIStereoDisparityEstimatorParams::windowSize
  int32_t windowSize
  Represents the median filter size on OFA+PVA+VIC backend or census transform window size (other backe...
  Definition: StereoDisparity.h:227
  
  VPIStereoDisparityEstimatorParams::maxDisparity
  int32_t maxDisparity
  Maximum disparity for matching search.
  Definition: StereoDisparity.h:232
  
  vpiInitStereoDisparityEstimatorParams
  VPIStatus vpiInitStereoDisparityEstimatorParams(VPIStereoDisparityEstimatorParams *params)
  Initializes VPIStereoDisparityEstimatorParams with default values.
  
  VPIStereoDisparityEstimatorParams
  Structure that defines the parameters for vpiSubmitStereoDisparityEstimator.
  Definition: StereoDisparity.h:221
2. Submit the payload for execution on the backend associated with it.
  vpiSubmitStereoDisparityEstimator(stream, VPI_BACKEND_CUDA, stereo, left, right, disparity, confidence, &params);
  
  vpiSubmitStereoDisparityEstimator
  VPIStatus vpiSubmitStereoDisparityEstimator(VPIStream stream, uint64_t backend, VPIPayload payload, VPIImage left, VPIImage right, VPIImage disparity, VPIImage confidenceMap, const VPIStereoDisparityEstimatorParams *params)
  Runs stereo processing on a pair of images and outputs a disparity map.
3. Optionally, the resulting disparity image can be converted to U8 format and the disparity values are rescaled to fit in [0,255] range, suited for display.
  VPIConvertImageFormatParams cvtParams;
  
  vpiInitConvertImageFormatParams(&cvtParams);
  
  cvtParams.scale = 1.0f / (32 * params.maxDisparity) * 255;
  
  vpiSubmitConvertImageFormat(stream, VPI_BACKEND_CUDA, disparity, display, &cvtParams);
  
  VPIConvertImageFormatParams::scale
  float scale
  Scaling factor.
  Definition: ConvertImageFormat.h:94
  
  vpiInitConvertImageFormatParams
  VPIStatus vpiInitConvertImageFormatParams(VPIConvertImageFormatParams *params)
  Initialize VPIConvertImageFormatParams with default values.
  
  vpiSubmitConvertImageFormat
  VPIStatus vpiSubmitConvertImageFormat(VPIStream stream, uint64_t backend, VPIImage input, VPIImage output, const VPIConvertImageFormatParams *params)
  Converts the image contents to the desired format, with optional scaling and offset.
  
  VPIConvertImageFormatParams
  Parameters for customizing image format conversion.
  Definition: ConvertImageFormat.h:86
4. Wait until the processing is done.
  vpiStreamSync(stream);
  
  vpiStreamSync
  VPIStatus vpiStreamSync(VPIStream stream)
  Blocks the calling thread until all submitted commands in this stream queue are done (queue is empty)...
Cleanup phase
1. Free resources held by the stream, the payload and the input and ouput images.
  vpiStreamDestroy(stream);
  
  vpiPayloadDestroy(stereo);
  
  vpiImageDestroy(left);
  
  vpiImageDestroy(right);
  
  vpiImageDestroy(display);
  
  vpiImageDestroy(disparity);
  
  vpiImageDestroy(confidence);
  
  vpiImageDestroy
  void vpiImageDestroy(VPIImage img)
  Destroy an image instance.
  
  vpiPayloadDestroy
  void vpiPayloadDestroy(VPIPayload payload)
  Deallocates the payload object and all associated resources.
  
  vpiStreamDestroy
  void vpiStreamDestroy(VPIStream stream)
  Destroy a stream instance and deallocate all HW resources.

Consult the Stereo Disparity Sample for a complete example.

For more information, see Stereo Disparity Estimator in the "C API Reference" section of VPI - Vision Programming Interface.

Performance

For information on how to use the performance table below, see Algorithm Performance Tables.
Before comparing measurements, consult Comparing Algorithm Elapsed Times.
For further information on how performance was benchmarked, see Performance Benchmark.

References

Hirschmüller, Heiko (2005). "Accurate and efficient stereo processing by semi-global matching and mutual information".
IEEE Conference on Computer Vision and Pattern Recognition. pp. 807–814.
Zabih, Ramin; Woodfill, John (1994). "Non-parametric local transforms for computing visual correspondence".
European conference on computer vision. pp. 151–158.