Stereo Depth DNN¶
Isaac provides StereoDNN, a depth estimation algorithm that uses a deep neural network (DNN). The algorithm is based on a deep learning model designed to calculate per-pixel depths from stereo camera footage. StereoDNN estimates disparities (depth maps) from pairs of left and right stereo images, end-to-end.
StereoDNN is trained with the Tensorflow framework. The network can be trained in supervised (by Lidar), unsupervised (photometric) and semi-supervised (photometric with Lidar/depth GT) modes. The trained model is then converted into the TensorRT inference runtime using custom plug-ins for improved efficiency.
The best, or largest model trained in testing scores 3.12% D1 error on the KITTI benchmark (#12 out of all DNNs in KITTI and #4 out of results published in December of 2017). The Isaac SDK also includes a custom inference runtime based on TensorRT and several light-weight models that can run at 10-30 FPS on GPUs, though with reduced accuracy. The fastest model runs at 10 FPS on Jetson TX2 and more than 30 FPS on Titan X.
The DNN is described in the following white paper: https://arxiv.org/abs/1803.09719.
The images below are the result of StereoDNN estimates of depth/disparities with the KITTI 2015 stereo benchmark. The DNN is trained in a semi-supervised way by combining Lidar groundtruth with Photometric loss.
The following images are a comparison of a mono depth DNN approach (Godard et al.) to a StereoDNN results scene. Computed and/or estimated DNN point clouds are in white, and Lidar ground truth is in green. This model has ~10 cm error at 10 m distance and ~30-40 cm error at 30-50 m. This is the best model trained on KITTI. Where StereoDNN captures building and street geometry well, monoculuar DNN misses that information completely.
The following images are examples of StereoDNN disparity maps computed on stereo RGB frames from the KITTI dataset:
The standalone code for the DNN is available on GitHub at the following link:
The goal of this project is to enable inference for NVIDIA Stereo DNN TensorFlow models on Jetson as well as other platforms supported by NVIDIA TensorRT library.
A demo of inference on KITTI dataset can be viewed on YouTube at the following link:
The stereo DNN is wrapped as an Isaac codelet, and is available in the Isaac repository.
The Isaac codelet wrapping StereoDNN takes a left rectified image, a right rectified image, and both the intrinsic and extrinsic calibration of the stereo camera, and generates a depth frame of size 513x257, using the nvstereonet library from GitHub.
Running the Sample Application¶
The stereo_depth_dnn sample application uses a ZED stereo camera. First connect the ZED camera to the host system or the Jetson Xavier platform you are using, and then use one of the following procedure to run the application.
Note that this application only runs on the host system or the Jetson Xavier platform.
To Run the Sample Application on the Host System¶
Run the sample application with the following command:
bazel run //apps/samples/stereo_depth_dnn -- --config apps/samples/stereo_depth_dnn/stereo_depth_dnn.config.x64.json
To Run the Application on Jetson¶
To run the sample appliation on Jetson, first build a package on the host and then deploy it to the Jetson system. Run the following command on the host computer, where <JETSON_IP> is replaced by the IP address of your Jetson system.
bob@desktop:~/isaac$ ./engine/build/deploy.sh -p //apps/samples/stereo_dummy:stereo_dummy-pkg -d jetpack42 -h <JETSON_IP>
Log on to the Jetson system and execute the following commands:
bob@jetson:~/$ cd deploy/bob/stereo_depth bob@jetson:~/deploy/bob/stereo_depth$ ./apps/samples/stereo_depth_dnn/stereo_depth_dnn --config apps/samples/stereo_depth_dnn/stereo_depth_dnn.config.xavier.json
Where “bob” is your user name on the host system.
To View Output from the Application in Websight¶
While the application is running, open Isaac Sight in a browser by navigating to
http://localhost:3000. If running the application on a Jetson platform, make sure to use the
IP address of the Jetson system instead of
In Websight, a window called “color_left” shows the left input image and a window called “depth” shows the depth estimated by the algorithm.