Reconstruct Scenes from Stereo Camera Data#

Overview#

The stereo camera reference workflow provides an end-to-end pipeline for converting real-world sensor data from stereo RGB cameras into physically realistic digital assets usable in NVIDIA Isaac Sim. This professional reference workflow is designed for production robotics applications and automates pose estimation, depth estimation, scene reconstruction, and physically based rendering asset generation.

This workflow was used to generate the assets available in the NVIDIA PhysicalAI Robotics NuRec Dataset.

The pipeline integrates several NVIDIA technologies:

Isaac ROS for data processing and conversion
cuSFM (PyCuSFM) for pose estimation
FoundationStereo for depth estimation
nvblox for mesh generation and occupancy mapping
3DGURT for neural reconstruction
USD/USDZ export for Isaac Sim integration

Prerequisites#

Hardware Requirements#

Stereo RGB camera system
Linux workstation with:
- CUDA 11.8
- GCC ≤ 11
- NVIDIA GPU (RTX recommended)
- Docker support

Software Requirements#

Isaac ROS development environment
ROS 2 Humble
Docker
Isaac Sim 5.0 or later

Raw Data Requirements#

The stereo camera pipeline supports two types of input data:

ROS Bag Format#

Recorded by the user, which can be converted to cuSFM format.

Requirements:

Stereo RGB camera required
camera_info topic needed for camera intrinsics
/tf and /tf_static needed for camera extrinsics

Example ROS bags:

https://catalog.ngc.nvidia.com/orgs/nvidia/teams/isaac/resources/r2bdataset2024

cuSFM Format#

Pre-processed data in cuSFM format. See the PyCuSFM Tutorial on Raw Data Requirements for detailed format specifications.

Components:

Image files: Camera images in supported formats (JPEG, PNG, etc.)
frames_meta.json: Metadata file following the KeyframesMetadataCollection protobuf format

Example cuSFM data:

nvidia-isaac/pyCuSFM

Expected Output Structure#

After running all modules in the stereo camera pipeline, the expected output directory structure is:

workspace/
├── rosbag_mapping_data/        # Raw data extracted from ROS bag (cuSFM format)
│   ├── <camera_name>/          # e.g., front_stereo_camera_left
│   │   └── xxxx.jpeg
│   ├── frames_meta.json        # Frame metadata (timestamps, poses, etc.)
│   └── stereo.edex             # Stereo calibration / extrinsics data
│
├── cusfm/                      # cuSFM reconstruction outputs
│   ├── cuvgl_map/
│   ├── cuvslam_output/
│   ├── keyframes/
│   ├── kpmap/
│   ├── matches/
│   ├── output_poses/
│   ├── pose_graph/
│   └── sparse/                 # Sparse reconstruction
│       ├── cameras.txt
│       ├── images.txt
│       └── points3D.txt
│
├── nvblox_mesh/                # Mesh and occupancy map outputs
│   ├── nvblox_mesh.ply         # Reconstructed mesh
│   ├── occupancy_map.png       # 2D occupancy map visualization
│   └── occupancy_map.yaml      # Occupancy map metadata
│
├── depth/                      # Depth maps from FoundationStereo
│   ├── scaled_0_4/             # Scaled depth maps (0.4x downsampled)
│   │   └── <camera_name>/
│   │       └── xxxx.png
│   └── original_size/          # Original resolution depth maps
│       └── <camera_name>/
│           └── xxxx.png
│
└── 3dgurt/                     # 3DGURT training outputs and final assets
    ├── ckpt_last.pt            # Final checkpoint
    ├── export_last.inpg        # Intermediate representation
    ├── parsed.yaml             # Training configuration
    ├── ours_xxxxx/             # e.g., ours_30000 (iteration count)
    └── export_last.usdz        # Final USDZ asset for Isaac Sim

Note

Folder names such as rosbag_mapping_data, cusfm, nvblox_mesh, depth, or 3dgurt may vary depending on your configuration.

Workflow Steps#

Step 1: Setup Isaac ROS#

Isaac ROS provides NVIDIA CUDA-accelerated computing packages and AI models designed to streamline robotics development. It offers ready-to-use packages for navigation, perception, and other common tasks.

Development Environment Setup#

Follow the instructions in the Isaac ROS Getting Started Guide.

Clone Isaac ROS Common#

$ cd ${ISAAC_ROS_WS}/src
$ git clone -b release-3.2 https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_common.git

Add Data Volume to Docker Container#

Add your data volume to the Docker container by modifying the .isaac_ros_dev-dockerargs file. For example, to mount the host directory /mnt/nova_ssd/recordings:

$ echo -e '-v /mnt/nova_ssd/recordings:/mnt/nova_ssd/recordings' \
  >> ${ISAAC_ROS_WS}/src/isaac_ros_common/scripts/.isaac_ros_dev-dockerargs

Configure for x86_64 Platform#

For an x86_64 platform with RTX GPU running Ubuntu and Docker:

$ cd ${ISAAC_ROS_WS}/src/isaac_ros_common/scripts/
$ echo -e "CONFIG_IMAGE_KEY=ros2_humble\nCONFIG_DOCKER_SEARCH_DIRS=(../docker)" > .isaac_ros_common-config

Key Isaac ROS Tools#

In the stereo camera pipeline, we use two key tools from Isaac ROS:

rosbag_to_mapping_data: Converts ROS bags into the cuSFM format, extracting either raw images or keyframes
nvblox: Reconstructs 3D scenes by generating meshes and occupancy maps from depth and RGB data

Step 2: ROS Bag Converter (Optional)#

Note

This step is optional if your input is already in cuSFM format.

This step converts ROS bags to cuSFM format using the Isaac Mapping application rosbag_to_mapping_data.

Launch Docker Container#

$ cd ${ISAAC_ROS_WS}/src/isaac_ros_common
$ ./scripts/run_dev.sh

Install Isaac Mapping ROS Package#

Inside the Docker container:

$ sudo apt-get update && sudo apt-get install -y ros-humble-isaac-mapping-ros

Extract Images from ROS Bag#

$ ros2 run isaac_mapping_ros rosbag_to_mapping_data \
    --sensor_data_bag_file <path_to_ROS_BAG_FILE> \
    --output_folder_path workspace/rosbag_mapping_data \
    --pose_topic_name <pose_topic_name> \
    --image_extension=.jpeg \
    --base_link_name=base_link \
    --generate_edex=True

Command Parameters#

sensor_data_bag_file (required): Path to the ROS bag file
output_folder_path (required): Directory where the converted data will be saved
pose_topic_name (optional): Name of the pose topic to use. Default: /visual_slam/vis/slam_odometry
base_link_name (optional): Name of the base link or vehicle coordinate frame. Default: base_link
generate_edex (optional): Controls whether EDEX files (Enhanced Data Exchange files) are generated. These files speed up downstream processing such as feature extraction, matching, and reconstruction in cuSFM

Expected Output#

rosbag_mapping_data/
├── <camera_name>/          # e.g., front_stereo_camera_left
│   └── xxxx.jpeg
├── frames_meta.json        # Frame metadata (timestamps, poses, etc.)
└── stereo.edex             # Stereo calibration / extrinsics data

Step 3: Pose Estimation with cuSFM#

PyCuSFM is a Python implementation of cuSFM, a CUDA-accelerated structure-from-motion framework for reconstructing accurate 3D environmental structure and estimating precise camera pose.

Quick Start#

$ git clone https://github.com/nvidia-isaac/pyCuSFM
$ cd pycusfm
$ ./run_in_docker.sh --build_docker --install
$ cusfm_cli --input_dir $CONVERTED_DATA_DIR --cusfm_base_dir $CUSFM_OUTPUT_DIR

Command Parameters#

input_dir (required): Path to the converted dataset (e.g., output of ROS bag conversion)
cusfm_base_dir (required): Output directory where cuSFM results will be saved (e.g., keyframes/, sparse/, matches/, pose_graph/)

Expected Output#

cusfm/
├── cuvgl_map/
├── cuvslam_output/
├── keyframes/
├── kpmap/
├── matches/
├── output_poses/
├── pose_graph/
└── sparse/                 # Contains cameras.txt, images.txt, points3D.txt

For complete installation and usage documentation, see the PyCuSFM Repository and PyCuSFM Tutorial.

Step 4: Depth Estimation with FoundationStereo#

In the stereo camera pipeline, a depth map with metric scale is essential. Without additional sensors like LiDAR, depth maps serve as the primary source for generating point clouds used in Gaussian initialization and geometry regularization.

FoundationStereo is a large-scale foundation model for stereo depth estimation that demonstrates strong zero-shot generalization. It processes sequences of stereo RGB image pairs and generates corresponding depth maps at matching resolution.

Run Depth Estimation#

python scripts/run_demo_batch.py \
    --ckpt_dir pretrained_models/23-51-11/model_best_bp2.pth \
    --imgdir workspace/rosbag_mapping_data/ \
    --metadata_file workspace/cusfm/kpmap/keyframes/frames_meta.json \
    --out_dir <path_to_output_dir> \
    --camera <camera_name>

Command Parameters#

ckpt_dir (required): Path to the directory containing the pretrained FoundationStereo checkpoint (e.g., model_best_bp2.pth)
imgdir (required): Path to the folder containing stereo image pairs extracted from the ROS bag (e.g., rosbag_mapping_data/)
metadata_file (required): Path to the cuSFM-formatted frames_meta.json file
out_dir (required): Directory where the generated depth maps will be saved
camera (optional): Camera name to process (e.g., front_stereo_camera_left). If omitted, all cameras defined in frames_meta.json are processed by default

Expected Output#

depth/
├── scaled_0_4/              # Scaled depth maps (normalized, 0.4x downsampled)
│   └── <camera_name>/
│       └── xxxx.png
└── original_size/           # Depth maps at original resolution
    └── <camera_name>/
        └── xxxx.png

The scaled_0_4 folder includes depth maps downsampled by a factor of 0.4. Use the depth maps from the original_size folder for subsequent steps, such as mesh generation.

Note

For more information, see FoundationStereo.

Step 5: Mesh Generation with nvblox#

A mesh is crucial for enabling collision detection, navigation, and assigning physical properties—all essential for realistic simulation. nvblox reconstructs a voxel-based map from dense depth and RGB images, generating both a triangle mesh and an occupancy map. It supports 2D and 3D mapping and is optimized for NVIDIA GPUs and Jetson devices.

Generate Mesh with nvblox#

Launch the container and run the following command to generate an output mesh file named nvblox_mesh.ply:

$ ros2 run isaac_mapping_ros run_nvblox \
    --color_image_dir workspace/rosbag_mapping_data \
    --frames_meta_file workspace/cusfm/kpmap/keyframes/frames_meta.json \
    --depth_image_dir workspace/depth/original_size \
    --save_2d_occupancy_map_path workspace/nvblox_mesh/occupancy_map \
    --mesh_output_path workspace/nvblox_mesh/nvblox_mesh.ply \
    --voxel_size 0.02 \
    --projective_integrator_truncation_distance_vox 8 \
    --projective_integrator_max_weight 20 \
    --esdf_slice_height 0.3

Command Parameters#

color_image_dir (required): Path to the folder containing color (RGB) images extracted from the ROS bag (e.g., rosbag_mapping_data/)
frames_meta_file (required): Path to the frames_meta.json file in cuSFM format that provides timestamps, poses, and camera parameters
depth_image_dir (required): Path to the folder containing corresponding depth maps generated by FoundationStereo
save_2d_occupancy_map_path (required): Output directory where the generated 2D occupancy map will be saved
mesh_output_path (required): File path for saving the reconstructed 3D mesh (e.g., workspace/nvblox_mesh/nvblox_mesh.ply)
voxel_size (optional): Size of each voxel in meters. Smaller values yield finer meshes at higher computational cost
projective_integrator_truncation_distance_vox (optional): Truncation distance (in voxels) for the TSDF integrator, controlling surface precision
projective_integrator_max_weight (optional): Maximum weight for depth fusion; larger values make the surface smoother and more stable
esdf_slice_height (optional): Height (in meters) at which to extract the 2D ESDF/occupancy map from the 3D reconstruction

Expected Output#

nvblox_mesh/
├── nvblox_mesh.ply         # Reconstructed mesh
├── occupancy_map.png       # 2D occupancy / cost map visualization
└── occupancy_map.yaml      # Metadata/config for occupancy map

Step 6: Neural Reconstruction with 3DGURT#

Once the mesh, point cloud, depth maps, poses, and RGB images are ready, we use 3DGURT to train a dense 3D reconstruction and export it to USD for Isaac Sim.

3DGURT implements 3D Gaussian Ray Tracing (3DGRT) and 3D Gaussian Unscented Transform (3DGUT), enabling native support for distorted camera models with time-dependent effects (e.g., fisheye and rolling shutter). It also supports secondary ray tracing, allowing the representation of phenomena such as reflections and refractions within a unified 3D Gaussian framework—resulting in highly realistic scene rendering and reconstruction.

Setup 3DGURT Environment#

Requirements:

Linux system with CUDA 11.8
GCC ≤ 11
NVIDIA GPU

Installation#

The official 3DGURT code is available on GitHub. Follow the installation instructions to set up all required dependencies.

$ git clone --recursive https://github.com/nv-tlabs/3dgrut.git
$ cd 3dgrut
$ chmod +x install_env.sh
$ ./install_env.sh 3dgrut
$ conda activate 3dgrut

For GCC 11 inside Conda environment:

If your system uses GCC 12 or newer (e.g., on Ubuntu 24.04):

$ conda install -c conda-forge gcc=11 gxx=11

Tip

Verify successful installation by running a test reconstruction on one of the sample datasets included in the repository before proceeding with your own data.

Prepare cuSFM Outputs#

Note the path to your cuSFM output directory generated in Step 2. We use the apps/cusfm_3dgut.yaml configuration in this tutorial, but you can also use apps/cusfm_3dgut_mcmc.yaml, which pairs 3DGURT with an MCMC (Markov Chain Monte Carlo) densification strategy. In practice, this approach samples and densifies Gaussians in regions where the reconstruction is uncertain, sharpening thin structures and edges while improving overall fidelity, with only a modest increase in training time compared to the baseline configuration.

Run 3DGURT Training and Export USDZ#

With the environment active, start training by running the provided train.py script with the cuSFM config:

$ conda activate 3dgrut
$ python train.py \
   --config-name apps/cusfm_3dgut.yaml \
   path=workspace/ \
   out_dir=workspace/3dgurt \
   initialization.fused_point_cloud_path=workspace/nvblox_mesh/nvblox_mesh.ply \
   experiment_name=3dgut_mcmc \
   export_usdz.enabled=true \
   export_usdz.apply_normalizing_transform=true

Command Parameters#

config-name (required): Specifies which configuration YAML file to use. apps/cusfm_3dgut_mcmc.yaml enables the MCMC densification strategy
path (required): Path to the cuSFM output directory containing sparse reconstruction data in the sparse/ folder (e.g., cameras.txt, images.txt, and points3D.txt)
out_dir (required): Directory where the 3DGURT training outputs, checkpoints, and exported assets will be saved
experiment_name (required): Custom name for the training run. Used to organize logs, checkpoints, and output folders
initialization.accumulated_point_cloud_path (required): Path to the point cloud or mesh used for Gaussian initialization
export_usdz.enabled (optional): If set to true, exports the reconstructed 3D scene as a USDZ file, which can be directly imported into NVIDIA Isaac Sim
export_usdz.apply_normalizing_transform (optional): If true, applies a simple normalization (centers and scales the scene near the origin). Note: This does not ensure the floor aligns exactly with z = 0; minor translation or rotation in Isaac Sim may still be required for alignment

Training Process#

Once you run the command, 3DGURT will begin training:

Reads your images and cuSFM data
Optimizes a 3D representation of the scene
Training duration depends on scene complexity and GPU performance
Typically ranges from a few minutes for small scenes to several hours for highly detailed ones

Training Output#

When the process completes, you will obtain:

3dgurt/
├── ckpt_last.pt            # Final checkpoint
├── export_last.inpg        # Exported intermediate representation
├── parsed.yaml             # Training configuration
├── ours_xxxxx/             # e.g., ours_30000 (iteration count)
└── export_last.usdz        # Final USDZ asset for Isaac Sim

The flag export_usdz.enabled=true writes out a USDZ file of your reconstructed scene, allowing it to be directly loaded into Isaac Sim.

The flag export_usdz.apply_normalizing_transform=true applies a primitive normalization (centering and scaling the scene near the origin). However, it does not guarantee that the floor aligns exactly at z = 0. In Isaac Sim, you can add a Ground Plane and/or slightly adjust the scene root (translate or rotate) to achieve proper alignment.

Step 7: Using Generated Assets and Sim-Ready Assets in Isaac Sim#

You can import the generated .usdz scene assets into NVIDIA Isaac Sim for robotics simulation. To set up physical interactions by adding assets from the OpenUSD Sim-Ready asset libraries, search for and insert these assets directly into your generated scenes in Isaac Sim.

There are several Sim-Ready asset libraries available for you to choose from:

NVIDIA’s physical AI dataset: Available on Hugging Face
Lightwheel’s extensive library: Available at simready.com
Extwin’s Synthesis Sim-Ready Assets: Available on Hugging Face (non-commercial use only)

Choose any of these Sim-Ready libraries, download the assets you need, and then load them into Isaac Sim for physical interaction with your reconstructed scenes.

Download Sim-Ready Assets:

For example: Open Lightwheel’s library in your browser, search for the asset you want, then download the zip file and extract the USD file.
Load the Generated Scene and Sim-Ready Assets:

To load the generated scene asset along with additional Sim-Ready assets into Isaac Sim, follow the steps below:
1. Drag and drop the .usdz file into Isaac Sim.
2. Select File → Open → (Select File) → Open File from the menu.
To make the assets ready for simulation, you might want to also follow these steps:
- Search for an asset in the search bar (right-hand side asset tree) and drag it into the scene.
- To set mesh collision properties so that your robot can interact with the scene, navigate to and right-click on World → Volume → Mesh → Mesh, then select Add → Physics → Collider / Collider Preset.

For a complete walkthrough, watch the video tutorial by expanding the following section.

Tips and Best Practices#

Hardware Recommendations#

Stereo RGB camera: A stereo RGB camera is required for this workflow. Recommended camera options include:
- ZED 2i - AI-powered stereo camera with neural depth sensing and IP66 protection
- Leopard Imaging Hawk Stereo Cameras - High-performance stereo camera systems
Data capture platform: We recommend an NVIDIA Jetson Orin Developer Kit for data capture along with an Isaac ROS Nova recorder for streamlined data collection and processing.
Camera stabilization: We recommend adding a gimbal with a handheld camera setup to reduce motion blur and improve image quality during data capture.

Data Capture#

Viewpoint diversity: Capture data from various different viewpoints with different pitch angles. This multi-angle coverage significantly improves reconstruction quality and completeness.
Data coverage: More data results in better reconstruction quality. Ensure thorough coverage of the area of interest—the more comprehensive your data collection, the higher the fidelity of the final reconstructed scene.
Stereo camera calibration: Ensure proper calibration of stereo cameras for accurate depth estimation
ROS bag recording: Record all necessary topics (images, camera_info, tf, tf_static)
Lighting conditions: Maintain consistent lighting during capture

Pipeline Optimization#

cuSFM settings: Adjust cuSFM parameters based on scene complexity and size
Depth quality: Monitor FoundationStereo output quality; regenerate if needed
Mesh resolution: Balance voxel size between detail and computational cost
Training iterations: More iterations generally improve quality but increase training time

Isaac Sim Integration#

Collision setup: Always set up collision properties for robot interaction
Ground alignment: Verify and adjust ground plane alignment
Physics properties: Configure material properties for realistic interactions
Lighting: Add supplementary lighting if the reconstructed scene is too dark

Troubleshooting#

ROS Bag Conversion Issues#

Problem: Missing topics in ROS bag

Solutions:

Verify required topics exist: camera_info, /tf, /tf_static
Use ros2 bag info to inspect bag contents
Re-record ROS bag with all necessary topics

cuSFM Reconstruction Issues#

Problem: Poor sparse reconstruction quality

Solutions:

Check input image quality and camera calibration
Ensure stereo baseline is appropriate for scene scale
Verify EDEX files are generated correctly
Adjust cuSFM parameters for better feature matching

FoundationStereo Issues#

Problem: Depth maps have artifacts or missing regions

Solutions:

Verify stereo image pair alignment
Check that frames_meta.json contains correct camera parameters
Ensure sufficient texture in the scene for stereo matching
Try adjusting FoundationStereo inference parameters

nvblox Mesh Generation Issues#

Problem: Mesh has holes or artifacts

Solutions:

Verify depth map quality from FoundationStereo
Adjust voxel_size parameter (smaller for finer detail)
Increase projective_integrator_max_weight for smoother surfaces
Check that RGB and depth images are properly aligned

3DGURT Training Issues#

Problem: Out of GPU memory

Solutions:

Reduce training batch size in configuration
Use a GPU with more memory
Reduce image resolution in earlier pipeline stages

Problem: Training convergence issues

Solutions:

Verify input data quality (images, poses, depth, mesh)
Check that cuSFM sparse reconstruction is accurate
Adjust learning rate or other training parameters
Use MCMC configuration for better quality

Isaac Sim Integration Issues#

Problem: Scene appears at wrong scale or position

Solutions:

Verify normalizing transform was applied during export
Manually adjust scene transform in Isaac Sim
Check coordinate system conventions between pipeline and Isaac Sim

Problem: Robot falls through surfaces

Solutions:

Ensure collision properties are enabled on geometry
Verify ground plane is positioned correctly
Check physics settings in Isaac Sim

Additional Resources#

Documentation

Datasets

Community

Next Steps#

Explore the Mono Camera Reference Workflow for simpler capture scenarios
Learn about 4 Steps to Generate Synthetic Data for Robot Development
Join Physical AI and Robotics Day at NVIDIA GTC