Reconstruct Scenes from Stereo Camera Data#

Overview#

The stereo camera reference workflow provides an end-to-end pipeline for converting real-world sensor data from stereo RGB cameras into physically realistic digital assets usable in NVIDIA Isaac Sim. This professional reference workflow is designed for production robotics applications and automates pose estimation, depth estimation, scene reconstruction, and physically based rendering asset generation.

This workflow was used to generate the assets available in the NVIDIA PhysicalAI Robotics NuRec Dataset.

The pipeline integrates several NVIDIA technologies:

  • Isaac ROS for data processing and conversion

  • cuSFM (PyCuSFM) for pose estimation

  • FoundationStereo for depth estimation

  • nvblox for mesh generation and occupancy mapping

  • 3DGURT for neural reconstruction

  • USD/USDZ export for Isaac Sim integration

Prerequisites#

Hardware Requirements#

  • Stereo RGB camera system

  • Linux workstation with:

    • CUDA 11.8

    • GCC ≤ 11

    • NVIDIA GPU (RTX recommended)

    • Docker support

Software Requirements#

  • Isaac ROS development environment

  • ROS 2 Humble

  • Docker

  • Isaac Sim 5.0 or later

Raw Data Requirements#

The stereo camera pipeline supports two types of input data:

ROS Bag Format#

Recorded by the user, which can be converted to cuSFM format.

Requirements:

  • Stereo RGB camera required

  • camera_info topic needed for camera intrinsics

  • /tf and /tf_static needed for camera extrinsics

Example ROS bags:

https://catalog.ngc.nvidia.com/orgs/nvidia/teams/isaac/resources/r2bdataset2024

cuSFM Format#

Pre-processed data in cuSFM format. See the PyCuSFM Tutorial on Raw Data Requirements for detailed format specifications.

Components:

  • Image files: Camera images in supported formats (JPEG, PNG, etc.)

  • frames_meta.json: Metadata file following the KeyframesMetadataCollection protobuf format

Example cuSFM data:

nvidia-isaac/pyCuSFM

Expected Output Structure#

After running all modules in the stereo camera pipeline, the expected output directory structure is:

workspace/
├── rosbag_mapping_data/        # Raw data extracted from ROS bag (cuSFM format)
│   ├── <camera_name>/          # e.g., front_stereo_camera_left
│   │   └── xxxx.jpeg
│   ├── frames_meta.json        # Frame metadata (timestamps, poses, etc.)
│   └── stereo.edex             # Stereo calibration / extrinsics data
│
├── cusfm/                      # cuSFM reconstruction outputs
│   ├── cuvgl_map/
│   ├── cuvslam_output/
│   ├── keyframes/
│   ├── kpmap/
│   ├── matches/
│   ├── output_poses/
│   ├── pose_graph/
│   └── sparse/                 # Sparse reconstruction
│       ├── cameras.txt
│       ├── images.txt
│       └── points3D.txt
│
├── nvblox_mesh/                # Mesh and occupancy map outputs
│   ├── nvblox_mesh.ply         # Reconstructed mesh
│   ├── occupancy_map.png       # 2D occupancy map visualization
│   └── occupancy_map.yaml      # Occupancy map metadata
│
├── depth/                      # Depth maps from FoundationStereo
│   ├── scaled_0_4/             # Scaled depth maps (0.4x downsampled)
│   │   └── <camera_name>/
│   │       └── xxxx.png
│   └── original_size/          # Original resolution depth maps
│       └── <camera_name>/
│           └── xxxx.png
│
└── 3dgurt/                     # 3DGURT training outputs and final assets
    ├── ckpt_last.pt            # Final checkpoint
    ├── export_last.inpg        # Intermediate representation
    ├── parsed.yaml             # Training configuration
    ├── ours_xxxxx/             # e.g., ours_30000 (iteration count)
    └── export_last.usdz        # Final USDZ asset for Isaac Sim

Note

Folder names such as rosbag_mapping_data, cusfm, nvblox_mesh, depth, or 3dgurt may vary depending on your configuration.

Workflow Steps#

Step 1: Setup Isaac ROS#

Isaac ROS provides NVIDIA CUDA-accelerated computing packages and AI models designed to streamline robotics development. It offers ready-to-use packages for navigation, perception, and other common tasks.

Development Environment Setup#

Follow the instructions in the Isaac ROS Getting Started Guide.

Clone Isaac ROS Common#

$ cd ${ISAAC_ROS_WS}/src
$ git clone -b release-3.2 https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_common.git

Add Data Volume to Docker Container#

Add your data volume to the Docker container by modifying the .isaac_ros_dev-dockerargs file. For example, to mount the host directory /mnt/nova_ssd/recordings:

$ echo -e '-v /mnt/nova_ssd/recordings:/mnt/nova_ssd/recordings' \
  >> ${ISAAC_ROS_WS}/src/isaac_ros_common/scripts/.isaac_ros_dev-dockerargs

Configure for x86_64 Platform#

For an x86_64 platform with RTX GPU running Ubuntu and Docker:

$ cd ${ISAAC_ROS_WS}/src/isaac_ros_common/scripts/
$ echo -e "CONFIG_IMAGE_KEY=ros2_humble\nCONFIG_DOCKER_SEARCH_DIRS=(../docker)" > .isaac_ros_common-config

Key Isaac ROS Tools#

In the stereo camera pipeline, we use two key tools from Isaac ROS:

  • rosbag_to_mapping_data: Converts ROS bags into the cuSFM format, extracting either raw images or keyframes

  • nvblox: Reconstructs 3D scenes by generating meshes and occupancy maps from depth and RGB data

Step 2: ROS Bag Converter (Optional)#

Note

This step is optional if your input is already in cuSFM format.

This step converts ROS bags to cuSFM format using the Isaac Mapping application rosbag_to_mapping_data.

Launch Docker Container#

$ cd ${ISAAC_ROS_WS}/src/isaac_ros_common
$ ./scripts/run_dev.sh

Install Isaac Mapping ROS Package#

Inside the Docker container:

$ sudo apt-get update && sudo apt-get install -y ros-humble-isaac-mapping-ros

Extract Images from ROS Bag#

$ ros2 run isaac_mapping_ros rosbag_to_mapping_data \
    --sensor_data_bag_file <path_to_ROS_BAG_FILE> \
    --output_folder_path workspace/rosbag_mapping_data \
    --pose_topic_name <pose_topic_name> \
    --image_extension=.jpeg \
    --base_link_name=base_link \
    --generate_edex=True

Command Parameters#

  • sensor_data_bag_file (required): Path to the ROS bag file

  • output_folder_path (required): Directory where the converted data will be saved

  • pose_topic_name (optional): Name of the pose topic to use. Default: /visual_slam/vis/slam_odometry

  • base_link_name (optional): Name of the base link or vehicle coordinate frame. Default: base_link

  • generate_edex (optional): Controls whether EDEX files (Enhanced Data Exchange files) are generated. These files speed up downstream processing such as feature extraction, matching, and reconstruction in cuSFM

Expected Output#

rosbag_mapping_data/
├── <camera_name>/          # e.g., front_stereo_camera_left
│   └── xxxx.jpeg
├── frames_meta.json        # Frame metadata (timestamps, poses, etc.)
└── stereo.edex             # Stereo calibration / extrinsics data

Step 3: Pose Estimation with cuSFM#

PyCuSFM is a Python implementation of cuSFM, a CUDA-accelerated structure-from-motion framework for reconstructing accurate 3D environmental structure and estimating precise camera pose.

Quick Start#

$ git clone https://github.com/nvidia-isaac/pyCuSFM
$ cd pycusfm
$ ./run_in_docker.sh --build_docker --install
$ cusfm_cli --input_dir $CONVERTED_DATA_DIR --cusfm_base_dir $CUSFM_OUTPUT_DIR

Command Parameters#

  • input_dir (required): Path to the converted dataset (e.g., output of ROS bag conversion)

  • cusfm_base_dir (required): Output directory where cuSFM results will be saved (e.g., keyframes/, sparse/, matches/, pose_graph/)

Expected Output#

cusfm/
├── cuvgl_map/
├── cuvslam_output/
├── keyframes/
├── kpmap/
├── matches/
├── output_poses/
├── pose_graph/
└── sparse/                 # Contains cameras.txt, images.txt, points3D.txt

For complete installation and usage documentation, see the PyCuSFM Repository and PyCuSFM Tutorial.

Step 4: Depth Estimation with FoundationStereo#

In the stereo camera pipeline, a depth map with metric scale is essential. Without additional sensors like LiDAR, depth maps serve as the primary source for generating point clouds used in Gaussian initialization and geometry regularization.

FoundationStereo is a large-scale foundation model for stereo depth estimation that demonstrates strong zero-shot generalization. It processes sequences of stereo RGB image pairs and generates corresponding depth maps at matching resolution.

Run Depth Estimation#

python scripts/run_demo_batch.py \
    --ckpt_dir pretrained_models/23-51-11/model_best_bp2.pth \
    --imgdir workspace/rosbag_mapping_data/ \
    --metadata_file workspace/cusfm/kpmap/keyframes/frames_meta.json \
    --out_dir <path_to_output_dir> \
    --camera <camera_name>

Command Parameters#

  • ckpt_dir (required): Path to the directory containing the pretrained FoundationStereo checkpoint (e.g., model_best_bp2.pth)

  • imgdir (required): Path to the folder containing stereo image pairs extracted from the ROS bag (e.g., rosbag_mapping_data/)

  • metadata_file (required): Path to the cuSFM-formatted frames_meta.json file

  • out_dir (required): Directory where the generated depth maps will be saved

  • camera (optional): Camera name to process (e.g., front_stereo_camera_left). If omitted, all cameras defined in frames_meta.json are processed by default

Expected Output#

depth/
├── scaled_0_4/              # Scaled depth maps (normalized, 0.4x downsampled)
│   └── <camera_name>/
│       └── xxxx.png
└── original_size/           # Depth maps at original resolution
    └── <camera_name>/
        └── xxxx.png

The scaled_0_4 folder includes depth maps downsampled by a factor of 0.4. Use the depth maps from the original_size folder for subsequent steps, such as mesh generation.

Note

For more information, see FoundationStereo.

Step 5: Mesh Generation with nvblox#

A mesh is crucial for enabling collision detection, navigation, and assigning physical properties—all essential for realistic simulation. nvblox reconstructs a voxel-based map from dense depth and RGB images, generating both a triangle mesh and an occupancy map. It supports 2D and 3D mapping and is optimized for NVIDIA GPUs and Jetson devices.

Generate Mesh with nvblox#

Launch the container and run the following command to generate an output mesh file named nvblox_mesh.ply:

$ ros2 run isaac_mapping_ros run_nvblox \
    --color_image_dir workspace/rosbag_mapping_data \
    --frames_meta_file workspace/cusfm/kpmap/keyframes/frames_meta.json \
    --depth_image_dir workspace/depth/original_size \
    --save_2d_occupancy_map_path workspace/nvblox_mesh/occupancy_map \
    --mesh_output_path workspace/nvblox_mesh/nvblox_mesh.ply \
    --voxel_size 0.02 \
    --projective_integrator_truncation_distance_vox 8 \
    --projective_integrator_max_weight 20 \
    --esdf_slice_height 0.3

Command Parameters#

  • color_image_dir (required): Path to the folder containing color (RGB) images extracted from the ROS bag (e.g., rosbag_mapping_data/)

  • frames_meta_file (required): Path to the frames_meta.json file in cuSFM format that provides timestamps, poses, and camera parameters

  • depth_image_dir (required): Path to the folder containing corresponding depth maps generated by FoundationStereo

  • save_2d_occupancy_map_path (required): Output directory where the generated 2D occupancy map will be saved

  • mesh_output_path (required): File path for saving the reconstructed 3D mesh (e.g., workspace/nvblox_mesh/nvblox_mesh.ply)

  • voxel_size (optional): Size of each voxel in meters. Smaller values yield finer meshes at higher computational cost

  • projective_integrator_truncation_distance_vox (optional): Truncation distance (in voxels) for the TSDF integrator, controlling surface precision

  • projective_integrator_max_weight (optional): Maximum weight for depth fusion; larger values make the surface smoother and more stable

  • esdf_slice_height (optional): Height (in meters) at which to extract the 2D ESDF/occupancy map from the 3D reconstruction

Expected Output#

nvblox_mesh/
├── nvblox_mesh.ply         # Reconstructed mesh
├── occupancy_map.png       # 2D occupancy / cost map visualization
└── occupancy_map.yaml      # Metadata/config for occupancy map

Step 6: Neural Reconstruction with 3DGURT#

Once the mesh, point cloud, depth maps, poses, and RGB images are ready, we use 3DGURT to train a dense 3D reconstruction and export it to USD for Isaac Sim.

3DGURT implements 3D Gaussian Ray Tracing (3DGRT) and 3D Gaussian Unscented Transform (3DGUT), enabling native support for distorted camera models with time-dependent effects (e.g., fisheye and rolling shutter). It also supports secondary ray tracing, allowing the representation of phenomena such as reflections and refractions within a unified 3D Gaussian framework—resulting in highly realistic scene rendering and reconstruction.

Setup 3DGURT Environment#

Requirements:

  • Linux system with CUDA 11.8

  • GCC ≤ 11

  • NVIDIA GPU

Installation#

The official 3DGURT code is available on GitHub. Follow the installation instructions to set up all required dependencies.

$ git clone --recursive https://github.com/nv-tlabs/3dgrut.git
$ cd 3dgrut
$ chmod +x install_env.sh
$ ./install_env.sh 3dgrut
$ conda activate 3dgrut

For GCC 11 inside Conda environment:

If your system uses GCC 12 or newer (e.g., on Ubuntu 24.04):

$ conda install -c conda-forge gcc=11 gxx=11

Tip

Verify successful installation by running a test reconstruction on one of the sample datasets included in the repository before proceeding with your own data.

Prepare cuSFM Outputs#

Note the path to your cuSFM output directory generated in Step 2. We use the apps/cusfm_3dgut.yaml configuration in this tutorial, but you can also use apps/cusfm_3dgut_mcmc.yaml, which pairs 3DGURT with an MCMC (Markov Chain Monte Carlo) densification strategy. In practice, this approach samples and densifies Gaussians in regions where the reconstruction is uncertain, sharpening thin structures and edges while improving overall fidelity, with only a modest increase in training time compared to the baseline configuration.

Run 3DGURT Training and Export USDZ#

With the environment active, start training by running the provided train.py script with the cuSFM config:

$ conda activate 3dgrut
$ python train.py \
   --config-name apps/cusfm_3dgut.yaml \
   path=workspace/ \
   out_dir=workspace/3dgurt \
   initialization.fused_point_cloud_path=workspace/nvblox_mesh/nvblox_mesh.ply \
   experiment_name=3dgut_mcmc \
   export_usdz.enabled=true \
   export_usdz.apply_normalizing_transform=true

Command Parameters#

  • config-name (required): Specifies which configuration YAML file to use. apps/cusfm_3dgut_mcmc.yaml enables the MCMC densification strategy

  • path (required): Path to the cuSFM output directory containing sparse reconstruction data in the sparse/ folder (e.g., cameras.txt, images.txt, and points3D.txt)

  • out_dir (required): Directory where the 3DGURT training outputs, checkpoints, and exported assets will be saved

  • experiment_name (required): Custom name for the training run. Used to organize logs, checkpoints, and output folders

  • initialization.accumulated_point_cloud_path (required): Path to the point cloud or mesh used for Gaussian initialization

  • export_usdz.enabled (optional): If set to true, exports the reconstructed 3D scene as a USDZ file, which can be directly imported into NVIDIA Isaac Sim

  • export_usdz.apply_normalizing_transform (optional): If true, applies a simple normalization (centers and scales the scene near the origin). Note: This does not ensure the floor aligns exactly with z = 0; minor translation or rotation in Isaac Sim may still be required for alignment

Training Process#

Once you run the command, 3DGURT will begin training:

  • Reads your images and cuSFM data

  • Optimizes a 3D representation of the scene

  • Training duration depends on scene complexity and GPU performance

  • Typically ranges from a few minutes for small scenes to several hours for highly detailed ones

Training Output#

When the process completes, you will obtain:

3dgurt/
├── ckpt_last.pt            # Final checkpoint
├── export_last.inpg        # Exported intermediate representation
├── parsed.yaml             # Training configuration
├── ours_xxxxx/             # e.g., ours_30000 (iteration count)
└── export_last.usdz        # Final USDZ asset for Isaac Sim

The flag export_usdz.enabled=true writes out a USDZ file of your reconstructed scene, allowing it to be directly loaded into Isaac Sim.

The flag export_usdz.apply_normalizing_transform=true applies a primitive normalization (centering and scaling the scene near the origin). However, it does not guarantee that the floor aligns exactly at z = 0. In Isaac Sim, you can add a Ground Plane and/or slightly adjust the scene root (translate or rotate) to achieve proper alignment.

Step 7: Using Generated Assets and Sim-Ready Assets in Isaac Sim#

You can import the generated .usdz scene assets into NVIDIA Isaac Sim for robotics simulation. To set up physical interactions by adding assets from the OpenUSD Sim-Ready asset libraries, search for and insert these assets directly into your generated scenes in Isaac Sim.

There are several Sim-Ready asset libraries available for you to choose from:

  1. NVIDIA’s physical AI dataset: Available on Hugging Face

  2. Lightwheel’s extensive library: Available at simready.com

  3. Extwin’s Synthesis Sim-Ready Assets: Available on Hugging Face (non-commercial use only)

Choose any of these Sim-Ready libraries, download the assets you need, and then load them into Isaac Sim for physical interaction with your reconstructed scenes.

  1. Download Sim-Ready Assets:

    For example: Open Lightwheel’s library in your browser, search for the asset you want, then download the zip file and extract the USD file.

  2. Load the Generated Scene and Sim-Ready Assets:

    To load the generated scene asset along with additional Sim-Ready assets into Isaac Sim, follow the steps below:

    1. Drag and drop the .usdz file into Isaac Sim.

    2. Select File → Open → (Select File) → Open File from the menu.

    To make the assets ready for simulation, you might want to also follow these steps:

    • Search for an asset in the search bar (right-hand side asset tree) and drag it into the scene.

    • To set mesh collision properties so that your robot can interact with the scene, navigate to and right-click on World → Volume → Mesh → Mesh, then select Add → Physics → Collider / Collider Preset.

For a complete walkthrough, watch the video tutorial by expanding the following section.

Video Tutorial: Adding Sim Objects and Setting Physics Properties

For a detailed video tutorial, play the following video: Adding Sim Objects and Setting Physics Properties.

Tips and Best Practices#

Hardware Recommendations#

  • Stereo RGB camera: A stereo RGB camera is required for this workflow. Recommended camera options include:

  • Data capture platform: We recommend an NVIDIA Jetson Orin Developer Kit for data capture along with an Isaac ROS Nova recorder for streamlined data collection and processing.

  • Camera stabilization: We recommend adding a gimbal with a handheld camera setup to reduce motion blur and improve image quality during data capture.

Data Capture#

  • Viewpoint diversity: Capture data from various different viewpoints with different pitch angles. This multi-angle coverage significantly improves reconstruction quality and completeness.

  • Data coverage: More data results in better reconstruction quality. Ensure thorough coverage of the area of interest—the more comprehensive your data collection, the higher the fidelity of the final reconstructed scene.

  • Stereo camera calibration: Ensure proper calibration of stereo cameras for accurate depth estimation

  • ROS bag recording: Record all necessary topics (images, camera_info, tf, tf_static)

  • Lighting conditions: Maintain consistent lighting during capture

Pipeline Optimization#

  • cuSFM settings: Adjust cuSFM parameters based on scene complexity and size

  • Depth quality: Monitor FoundationStereo output quality; regenerate if needed

  • Mesh resolution: Balance voxel size between detail and computational cost

  • Training iterations: More iterations generally improve quality but increase training time

Isaac Sim Integration#

  • Collision setup: Always set up collision properties for robot interaction

  • Ground alignment: Verify and adjust ground plane alignment

  • Physics properties: Configure material properties for realistic interactions

  • Lighting: Add supplementary lighting if the reconstructed scene is too dark

Troubleshooting#

ROS Bag Conversion Issues#

Problem: Missing topics in ROS bag

Solutions:

  • Verify required topics exist: camera_info, /tf, /tf_static

  • Use ros2 bag info to inspect bag contents

  • Re-record ROS bag with all necessary topics

cuSFM Reconstruction Issues#

Problem: Poor sparse reconstruction quality

Solutions:

  • Check input image quality and camera calibration

  • Ensure stereo baseline is appropriate for scene scale

  • Verify EDEX files are generated correctly

  • Adjust cuSFM parameters for better feature matching

FoundationStereo Issues#

Problem: Depth maps have artifacts or missing regions

Solutions:

  • Verify stereo image pair alignment

  • Check that frames_meta.json contains correct camera parameters

  • Ensure sufficient texture in the scene for stereo matching

  • Try adjusting FoundationStereo inference parameters

nvblox Mesh Generation Issues#

Problem: Mesh has holes or artifacts

Solutions:

  • Verify depth map quality from FoundationStereo

  • Adjust voxel_size parameter (smaller for finer detail)

  • Increase projective_integrator_max_weight for smoother surfaces

  • Check that RGB and depth images are properly aligned

3DGURT Training Issues#

Problem: Out of GPU memory

Solutions:

  • Reduce training batch size in configuration

  • Use a GPU with more memory

  • Reduce image resolution in earlier pipeline stages

Problem: Training convergence issues

Solutions:

  • Verify input data quality (images, poses, depth, mesh)

  • Check that cuSFM sparse reconstruction is accurate

  • Adjust learning rate or other training parameters

  • Use MCMC configuration for better quality

Isaac Sim Integration Issues#

Problem: Scene appears at wrong scale or position

Solutions:

  • Verify normalizing transform was applied during export

  • Manually adjust scene transform in Isaac Sim

  • Check coordinate system conventions between pipeline and Isaac Sim

Problem: Robot falls through surfaces

Solutions:

  • Ensure collision properties are enabled on geometry

  • Verify ground plane is positioned correctly

  • Check physics settings in Isaac Sim

Additional Resources#

Documentation

Datasets

Community

Next Steps#