Reconstruct Scenes from Stereo Camera Data#
Overview#
The stereo camera reference workflow provides an end-to-end pipeline for converting real-world sensor data from stereo RGB cameras into physically realistic digital assets usable in NVIDIA Isaac Sim. This professional reference workflow is designed for production robotics applications and automates pose estimation, depth estimation, scene reconstruction, and physically based rendering asset generation.
This workflow was used to generate the assets available in the NVIDIA PhysicalAI Robotics NuRec Dataset.
The pipeline integrates several NVIDIA technologies:
Isaac ROS for data processing and conversion
cuSFM (PyCuSFM) for pose estimation
FoundationStereo for depth estimation
nvblox for mesh generation and occupancy mapping
3DGURT for neural reconstruction
USD/USDZ export for Isaac Sim integration
Prerequisites#
Hardware Requirements#
Stereo RGB camera system
Linux workstation with:
CUDA 11.8
GCC ≤ 11
NVIDIA GPU (RTX recommended)
Docker support
Software Requirements#
Isaac ROS development environment
ROS 2 Humble
Docker
Isaac Sim 5.0 or later
Raw Data Requirements#
The stereo camera pipeline supports two types of input data:
ROS Bag Format#
Recorded by the user, which can be converted to cuSFM format.
Requirements:
Stereo RGB camera required
camera_infotopic needed for camera intrinsics/tfand/tf_staticneeded for camera extrinsics
Example ROS bags:
https://catalog.ngc.nvidia.com/orgs/nvidia/teams/isaac/resources/r2bdataset2024
cuSFM Format#
Pre-processed data in cuSFM format. See the PyCuSFM Tutorial on Raw Data Requirements for detailed format specifications.
Components:
Image files: Camera images in supported formats (JPEG, PNG, etc.)
frames_meta.json: Metadata file following the KeyframesMetadataCollection protobuf format
Example cuSFM data:
Expected Output Structure#
After running all modules in the stereo camera pipeline, the expected output directory structure is:
workspace/
├── rosbag_mapping_data/ # Raw data extracted from ROS bag (cuSFM format)
│ ├── <camera_name>/ # e.g., front_stereo_camera_left
│ │ └── xxxx.jpeg
│ ├── frames_meta.json # Frame metadata (timestamps, poses, etc.)
│ └── stereo.edex # Stereo calibration / extrinsics data
│
├── cusfm/ # cuSFM reconstruction outputs
│ ├── cuvgl_map/
│ ├── cuvslam_output/
│ ├── keyframes/
│ ├── kpmap/
│ ├── matches/
│ ├── output_poses/
│ ├── pose_graph/
│ └── sparse/ # Sparse reconstruction
│ ├── cameras.txt
│ ├── images.txt
│ └── points3D.txt
│
├── nvblox_mesh/ # Mesh and occupancy map outputs
│ ├── nvblox_mesh.ply # Reconstructed mesh
│ ├── occupancy_map.png # 2D occupancy map visualization
│ └── occupancy_map.yaml # Occupancy map metadata
│
├── depth/ # Depth maps from FoundationStereo
│ ├── scaled_0_4/ # Scaled depth maps (0.4x downsampled)
│ │ └── <camera_name>/
│ │ └── xxxx.png
│ └── original_size/ # Original resolution depth maps
│ └── <camera_name>/
│ └── xxxx.png
│
└── 3dgurt/ # 3DGURT training outputs and final assets
├── ckpt_last.pt # Final checkpoint
├── export_last.inpg # Intermediate representation
├── parsed.yaml # Training configuration
├── ours_xxxxx/ # e.g., ours_30000 (iteration count)
└── export_last.usdz # Final USDZ asset for Isaac Sim
Note
Folder names such as rosbag_mapping_data, cusfm, nvblox_mesh, depth, or 3dgurt may vary depending on your configuration.
Workflow Steps#
Step 1: Setup Isaac ROS#
Isaac ROS provides NVIDIA CUDA-accelerated computing packages and AI models designed to streamline robotics development. It offers ready-to-use packages for navigation, perception, and other common tasks.
Development Environment Setup#
Follow the instructions in the Isaac ROS Getting Started Guide.
Clone Isaac ROS Common#
$ cd ${ISAAC_ROS_WS}/src
$ git clone -b release-3.2 https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_common.git
Add Data Volume to Docker Container#
Add your data volume to the Docker container by modifying the .isaac_ros_dev-dockerargs file. For example, to mount the host directory /mnt/nova_ssd/recordings:
$ echo -e '-v /mnt/nova_ssd/recordings:/mnt/nova_ssd/recordings' \
>> ${ISAAC_ROS_WS}/src/isaac_ros_common/scripts/.isaac_ros_dev-dockerargs
Configure for x86_64 Platform#
For an x86_64 platform with RTX GPU running Ubuntu and Docker:
$ cd ${ISAAC_ROS_WS}/src/isaac_ros_common/scripts/
$ echo -e "CONFIG_IMAGE_KEY=ros2_humble\nCONFIG_DOCKER_SEARCH_DIRS=(../docker)" > .isaac_ros_common-config
Key Isaac ROS Tools#
In the stereo camera pipeline, we use two key tools from Isaac ROS:
rosbag_to_mapping_data: Converts ROS bags into the cuSFM format, extracting either raw images or keyframes
nvblox: Reconstructs 3D scenes by generating meshes and occupancy maps from depth and RGB data
Step 2: ROS Bag Converter (Optional)#
Note
This step is optional if your input is already in cuSFM format.
This step converts ROS bags to cuSFM format using the Isaac Mapping application rosbag_to_mapping_data.
Launch Docker Container#
$ cd ${ISAAC_ROS_WS}/src/isaac_ros_common
$ ./scripts/run_dev.sh
Install Isaac Mapping ROS Package#
Inside the Docker container:
$ sudo apt-get update && sudo apt-get install -y ros-humble-isaac-mapping-ros
Extract Images from ROS Bag#
$ ros2 run isaac_mapping_ros rosbag_to_mapping_data \
--sensor_data_bag_file <path_to_ROS_BAG_FILE> \
--output_folder_path workspace/rosbag_mapping_data \
--pose_topic_name <pose_topic_name> \
--image_extension=.jpeg \
--base_link_name=base_link \
--generate_edex=True
Command Parameters#
sensor_data_bag_file(required): Path to the ROS bag fileoutput_folder_path(required): Directory where the converted data will be savedpose_topic_name(optional): Name of the pose topic to use. Default:/visual_slam/vis/slam_odometrybase_link_name(optional): Name of the base link or vehicle coordinate frame. Default:base_linkgenerate_edex(optional): Controls whether EDEX files (Enhanced Data Exchange files) are generated. These files speed up downstream processing such as feature extraction, matching, and reconstruction in cuSFM
Expected Output#
rosbag_mapping_data/
├── <camera_name>/ # e.g., front_stereo_camera_left
│ └── xxxx.jpeg
├── frames_meta.json # Frame metadata (timestamps, poses, etc.)
└── stereo.edex # Stereo calibration / extrinsics data
Step 3: Pose Estimation with cuSFM#
PyCuSFM is a Python implementation of cuSFM, a CUDA-accelerated structure-from-motion framework for reconstructing accurate 3D environmental structure and estimating precise camera pose.
Quick Start#
$ git clone https://github.com/nvidia-isaac/pyCuSFM
$ cd pycusfm
$ ./run_in_docker.sh --build_docker --install
$ cusfm_cli --input_dir $CONVERTED_DATA_DIR --cusfm_base_dir $CUSFM_OUTPUT_DIR
Command Parameters#
input_dir(required): Path to the converted dataset (e.g., output of ROS bag conversion)cusfm_base_dir(required): Output directory where cuSFM results will be saved (e.g., keyframes/, sparse/, matches/, pose_graph/)
Expected Output#
cusfm/
├── cuvgl_map/
├── cuvslam_output/
├── keyframes/
├── kpmap/
├── matches/
├── output_poses/
├── pose_graph/
└── sparse/ # Contains cameras.txt, images.txt, points3D.txt
For complete installation and usage documentation, see the PyCuSFM Repository and PyCuSFM Tutorial.
Step 4: Depth Estimation with FoundationStereo#
In the stereo camera pipeline, a depth map with metric scale is essential. Without additional sensors like LiDAR, depth maps serve as the primary source for generating point clouds used in Gaussian initialization and geometry regularization.
FoundationStereo is a large-scale foundation model for stereo depth estimation that demonstrates strong zero-shot generalization. It processes sequences of stereo RGB image pairs and generates corresponding depth maps at matching resolution.
Run Depth Estimation#
python scripts/run_demo_batch.py \
--ckpt_dir pretrained_models/23-51-11/model_best_bp2.pth \
--imgdir workspace/rosbag_mapping_data/ \
--metadata_file workspace/cusfm/kpmap/keyframes/frames_meta.json \
--out_dir <path_to_output_dir> \
--camera <camera_name>
Command Parameters#
ckpt_dir(required): Path to the directory containing the pretrained FoundationStereo checkpoint (e.g.,model_best_bp2.pth)imgdir(required): Path to the folder containing stereo image pairs extracted from the ROS bag (e.g.,rosbag_mapping_data/)metadata_file(required): Path to the cuSFM-formattedframes_meta.jsonfileout_dir(required): Directory where the generated depth maps will be savedcamera(optional): Camera name to process (e.g.,front_stereo_camera_left). If omitted, all cameras defined inframes_meta.jsonare processed by default
Expected Output#
depth/
├── scaled_0_4/ # Scaled depth maps (normalized, 0.4x downsampled)
│ └── <camera_name>/
│ └── xxxx.png
└── original_size/ # Depth maps at original resolution
└── <camera_name>/
└── xxxx.png
The scaled_0_4 folder includes depth maps downsampled by a factor of 0.4. Use the depth maps from the original_size folder for subsequent steps, such as mesh generation.
Note
For more information, see FoundationStereo.
Step 5: Mesh Generation with nvblox#
A mesh is crucial for enabling collision detection, navigation, and assigning physical properties—all essential for realistic simulation. nvblox reconstructs a voxel-based map from dense depth and RGB images, generating both a triangle mesh and an occupancy map. It supports 2D and 3D mapping and is optimized for NVIDIA GPUs and Jetson devices.
Generate Mesh with nvblox#
Launch the container and run the following command to generate an output mesh file named nvblox_mesh.ply:
$ ros2 run isaac_mapping_ros run_nvblox \
--color_image_dir workspace/rosbag_mapping_data \
--frames_meta_file workspace/cusfm/kpmap/keyframes/frames_meta.json \
--depth_image_dir workspace/depth/original_size \
--save_2d_occupancy_map_path workspace/nvblox_mesh/occupancy_map \
--mesh_output_path workspace/nvblox_mesh/nvblox_mesh.ply \
--voxel_size 0.02 \
--projective_integrator_truncation_distance_vox 8 \
--projective_integrator_max_weight 20 \
--esdf_slice_height 0.3
Command Parameters#
color_image_dir(required): Path to the folder containing color (RGB) images extracted from the ROS bag (e.g.,rosbag_mapping_data/)frames_meta_file(required): Path to theframes_meta.jsonfile in cuSFM format that provides timestamps, poses, and camera parametersdepth_image_dir(required): Path to the folder containing corresponding depth maps generated by FoundationStereosave_2d_occupancy_map_path(required): Output directory where the generated 2D occupancy map will be savedmesh_output_path(required): File path for saving the reconstructed 3D mesh (e.g.,workspace/nvblox_mesh/nvblox_mesh.ply)voxel_size(optional): Size of each voxel in meters. Smaller values yield finer meshes at higher computational costprojective_integrator_truncation_distance_vox(optional): Truncation distance (in voxels) for the TSDF integrator, controlling surface precisionprojective_integrator_max_weight(optional): Maximum weight for depth fusion; larger values make the surface smoother and more stableesdf_slice_height(optional): Height (in meters) at which to extract the 2D ESDF/occupancy map from the 3D reconstruction
Expected Output#
nvblox_mesh/
├── nvblox_mesh.ply # Reconstructed mesh
├── occupancy_map.png # 2D occupancy / cost map visualization
└── occupancy_map.yaml # Metadata/config for occupancy map
Step 6: Neural Reconstruction with 3DGURT#
Once the mesh, point cloud, depth maps, poses, and RGB images are ready, we use 3DGURT to train a dense 3D reconstruction and export it to USD for Isaac Sim.
3DGURT implements 3D Gaussian Ray Tracing (3DGRT) and 3D Gaussian Unscented Transform (3DGUT), enabling native support for distorted camera models with time-dependent effects (e.g., fisheye and rolling shutter). It also supports secondary ray tracing, allowing the representation of phenomena such as reflections and refractions within a unified 3D Gaussian framework—resulting in highly realistic scene rendering and reconstruction.
Setup 3DGURT Environment#
Requirements:
Linux system with CUDA 11.8
GCC ≤ 11
NVIDIA GPU
Installation#
The official 3DGURT code is available on GitHub. Follow the installation instructions to set up all required dependencies.
$ git clone --recursive https://github.com/nv-tlabs/3dgrut.git
$ cd 3dgrut
$ chmod +x install_env.sh
$ ./install_env.sh 3dgrut
$ conda activate 3dgrut
For GCC 11 inside Conda environment:
If your system uses GCC 12 or newer (e.g., on Ubuntu 24.04):
$ conda install -c conda-forge gcc=11 gxx=11
Tip
Verify successful installation by running a test reconstruction on one of the sample datasets included in the repository before proceeding with your own data.
Prepare cuSFM Outputs#
Note the path to your cuSFM output directory generated in Step 2. We use the apps/cusfm_3dgut.yaml
configuration in this tutorial, but you can also use apps/cusfm_3dgut_mcmc.yaml, which pairs 3DGURT with an MCMC
(Markov Chain Monte Carlo) densification strategy. In practice, this approach samples and densifies Gaussians
in regions where the reconstruction is uncertain, sharpening thin structures and edges while improving overall fidelity,
with only a modest increase in training time compared to the baseline configuration.
Run 3DGURT Training and Export USDZ#
With the environment active, start training by running the provided train.py script with the cuSFM config:
$ conda activate 3dgrut
$ python train.py \
--config-name apps/cusfm_3dgut.yaml \
path=workspace/ \
out_dir=workspace/3dgurt \
initialization.fused_point_cloud_path=workspace/nvblox_mesh/nvblox_mesh.ply \
experiment_name=3dgut_mcmc \
export_usdz.enabled=true \
export_usdz.apply_normalizing_transform=true
Command Parameters#
config-name(required): Specifies which configuration YAML file to use.apps/cusfm_3dgut_mcmc.yamlenables the MCMC densification strategypath(required): Path to the cuSFM output directory containing sparse reconstruction data in thesparse/folder (e.g.,cameras.txt,images.txt, andpoints3D.txt)out_dir(required): Directory where the 3DGURT training outputs, checkpoints, and exported assets will be savedexperiment_name(required): Custom name for the training run. Used to organize logs, checkpoints, and output foldersinitialization.accumulated_point_cloud_path(required): Path to the point cloud or mesh used for Gaussian initializationexport_usdz.enabled(optional): If set totrue, exports the reconstructed 3D scene as a USDZ file, which can be directly imported into NVIDIA Isaac Simexport_usdz.apply_normalizing_transform(optional): Iftrue, applies a simple normalization (centers and scales the scene near the origin). Note: This does not ensure the floor aligns exactly with z = 0; minor translation or rotation in Isaac Sim may still be required for alignment
Training Process#
Once you run the command, 3DGURT will begin training:
Reads your images and cuSFM data
Optimizes a 3D representation of the scene
Training duration depends on scene complexity and GPU performance
Typically ranges from a few minutes for small scenes to several hours for highly detailed ones
Training Output#
When the process completes, you will obtain:
3dgurt/
├── ckpt_last.pt # Final checkpoint
├── export_last.inpg # Exported intermediate representation
├── parsed.yaml # Training configuration
├── ours_xxxxx/ # e.g., ours_30000 (iteration count)
└── export_last.usdz # Final USDZ asset for Isaac Sim
The flag export_usdz.enabled=true writes out a USDZ file of your reconstructed scene, allowing it to be directly loaded into Isaac Sim.
The flag export_usdz.apply_normalizing_transform=true applies a primitive normalization (centering and scaling the scene near the origin). However, it does not guarantee that the floor aligns exactly at z = 0. In Isaac Sim, you can add a Ground Plane and/or slightly adjust the scene root (translate or rotate) to achieve proper alignment.
Step 7: Using Generated Assets and Sim-Ready Assets in Isaac Sim#
You can import the generated .usdz scene assets into
NVIDIA Isaac Sim for robotics simulation.
To set up physical interactions by adding assets from the OpenUSD Sim-Ready asset libraries, search for and insert these assets directly into your generated scenes in Isaac Sim.
There are several Sim-Ready asset libraries available for you to choose from:
NVIDIA’s physical AI dataset: Available on Hugging Face
Lightwheel’s extensive library: Available at simready.com
Extwin’s Synthesis Sim-Ready Assets: Available on Hugging Face (non-commercial use only)
Choose any of these Sim-Ready libraries, download the assets you need, and then load them into Isaac Sim for physical interaction with your reconstructed scenes.
Download Sim-Ready Assets:
For example: Open Lightwheel’s library in your browser, search for the asset you want, then download the zip file and extract the USD file.
Load the Generated Scene and Sim-Ready Assets:
To load the generated scene asset along with additional Sim-Ready assets into Isaac Sim, follow the steps below:
Drag and drop the
.usdzfile into Isaac Sim.Select File → Open → (Select File) → Open File from the menu.
To make the assets ready for simulation, you might want to also follow these steps:
Search for an asset in the search bar (right-hand side asset tree) and drag it into the scene.
To set mesh collision properties so that your robot can interact with the scene, navigate to and right-click on World → Volume → Mesh → Mesh, then select Add → Physics → Collider / Collider Preset.
For a complete walkthrough, watch the video tutorial by expanding the following section.
Video Tutorial: Adding Sim Objects and Setting Physics Properties
For a detailed video tutorial, play the following video: Adding Sim Objects and Setting Physics Properties.
Tips and Best Practices#
Hardware Recommendations#
Stereo RGB camera: A stereo RGB camera is required for this workflow. Recommended camera options include:
ZED 2i - AI-powered stereo camera with neural depth sensing and IP66 protection
Leopard Imaging Hawk Stereo Cameras - High-performance stereo camera systems
Data capture platform: We recommend an NVIDIA Jetson Orin Developer Kit for data capture along with an Isaac ROS Nova recorder for streamlined data collection and processing.
Camera stabilization: We recommend adding a gimbal with a handheld camera setup to reduce motion blur and improve image quality during data capture.
Data Capture#
Viewpoint diversity: Capture data from various different viewpoints with different pitch angles. This multi-angle coverage significantly improves reconstruction quality and completeness.
Data coverage: More data results in better reconstruction quality. Ensure thorough coverage of the area of interest—the more comprehensive your data collection, the higher the fidelity of the final reconstructed scene.
Stereo camera calibration: Ensure proper calibration of stereo cameras for accurate depth estimation
ROS bag recording: Record all necessary topics (images, camera_info, tf, tf_static)
Lighting conditions: Maintain consistent lighting during capture
Pipeline Optimization#
cuSFM settings: Adjust cuSFM parameters based on scene complexity and size
Depth quality: Monitor FoundationStereo output quality; regenerate if needed
Mesh resolution: Balance voxel size between detail and computational cost
Training iterations: More iterations generally improve quality but increase training time
Isaac Sim Integration#
Collision setup: Always set up collision properties for robot interaction
Ground alignment: Verify and adjust ground plane alignment
Physics properties: Configure material properties for realistic interactions
Lighting: Add supplementary lighting if the reconstructed scene is too dark
Troubleshooting#
ROS Bag Conversion Issues#
Problem: Missing topics in ROS bag
Solutions:
Verify required topics exist:
camera_info,/tf,/tf_staticUse
ros2 bag infoto inspect bag contentsRe-record ROS bag with all necessary topics
cuSFM Reconstruction Issues#
Problem: Poor sparse reconstruction quality
Solutions:
Check input image quality and camera calibration
Ensure stereo baseline is appropriate for scene scale
Verify EDEX files are generated correctly
Adjust cuSFM parameters for better feature matching
FoundationStereo Issues#
Problem: Depth maps have artifacts or missing regions
Solutions:
Verify stereo image pair alignment
Check that
frames_meta.jsoncontains correct camera parametersEnsure sufficient texture in the scene for stereo matching
Try adjusting FoundationStereo inference parameters
nvblox Mesh Generation Issues#
Problem: Mesh has holes or artifacts
Solutions:
Verify depth map quality from FoundationStereo
Adjust
voxel_sizeparameter (smaller for finer detail)Increase
projective_integrator_max_weightfor smoother surfacesCheck that RGB and depth images are properly aligned
3DGURT Training Issues#
Problem: Out of GPU memory
Solutions:
Reduce training batch size in configuration
Use a GPU with more memory
Reduce image resolution in earlier pipeline stages
Problem: Training convergence issues
Solutions:
Verify input data quality (images, poses, depth, mesh)
Check that cuSFM sparse reconstruction is accurate
Adjust learning rate or other training parameters
Use MCMC configuration for better quality
Isaac Sim Integration Issues#
Problem: Scene appears at wrong scale or position
Solutions:
Verify normalizing transform was applied during export
Manually adjust scene transform in Isaac Sim
Check coordinate system conventions between pipeline and Isaac Sim
Problem: Robot falls through surfaces
Solutions:
Ensure collision properties are enabled on geometry
Verify ground plane is positioned correctly
Check physics settings in Isaac Sim
Additional Resources#
Documentation
Datasets
Community
Next Steps#
Explore the Mono Camera Reference Workflow for simpler capture scenarios
Learn about 4 Steps to Generate Synthetic Data for Robot Development