Generate NuRec Auxiliary Data#
NuRec requires an additional dataset to reconstruct scenes when you convert and input your own real-world data. This dataset, called NuRec Auxiliary Data, includes the following required and optional data types:
-
Semantic Segmentation Data (conditionally required)
-
Depth Estimation Data (optional)
-
DINOv2 Feature Extraction (optional)
-
LiDAR Segmentation and Visibility (optional, recommended)
-
Metadata and Configuration (required)
To learn more about each data type, see the Learn About NuRec Auxiliary Data Types section.
Generate the Data#
You generate NuRec Auxiliary Data in its own container, available on NGC. To generate auxiliary data, follow the steps in this section.
-
Download the NuRec Auxiliary Data container by running the following command:
docker pull nvcr.io/nvidia/nre/nre-tools-ga:latest
-
To start generating the data in the Docker container, edit the following command to reflect the correct paths and then run it:
docker run --shm-size=2g -it --rm --gpus all \
-e NGC_API_KEY=${NGC_API_KEY} \
--volume /path/to/dataset:/workdir/dataset \
--volume /path/to/output:/workdir/output \
nvcr.io/nvidia/nre/nre-tools-ga:latest \
--dataset-path=/workdir/dataset/<DATASET_NAME>.zarr.itar \
--output-dir=/workdir/output \
--camera-id=<ID1> --camera-id=<ID2> --camera-id=<ID3> \
--store-meta \
--no-seg-logits \
--lidar-seg-camvis
Notes:
-
Point the dataset volume mount to the directory where the
.zarr.itarand.jsonfile are saved from the previous section. -
Replace
<DATASET_NAME>.zarr.itarwith the filename for the file with extension.zarr.itar. -
Use the
--camera-id=<ID1> --camera-id=<ID2>flags to pass camera IDs multiple times. If you don’t pass any camera IDs, the model generates auxiliary data for all cameras. For example, Ifcamera_mainis your camera of choice, pass the--camera-id=camera_mainflag. -
Camera IDs can be found in the JSON file generated by NCore.
-
You can use the
--numba-num-threads <N>flag to increase the number of CPU threads and generate the auxiliary data faster. -
Files generated by above command will have
.auxadded to the filename:<DATASET_NAME>.aux.<>.zarr.itar.
-
Copy the generated
.aux.<>.zarrfiles to the same directory as your NCore data, then follow the steps in Use NuRec for Autonomous Vehicles for training.
Options for Generating the Data#
The following table outlines all the options you can pass when you run the command to generate the supplemental auxiliary data.
Note: Either shard-file-pattern or dataset-path are required,
in addition to output-dir. Use --dataset-path to point to
the NCore .json
manifest file. Use --shard-file-pattern for
monolithic shard files. If you’re using the NVIDIA
NCore-converted Physical AI dataset or NVIDIA Physical AI
raw dataset, use dataset-path.
|
Option |
Description |
|---|---|
|
|
Data shard pattern to load (supports range expansion) |
|
|
Path to NCore |
|
|
Path to the output folder |
|
|
Cameras to be used (multiple value option, all if not specified) |
|
|
Lidars to be used (multiple value option, all if not specified) |
|
|
Perform segmentation, please choose a backend. |
|
|
Perform semantic segmentation and save logits |
|
|
Enable running TRT optimized models |
|
|
DINOv2 backbone to be used for feature extraction |
|
|
PCA dimension for the features to be extracted ( |
|
|
DINOv2 feature width (default 256) |
|
|
Perform lidar segmentation and point-in-cameras visibility determination |
|
|
Whether to use CUDA-based ensemble function for lidar segmentation |
|
|
Perform depth estimation, please choose a backend. |
|
|
Estimate the relative depth (as opposed to metric) |
|
|
The maximum metric depth predicted by the metric depth estimation network. |
|
|
The resolution of the inputs to the depth estimation network |
|
|
Store depth in a quantized form (as PNG) |
|
|
Perform automatic ego-mask estimation |
|
|
Number of frames to sample for ego-mask estimation per second (default: 0.2). |
|
|
Aggregation method for ego-mask estimation when using multiple samples. |
|
|
Zarr store type to store the aux data in |
|
|
Open shards consolidated meta-data |
|
|
Number of numba threads to use (use |
|
|
Enable debug logging outputs |
|
|
Enable outputting visualization results |
|
|
Store meta-file per shard with CLI arguments and maglev runtime logging (if available) |
|
|
Show all the command-line options and exit. |
Learn About NuRec Auxiliary Data Types#
-
Semantic Segmentation Data (conditionally required):
-
Method: Uses Mask2Former with DINOv2 backbone
-
Outputs:
-
Semantic segmentation masks (stored as PNG images)
-
Semantic segmentation logits (optional, for training/fine-tuning)
-
Default: –no-seg-logits (disabled by default)
-
Usage: Only stored when –seg-logits flag is used
-
Purpose: Used for advanced training techniques but not core functionality
-
-
Per-pixel class labels for scene understanding
-
-
Default: –segmentation-backend=”mask2former” (enabled by default)
-
Can be disabled: –segmentation-backend=”none”
-
Dependency: Required if LiDAR segmentation is enabled
-
Usage: Core for multi-modal understanding but can be skipped for pure image-based training
-
-
Depth Estimation Data (optional):
-
Method: Uses DepthAnythingV2 models
-
Types:
-
Relative depth: Normalized depth values [0,1]
-
Metric depth: Absolute depth values in meters (default max 80m for outdoor scenes)
-
-
Storage: Can be stored as quantized PNG or raw float16 values
-
Resolution: Configurable input resolution (default 1036px)
-
Default: –depth-backend=”none” (disabled by default)
-
Usage: Only generated when explicitly enabled with –depth-backend=depthanythingv2
-
Purpose: Provides geometric constraints but not essential for basic NeRF training
-
-
DINOv2 Feature Extraction (optional):
-
Purpose: Dense visual features for neural rendering
-
Models: Various DINOv2 variants (ViT-S/B/L/G with 14x14 patches)
-
Processing:
-
Optional PCA dimensionality reduction
-
Color transformation for visualization
-
Feature-to-color mapping for neural field training
-
-
Output: High-dimensional feature vectors per pixel patch
-
Default: –dinov2-backend=”none” (disabled by default)
-
Usage: Only used for advanced feature-based rendering when explicitly enabled
-
Purpose: Enhances semantic consistency and novel view synthesis quality, but not required
-
-
LiDAR Segmentation and Visibility (optional, recommended):
-
Method: Projects camera semantic segmentation onto LiDAR point clouds
-
Outputs:
-
Per-point semantic labels for LiDAR data
-
Point-in-camera visibility information
-
Ensemble-based label fusion from multiple camera views
-
-
Uses: CUDA-accelerated ensemble methods for performance
-
Default: –lidar-seg-camvis (enabled by default)
-
Can be disabled: –no-lidar-seg-camvis
-
Dependency: Requires semantic segmentation to be available (either generated or pre-existing)
-
Purpose: Essential for multi-modal NeRF training with LiDAR data
-
-
Metadata and Configuration (required):
-
Camera calibration and sensor metadata
-
Processing parameters and CLI arguments
-
Model metadata (versions, configurations, etc.)
-
Runtime information and workflow logging
-