Generate NuRec Auxiliary Data#

NuRec requires an additional dataset to reconstruct scenes when you convert and input your own real-world data. This dataset, called NuRec Auxiliary Data, includes the following required and optional data types:

  • Semantic Segmentation Data (conditionally required)

  • Depth Estimation Data (optional)

  • DINOv2 Feature Extraction (optional)

  • LiDAR Segmentation and Visibility (optional, recommended)

  • Metadata and Configuration (required)

To learn more about each data type, see the Learn About NuRec Auxiliary Data Types section.

Generate the Data#

You generate NuRec Auxiliary Data in its own container, available on NGC. To generate auxiliary data, follow the steps in this section.

  1. Download the NuRec Auxiliary Data container by running the following command:

   docker pull nvcr.io/nvidia/nre/nre-tools-ga:latest
  1. To start generating the data in the Docker container, edit the following command to reflect the correct paths and then run it:

   docker run --shm-size=2g -it --rm --gpus all \
   -e NGC_API_KEY=${NGC_API_KEY} \
   --volume /path/to/dataset:/workdir/dataset \
   --volume /path/to/output:/workdir/output \
   nvcr.io/nvidia/nre/nre-tools-ga:latest \
   --dataset-path=/workdir/dataset/<DATASET_NAME>.zarr.itar \
   --output-dir=/workdir/output \
   --camera-id=<ID1> --camera-id=<ID2> --camera-id=<ID3> \
   --store-meta \
   --no-seg-logits \
   --lidar-seg-camvis

Notes:

  • Point the dataset volume mount to the directory where the .zarr.itar and .json file are saved from the previous section.

  • Replace <DATASET_NAME>.zarr.itar with the filename for the file with extension .zarr.itar.

  • Use the --camera-id=<ID1> --camera-id=<ID2> flags to pass camera IDs multiple times. If you don’t pass any camera IDs, the model generates auxiliary data for all cameras. For example, If camera_main is your camera of choice, pass the --camera-id=camera_main flag.

  • Camera IDs can be found in the JSON file generated by NCore.

  • You can use the --numba-num-threads <N> flag to increase the number of CPU threads and generate the auxiliary data faster.

  • Files generated by above command will have .aux added to the filename: <DATASET_NAME>.aux.<>.zarr.itar.

  1. Copy the generated .aux.<>.zarr files to the same directory as your NCore data, then follow the steps in Use NuRec for Autonomous Vehicles for training.

Options for Generating the Data#

The following table outlines all the options you can pass when you run the command to generate the supplemental auxiliary data.

Note: Either shard-file-pattern or dataset-path are required, in addition to output-dir. Use --dataset-path to point to the NCore .json manifest file. Use --shard-file-pattern for monolithic shard files. If you’re using the NVIDIA NCore-converted Physical AI dataset or NVIDIA Physical AI raw dataset, use dataset-path.

Option

Description

--shard-file-pattern

Data shard pattern to load (supports range expansion)

--dataset-path

Path to NCore .json manifest file for the dataset you’re using. If you’re using an NVIDIA-published dataset, use this flag

--output-dir

Path to the output folder

--camera-id

Cameras to be used (multiple value option, all if not specified)

--lidar-id

Lidars to be used (multiple value option, all if not specified)

--segmentation-backend

Perform segmentation, please choose a backend.
(0) None: do not perform segmentation.
(1) Mask2Former: supports semantic segmentation and logits saving via --seg-logits (optional).

--seg-logits / --no-seg-logits

Perform semantic segmentation and save logits

--enable-trt / --disable-trt

Enable running TRT optimized models

--dinov2-backend

DINOv2 backbone to be used for feature extraction

--dinov2-pca-dim

PCA dimension for the features to be extracted (-1 means not to apply PCA)

--dinov2-width

DINOv2 feature width (default 256)

--lidar-seg-camvis / --no-lidar-seg-camvis

Perform lidar segmentation and point-in-cameras visibility determination

--lidar-seg-ensemble-cuda / --no-lidar-seg-ensemble-cuda

Whether to use CUDA-based ensemble function for lidar segmentation

--depth-backend

Perform depth estimation, please choose a backend.
(0) None: do not perform depth estimation.
(1) DepthAnythingV2: using small model.

--relative-depth

Estimate the relative depth (as opposed to metric)

--max-depth-m

The maximum metric depth predicted by the metric depth estimation network.
For relative depth this parameter has no effect and the values will be normalized to [0, 1].
More info: DepthAnythingV2 Issue #147. Default for outdoor is 80.0

--depth-input-resolution

The resolution of the inputs to the depth estimation network

--store-depth-as-png

Store depth in a quantized form (as PNG)

--ego-mask / --no-ego-mask

Perform automatic ego-mask estimation

--ego-mask-samples-per-second

Number of frames to sample for ego-mask estimation per second (default: 0.2).
[0.0002 <= x <= 30.0]

--ego-mask-aggregation-method

Aggregation method for ego-mask estimation when using multiple samples.

--zarr-store-type

Zarr store type to store the aux data in

--open-consolidated / --no-open-consolidated

Open shards consolidated meta-data

--numba-num-threads

Number of numba threads to use (use 'auto' to determine number of threads from current CPU count)

--debug

Enable debug logging outputs

--visualize

Enable outputting visualization results

--store-meta

Store meta-file per shard with CLI arguments and maglev runtime logging (if available)

--help

Show all the command-line options and exit.

Learn About NuRec Auxiliary Data Types#

  • Semantic Segmentation Data (conditionally required):

    • Method: Uses Mask2Former with DINOv2 backbone

    • Outputs:

      • Semantic segmentation masks (stored as PNG images)

      • Semantic segmentation logits (optional, for training/fine-tuning)

        • Default: –no-seg-logits (disabled by default)

        • Usage: Only stored when –seg-logits flag is used

        • Purpose: Used for advanced training techniques but not core functionality

      • Per-pixel class labels for scene understanding

    • Default: –segmentation-backend=”mask2former” (enabled by default)

    • Can be disabled: –segmentation-backend=”none”

    • Dependency: Required if LiDAR segmentation is enabled

    • Usage: Core for multi-modal understanding but can be skipped for pure image-based training

  • Depth Estimation Data (optional):

    • Method: Uses DepthAnythingV2 models

    • Types:

      • Relative depth: Normalized depth values [0,1]

      • Metric depth: Absolute depth values in meters (default max 80m for outdoor scenes)

    • Storage: Can be stored as quantized PNG or raw float16 values

    • Resolution: Configurable input resolution (default 1036px)

    • Default: –depth-backend=”none” (disabled by default)

    • Usage: Only generated when explicitly enabled with –depth-backend=depthanythingv2

    • Purpose: Provides geometric constraints but not essential for basic NeRF training

  • DINOv2 Feature Extraction (optional):

    • Purpose: Dense visual features for neural rendering

    • Models: Various DINOv2 variants (ViT-S/B/L/G with 14x14 patches)

    • Processing:

      • Optional PCA dimensionality reduction

      • Color transformation for visualization

      • Feature-to-color mapping for neural field training

    • Output: High-dimensional feature vectors per pixel patch

    • Default: –dinov2-backend=”none” (disabled by default)

    • Usage: Only used for advanced feature-based rendering when explicitly enabled

    • Purpose: Enhances semantic consistency and novel view synthesis quality, but not required

  • LiDAR Segmentation and Visibility (optional, recommended):

    • Method: Projects camera semantic segmentation onto LiDAR point clouds

    • Outputs:

      • Per-point semantic labels for LiDAR data

      • Point-in-camera visibility information

      • Ensemble-based label fusion from multiple camera views

    • Uses: CUDA-accelerated ensemble methods for performance

    • Default: –lidar-seg-camvis (enabled by default)

    • Can be disabled: –no-lidar-seg-camvis

    • Dependency: Requires semantic segmentation to be available (either generated or pre-existing)

    • Purpose: Essential for multi-modal NeRF training with LiDAR data

  • Metadata and Configuration (required):

    • Camera calibration and sensor metadata

    • Processing parameters and CLI arguments

    • Model metadata (versions, configurations, etc.)

    • Runtime information and workflow logging