NvPanoptix3D#

NvPanoptix3D is a 3D panoptic scene reconstruction network that takes a single RGB image as input and produces a complete 3D reconstruction of the scene, including depth estimation, 2D panoptic segmentation, 3D geometry, and 3D panoptic segmentation. The network is built on a VGGT (Visual Geometry Grounded Transformer) backbone combined with a Mask2Former-style decoder for the 2D stage and a sparse 3D convolutional frustum decoder for the 3D stage. The total model size is approximately 1.4 billion parameters.

NvPanoptix3D supports the following tasks:

  • train

  • evaluate

  • inference

  • export

The tasks are explained in detail in the following sections.

Note

  • Throughout this documentation are references to $EXPERIMENT_ID and $DATASET_ID in the FTMS Client sections.

    • For instructions on creating a dataset using the remote client, refer to the Creating a dataset section in the Remote Client documentation.

    • For instructions on creating an experiment using the remote client, refer to the Creating an experiment section in the Remote Client documentation.

  • The spec format is YAML for TAO Launcher, and JSON for FTMS Client.

  • File-related parameters, such as dataset paths or pretrained model paths, are required only for TAO Launcher, not for FTMS Client.

Pipeline Overview#

NvPanoptix3D uses a two-stage training pipeline. You must train Stage 1 before Stage 2.

Stage 1 — 2D Stage#

This stage trains joint 2D panoptic segmentation and depth estimation. It takes a single RGB image as input and produces:

  • A depth map

  • 2D panoptic segmentation masks

  • Object queries for the 3D stage

  • Camera intrinsic matrix

Stage 2 — 3D Stage#

This stage freezes the Stage 1 model weights and trains the 3D U-Net frustum completion module. It takes the Stage 1 outputs as input and produces:

  • 3D scene geometry as a truncated signed distance field (TSDF) at 3 cm voxel resolution

  • 3D panoptic segmentation at a 256 × 256 × 256 voxel grid

The dataset.enable_3d parameter in the configuration file controls which stage is active. Set it to False for Stage 1 and True for Stage 2.

Dataset Format#

NvPanoptix3D supports two datasets: 3D-Front and Matterport3D.

3D-Front Dataset#

3D-Front is a synthetic indoor scene dataset. The annotation JSON file for each split specifies the scene and image IDs to use. Organize the data in the following directory structure:

<base_dir>/
    data/
        <scene_id>/
            rgb_<img_id>.png
            depth_<img_id>.exr
            segmap_<img_id>.mapped.npz
            geometry_<img_id>.npz
            segmentation_<img_id>.mapped.npz
            weighting_<img_id>.npz

The files in each scene directory contain the following:

File

Description

rgb_<img_id>.png

RGB input image

depth_<img_id>.exr

Depth map in OpenEXR format

segmap_<img_id>.mapped.npz

2D panoptic segmentation labels with mapped category IDs

geometry_<img_id>.npz

3D geometry encoded as a truncated signed distance field

segmentation_<img_id>.mapped.npz

3D panoptic segmentation volumes

weighting_<img_id>.npz

Spatial weighting volumes used during 3D training

Matterport3D Dataset#

Matterport3D is a real indoor scene dataset with per-image camera intrinsics. The image ID format is <name>_<angle>_<rot>. Organize the data in the following directory structure:

<base_dir>/
    data/
        <scene_id>/
            <name>_i<angle>_<rot>.jpg
            <name>_segmap<angle>_<rot>.mapped.npz
            <name>_intrinsics_<angle>.npy
    depth_gen/
        <scene_id>/
            <name>_d<angle>_<rot>.png
    room_mask/
        <scene_id>/
            <name>_rm<angle>_<rot>.png

The files in each scene directory contain the following:

File

Description

<name>_i<angle>_<rot>.jpg

RGB input image

<name>_segmap<angle>_<rot>.mapped.npz

2D panoptic segmentation labels with mapped category IDs

<name>_intrinsics_<angle>.npy

Per-image camera intrinsic matrix

<name>_d<angle>_<rot>.png

Depth map

<name>_rm<angle>_<rot>.png

Room mask used for multiplane occupancy

Note

Unlike 3D-Front, Matterport3D uses per-image intrinsic matrices. Set dataset.downsample_factor to 2 and dataset.iso_value to 2.0 in the configuration file when training on Matterport3D.

Creating a Configuration File#

NvPanoptix3D uses a YAML configuration file with the following top-level sections: dataset, train, evaluate, inference, model, export, and wandb.

Because training is a two-stage process, prepare a separate configuration file for each stage. Sample configuration files for both datasets and both stages are provided in the experiment_specs directory of the NvPanoptix3D source:

  • spec_front3d_2d.yaml: Stage 1 (2D) training on 3D-Front

  • spec_front3d_3d.yaml: Stage 2 (3D) training on 3D-Front

  • spec_matterport_2d.yaml: Stage 1 (2D) training on Matterport3D

  • spec_matterport_3d.yaml: Stage 2 (3D) training on Matterport3D

The following example shows a Stage 2 (3D) configuration file for the 3D-Front dataset. The Stage 1 configuration is identical except that you set dataset.enable_3d to False, omit train.checkpoint_2d, and set the batch size for training to 16 instead of 1.

results_dir: /workspace/nvpanoptix3d/train3d_front3d

dataset:
  name: front3d
  contiguous_id: True
  label_map: ""
  downsample_factor: 1
  frustum_mask_path: ""
  iso_value: 1.0
  ignore_label: 255
  enable_3d: True        # Set to False for Stage 1 (2D) training
  enable_mp_occ: True
  train:
    json_path: /path/to/train.json
    base_dir: /path/to/front3d/data
    batch_size: 1
    num_workers: 2
  val:
    json_path: /path/to/val.json
    base_dir: /path/to/front3d/data
    batch_size: 1
    num_workers: 2
  test:
    json_path: /path/to/test.json
    base_dir: /path/to/front3d/data
    batch_size: 1
    num_workers: 2
  augmentation:
    train_min_size: [240]
    train_max_size: 960
    test_min_size: 240
    test_max_size: 960
    size_divisibility: 32

train:
  checkpoint_2d: /path/to/stage1/checkpoint.pth
  checkpoint_3d: ""
  freeze: []
  precision: fp32
  num_gpus: 1
  num_nodes: 1
  checkpoint_interval_unit: step
  checkpoint_interval: 1000
  num_epochs: 20
  activation_checkpoint: False
  optim:
    type: AdamW
    lr: 0.0001
    weight_decay: 0.05
    lr_scheduler: WarmupPoly
    max_steps: 110000

evaluate:
  checkpoint: ""

inference:
  images_dir: ""
  checkpoint: ""

model:
  object_mask_threshold: 0.8
  overlap_threshold: 0.8
  test_topk_per_image: 100
  mode: panoptic
  backbone:
    backbone_type: vggt
    pretrained_model_path: /path/to/vggt_pretrained.pth
  sem_seg_head:
    num_classes: 13
  mask_former:
    dropout: 0.0
    num_object_queries: 100
    deep_supervision: True
    no_object_weight: 0.1
    class_weight: 2.0
    mask_weight: 5.0
    dice_weight: 5.0
    depth_weight: 5.0
    mp_occ_weight: 5.0
    size_divisibility: 32
  frustum3d:
    truncation: 3.0
    iso_recon_value: 1.0
    panoptic_weight: 25.0
    completion_weights: [50.0, 25.0, 10.0]
    surface_weight: 5.0
    unet_output_channels: 16
    unet_features: 16
    use_multi_scale: True
    grid_dimensions: 256
    signed_channel: 3
    frustum_dims: 256
  projection:
    voxel_size: 0.03
    sign_channel: True

export:
  checkpoint: ""
  onnx_file_2d: /workspace/nvpanoptix3d/model_2d.onnx
  onnx_file_3d: ""
  on_cpu: False
  input_channel: 3
  input_width: 320
  input_height: 240
  opset_version: 17
  batch_size: 1
  verbose: False

wandb:
  enable: True
  name: nvpanoptix3d_vggt_3d_front3d
  tags: ["training", "nvpanoptix3d", "vggt", "3d_front3d"]

Configuration Parameters#

The following tables describe all available configuration parameters.

Experiment Configuration#

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

model_name

string

Name of model if invoking task via model_agnostic

encryption_key

string

Key for encrypting model checkpoints

results_dir

string

Path to where all the assets generated from a task are stored

wandb

collection

False

model

collection

Configurable parameters to construct the model for the NVPanoptix3D experiment

False

dataset

collection

Configurable parameters to construct the dataset for the NVPanoptix3D experiment

False

train

collection

Configurable parameters to construct the trainer for the NVPanoptix3D experiment

False

inference

collection

Configurable parameters to construct the inferencer for the NVPanoptix3D experiment

False

evaluate

collection

Configurable parameters to construct the evaluator for the NVPanoptix3D experiment

False

export

collection

Configurable parameters to construct the exporter for the NVPanoptix3D experiment

False

gen_trt_engine

collection

Configurable parameters to construct the TensorRT engine builder for a NVPanoptix3D experiment

False

WandB Configuration#

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

enable

bool

True

project

string

TAO Toolkit

entity

string

group

string

tags

list

[‘tao-toolkit’]

False

reinit

bool

False

sync_tensorboard

bool

False

save_code

bool

False

name

string

TAO Toolkit Training

run_id

string

Model Configuration#

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

backbone

collection

Configuration hyper parameters for the NVPanoptix3D Backbone

False

sem_seg_head

collection

Configuration hyper parameters for the Mask2Former Semantic Segmentation Head

False

mask_former

collection

Configuration hyper parameters for the Mask2Former model

False

frustum3d

collection

Configuration hyper parameters for the Frustum3D model

False

projection

collection

Configuration hyper parameters for the Projection model

False

mode

categorical

Segmentation mode

panoptic

panoptic,instance,semantic

object_mask_threshold

float

The value of the threshold to be used when filtering out the object mask

0.4

overlap_threshold

float

The value of the threshold to be used when evaluating overlap

0.5

test_topk_per_image

int

Keep topk instances per image for instance segmentation

100

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

backbone_type

categorical

Type of backbone to use. Available backbone: vggt

vggt

vggt

pretrained_model_path

string

Path to a pretrained backbone file

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

common_stride

int

Common stride

4

2

transformer_enc_layers

int

Number of transformer encoder layers

6

1

convs_dim

int

Convolutional layer dimension

256

1

mask_dim

int

Mask head dimension

256

1

depth_dim

int

Depth head dimension

256

1

ignore_value

int

Ignore value

255

0

255

deformable_transformer_encoder_in_features

list

List of feature names for deformable transformer encoder input

[‘res3’, ‘res4’, ‘res5’]

False

num_classes

int

Number of classes

13

1

norm

string

Norm layer type

GN

in_features

list

List of input feature names

[‘res2’, ‘res3’, ‘res4’, ‘res5’]

False

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

dropout

float

The probability to drop out

0

0.0

1.0

nheads

int

Number of heads

8

num_object_queries

int

The number of queries

100

1

inf

hidden_dim

int

Dimension of the hidden units

256

transformer_dim_feedforward

int

Dimension of the feedforward network in the transformer

1024

1

dim_feedforward

int

Dimension of the feedforward network

2048

1

dec_layers

int

Number of decoder layers in the transformer

10

1

pre_norm

bool

Whether to add layer norm in the encoder; 1=add layer norm, 0=do not add

0

class_weight

float

The relative weight of the classification error in the matching cost

2

0.0

inf

dice_weight

float

The relative weight of the focal loss of the binary mask in the matching cost

5

0.0

inf

mask_weight

float

The relative weight of the dice loss of the binary mask in the matching cost

5

0.0

inf

depth_weight

float

The relative weight of the depth loss in the matching cost

5

0.0

inf

mp_occ_weight

float

The relative weight of the mp occ loss in the matching cost

5

0.0

inf

train_num_points

int

The number of points to sample

12544

oversample_ratio

float

Oversampling parameter

3

importance_sample_ratio

float

Ratio of points that are sampled via important sampling

0.75

deep_supervision

bool

Flag to enable deep supervision

1

no_object_weight

float

The relative classification weight applied to the no-object category

0.1

size_divisibility

int

Size divisibility

32

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

truncation

float

The truncation value

3.0

iso_recon_value

float

The iso recon value

2.0

panoptic_weight

float

The weight of the panoptic loss

25.0

completion_weights

list

The weights of the completion loss

[50.0, 25.0, 10.0]

False

surface_weight

float

The weight of the surface loss

5.0

unet_output_channels

int

The number of output channels of the UNet

16

unet_features

int

The number of features of the UNet

16

use_multi_scale

bool

Whether to use multi-scale

False

grid_dimensions

int

The number of grid dimensions

256

frustum_dims

int

The number of frustum dimensions

256

signed_channel

int

The number of signed channel

3

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

voxel_size

float

The size of the voxel

0.03

sign_channel

bool

Whether to use signed channel

1

depth_feature_dim

int

The dimension of the depth feature

256

Dataset Configuration#

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

train

collection

Configurable parameters to construct the train dataset

False

val

collection

Configurable parameters to construct the validation dataset

False

test

collection

Configurable parameters to construct the test dataset

False

workers

int

The number of parallel workers processing data

8

1

pin_memory

bool

Flag to allocate pagelocked memory for faster of data between the CPU and GPU

True

augmentation

collection

Configuration parameters for data augmentation

False

contiguous_id

bool

Flag to enable contiguous IDs for labels

False

label_map

string

A path to label map file

name

categorical

Dataset name

front3d

front3d,matterport,synthetic_hospital,synthetic_warehouse

downsample_factor

int

Downsample factor (1: Synthetic & Front3D, 2: Matterport3D)

1

iso_value

float

ISO value to reconstruct mesh from TUDF volume

1.0

ignore_label

int

Ignore label value

255

min_instance_pixels

int

Minimum number of pixels required for an instance to be considered valid

200

img_format

string

Image format

RGB

target_size

list

Input image size to resize

[320, 240]

False

reduced_target_size

list

Image size to process at 3D stage

[160, 120]

False

depth_size

list

Input depth size to resize

[120, 160]

False

depth_bound

bool

Enable depth truncation in bounds

False

depth_min

float

Min depth value

0.4

depth_max

float

Max depth value

6.0

frustum_mask_path

string

Relative frustum mask path

meta/frustum_mask.npz

occ_truncation_lvl

list

Value to create occuppancy volume from TUDF volume

[8.0, 6.0]

False

truncation_range

list

truncation range for TUDF volume

[0.0, 12.0]

False

enable_3d

bool

Enable 3d for training

False

enable_mp_occ

bool

Enable multi-plane occupancy

True

depth_scale

float

Depth scale

25.0

num_thing_classes

int

Number of thing classes

9

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

base_dir

string

Root directory of the dataset

json_path

string

JSON file for image/mask pair

batch_size

int

Batch size

1

1

num_workers

int

Number of workers in the dataloader

1

0

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

base_dir

string

Root directory of the dataset

json_path

string

JSON file for image/mask pair

batch_size

int

Batch size

1

1

num_workers

int

Number of workers in the dataloader

1

0

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

base_dir

string

Root directory of the dataset

json_path

string

JSON file for image/mask pair

batch_size

int

Batch size

1

1

num_workers

int

Number of workers in the dataloader

1

0

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

train_min_size

list

A list of sizes to perform random resize

[448]

False

train_max_size

int

The maximum random crop size for training data

768

32

960

train_crop_size

list

The random crop size for training data in [H, W]

[240, 240]

False

test_min_size

int

The minimum resize size for test data

240

32

960

test_max_size

int

The maximum resize size for test

960

32

960

color_aug_ssd

bool

Color augmentation

False

enable_crop

bool

Enable cropping for input image

False

crop_size

list

Size to crop input image

[240, 240]

False

single_category_max_area

float

Maximum ratio of crop area that can be occupied by a single semantic category

1.0

0.0

1.0

random_flip

string

Flip horizontal/vertical

random_flip_prob

float

Flip probability

0.5

0.0

1.0

size_divisibility

float

Size divisibility to pad

-1

gen_aug_weight

float

Weight for generated augmentation, 0.0 will disable generated augmentation

0.0

0.0

1.0

Training Configuration#

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

num_gpus

int

The number of GPUs to run the train job

1

1

gpu_ids

list

List of GPU IDs to run the training on; length of list must equal train.num_gpus

[0]

False

num_nodes

int

Number of nodes to run the training on; if > 1, multi-node is enabled

1

1

seed

int

Seed for the initializer in PyTorch; if < 0, fixed seed is disabled

1234

-1

inf

cudnn

collection

False

num_epochs

int

Number of epochs to run the training

10

1

inf

checkpoint_interval

int

The interval (in epochs) at which a checkpoint is saved

1

1

checkpoint_interval_unit

categorical

The unit of the checkpoint interval

epoch

epoch,step

validation_interval

int

The interval (in epochs) at which an evaluation is triggered on the validation set

1

1

resume_training_checkpoint_path

string

Path to the checkpoint to resume training from

results_dir

string

The folder in which to save the experiment

checkpoint_2d

string

Path to 2D stage checkpoint to initialize the 3D stage training

checkpoint_3d

string

Path to 3D stage checkpoint to initialize the 3D stage training

val_check_interval

int

The number of iterations between validation checks

5

freeze

list

List of layer names to freeze

Example: [“backbone”, “transformer.encoder”, “input_proj”]

[]

False

clip_grad_norm

float

Amount to clip the gradient by L2 Norm

0.1

clip_grad_norm_type

clip_grad_type

string

Gradient clip type

full

is_dry_run

bool

Whether to run the trainer in Dry Run mode

False

optim

collection

Hyper parameters to configure the optimizer

False

precision

categorical

Precision to run the training on

fp32

fp16,fp32

distributed_strategy

categorical

The multi-GPU training strategy

DDP (Distributed Data Parallel) and Fully Sharded DDP are supported

ddp

ddp,fsdp

activation_checkpoint

bool

A True value instructs train to recompute in backward pass to save GPU memory,

rather than storing activations

True

verbose

bool

Flag to enable printing of detailed learning rate scaling from the optimizer

False

iters_per_epoch

int

Number of iterations per epoch

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

type

categorical

Type of optimizer used to train the network

AdamW

AdamW

monitor_name

categorical

The metric value to be monitored for the AutoReduce Scheduler

val_loss

val_loss,train_loss

lr

float

The initial learning rate for training the model

0.0002

0.0

1.0

True

backbone_multiplier

float

A multiplier for backbone learning rate

0.1

0.0

1.0

True

momentum

float

The momentum for the AdamW optimizer

0.9

0.0

1.0

True

weight_decay

float

The weight decay coefficient

0.05

0.0

1.0

True

lr_scheduler

categorical

The learning scheduler:
  • MultiStep: Decrease the lr by lr_decay from lr_steps

  • Warmuppoly: Poly learning rate schedule

MultiStep

MultiStep,Warmuppoly

milestones

list

Learning rate decay epochs

[88, 96]

False

gamma

float

Multiplicative factor of learning rate decay

0.1

max_steps

int

The maximum number of steps to train the model

160000

warmup_factor

float

The warmup factor for the learning rate scheduler

1.0

warmup_iters

int

The number of warmup iterations

0

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

benchmark

bool

Whether to enable cuDNN benchmark mode

False

deterministic

bool

Whether to enable cuDNN deterministic mode

True

Inference Configuration#

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

num_gpus

int

The number of GPUs to run the evaluation job

1

1

gpu_ids

list

List of GPU IDs to run the inference on; length of list must equal inference.num_gpus

[0]

False

num_nodes

int

Number of nodes to run the inference on; if > 1, multi-node is enabled

1

1

checkpoint

string

Path to the checkpoint file used for inference

trt_engine

string

Path to the TensorRT engine folder to be used for inference

results_dir

string

Path to where all the assets generated from a task are stored

batch_size

int

The batch size of the input tensor; important if batch_size > 1 for large datasets

-1

-1

mode

categorical

Mode to run inference

panoptic

semantic,instance,panoptic

images_dir

string

Path to the images directory

Evaluation Configuration#

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

num_gpus

int

The number of GPUs to run the evaluation job

1

1

gpu_ids

list

List of GPU IDs to run the evaluation on; length of list must equal evaluate.num_gpus

[0]

False

num_nodes

int

Number of nodes to run the evaluation on; if > 1, multi-node is enabled

1

1

checkpoint

string

Path to the checkpoint file used for evaluation

trt_engine

string

Path to the TensorRT engine to be used for evaluation; only works with tao-deploy

results_dir

string

Path to where all the assets generated from a task are stored

batch_size

int

The batch size of the input tensor; important if batch_size > 1 for large datasets

-1

-1

Export Configuration#

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

results_dir

string

Path to where all the assets generated from a task are stored

gpu_id

int

The index of the GPU used to build the TensorRT engine

0

checkpoint

string

Path to the checkpoint file to run export

???

onnx_file

string

Path to the ONNX model file

???

on_cpu

bool

Flag to export CPU compatible model

False

input_channel

ordered_int

Number of channels in the input tensor

3

1

1,3

input_width

int

Width of the input image tensor

960

32

input_height

int

Height of the input image tensor

544

32

opset_version

int

Operator set version of the ONNX model used to generate

the TensorRT engine

17

1

batch_size

int

The batch size of the input tensor for the engine

A value of -1 implies dynamic tensor shapes

-1

-1

verbose

bool

Flag to enable verbose TensorRT logging

False

format

categorical

File format to export to

onnx

onnx,xdl

onnx_file_2d

string

Path to the ONNX model 2D file

onnx_file_3d

string

Path to the ONNX model 3D file

max_voxels

int

The maximum number of voxels in the input tensor for the engine

700000

1

TensorRT Engine Configuration#

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

results_dir

string

Path to where all the assets generated from a task are stored

gpu_id

int

The index of the GPU used to build the TensorRT engine

0

0

onnx_file

string

Path to the ONNX model file

???

trt_engine

string

Path to the generated TensorRT engine; only works with tao-deploy

???

timing_cache

string

Path to a TensorRT timing cache that speeds up engine generation;

will be created, read, and updated

batch_size

int

The batch size of the input tensor for the engine

A value of -1 implies dynamic tensor shapes

-1

-1

verbose

bool

Flag to enable verbose TensorRT logging

False

tensorrt

collection

Hyper parameters to configure the NVPanoptix3D TensorRT Engine builder

False

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

workspace_size

int

The size (in megabytes) of the workspace TensorRT has

to run its optimization tactics and generate the TensorRT engine

1024

0

min_batch_size

int

The minimum batch size in the optimization profile for

the input tensor of the TensorRT engine

1

1

opt_batch_size

int

The optimum batch size in the optimization profile for

the input tensor of the TensorRT engine

1

1

max_batch_size

int

The maximum batch size in the optimization profile for

the input tensor of the TensorRT engine

1

1

layers_precision

list

The list to specify layer precision

[]

False

data_type

categorical

The precision to be set for building the TensorRT engine

FP32

FP32,FP16

Training#

NvPanoptix3D training requires two sequential stages. Complete Stage 1 before beginning Stage 2.

Stage 1: 2D Panoptic Segmentation#

Stage 1 trains the 2D panoptic segmentation and depth estimation head. Set dataset.enable_3d to False in your configuration file.

To run Stage 1 training:

BASE_EXPERIMENT_ID=$(tao nvpanoptix3d list-base-experiments | jq -r '.[0].id')
STAGE1_SPECS=$(tao nvpanoptix3d get-job-schema --action train --base-experiment-id $BASE_EXPERIMENT_ID | jq -r '.default')

STAGE1_JOB_ID=$(tao nvpanoptix3d create-job \
  --kind experiment \
  --name "nvpanoptix3d_stage1_train" \
  --action train \
  --workspace-id $WORKSPACE_ID \
  --specs @stage1_spec.yaml \
  --train-dataset-uri "$DATASET_URI" \
  --eval-dataset-uri "$DATASET_URI" \
  --base-experiment-id "$BASE_EXPERIMENT_ID" \
  --encryption-key "nvidia_tlt" | jq -r '.id')

Multi-Node Training with FTMS

Distributed training is supported through FTMS. For large models, multi-node clusters can bring significant speedups and performance improvements for training.

Verify that your cluster has multiple GPU enabled nodes available for training by running this command:

kubectl get nodes -o wide

The command lists the nodes in your cluster. If it does not list multiple nodes, contact your cluster administrator to get more nodes added to your cluster.

To run a multi-node training job through FTMS, modify these fields in the training job specification:

{
    "train": {
        "num_gpus": 8, // Number of GPUs per node
        "num_nodes": 2 // Number of nodes to use for training
    }
}

If these fields are not specified, FTMS uses the default values of one GPU per node and one node.

Note

The number of GPUs specified in the num_gpus field must not exceed the number of GPUs per node in the cluster. The number of nodes specified in the num_nodes field must not exceed the number of nodes in the cluster.

tao nvpanoptix3d train \
    -e /path/to/spec_2d.yaml \
    dataset.train.json_path=/path/to/train.json \
    dataset.train.base_dir=/path/to/data \
    dataset.val.json_path=/path/to/val.json \
    dataset.val.base_dir=/path/to/data \
    model.backbone.pretrained_model_path=/path/to/vggt_pretrained.pth \
    results_dir=/path/to/results/stage1

Required arguments:

  • -e: Path to the Stage 1 experiment specification file.

Optional arguments:

  • results_dir: Override the results directory.

  • train.num_gpus: Number of GPUs to use.

  • model.backbone.pretrained_model_path: Path to pretrained VGGT backbone weights.

Note

For training, evaluation, and inference, we expose two variables for each task: num_gpus and gpu_ids, which default to 1 and [0], respectively. If both are passed, but are inconsistent, for example num_gpus = 1, gpu_ids = [0, 1], then they are modified to follow the setting that implies more GPUs; in the same example num_gpus is modified from 1 to 2.

In some cases multi-GPU training may result in a segmentation fault. You can circumvent this by setting the enviroment variable OMP_NUM_THREADS to 1. Depending upon your model of execution, you may use the following methods to set this variable:

  • CLI Launcher:

    You may set the environment variable by adding the following fields to the Envs field of your ~/.tao_mounts.json file as mentioned in bullet 3 in ths section Running the launcher.

    {
        "Envs": [
            {
                "variable": "OMP_NUM_THREADSR",
                "value": "1"
            }
    
    }
    
  • Docker:

    You may set environment variables in Docker by setting the -e flag in the Docker command line.

    docker run -it --rm --gpus all \
        -e OMP_NUM_THREADS=1 \
        -v /path/to/local/mount:/path/to/docker/mount nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt <model> train -e
    

Stage 2: 3D Volumetric Reconstruction#

Stage 2 freezes the Stage 1 model weights and trains the 3D U-Net frustum completion module. Set dataset.enable_3d to True and provide the Stage 1 checkpoint via train.checkpoint_2d.

To run Stage 2 training:

STAGE2_SPECS=$(tao nvpanoptix3d get-job-schema --action train --base-experiment-id $BASE_EXPERIMENT_ID | jq -r '.default')

STAGE2_JOB_ID=$(tao nvpanoptix3d create-job \
  --kind experiment \
  --name "nvpanoptix3d_stage2_train" \
  --action train \
  --workspace-id $WORKSPACE_ID \
  --specs @stage2_spec.yaml \
  --train-dataset-uri "$DATASET_URI" \
  --eval-dataset-uri "$DATASET_URI" \
  --parent-job-id $STAGE1_JOB_ID \
  --base-experiment-id "$BASE_EXPERIMENT_ID" \
  --encryption-key "nvidia_tlt" | jq -r '.id')

Multi-Node Training with FTMS

Distributed training is supported through FTMS. For large models, multi-node clusters can bring significant speedups and performance improvements for training.

Verify that your cluster has multiple GPU enabled nodes available for training by running this command:

kubectl get nodes -o wide

The command lists the nodes in your cluster. If it does not list multiple nodes, contact your cluster administrator to get more nodes added to your cluster.

To run a multi-node training job through FTMS, modify these fields in the training job specification:

{
    "train": {
        "num_gpus": 8, // Number of GPUs per node
        "num_nodes": 2 // Number of nodes to use for training
    }
}

If these fields are not specified, FTMS uses the default values of one GPU per node and one node.

Note

The number of GPUs specified in the num_gpus field must not exceed the number of GPUs per node in the cluster. The number of nodes specified in the num_nodes field must not exceed the number of nodes in the cluster.

tao nvpanoptix3d train \
    -e /path/to/spec_3d.yaml \
    dataset.train.json_path=/path/to/train.json \
    dataset.train.base_dir=/path/to/data \
    dataset.val.json_path=/path/to/val.json \
    dataset.val.base_dir=/path/to/data \
    train.checkpoint_2d=/path/to/results/stage1/checkpoint.pth \
    results_dir=/path/to/results/stage2

To resume Stage 2 training from an existing Stage 2 checkpoint, also set train.checkpoint_3d:

tao nvpanoptix3d train \
    -e /path/to/spec_3d.yaml \
    train.checkpoint_2d=/path/to/results/stage1/checkpoint.pth \
    train.checkpoint_3d=/path/to/results/stage2/checkpoint.pth \
    results_dir=/path/to/results/stage2_resumed

Required arguments:

  • -e: Path to the Stage 2 experiment specification file.

  • train.checkpoint_2d: Path to the Stage 1 checkpoint.

Optional arguments:

  • results_dir: Override the results directory.

  • train.checkpoint_3d: Resume Stage 2 from an existing checkpoint.

  • train.num_gpus: Number of GPUs. For 3D training, also set train.activation_checkpoint=True to reduce GPU memory usage.

Note

For training, evaluation, and inference, we expose two variables for each task: num_gpus and gpu_ids, which default to 1 and [0], respectively. If both are passed, but are inconsistent, for example num_gpus = 1, gpu_ids = [0, 1], then they are modified to follow the setting that implies more GPUs; in the same example num_gpus is modified from 1 to 2.

In some cases multi-GPU training may result in a segmentation fault. You can circumvent this by setting the enviroment variable OMP_NUM_THREADS to 1. Depending upon your model of execution, you may use the following methods to set this variable:

  • CLI Launcher:

    You may set the environment variable by adding the following fields to the Envs field of your ~/.tao_mounts.json file as mentioned in bullet 3 in ths section Running the launcher.

    {
        "Envs": [
            {
                "variable": "OMP_NUM_THREADSR",
                "value": "1"
            }
    
    }
    
  • Docker:

    You may set environment variables in Docker by setting the -e flag in the Docker command line.

    docker run -it --rm --gpus all \
        -e OMP_NUM_THREADS=1 \
        -v /path/to/local/mount:/path/to/docker/mount nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt <model> train -e
    

Note

Only fp32 precision is supported. Mixed precision training is not available for NvPanoptix3D.

Checkpointing and Resuming Training

At every train.checkpoint_interval, a PyTorch Lightning checkpoint is saved. It is called model_epoch_<epoch_num>.pth. Checkpoints are saved in train.results_dir, like this:

$ ls /results/train

'model_epoch_000.pth'
'model_epoch_001.pth'
'model_epoch_002.pth'
'model_epoch_003.pth'
'model_epoch_004.pth'

Evaluation#

NvPanoptix3D reports the following metrics, which assess the quality of 3D panoptic reconstruction:

Metric

Description

PRQ

Panoptic Reconstruction Quality. Overall 3D panoptic performance, combining geometry accuracy and semantic recognition.

RSQ

Reconstructed Segmentation Quality. Measures the quality of semantic segmentation in 3D.

RRQ

Reconstruction Recognition Quality. Measures instance recognition quality in 3D.

To evaluate a trained model, provide the path to the checkpoint and the test dataset:

EVALUATE_SPECS=$(tao nvpanoptix3d get-job-schema --action evaluate --base-experiment-id $BASE_EXPERIMENT_ID | jq -r '.default')

EVAL_JOB_ID=$(tao nvpanoptix3d create-job \
  --kind experiment \
  --name "nvpanoptix3d_evaluate" \
  --action evaluate \
  --workspace-id $WORKSPACE_ID \
  --parent-job-id $STAGE2_JOB_ID \
  --eval-dataset-uri "$DATASET_URI" \
  --specs @eval_spec.yaml \
  --base-experiment-id "$BASE_EXPERIMENT_ID" \
  --encryption-key "nvidia_tlt" | jq -r '.id')
tao nvpanoptix3d evaluate \
    -e /path/to/spec.yaml \
    dataset.test.json_path=/path/to/test.json \
    dataset.test.base_dir=/path/to/data \
    evaluate.checkpoint=/path/to/checkpoint.pth

Required arguments:

  • -e: Path to the experiment specification file.

  • evaluate.checkpoint: Path to the trained checkpoint.

Optional arguments:

  • evaluate.num_gpus: Number of GPUs to use.

Set dataset.enable_3d to True in the specification file to evaluate the full 3D model, or False to evaluate the Stage 1 (2D) model only. When evaluating the 2D model, the reported metric is Panoptic Quality (PQ).

Performance#

The following tables show NvPanoptix3D performance on the 3D-Front test set and Matterport3D validation set, compared against published baselines. Metrics are reported for all categories combined and broken down into Things (countable objects) and Stuff (background regions). Bold values indicate the best result in each column.

3D-Front Test Set#

Model

PRQ (All / Things / Stuff)

RSQ (All / Things / Stuff)

RRQ (All / Things / Stuff)

BUOL

54.01 / 49.73 / 73.30

63.81 / 60.57 / 78.37

82.99 / 80.67 / 93.42

Uni3D

52.76 / 47.29 / 77.41

60.98 / 56.56 / 80.87

84.26 / 81.81 / 95.31

NvPanoptix3D

54.32 / 49.74 / 74.90

62.95 / 58.98 / 80.80

83.94 / 82.15 / 92.00

Matterport3D Validation Set#

Model

PRQ (All / Things / Stuff)

RSQ (All / Things / Stuff)

RRQ (All / Things / Stuff)

BUOL

14.47 / 10.97 / 24.94

45.71 / 45.30 / 46.93

30.91 / 23.81 / 52.22

Uni3D

16.32 / 13.21 / 29.33

44.36 / 44.58 / 44.09

36.48 / 29.33 / 65.19

NvPanoptix3D

17.63 / 14.79 / 28.04

45.27 / 45.68 / 43.31

38.98 / 32.26 / 64.02

NvPanoptix3D achieves the highest PRQ on both datasets and the highest RRQ on 3D-Front, demonstrating strong 3D panoptic reconstruction quality across both synthetic and real indoor environments.

Inference#

NvPanoptix3D inference runs on a directory of RGB images. NvPanoptix3D does not require ground truth annotations. The network accepts .jpg and .png images as input.

To run inference:

INFERENCE_SPECS=$(tao nvpanoptix3d get-job-schema --action inference --base-experiment-id $BASE_EXPERIMENT_ID | jq -r '.default')

INFER_JOB_ID=$(tao nvpanoptix3d create-job \
  --kind experiment \
  --name "nvpanoptix3d_inference" \
  --action inference \
  --workspace-id $WORKSPACE_ID \
  --parent-job-id $STAGE2_JOB_ID \
  --inference-dataset-uri "$DATASET_URI" \
  --specs @inference_spec.yaml \
  --base-experiment-id "$BASE_EXPERIMENT_ID" \
  --encryption-key "nvidia_tlt" | jq -r '.id')
tao nvpanoptix3d inference \
    -e /path/to/spec.yaml \
    inference.images_dir=/path/to/images \
    inference.checkpoint=/path/to/checkpoint.pth \
    results_dir=/path/to/inference_results

Required arguments:

  • -e: Path to the experiment specification file.

  • inference.checkpoint: Path to the trained checkpoint.

Optional arguments:

  • inference.images_dir: Override the images directory. Defaults to the value in the specification file.

  • inference.num_gpus: Number of GPUs to use. Defaults to 1.

Set dataset.enable_3d to True to produce 3D reconstruction outputs, or False to produce 2D panoptic segmentation and depth outputs only.

The inference outputs saved to results_dir include the following:

Output

Shape

Description

2D panoptic segmentation

(120, 160)

Per-pixel panoptic label map combining semantic and instance information.

2D depth map

(120, 160)

Per-pixel depth estimate in meters.

3D geometry

(256, 256, 256)

Truncated signed distance field representing the 3D scene geometry.

3D semantic segmentation

(256, 256, 256)

Per-voxel semantic class labels.

3D panoptic segmentation

(256, 256, 256)

Per-voxel panoptic labels combining semantic and instance information.

Export#

NvPanoptix3D exports the Stage 1 (2D) model to ONNX format for deployment with NVIDIA® TensorRT™.

Note

Only the 2D model supports ONNX export in this release. 3D model export is not yet available.

To export the 2D model:

EXPORT_SPECS=$(tao nvpanoptix3d get-job-schema --action export --base-experiment-id $BASE_EXPERIMENT_ID | jq -r '.default')

EXPORT_JOB_ID=$(tao nvpanoptix3d create-job \
  --kind experiment \
  --name "nvpanoptix3d_export" \
  --action export \
  --workspace-id $WORKSPACE_ID \
  --parent-job-id $STAGE2_JOB_ID \
  --specs @export_spec.yaml \
  --base-experiment-id "$BASE_EXPERIMENT_ID" \
  --encryption-key "nvidia_tlt" | jq -r '.id')
tao nvpanoptix3d export \
    -e /path/to/spec.yaml \
    export.checkpoint=/path/to/stage1_checkpoint.pth \
    export.onnx_file_2d=/path/to/output/model_2d.onnx \
    export.input_height=256 \
    export.input_width=320 \
    export.opset_version=17

Required arguments:

  • -e: Path to the experiment specification file.

  • export.checkpoint: Path to the Stage 1 checkpoint to export.

  • export.onnx_file_2d: Output path for the exported 2D ONNX file.

Optional arguments:

  • export.input_height: Input image height. Default: 256.

  • export.input_width: Input image width. Default: 320.

  • export.opset_version: ONNX opset version. Default: 17.

After export, generate a TensorRT engine from the ONNX file using the provided gen_trt_engine.py script:

python3 gen_trt_engine.py \
    --onnx_file_2d /path/to/model_2d.onnx \
    --trt_engine_2d /path/to/trt_2d_engine.engine \
    --batch_size 1 \
    --input_height 256 \
    --input_width 320 \
    --workspace_gb 8

Inference with NVIDIA Triton Inference Server#

NvPanoptix3D supports deployment as a hybrid TensorRT and PyTorch ensemble model on NVIDIA Triton Inference Server. The 2D stage runs as a TensorRT engine, and the 3D stage runs as a PyTorch model. The Triton model repository and client scripts are provided in the tlt-triton-apps repository.

The Triton model accepts the following inputs and produces the following outputs:

Name

Direction

Data Type

Description

images

Input

UINT8

RGB image with shape [3, H, W].

frustum_mask

Input

UINT8

Frustum mask with shape [D, H, W].

intrinsic

Input

FP32

Camera intrinsic matrix with shape [4, 4].

panoptic_seg_2d

Output

INT32

2D panoptic segmentation map.

depth_2d

Output

FP32

2D depth map.

panoptic_seg_3d

Output

INT32

3D panoptic segmentation volume.

geometry_3d

Output

FP32

3D scene geometry (truncated signed distance field).

semantic_seg_3d

Output

INT32

3D semantic segmentation volume.

To start the Triton server:

bash scripts/nvpanoptix3d_e2e_inference/start_server.sh

Install Python client requirements:

pip install -r scripts/nvpanoptix3d_e2e_inference/client-requirements.txt

To run the Triton client against the server:

bash scripts/nvpanoptix3d_e2e_inference/start_client.sh

Refer to the tlt-triton-apps repository for complete setup instructions, including NGC authentication, Docker configuration, and client usage.