Visual ChangeNet-Classification#

Visual ChangeNet-Classification is an NVIDIA-developed classification change detection model and is included in the TAO. Visual ChangeNet supports the following tasks:

train
evaluate
inference
export

Each task is explained in detail in the following sections.

Note

Throughout this documentation are references to $EXPERIMENT_ID and $DATASET_ID in the FTMS Client sections.
- For instructions on creating a dataset using the remote client, refer to the Creating a dataset section in the Remote Client documentation.
- For instructions on creating an experiment using the remote client, refer to the Creating an experiment section in the Remote Client documentation.
The spec format is YAML for TAO Launcher, and JSON for FTMS Client.
File-related parameters, such as dataset paths or pretrained model paths, are required only for TAO Launcher, not for FTMS Client.

Data Input for VisualChangeNet#

Single Golden Data Format#

VisualChangeNet-Classification requires the data to be provided as image and CSV files. Refer to the Data Annotation Format page for more information about the input data format for VisualChangeNet-Classification, which follows the same input data format as Optical Inspection.

Multiple Golden Data Format#

To enable Multiple Golden mode, set num_golden > 1 in the Dataset Configuration. This mode requires a different data format to support multiple golden reference images per sample. Refer to the Data Annotation Format page for more information about the input data format for Multiple-Golden-VisualChangeNet-Classification.

Creating a Training Experiment Spec File#

Configuring a Custom Dataset#

This section provides example configuration and commands to retrieve configuration for training VisualChangeNet-Classification using the dataset format described above.

Note

Make sure to set task=classify in SPECS for all task specs.

TAO Client (v2 API)

BASE_EXPERIMENT_ID=$(tao visual_changenet list-base-experiments | jq -r '.[0].id')
SPECS=$(tao visual_changenet get-job-schema --action train --base-experiment-id $BASE_EXPERIMENT_ID | jq -r '.default')

TAO Launcher

Here is an example spec file for training a VisualChangeNet-Classification model with NVIDIA’s FAN Hybrid backbone using the Data Annotation Format.

encryption_key: tlt_encode
task: classify
train:
  pretrained_model_path: /path/to/pretrained/model.pth
  resume_training_checkpoint_path: null
  classify:
    loss: "ce"
    cls_weight: [1.0, 10.0]
  num_epochs: 10
  num_nodes: 1
  validation_interval: 5
  checkpoint_interval: 5
  seed: 1234
  optim:
    lr: 0.0001
    optim: "adamw"
    policy: "linear"
    momentum: 0.9
    weight_decay: 0.01
  results_dir: "${results_dir}/train"
  tensorboard:
    enabled: True
results_dir: /path/to/experiment_results
model:
  backbone:
    type: "fan_small_12_p4_hybrid"
    pretrained_backbone_path: null
    freeze_backbone: False
  decode_head:
    feature_strides: [4, 8, 16, 16]
    use_summary_token: True
  classify:
    train_margin_euclid: 2.0
    eval_margin: 0.005
    embedding_vectors: 5
    embed_dec: 30
    difference_module: 'learnable'
    learnable_difference_modules: 4
dataset:
  classify:
    train_dataset:
      csv_path: /path/to/train.csv
      images_dir: /path/to/img_dir
    validation_dataset:
      csv_path: /path/to/val.csv
      images_dir: /path/to/img_dir
    test_dataset:
      csv_path: /path/to/test.csv
      images_dir: /path/to/img_dir
    infer_dataset:
      csv_path: /path/to/infer.csv
      images_dir: /path/to/img_dir
    image_ext: .jpg
    batch_size: 16
    workers: 2
    fpratio_sampling: 0.2
    num_input: 4
    input_map:
      LowAngleLight: 0
      SolderLight: 1
      UniformLight: 2
      WhiteLight: 3
    concat_type: linear
    grid_map:
      x: 2
      y: 2
    image_width: 128
    image_height: 128
    augmentation_config:
      rgb_input_mean: [0.485, 0.456, 0.406]
      rgb_input_std: [0.229, 0.224, 0.225]
    num_classes: 2
    num_golden: 1
evaluate:
  checkpoint: "???"
inference:
  checkpoint: "???"
export:
  gpu_id: 0
  checkpoint: "???"
  onnx_file: "???"
  input_width: 128
  input_height: 512

Parameter	Data Type	Default	Description	Supported Values
model	dict config	–	The configuration of the model architecture.
dataset	dict config	–	The configuration of the dataset.
train	dict config	–	The configuration of the training task.
evaluate	dict config	–	The configuration of the evaluation task.
inference	dict config	–	The configuration of the inference task.
encryption_key	string	None	The encryption key to encrypt and decrypt model files.
results_dir	string	/results	The directory where experiment results are saved.
export	dict config	–	The configuration of the ONNX export task.
task	str	classify	A flag to indicate the change detection task. Supports two tasks: ‘segment’ and ‘classify’ for segmentation and classification.	classify, segment

train#

Parameter	Datatype	Default	Description	Supported Values
num_gpus	unsigned int	1	The number of GPUs to use for distributed training.	>0
gpu_ids	List[int]	[0]	The indices of the GPU’s to use for distributed training.
seed	unsigned int	1234	The random seed for random, NumPy, and torch.	>0
num_epochs	unsigned int	10	The total number of epochs to run the experiment.	>0
checkpoint_interval	unsigned int	1	The epoch interval at which the checkpoints are saved.	>0
validation_interval	unsigned int	1	The epoch interval at which the validation is run.	>0
resume_training_checkpoint_path	string		The intermediate PyTorch Lightning checkpoint from which to resume training.
results_dir	string	/results/train	The directory in which to save training results.
classify	Dict str list	None ce	The classify dict contains configurable parameters for the VisualChangeNet Classification pipeline with the following parameters: * loss: The loss function used for classification training. * cls_weights: Weights for Cross-Entropy Loss for unbalanced dataset distributions.
segment	Dict str list	None ce [0.5, 0.5, 0.5, 0.8, 1.0]	The segment dict contains configurable parameters for the VisualChangeNet Segmentation pipeline with the following parameters: * loss: The loss function used for segmentation training.
num_nodes	unsigned int	1	The number of nodes. If larger than 1, multi-node is enabled.
pretrained_model_path	string	–	The path to the pretrained model checkpoint to initialize the end-end model weights.
optim	`dict` `config`	None	Contains the configurable parameters for the VisualChangeNet optimizer detailed in the optim section.
tensorboard	`dict` config bool	None True	Enable TensorBoard visualisation using a dict with configurable parameters: * enabled: If set to `True`, enables TensorBoard.

optim#

optim:
  lr: 0.0001
  optim: "adamw"
  policy: "linear"
  momentum: 0.9
  weight_decay: 0.01

Parameter	Datatype	Default	Description	Supported Values
lr	float	0.0005	The learning rate.	>=0.0
optim	str	adamw	The optimizer.
policy	str	linear	The learning scheduler: * linear : LambdaLR decreases the lr by a multiplicative factor. * step : StepLR decrease the lr by 0.1 at every `num_epochs // 3` steps.	linear/step
momentum	float	0.9	The momentum for the AdamW optimizer.
weight_decay	float	0.1	The weight decay coefficient.
monitor_name	str	val_loss	The name of the monitor used for saving the top-k checkpoints.

Model#

The following example model config provides options to change the VisualChangeNet-Classification architecture for training. VisualChangeNet-Classification supports two model architectures. Architecture 1 (difference_module = euclidean) leverages only the last feature maps from the FAN backbone using Euclidean difference to perform contrastive learning. Architecture 2 (difference_module = learnable) leverages the VisualChangeNet-Classification learnable difference modules for 4 different features at 3 feature resolutions to minimize Cross-Entropy loss.

model:
  backbone:
    type: "fan_small_12_p4_hybrid"
    pretrained_backbone_path: null
    freeze_backbone: False
  decode_head:
    feature_strides: [4, 8, 16, 16]
    align_corner: False
    use_summary_token: True
  classify:
    train_margin_euclid: 2.0
    eval_margin: 0.005
    embedding_vectors: 5
    embed_dec: 30
    difference_module: 'learnable'
    learnable_difference_modules: 4

Parameter	Datatype	Default	Description	Supported Values
backbone	Dict string bool bool	None None False False	A dictionary containing the following configurable parameters for VisualChangeNet-Classification backbone: * type: The name of the backbone to be used. * pretrained_backbone_path: The path to pre-trained backbone weights file. * freeze_backbone: If set to `True`, freezes the backbone weights during training. * feat_downsample: If set to `True`, downsamples the last feature map in FAN backbone configurations. This parameter is not propagated to other backbones.	fan_tiny_8_p4_hybrid fan_large_16_p4_hybrid fan_small_12_p4_hybrid fan_base_16_p4_hybrid vit_large_nvdinov2 c_radio_p1_vit_huge_patch16_224_mlpnorm c_radio_p2_vit_huge_patch16_224_mlpnorm c_radio_p3_vit_huge_patch16_224_mlpnorm c_radio_v2_vit_huge_patch16_224 c_radio_v2_vit_large_patch16_224 c_radio_v2_vit_base_patch16_224
decode_head	Dict bool bool list Dict int	None False True [4, 8, 16, 16] 256	A dictionary containing the following configurable parameters for the decoder: * align_corners: If set to `True`, the input and output tensors are aligned by the center points of their corner pixels, preserving the values at the corner pixels. * use_summary_token: If set to `True`, uses the summary token of the backbone. * feature_strides: The downsampling feature strides for different backbones. * decoder_params: Contains the following network parameters: – embed_dims: The embedding dimensions.	True, False True, False >0
classify	Dict string	None 2.0 5 30 learnable 4	A dictionary containing the following configurable parameters for VisualChangeNet-Classification model: * train_margin_euclid: The training margin threshold for contrastive learning (applicable for Architecture 1). * eval_margin: The evaluation margin threshold. * embedding_vectors: The output embedding dimension for each input image before computing Euclidean distance (applicable to Architecture 1). * embed_dec: The transformer decoder MLP embedding dimension (applicable to Architecture 2). * difference_module: The type of difference module used (applicable to both architectures). * learnable_difference_modules: The number of learnable difference modules (applicable to Architecture 2).	>0 >0 >0 >0 euclidean, learnable <4

Dataset#

The dataset parameter defines the dataset source, training batch size, augmentation, and pre-processing. An example dataset is provided below.

dataset:
  classify:
    train_dataset:
      csv_path: /path/to/train.csv
      images_dir: /path/to/img_dir
    validation_dataset:
      csv_path: /path/to/val.csv
      images_dir: /path/to/img_dir
    test_dataset:
      csv_path: /path/to/test.csv
      images_dir: /path/to/img_dir
    infer_dataset:
      csv_path: /path/to/infer.csv
      images_dir: /path/to/img_dir
    image_ext: .jpg
    batch_size: 16
    workers: 2
    fpratio_sampling: 0.2
    num_input: 4
    input_map:
      LowAngleLight: 0
      SolderLight: 1
      UniformLight: 2
      WhiteLight: 3
    concat_type: linear
    grid_map:
      x: 2
      y: 2
    image_width: 128
    image_height: 128
    augmentation_config:
      rgb_input_mean: [0.485, 0.456, 0.406]
      rgb_input_std: [0.229, 0.224, 0.225]
    num_classes: 2

* Refer to the Dataset Annotation Format definition for more information about specifying lighting conditions.

Parameter	Datatype	Default	Description	Supported Values
segment	Dict	–	The `segment` contains dataset config for the segmentation dataloader.
classify	Dict	–	The `classify` contains dataset config for the classification dataloader detailed in the classify section.

classify#

Parameter	Datatype	Default	Description	Supported Values
train_dataset	Dict	–	The paths to the image directory and CSV files for the training dataset.
validation_dataset	Dict	–	The paths to the image directory and CSV files for the validation dataset.
test_dataset	Dict	–	The paths to the image directory and CSV files for the test dataset.
infer_dataset	Dict	–	The paths to the image directory and CSV files for the inference dataset.
image_ext	str	.jpg	The file extension of the images in the dataset.	string
batch_size	int	32	The number of samples per batch.	string
workers	int	8	The number of worker processes for data loading.
fpratio_sampling	int	0.1	The ratio of false-positive examples to sample.	>0
num_input	int	4	The number of lighting conditions for each input image*.	>0
input_map	Dict	–	The mapping of lighting conditions to indices specifying concatenation ordering*.
concat_type	string	linear	Type of concatenation to use for different image lighting conditions.	linear, grid
grid_map	Dict Dict Dict	None None None	The parameters to define the grid dimensions to concatenate images as a grid: * x: The number of images along the x-axis. * y: The number of images along the y-axis.	Dict
input_width	int	100	The width of the input image.	>0
input_height	int	100	The height of the input image.	>0
num_classes	int	2	The number of classes in the dataset.	>1
augmentation_config	Dict	None	Dictionary containing various data augmentation settings, which is detailed in the augmentation section.
num_golden	int	1	Number of golden images to use per input image. Setting this value greater than 1 enables Multiple Golden mode. Multiple Golden mode is only supported with ViT backbones, using `input_width = input_height = 224` and `input_map = None`. In Multiple Golden mode, the dataset must follow the multiple golden data format.	>0

augmentation_config#

Parameter	Datatype	Default	Description	Supported Values
random_flip	Dict float float bool	None 0.5 0.5 True	Random vertical and horizontal flipping augmentation settings. * vflip_probability: Probability of vertical flipping. * hflip_probability: Probability of horizontal flipping. * enable: If set to `True`, enables random flipping augmentation.	>=0.0 >=0.0
random_rotate	Dict float list bool	None 0.5 [90, 180, 270] True	Random rotation augmentation settings. * rotate_probability: Probability of applying random rotation. * angle_list: List of rotation angles to choose from. * enable: If set to `True`, enables random rotation augmentation.	>=0.0 >=0.0
random_color	Dict float float float float bool float	None 0.3 0.3 0.3 0.3 True 0.5	Random color augmentation settings. * brightness: Maximum brightness change factor. * contrast: Maximum contrast change factor. * saturation: Maximum saturation change factor. * hue: Maximum hue change factor. * enabled: If set to `True`, enables random color augmentation. * color_probability: Probability of applying color augmentation.	>=0.0 >=0.0 >=0.0 >=0.0 >=0.0
with_random_crop	bool	True	If set to `True`, applies random crop augmentation.	True, False
with_random_blur	bool	True	If set to `True`, applies random blurring augmentation.	True, False
rgb_input_mean	List[float]	[0.485, 0.456, 0.406]	The mean to be subtracted for pre-processing.
rgb_input_std	List[float]	[0.229, 0.224, 0.225]	The standard deviation to divide the image by.
augment	bool	False	If set to `True`, applies data augmentations.	True, False

Example spec File for ViT Backbones#

Note

The following spec file is only relevant for TAO versions 5.3 and later.

TAO Client (v2 API)

BASE_EXPERIMENT_ID=$(tao visual_changenet list-base-experiments | jq -r '.[0].id')
SPECS=$(tao visual_changenet get-job-schema --action train --base-experiment-id $BASE_EXPERIMENT_ID | jq -r '.default')

TAO Launcher

encryption_key: tlt_encode
task: classify
train:
  pretrained_model_path: /path/to/pretrained/model.pth
  resume_training_checkpoint_path: null
  classify:
    loss: "contrastive"
    cls_weight: [1.0, 10.0]
  num_epochs: 10
  num_nodes: 1
  validation_interval: 5
  checkpoint_interval: 5
  seed: 1234
  optim:
    lr: 0.0001
    optim: "adamw"
    policy: "linear"
    momentum: 0.9
    weight_decay: 0.01
  results_dir: "${results_dir}/train"
  tensorboard:
    enabled: True
results_dir: /path/to/experiment_results
model:
  backbone:
    type: "vit_large_nvdinov2"
    pretrained_backbone_path: /path/to/pretrained/backbone.pth
    freeze_backbone: False
  decode_head:
    feature_strides: [4, 8, 16, 32]
    use_summary_token: True
  classify:
    train_margin_euclid: 2.0
    eval_margin: 0.005
    embedding_vectors: 5
    embed_dec: 30
    difference_module: 'euclidean'
    learnable_difference_modules: 4
dataset:
  classify:
    train_dataset:
      csv_path: /path/to/train.csv
      images_dir: /path/to/img_dir
    validation_dataset:
      csv_path: /path/to/val.csv
      images_dir: /path/to/img_dir
    test_dataset:
      csv_path: /path/to/test.csv
      images_dir: /path/to/img_dir
    infer_dataset:
      csv_path: /path/to/infer.csv
      images_dir: /path/to/img_dir
    image_ext: .jpg
    batch_size: 16
    workers: 2
    fpratio_sampling: 0.2
    num_input: 4
    input_map:
      LowAngleLight: 0
      SolderLight: 1
      UniformLight: 2
      WhiteLight: 3
    concat_type: grid
    grid_map:
      x: 2
      y: 2
    image_width: 112
    image_height: 112
    augmentation_config:
      rgb_input_mean: [0.485, 0.456, 0.406]
      rgb_input_std: [0.229, 0.224, 0.225]
    num_classes: 2
    num_golden: 1
evaluate:
  checkpoint: "???"
inference:
  checkpoint: "???"
export:
  gpu_id: 0
  checkpoint: "???"
  onnx_file: "???"
  input_width: 224
  input_height: 224

Training the Model#

Use the following command to run VisualChangeNet-Classification training:

TAO Client (v2 API)

TRAIN_JOB_ID=$(tao visual_changenet create-job \
  --kind experiment \
  --name "visual_changenet_train" \
  --action train \
  --workspace-id $WORKSPACE_ID \
  --specs "$TRAIN_SPECS" \
  --train-datasets '["'$DATASET_ID'"]' \
  --eval-dataset "$DATASET_ID" \
  --base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
  --encryption-key "nvidia_tlt" | jq -r '.id')

TAO Launcher

tao model visual_changenet train [-h] -e <experiment_spec>
                         task=classify
                         [results_dir=<global_results_dir>]
                         [model.<model_option>=<model_option_value>]
                         [dataset.<dataset_option>=<dataset_option_value>]
                         [train.<train_option>=<train_option_value>]
                         [train.gpu_ids=<gpu indices>]
                         [train.num_gpus=<number of gpus>]

Required Arguments

The following arguments are required.

-e, --experiment_spec_file: The path to the experiment spec file.
task: The task (‘segment’ or ‘classify’) for the visual_changenet training. Default: segment.

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

-h, --help: Show this help message and exit.
model.<model_option>: The model options.
dataset.<dataset_option>: The dataset options.
train.<train_option>: The train options.
train.optim.<optim_option>: The optimizer options

Note

For training, evaluation, and inference, we expose two variables for each task: num_gpus and gpu_ids, which default to 1 and [0], respectively. If both are passed, but are inconsistent, for example num_gpus = 1, gpu_ids = [0, 1], then they are modified to follow the setting that implies more GPUs; in the same example num_gpus is modified from 1 to 2.

In some cases multi-GPU training may result in a segmentation fault. You can circumvent this by setting the enviroment variable OMP_NUM_THREADS to 1. Depending upon your model of execution, you may use the following methods to set this variable:

CLI Launcher:

You may set the environment variable by adding the following fields to the Envs field of your ~/.tao_mounts.json file as mentioned in bullet 3 in ths section Running the launcher.
```
{
    "Envs": [
        {
            "variable": "OMP_NUM_THREADSR",
            "value": "1"
        }

}
```

Docker:

You may set environment variables in Docker by setting the -e flag in the Docker command line.

docker run -it --rm --gpus all \
    -e OMP_NUM_THREADS=1 \
    -v /path/to/local/mount:/path/to/docker/mount nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt <model> train -e

Checkpointing and Resuming Training

At every train.checkpoint_interval, a PyTorch Lightning checkpoint is saved. It is called model_epoch_<epoch_num>.pth. Checkpoints are saved in train.results_dir, like this:

$ ls /results/train

'model_epoch_000.pth'
'model_epoch_001.pth'
'model_epoch_002.pth'
'model_epoch_003.pth'
'model_epoch_004.pth'

The latest checkpoint is also saved as changenet_model_classify_latest.pth. Training automatically resumes from changenet_model_classify_latest.pth, if it exists in train.results_dir. This is superseded by train.resume_training_checkpoint_path, if it is provided.

The major implication of this logic is that, if you wish to trigger fresh training from scratch, either:

Specify a new, empty results directory (Recommended)
Remove the latest checkpoint from the results directory

Creating a Testing Experiment Spec File#

Here is an example spec file for testing evaluation and inference of a trained VisualChangeNet-Classification model.

TAO Client (v2 API)

BASE_EXPERIMENT_ID=$(tao visual_changenet list-base-experiments | jq -r '.[0].id')
SPECS=$(tao visual_changenet get-job-schema --action evaluate --base-experiment-id $BASE_EXPERIMENT_ID | jq -r '.default')

TAO Launcher

results_dir: /path/to/experiment_results
task: classify
model:
  backbone:
    type: "fan_small_12_p4_hybrid"
  classify:
    eval_margin: 0.005
dataset:
  classify:
    test_dataset:
      csv_path: /path/to/test.csv
      images_dir: /path/to/img_dir
    infer_dataset:
      csv_path: /path/to/infer.csv
      images_dir: /path/to/img_dir
    image_ext: .jpg
    batch_size: 16
    workers: 2
    num_input: 4
    input_map:
      LowAngleLight: 0
      SolderLight: 1
      UniformLight: 2
      WhiteLight: 3
    concat_type: linear
    grid_map:
      x: 2
      y: 2
    output_shape:
      - 128
      - 128
    augmentation_config:
      rgb_input_mean: [0.485, 0.456, 0.406]
      rgb_input_std: [0.229, 0.224, 0.225]
    num_classes: 2
    num_golden: 1
evaluate:
  checkpoint: /path/to/checkpoint
  results_dir: /results/evaluate
inference:
  checkpoint: /path/to/checkpoint
  results_dir: /results/inference

Inference/Evaluate#

Parameter	Datatype	Default	Description	Supported Values
checkpoint	string		Path to PyTorch model to evaluate/inference.
trt_engine	string		Path to TensorRT model to inference/evaluate. Should be only used with TAO Deploy.
num_gpus	unsigned int	1	The number of GPUs to use.	>0
gpu_ids	unsigned int	[0]	The GPU IDs to use.
results_dir	string		The path to a folder where the experiment outputs should be written.
vis_after_n_batches	unsigned int	1	Number of batches after which to save inference/evaluate visualization results.	>0
batch_size	unsigned int		The batch size of inference/evaluate.

Evaluating the Model#

Use the following command to run VisualChangeNet-Classification evaluation:

TAO Client (v2 API)

EVALUATE_JOB_ID=$(tao visual_changenet create-job \
  --kind experiment \
  --name "visual_changenet_evaluate" \
  --action evaluate \
  --workspace-id $WORKSPACE_ID \
  --parent-job-id $TRAIN_JOB_ID \
  --eval-dataset "$DATASET_ID" \
  --specs "$EVALUATE_SPECS" \
  --base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
  --encryption-key "nvidia_tlt" | jq -r '.id')

TAO Launcher

tao model visual_changenet evaluate [-h] -e <experiment_spec_file>
                      task=classify
                      evaluate.checkpoint=<model to be evaluated>
                      [evaluate.<evaluate_option>=<evaluate_option_value>]
                      [evaluate.gpu_ids=<gpu indices>]
                      [evaluate.num_gpus=<number of gpus>]

Required Arguments

The following arguments are required.

-e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment.
evaluate.checkpoint: The .pth model to be evaluated.

Optional Arguments

The following arguments are optional to run the command.

evaluate.<evaluate_option>: The evaluate options.

Multi-GPU evaluation is currently not supported for Visual ChangeNet Classify.

Running Inference on the Model#

Use the following command to run inference on VisualChangeNet-Classification with the .pth model:

TAO Client (v2 API)

INFERENCE_JOB_ID=$(tao visual_changenet create-job \
  --kind experiment \
  --name "visual_changenet_inference" \
  --action inference \
  --workspace-id $WORKSPACE_ID \
  --parent-job-id $TRAIN_JOB_ID \
  --inference-dataset "$DATASET_ID" \
  --specs "$INFERENCE_SPECS" \
  --base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
  --encryption-key "nvidia_tlt" | jq -r '.id')

TAO Launcher

tao model visual_changenet inference [-h] -e <experiment_spec_file>
                       task=classify
                       inference.checkpoint=<inference model>
                       [inference.<evaluate_option>=<evaluate_option_value>]
                       [inference.gpu_ids=<gpu indices>]
                       [inference.num_gpus=<number of gpus>]

Required Arguments

The following arguments are required.

-e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment.
inference.checkpoint: The .pth model to run inference on.

Optional Arguments

The following arguments are optional to run the command.

inference.<inference_option>: The inference options.

Exporting the Model#

Here is an example spec file for exporting the trained VisualChangeNet model:

TAO Client (v2 API)

BASE_EXPERIMENT_ID=$(tao visual_changenet list-base-experiments | jq -r '.[0].id')
SPECS=$(tao visual_changenet get-job-schema --action export --base-experiment-id $BASE_EXPERIMENT_ID | jq -r '.default')

TAO Launcher

export:
  checkpoint: /path/to/model.pth
  onnx_file: /path/to/model.onnx
  opset_version: 12
  input_channel: 3
  input_width: 128
  input_height: 512
  batch_size: -1

Parameter	Datatype	Default	Description	Supported Values
checkpoint	string		The path to the PyTorch model to export.
onnx_file	string		The path to the `.onnx` file.
opset_version	unsigned int	12	The opset version of the exported ONNX.	>0
input_channel	unsigned int	3	The input channel size. Only the value 3 is supported.	3
input_width	unsigned int	128	The input width.	>0
input_height	unsigned int	512	The input height.	>0
batch_size	unsigned int	-1	The batch size of the ONNX model. If this value is set to -1, the export uses dynamic batch size.	>=-1
gpu_id	unsigned int	0	The GPU ID to use.
on_cpu	bool	False	If set to `True`, exports the model on CPU.
verbose	bool	False	If set to `True`, prints a human-readable representation of the network.

Use the following command to export the model:

TAO Client (v2 API)

EXPORT_JOB_ID=$(tao visual_changenet create-job \
  --kind experiment \
  --name "visual_changenet_export" \
  --action export \
  --workspace-id $WORKSPACE_ID \
  --parent-job-id $TRAIN_JOB_ID \
  --specs "$EXPORT_SPECS" \
  --base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
  --encryption-key "nvidia_tlt" | jq -r '.id')

TAO Launcher

tao model visual_changenet export [-h] -e <experiment spec file>
                          task=classify
                          export.checkpoint=<model to export>
                          export.onnx_file=<onnx path>
                          [export.<export_option>=<export_option_value>]

Required Arguments

The following arguments are required to run the command.

-e, --experiment_spec: The path to an experiment spec file
export.checkpoint: The .pth model to export.
export.onnx_file: The path where the .etlt or .onnx model is saved.

Optional Arguments

The following arguments are optional to run the command.

export.<export_option>: The export options.

TensorRT Engine Generation, Validation, and int8 Calibration#

For deployment, refer to the TAO Deploy Documentation for VisualChangeNet-Classification.