TAO v5.5.0

Visual ChangeNet-Segmentation

Visual ChangeNet-Segmentation is an NVIDIA-developed semantic change segmentation model and is included in the TAO. Visual ChangeNet supports the following tasks:

  • train

  • evaluate

  • inference

  • export

These tasks can be invoked from the TAO Launcher using the following convention on the command-line:


tao model visual_changenet <sub_task> <args_per_subtask>

Where args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in the following sections.

VisualChangeNet-Segmentation requires the data to be provided as image and mask folders. See the Data Annotation Format page for more information about the input data format for VisualChangeNet-Segmentation.

Configuring a Custom Dataset

This section provides an example configuration and commands for training VisualChangeNet-Segmentation using the dataset format described for the LEVIR-CD dataset, above. LEVIR-CD dataset is a large-scale remote sensing building Change Detection dataset.

Here is an example spec file for training a VisualChangeNet-Segmentation model with NVIDIA’s FAN Hybrid backbone on the LEVIR-CD dataset using the Data Annotation Format.


encryption_key: tlt_encode task: segment train: pretrained_model_path: /path/to/pretrained/model.pth resume_training_checkpoint_path: null segment: loss: "ce" weights: [0.5, 0.5, 0.5, 0.8, 1.0] num_epochs: 10 num_nodes: 1 validation_interval: 5 checkpoint_interval: 5 seed: 1234 optim: lr: 0.0001 optim: "adamw" policy: "linear" momentum: 0.9 weight_decay: 0.01 betas: [0.9, 0.999] results_dir: /path/to/experiment_results model: backbone: type: "fan_small_12_p4_hybrid" pretrained_backbone_path: null freeze_backbone: False decode_head: feature_strides: [4, 8, 16, 16] dataset: segment: dataset: "CNDataset" root_dir: /path/to/root/dataset/dir/ data_name: "LEVIR-CD" label_transform: "norm" batch_size: 16 workers: 2 multi_scale_train: True multi_scale_infer: False num_classes: 2 img_size: 256 image_folder_name: "A" change_image_folder_name: "B" list_folder_name: 'list' annotation_folder_name: "label" train_split: "train" validation_split: "val" label_suffix: .png augmentation: random_flip: vflip_probability: 0.5 hflip_probability: 0.5 enable: True random_rotate: rotate_probability: 0.5 angle_list: [90, 180, 270] enable: True random_color: brightness: 0.3 contrast: 0.3 saturation: 0.3 hue: 0.3 enable: True with_scale_random_crop: enable: True with_random_crop: True with_random_blur: True evaluate: checkpoint: "???" vis_after_n_batches: 10 inference: checkpoint: "???" vis_after_n_batches: 1 export: gpu_id: 0 checkpoint: "???" onnx_file: "???" input_width: 256 input_height: 256

Parameter Data Type Default Description Supported Values
model dict config – The configuration of the model architecture
dataset dict config – The configuration of the dataset
train dict config – The configuration of the training task
evaluate dict config – The configuration of the evaluation task
inference dict config – The configuration of the inference task
encryption_key string None The encryption key to encrypt and decrypt model files
results_dir string /results The directory where experiment results are saved
export dict config – The configuration of the ONNX export task
task str segment A flag to indicate the change detection task. Supports two tasks: ‘segment’ and ‘classify’ for segmentation and classification
Parameter Datatype Default Description Supported Values
num_gpus unsigned int 1 The number of GPUs to use for distributed training >0
gpu_ids List[int] [0] The indices of the GPU’s to use for distributed training
seed unsigned int 1234 The random seed for random, numpy, and torch >0
num_epochs unsigned int 10 The total number of epochs to run the experiment >0
checkpoint_interval unsigned int 1 The epoch interval at which the checkpoints are saved >0
validation_interval unsigned int 1 The epoch interval at which the validation is run >0
resume_training_checkpoint_path string The intermediate PyTorch Lightning checkpoint to resume training from
results_dir string /results/train The directory to save training results






The segment dict contains configurable parameters for the VisualChangeNet Segmentation pipeline with the following parameters:

* loss: The loss function used for segmentation training.
* weights: Weights for multi-scale training.

num_nodes unsigned int 1 The number of nodes. If the value is larger than 1, multi-node is enabled.
pretrained_model_path string – The path to the pretrained model checkpoint to initialize the end-end model weights.


dict config


Contains the configurable parameters for the VisualChangeNet optimizer detailed in
the optim section.



optim: lr: 0.0001 optim: "adamw" policy: "linear" momentum: 0.9 weight_decay: 0.01

Parameter Datatype Default Description Supported Values
lr float 0.0005 The learning rate >=0.0
optim str adamw




The learning scheduler:
* linear : LambdaLR decreases the lr by a multiplicative factor.
* step : StepLR decrease the lr by 0.1 at every num_epochs//3


momentum float 0.9 The momentum for the AdamW optimizer
weight_decay float 0.1 The weight decay coefficient

The following example model config provides options to change the VisualChangeNet-Segmentation architecture for training.


model: backbone: type: "fan_small_12_p4_hybrid" pretrained_backbone_path: null freeze_backbone: False decode_head: feature_strides: [4, 8, 16, 16] align_corner: False

Parameter Datatype Default Description Supported Values






A dictionary containing the following configurable parameters:
* type: The name of the backbone to be used

* pretrained_backbone_path: The path to pre-trained backbone weights file
* freeze_backbone: Whether to freeze the backbone weights during training




[4, 8, 16, 16]

A dictionary containing the following configurable parameters:
* align_corners
* feature_strides

True, False
True, False

The dataset parameter defines the dataset source, training batch size, augmentation, and pre-processing. An example dataset is provided below.


dataset: segment: dataset: "CNDataset" root_dir: /path/to/root/dataset/dir/ data_name: "LEVIR-CD" label_transform: "norm" batch_size: 16 workers: 2 multi_scale_train: True multi_scale_infer: False num_classes: 2 img_size: 256 image_folder_name: "A" change_image_folder_name: "B" list_folder_name: 'list' annotation_folder_name: "label" train_split: "train" validation_split: "val" test_split: "test" predict_split: 'predict' label_suffix: .png augmentation: random_flip: vflip_probability: 0.5 hflip_probability: 0.5 enable: True random_rotate: rotate_probability: 0.5 angle_list: [90, 180, 270] enable: True random_color: brightness: 0.3 contrast: 0.3 saturation: 0.3 hue: 0.3 enable: True with_scale_random_crop: enable: True with_random_crop: True with_random_blur: True color_map: '0': [255, 255, 255] '1': [0, 0, 0]

Parameter Datatype Default Description Supported Values
segment Dict – The segment contains dataset config for the segmentation dataloader detailed in the segment section.
classify Dict – The classify contains dataset config for the classification dataloader


Parameter Datatype Default Description Supported Values
dataset Dict CNDataset The dataloader supported for segmentation CNDataset
root_dir str – The root directory path where the dataset is located.
data_name str LEVIR-CD The dataset identifier LEVIR-CD, LandSCD, custom
batch_size int 32 The number of samples per batch >0
workers int 2 The number of worker processes for data loading >=0
multi_scale_train bool True Whether multi-scale training is enabled True, False
multi_scale_infer bool False Whether multi-scale inference is enabled True, False
num_classes int 2 Number of classes in the dataset. >=2
img_size int 256 Size of the input images after resizing.
image_folder_name str A Name of the folder containing input images.
change_image_folder_name str B Name of the folder containing the changed images
list_folder_name str list Name of the folder containing dataset split lists’ csv files.
annotation_folder_name str label Name of the folder containing annotation masks
train_split str train Dataset split used for training, should indicate the name of csv file in list_folder_name.
validation_split str val Dataset split used for validation, should indicate the name of csv file in list_folder_name.
test_split str test Dataset split used for evaluation, should indicate the name of csv file in list_folder_name.
predict_split str predict Dataset split used for inference, should indicate the name of csv file in list_folder_name.
label_suffix str .png Suffix of the label image files.
augmentation Dict None Dictionary containing various data augmentation settings, which is detailed in the augmentation section.
color_map Optional[Dict[str, List[int]]] None Mapping of string class labels (‘0’ to ‘n’) to rgb color codes.


Parameter Datatype Default Description Supported Values




Random vertical and horizontal flipping augmentation settings.
* vflip_probability: Probability of vertical flipping.
* hflip_probability: Probability of horizontal flipping.
* enable: Enable or disable random flipping augmentation.




[90, 180, 270]

Randomly rotate images with specified probability and angles
* rotate_probability: Probability of applying random rotation.
* angle_list: List of rotation angles to choose from.
* enable: Enable or disable random rotation augmentation.





Apply random color augmentation to images.
* brightness: Maximum brightness change factor.
* contrast: Maximum contrast change factor.
* saturation: Maximum saturation change factor.
* hue: Maximum hue change factor.
* enabled: Enable or disable random color augmentation.
* color_probability: Probability of applying color augmentation.






Apply random scaling and cropping augmentation.
* enabled: Enable or disable random color augmentation.

True, False

with_random_crop bool True Apply random crop augmentation. True, False
with_random_blur bool True Apply random blurring augmentation. True, False
mean List[float] [0.5, 0.5, 0.5] The mean to be subtracted for pre-processing.
std List[float] [0.5, 0.5, 0.5] The standard deviation to divide the image by.

Example spec file for ViT backbones


The following spec file is only relevant for TAO versions 5.3 and later.


encryption_key: tlt_encode task: segment train: pretrained_model_path: /path/to/pretrained/model.pth resume_training_checkpoint_path: null segment: loss: "ce" weights: [0.5, 0.5, 0.5, 0.8, 1.0] num_epochs: 350 num_nodes: 1 validation_interval: 1 checkpoint_interval: 1 optim: lr: 0.00002 optim: "adamw" policy: "linear" momentum: 0.9 weight_decay: 0.01 betas: [0.9, 0.999] results_dir: /path/to/experiment_results model: backbone: type: "vit_large_nvdinov2" pretrained_backbone_path: /path/to/pretrained/backbone.pth freeze_backbone: False decode_head: feature_strides: [4, 8, 16, 32] dataset: segment: dataset: "CNDataset" root_dir: /path/to/root/dataset/dir/ data_name: "LEVIR-CD" label_transform: "norm" batch_size: 16 workers: 2 multi_scale_train: True multi_scale_infer: False num_classes: 2 img_size: 256 image_folder_name: "A" change_image_folder_name: "B" list_folder_name: 'list' annotation_folder_name: "label" train_split: "train" validation_split: "val" label_suffix: .png augmentation: random_flip: vflip_probability: 0.5 hflip_probability: 0.5 enable: True random_rotate: rotate_probability: 0.5 angle_list: [90, 180, 270] enable: True random_color: brightness: 0.3 contrast: 0.3 saturation: 0.3 hue: 0.3 enable: True with_scale_random_crop: enable: True with_random_crop: True with_random_blur: True evaluate: checkpoint: "???" vis_after_n_batches: 10 inference: checkpoint: "???" vis_after_n_batches: 1 export: gpu_id: 0 checkpoint: "???" onnx_file: "???" input_width: 256 input_height: 256

Use the following command to run VisualChangeNet-Segmentation training:


tao model visual_changenet train -e <experiment_spec_file> task=segment [results_dir=<global_results_dir>] [model.<model_option>=<model_option_value>] [dataset.<dataset_option>=<dataset_option_value>] [train.<train_option>=<train_option_value>] [train.gpu_ids=<gpu indices>] [train.num_gpus=<number of gpus>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file.

  • task: The task (‘segment’ or ‘classify’) for the visual_changenet training. Default: segment.

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.


For training, evaluation, and inference, we expose 2 variables for each respective task: num_gpus and gpu_ids, which default to 1 and [0], respectively. If both are passed, but inconsistent, for example num_gpus = 1, gpu_ids = [0, 1], then they are modified to follow the setting with more GPUs, for example num_gpus = 1 -> num_gpus = 2.

Checkpointing and Resuming Training

At every train.checkpoint_interval, a PyTorch Lightning checkpoint is saved. It is called model_epoch_<epoch_num>.pth. These are saved in train.results_dir, like so:


$ ls /results/train 'model_epoch_000.pth' 'model_epoch_001.pth' 'model_epoch_002.pth' 'model_epoch_003.pth' 'model_epoch_004.pth'

The latest checkpoint will also be saved as changenet_model_segment_latest.pth. Training will automatically resume from changenet_model_segment_latest.pth if it exists in train.results_dir. This will be superseded by train.resume_training_checkpoint_path if it is provided.

The major implication of this logic is that, if you wish to trigger fresh training from scratch, either

  • Specify a new, empty results directory (Recommended), or

  • Remove the latest checkpoint from the results directory

Here is an example spec file for testing evaluation and inference of a trained VisualChangeNet-Segmentation model:


results_dir: /path/to/experiment_results task: segment model: backbone: type: "fan_small_12_p4_hybrid" dataset: segment: dataset: "CNDataset" root_dir: /path/to/root/dataset/dir/ data_name: "LEVIR-CD" label_transform: "norm" batch_size: 16 workers: 2 multi_scale_train: True multi_scale_infer: False num_classes: 2 img_size: 256 image_folder_name: "A" change_image_folder_name: "B" list_folder_name: 'list' annotation_folder_name: "label" test_split: "test" predict_split: 'predict' label_suffix: .png evaluate: checkpoint: /path/to/checkpoint vis_after_n_batches: 1 results_dir: /results/evaluate inference: checkpoint: /path/to/checkpoint vis_after_n_batches: 1 results_dir: /results/inference

Parameter Datatype Default Description Supported Values
checkpoint string Path to PyTorch model to evaluate/infer
vis_after_n_batches int Number of batches interval between each visualisation output save.
trt_engine string Path to TensorRT model to inference. Should be only used with TAO Deploy
num_gpus unsigned int 1 The number of GPUs to use >0
gpu_ids unsigned int [0] The GPU ids to use
results_dir string The path to a folder where the experiment outputs should be written

Use the following command to run a VisualChangeNet-Segmentation evaluation:


tao model visual_changenet evaluate -e <experiment_spec> task=segment evaluate.checkpoint=<model to be evaluated> [evaluate.<evaluate_option>=<evaluate_option_value>] [evaluate.gpu_ids=<gpu indices>] [evaluate.num_gpus=<number of gpus>]

Required Arguments

  • -e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment.

  • evaluate.checkpoint: The .pth model to be evaluated.

Optional Arguments

Use the following command to run inference on VisualChangeNet-Segmentation with the .tlt model:


tao model visual_changenet inference -e <experiment_spec> task=segment inference.checkpoint=<inference model> [inference.<evaluate_option>=<evaluate_option_value>] [inference.gpu_ids=<gpu indices>] [inference.num_gpus=<number of gpus>]

Required Arguments

  • -e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment.

  • inference.checkpoint: The .pth model to run inference on.

Optional Arguments

Here is an example spec file for exporting the trained VisualChangeNet model:


export: checkpoint: /path/to/model.pth onnx_file: /path/to/model.onnx opset_version: 12 input_channel: 3 input_width: 256 input_height: 256 batch_size: -1

Parameter Datatype Default Description Supported Values
checkpoint string The path to the PyTorch model to export
onnx_file string The path to the .onnx file
opset_version unsigned int 12 The opset version of the exported ONNX >0
input_channel unsigned int 3 The input channel size. Only the value 3 is supported. 3
input_width unsigned int 256 The input width >0
input_height unsigned int 256 The input height >0
batch_size unsigned int -1 The batch size of the ONNX model. If this value is set to -1, the export uses dynamic batch size. >=-1

Use the following command to export the model:


tao model visual_changenet export [-h] -e <experiment spec file> task=segment export.checkpoint=<model to export> export.onnx_file=<onnx path> [export.<export_option>=<export_option_value>]

Required Arguments

  • -e, --experiment_spec: The path to an experiment spec file

  • export.checkpoint: The .pth model to export.

  • export.onnx_file: The path where the .etlt or .onnx model is saved.

Optional Arguments

Previous VisualChangeNet
Next Visual ChangeNet-Classification
© Copyright 2024, NVIDIA. Last updated on Oct 15, 2024.