Visual ChangeNet-Segmentation#

Visual ChangeNet-Segmentation is an NVIDIA-developed semantic change segmentation model and is included in the TAO. Visual ChangeNet supports the following tasks:

train
evaluate
inference
export

Each task is explained in detail in the following sections.

Note

Throughout this documentation, you will see references to $EXPERIMENT_ID and $DATASET_ID in the FTMS Client sections.
- For instructions on creating a dataset using the remote client, see the Creating a dataset section in the Remote Client documentation.
- For instructions on creating an experiment using the remote client, see the Creating an experiment section in the Remote Client documentation.
The spec format is YAML for TAO Launcher and JSON for FTMS Client.
File-related parameters, such as dataset paths or pretrained model paths, are required only for TAO Launcher and not for FTMS Client.

Data Input for VisualChangeNet#

VisualChangeNet-Segmentation requires the data to be provided as image and mask folders. See the Data Annotation Format page for more information about the input data format for VisualChangeNet-Segmentation.

Creating a Training Experiment Spec File#

Configuring a Custom Dataset#

This section provides an example configuration and commands to retrieve configuration for training VisualChangeNet-Segmentation using the dataset format described for the LEVIR-CD dataset, above. LEVIR-CD dataset is a large-scale remote sensing building Change Detection dataset.

Note

Make sure to set task=segment in SPECS for all task specs.

SPECS=$(tao-client visual_changenet get-spec --action train --job_type experiment --id $EXPERIMENT_ID)

Here is an example spec file for training a VisualChangeNet-Segmentation model with NVIDIA’s FAN Hybrid backbone on the LEVIR-CD dataset using the Data Annotation Format.

encryption_key: tlt_encode
task: segment
train:
  pretrained_model_path: /path/to/pretrained/model.pth
  resume_training_checkpoint_path: null
  segment:
    loss: "ce"
    weights: [0.5, 0.5, 0.5, 0.8, 1.0]
  num_epochs: 10
  num_nodes: 1
  validation_interval: 5
  checkpoint_interval: 5
  seed: 1234
  optim:
    lr: 0.0001
    optim: "adamw"
    policy: "linear"
    momentum: 0.9
    weight_decay: 0.01
    betas: [0.9, 0.999]
results_dir: /path/to/experiment_results
model:
  backbone:
    type: "fan_small_12_p4_hybrid"
    pretrained_backbone_path: null
    freeze_backbone: False
  decode_head:
    feature_strides: [4, 8, 16, 16]
    use_summary_token: True
dataset:
  segment:
    dataset: "CNDataset"
    root_dir: /path/to/root/dataset/dir/
    data_name: "LEVIR-CD"
    label_transform: "norm"
    batch_size: 16
    workers: 2
    multi_scale_train: True
    multi_scale_infer: False
    num_classes: 2
    img_size: 256
    image_folder_name: "A"
    change_image_folder_name: "B"
    list_folder_name: 'list'
    annotation_folder_name: "label"
    train_split: "train"
    validation_split: "val"
    label_suffix: .png
    augmentation:
      random_flip:
        vflip_probability: 0.5
        hflip_probability: 0.5
        enable: True
      random_rotate:
        rotate_probability: 0.5
        angle_list: [90, 180, 270]
        enable: True
      random_color:
        brightness: 0.3
        contrast: 0.3
        saturation: 0.3
        hue: 0.3
        enable: True
      with_scale_random_crop:
        enable: True
      with_random_crop: True
      with_random_blur: True
evaluate:
  checkpoint: "???"
  vis_after_n_batches: 10
inference:
  checkpoint: "???"
  vis_after_n_batches: 1
export:
  gpu_id: 0
  checkpoint: "???"
  onnx_file: "???"
  input_width: 256
  input_height: 256

Parameter	Data Type	Default	Description	Supported Values
model	dict config	–	The configuration of the model architecture.
dataset	dict config	–	The configuration of the dataset.
train	dict config	–	The configuration of the training task.
evaluate	dict config	–	The configuration of the evaluation task.
inference	dict config	–	The configuration of the inference task.
encryption_key	string	None	The encryption key to encrypt and decrypt model files.
results_dir	string	/results	The directory where experiment results are saved.
export	dict config	–	The configuration of the ONNX export task.
task	str	segment	A flag to indicate the change detection task. Supports two tasks: ‘segment’ and ‘classify’ for segmentation and classification.	classify, segmen

train#

Parameter	Datatype	Default	Description	Supported Values
num_gpus	unsigned int	1	The number of GPUs to use for distributed training.	>0
gpu_ids	List[int]	[0]	The indices of the GPU’s to use for distributed training.
seed	unsigned int	1234	The random seed for random, numpy, and torch.	>0
num_epochs	unsigned int	10	The total number of epochs to run the experiment.	>0
checkpoint_interval	unsigned int	1	The epoch interval at which the checkpoints are saved.	>0
validation_interval	unsigned int	1	The epoch interval at which the validation is run.	>0
resume_training_checkpoint_path	string		The intermediate PyTorch Lightning checkpoint to resume training from.
results_dir	string	/results/train	The directory to save training results.
segment	Dict str list	None ce	The segment dict contains configurable parameters for the VisualChangeNet Segmentation pipeline with the following parameters: * loss: The loss function used for segmentation training. * weights: Weights for multi-scale training.
num_nodes	unsigned int	1	The number of nodes. If the value is larger than 1, multi-node is enabled.
pretrained_model_path	string	–	The path to the pretrained model checkpoint to initialize the end-end model weights.
optim	dict config	None	Contains the configurable parameters for the VisualChangeNet optimizer detailed in the optim section.

optim#

optim:
  lr: 0.0001
  optim: "adamw"
  policy: "linear"
  momentum: 0.9
  weight_decay: 0.01

Parameter	Datatype	Default	Description	Supported Values
lr	float	0.0005	The learning rate.	>=0.0
optim	str	adamw	The optimizer.
policy	str	linear	The learning scheduler: * linear : LambdaLR decreases the lr by a multiplicative factor. * step : StepLR decrease the lr by 0.1 at every `num_epochs // 3` steps.	linear/step
momentum	float	0.9	The momentum for the AdamW optimizer.
weight_decay	float	0.1	The weight decay coefficient.

Model#

The following example model config provides options to change the VisualChangeNet-Segmentation architecture for training.

model:
  backbone:
    type: "fan_small_12_p4_hybrid"
    pretrained_backbone_path: null
    freeze_backbone: False
  decode_head:
    feature_strides: [4, 8, 16, 16]
    align_corner: False
    use_summary_token: True

Parameter	Datatype	Default	Description	Supported Values
backbone	Dict string bool bool	None None False False	A dictionary containing the following configurable parameters: * type: The name of the backbone to be used * pretrained_backbone_path: The path to pre-trained backbone weights file. * freeze_backbone: If set to `True`, freezes the backbone weights during training. * feat_downsample: If set to `True`, downsamples the last feature map in FAN backbone configurations. This parameter is not propagated to other backbones.	fan_tiny_8_p4_hybrid fan_large_16_p4_hybrid fan_small_12_p4_hybrid fan_base_16_p4_hybrid vit_large_nvdinov2 c_radio_p1_vit_huge_patch16_224_mlpnorm c_radio_p2_vit_huge_patch16_224_mlpnorm c_radio_p3_vit_huge_patch16_224_mlpnorm c_radio_v2_vit_huge_patch16_224 c_radio_v2_vit_large_patch16_224 c_radio_v2_vit_base_patch16_224
decode_head	Dict bool bool list	None False True [4, 8, 16, 16]	A dictionary containing the following configurable parameters for the decoder: * align_corners: If set to `True`, the input and output tensors are aligned by the center points of their corner pixels, preserving the values at the corner pixels. * use_summary_token: If set to `True`, uses the summary token of the backbone. * feature_strides: The downsampling feature strides for different backbones.	True, False True, False

Dataset#

The dataset parameter defines the dataset source, training batch size, augmentation, and pre-processing. An example dataset is provided below.

dataset:
  segment:
    dataset: "CNDataset"
    root_dir: /path/to/root/dataset/dir/
    data_name: "LEVIR-CD"
    label_transform: "norm"
    batch_size: 16
    workers: 2
    multi_scale_train: True
    multi_scale_infer: False
    num_classes: 2
    img_size: 256
    image_folder_name: "A"
    change_image_folder_name: "B"
    list_folder_name: 'list'
    annotation_folder_name: "label"
    train_split: "train"
    validation_split: "val"
    test_split: "test"
    predict_split: 'predict'
    label_suffix: .png
    augmentation:
      random_flip:
        vflip_probability: 0.5
        hflip_probability: 0.5
        enable: True
      random_rotate:
        rotate_probability: 0.5
        angle_list: [90, 180, 270]
        enable: True
      random_color:
        brightness: 0.3
        contrast: 0.3
        saturation: 0.3
        hue: 0.3
        enable: True
      with_scale_random_crop:
        enable: True
      with_random_crop: True
      with_random_blur: True
    color_map:
      '0': [255, 255, 255]
      '1': [0, 0, 0]

Parameter	Datatype	Default	Description	Supported Values
segment	Dict	–	The segment contains dataset config for the segmentation dataloader detailed in the segment section.
classify	Dict	–	The classify contains dataset config for the classification dataloader.

segment#

Parameter	Datatype	Default	Description	Supported Values
dataset	Dict	CNDataset	The dataloader supported for segmentation.	CNDataset
root_dir	str	–	The root directory path where the dataset is located.
data_name	str	LEVIR-CD	The dataset identifier.	LEVIR-CD, LandSCD, custom
batch_size	int	32	The number of samples per batch.	>0
workers	int	2	The number of worker processes for data loading.	>=0
multi_scale_train	bool	True	If set to `True`, enables multi-scale training.	True, False
multi_scale_infer	bool	False	If set to `True`, enables multi-scale inference.	True, False
num_classes	int	2	Number of classes in the dataset.	>=2
img_size	int	256	Size of the input images after resizing.
image_folder_name	str	A	Name of the folder containing input images.
change_image_folder_name	str	B	Name of the folder containing the changed images.
list_folder_name	str	list	Name of the folder containing dataset split lists’ csv files.
annotation_folder_name	str	label	Name of the folder containing annotation masks.
train_split	str	train	Dataset split used for training, should indicate the name of csv file in list_folder_name.
validation_split	str	val	Dataset split used for validation, should indicate the name of csv file in list_folder_name.
test_split	str	test	Dataset split used for evaluation, should indicate the name of csv file in list_folder_name.
predict_split	str	predict	Dataset split used for inference, should indicate the name of csv file in list_folder_name.
label_suffix	str	.png	Suffix of the label image files.
augmentation	Dict	None	Dictionary containing various data augmentation settings, which is detailed in the augmentation section.
color_map	Optional[Dict[str, List[int]]]	None	Mapping of string class labels (‘0’ to ‘n’) to rgb color codes.

augmentation#

Parameter	Datatype	Default	Description	Supported Values
random_flip	Dict float float bool	None 0.5 0.5 True	Random vertical and horizontal flipping augmentation settings. * vflip_probability: Probability of vertical flipping. * hflip_probability: Probability of horizontal flipping. * enable: If set to `True`, enables random flipping augmentation.	>=0.0 >=0.0
random_rotate	Dict float list bool	None 0.5 [90, 180, 270] True	Random rotation augmentation settings. * rotate_probability: Probability of applying random rotation. * angle_list: List of rotation angles to choose from. * enable: If set to `True`, enables random rotation augmentation.	>=0.0 >=0.0
random_color	Dict float float float float bool float	None 0.3 0.3 0.3 0.3 True 0.5	Random color augmentation settings. * brightness: Maximum brightness change factor. * contrast: Maximum contrast change factor. * saturation: Maximum saturation change factor. * hue: Maximum hue change factor. * enabled: If set to `True`, enables random color augmentation. * color_probability: Probability of applying color augmentation.	>=0.0 >=0.0 >=0.0 >=0.0 >=0.0
with_scale_random_crop	Dict bool	None True	Random scaling and cropping augmentation settings. * enabled If set to `True`, enables random color augmentation.	True, False
with_random_crop	bool	True	If set to `True`, applies random crop augmentation.	True, False
with_random_blur	bool	True	If set to `True`, applies random blurring augmentation.	True, False
mean	List[float]	[0.5, 0.5, 0.5]	The mean to be subtracted for pre-processing.
std	List[float]	[0.5, 0.5, 0.5]	The standard deviation to divide the image by.

Example spec file for ViT backbones#

Note

The following spec file is only relevant for TAO versions 5.3 and later.

SPECS=$(tao-client visual_changenet get-spec --action train --job_type experiment --id $EXPERIMENT_ID)

encryption_key: tlt_encode
task: segment
train:
  pretrained_model_path: /path/to/pretrained/model.pth
  resume_training_checkpoint_path: null
  segment:
    loss: "ce"
    weights: [0.5, 0.5, 0.5, 0.8, 1.0]
  num_epochs: 350
  num_nodes: 1
  validation_interval: 1
  checkpoint_interval: 1
  optim:
    lr: 0.00002
    optim: "adamw"
    policy: "linear"
    momentum: 0.9
    weight_decay: 0.01
    betas: [0.9, 0.999]
results_dir: /path/to/experiment_results
model:
  backbone:
    type: "vit_large_nvdinov2"
    pretrained_backbone_path: /path/to/pretrained/backbone.pth
    freeze_backbone: False
  decode_head:
    feature_strides: [4, 8, 16, 32]
    use_summary_token: True
dataset:
  segment:
    dataset: "CNDataset"
    root_dir: /path/to/root/dataset/dir/
    data_name: "LEVIR-CD"
    label_transform: "norm"
    batch_size: 16
    workers: 2
    multi_scale_train: True
    multi_scale_infer: False
    num_classes: 2
    img_size: 256
    image_folder_name: "A"
    change_image_folder_name: "B"
    list_folder_name: 'list'
    annotation_folder_name: "label"
    train_split: "train"
    validation_split: "val"
    label_suffix: .png
    augmentation:
      random_flip:
        vflip_probability: 0.5
        hflip_probability: 0.5
        enable: True
      random_rotate:
        rotate_probability: 0.5
        angle_list: [90, 180, 270]
        enable: True
      random_color:
        brightness: 0.3
        contrast: 0.3
        saturation: 0.3
        hue: 0.3
        enable: True
      with_scale_random_crop:
        enable: True
      with_random_crop: True
      with_random_blur: True
evaluate:
  checkpoint: "???"
  vis_after_n_batches: 10
inference:
  checkpoint: "???"
  vis_after_n_batches: 1
export:
  gpu_id: 0
  checkpoint: "???"
  onnx_file: "???"
  input_width: 256
  input_height: 256

Training the Model#

Use the following command to run VisualChangeNet-Segmentation training:

TRAIN_JOB_ID=$(tao-client visual_changenet experiment-run-action --action train --id $EXPERIMENT_ID --specs "$SPECS")

tao model visual_changenet train -e <experiment_spec_file>
                      task=segment
                      [results_dir=<global_results_dir>]
                      [model.<model_option>=<model_option_value>]
                      [dataset.<dataset_option>=<dataset_option_value>]
                      [train.<train_option>=<train_option_value>]
                      [train.gpu_ids=<gpu indices>]
                      [train.num_gpus=<number of gpus>]

Required Arguments

The following arguments are required.

-e, --experiment_spec_file: The path to the experiment spec file.
task: The task (‘segment’ or ‘classify’) for the visual_changenet training. Default: segment.

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

-h, --help: Show this help message and exit.
model.<model_option>: The model options.
dataset.<dataset_option>: The dataset options.
train.<train_option>: The train options.
train.optim.<optim_option>: The optimizer options

Note

For training, evaluation, and inference, we expose 2 variables for each respective task: num_gpus and gpu_ids, which default to 1 and [0], respectively. If both are passed, but inconsistent, for example num_gpus = 1, gpu_ids = [0, 1]`, then they are modified to follow the setting with more GPUs, for example num_gpus = 1 -> num_gpus = 2.

In some cases, you may encounter an issue with multi-GPU training resulting in a segmentation fault. You may circumvent this by setting the OMP_NUM_THREADS enviroment variable to 1. Depending upon your model of execution, you may use the following methods to set this variable

CLI Launcher

You may set this env variable by adding the following fields to the Envs field of your ~/.tao_mounts.json file as mentioned in bullet 3 in this section

{
    "Envs": [
        {
            "variable": "OMP_NUM_THREADSR",
            "value": "1"
        }
    ]
}

Docker

You may set environment variables in the docker by setting the -e flag in the docker command line.

docker run -it --rm --gpus all \
    -e OMP_NUM_THREADS=1 \
    -v /path/to/local/mount:/path/to/docker/mount nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt <model> train -e

Checkpointing and Resuming Training

At every train.checkpoint_interval, a PyTorch Lightning checkpoint is saved. It is called model_epoch_<epoch_num>.pth. These are saved in train.results_dir, like so:

$ ls /results/train

'model_epoch_000.pth'
'model_epoch_001.pth'
'model_epoch_002.pth'
'model_epoch_003.pth'
'model_epoch_004.pth'

The latest checkpoint will also be saved as changenet_model_segment_latest.pth. Training will automatically resume from changenet_model_segment_latest.pth if it exists in train.results_dir. This will be superseded by train.resume_training_checkpoint_path if it is provided.

The major implication of this logic is that, if you wish to trigger fresh training from scratch, either

Specify a new, empty results directory (Recommended), or
Remove the latest checkpoint from the results directory

Creating a Testing Experiment Spec File#

Here is an example spec file for testing evaluation and inference of a trained VisualChangeNet-Segmentation model:

SPECS=$(tao-client visual_changenet get-spec --action evaluate --job_type experiment --id $EXPERIMENT_ID)

results_dir: /path/to/experiment_results
task: segment
model:
  backbone:
    type: "fan_small_12_p4_hybrid"
dataset:
  segment:
    dataset: "CNDataset"
    root_dir: /path/to/root/dataset/dir/
    data_name: "LEVIR-CD"
    label_transform: "norm"
    batch_size: 16
    workers: 2
    multi_scale_train: True
    multi_scale_infer: False
    num_classes: 2
    img_size: 256
    image_folder_name: "A"
    change_image_folder_name: "B"
    list_folder_name: 'list'
    annotation_folder_name: "label"
    test_split: "test"
    predict_split: 'predict'
    label_suffix: .png
evaluate:
  checkpoint: /path/to/checkpoint
  vis_after_n_batches: 1
  results_dir: /results/evaluate
inference:
  checkpoint: /path/to/checkpoint
  vis_after_n_batches: 1
  results_dir: /results/inference

Parameter	Datatype	Default	Description	Supported Values
checkpoint	string		Path to PyTorch model to evaluate/inference.
trt_engine	string		Path to TensorRT model to inference/evaluate. Should be only used with TAO Deploy.
num_gpus	unsigned int	1	The number of GPUs to use.	>0
gpu_ids	unsigned int	[0]	The GPU IDs to use.
results_dir	string		The path to a folder where the experiment outputs should be written.
vis_after_n_batches	unsigned int	1	Number of batches after which to save inference/evaluate visualization results.	>0

Evaluating the Model#

Use the following command to run a VisualChangeNet-Segmentation evaluation:

EVALUATE_JOB_ID=$(tao-client visual_changenet experiment-run-action --action evaluate --id $EXPERIMENT_ID --specs "$SPECS" --parent_job_id $TRAIN_JOB_ID)

tao model visual_changenet evaluate -e <experiment_spec>
                       task=segment
                       evaluate.checkpoint=<model to be evaluated>
                       [evaluate.<evaluate_option>=<evaluate_option_value>]
                       [evaluate.gpu_ids=<gpu indices>]
                       [evaluate.num_gpus=<number of gpus>]

Required Arguments

The following arguments are required.

-e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment.
evaluate.checkpoint: The .pth model to be evaluated.

Optional Arguments

The following arguments are optional to run the command.

evaluate.<evaluate_option>: The evaluate options.

Running Inference on the Model#

Use the following command to run inference on VisualChangeNet-Segmentation with the .pth model:

INFERENCE_JOB_ID=$(tao-client visual_changenet experiment-run-action --action inference --id $EXPERIMENT_ID --specs "$SPECS" --parent_job_id $TRAIN_JOB_ID)

tao model visual_changenet inference -e <experiment_spec>
                       task=segment
                       inference.checkpoint=<inference model>
                       [inference.<evaluate_option>=<evaluate_option_value>]
                       [inference.gpu_ids=<gpu indices>]
                       [inference.num_gpus=<number of gpus>]

Required Arguments

The following arguments are required.

-e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment.
inference.checkpoint: The .pth model to run inference on.

Optional Arguments

The following arguments are optional to run the command.

inference.<inference_option>: The inference options.

Exporting the Model#

Here is an example to get spec from the FTMS client and an example spec file from TAO Launcher for exporting the trained VisualChangeNet model:

SPECS=$(tao-client visual_changenet get-spec --action export --job_type experiment --id $EXPERIMENT_ID)

export:
  checkpoint: /path/to/model.pth
  onnx_file: /path/to/model.onnx
  opset_version: 12
  input_channel: 3
  input_width: 256
  input_height: 256
  batch_size: -1

Parameter	Datatype	Default	Description	Supported Values
checkpoint	string		The path to the PyTorch model to export.
onnx_file	string		The path to the `.onnx` file.
opset_version	unsigned int	12	The opset version of the exported ONNX.	>0
input_channel	unsigned int	3	The input channel size. Only the value 3 is supported.	3
input_width	unsigned int	128	The input width.	>0
input_height	unsigned int	512	The input height.	>0
batch_size	unsigned int	-1	The batch size of the ONNX model. If this value is set to -1, the export uses dynamic batch size.	>=-1
gpu_id	unsigned int	0	The GPU ID to use.
on_cpu	bool	False	If set to `True`, exports the model on CPU.
verbose	bool	False	If set to `True`, prints a human-readable representation of the network.

Use the following command to export the model:

EXPORT_JOB_ID=$(tao-client visual_changenet experiment-run-action --action export --id $EXPERIMENT_ID --specs "$SPECS" --parent_job_id $TRAIN_JOB_ID)

tao model visual_changenet export [-h] -e <experiment spec file>
                          task=segment
                          export.checkpoint=<model to export>
                          export.onnx_file=<onnx path>
                          [export.<export_option>=<export_option_value>]

Required Arguments

The following arguments are required to run the command.

-e, --experiment_spec: The path to an experiment spec file
export.checkpoint: The .pth model to export.
export.onnx_file: The path where the .etlt or .onnx model is saved.

Optional Arguments

The following arguments are optional to run the command.

export.<export_option>: The export options.

TensorRT Engine Generation, Validation, and int8 Calibration#

For deployment, refer to the TAO Deploy Documentation for VisualChangeNet-Segmentation.