TAO v5.5.0
NVIDIA TAO v5.5.0

Visual ChangeNet-Segmentation

Visual ChangeNet-Segmentation is an NVIDIA-developed semantic change segmentation model and is included in the TAO. Visual ChangeNet supports the following tasks:

  • train

  • evaluate

  • inference

  • export

These tasks can be invoked from the TAO Launcher using the following convention on the command-line:

Copy
Copied!
            

tao model visual_changenet <sub_task> <args_per_subtask>

Where args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in the following sections.

VisualChangeNet-Segmentation requires the data to be provided as image and mask folders. See the Data Annotation Format page for more information about the input data format for VisualChangeNet-Segmentation.

Configuring a Custom Dataset

This section provides an example configuration and commands for training VisualChangeNet-Segmentation using the dataset format described for the LEVIR-CD dataset, above. LEVIR-CD dataset is a large-scale remote sensing building Change Detection dataset.

Here is an example spec file for training a VisualChangeNet-Segmentation model with NVIDIA’s FAN Hybrid backbone on the LEVIR-CD dataset using the Data Annotation Format.

Copy
Copied!
            

encryption_key: tlt_encode task: segment train: pretrained_model_path: /path/to/pretrained/model.pth resume_training_checkpoint_path: null segment: loss: "ce" weights: [0.5, 0.5, 0.5, 0.8, 1.0] num_epochs: 10 num_nodes: 1 validation_interval: 5 checkpoint_interval: 5 seed: 1234 optim: lr: 0.0001 optim: "adamw" policy: "linear" momentum: 0.9 weight_decay: 0.01 betas: [0.9, 0.999] results_dir: /path/to/experiment_results model: backbone: type: "fan_small_12_p4_hybrid" pretrained_backbone_path: null freeze_backbone: False decode_head: feature_strides: [4, 8, 16, 16] dataset: segment: dataset: "CNDataset" root_dir: /path/to/root/dataset/dir/ data_name: "LEVIR-CD" label_transform: "norm" batch_size: 16 workers: 2 multi_scale_train: True multi_scale_infer: False num_classes: 2 img_size: 256 image_folder_name: "A" change_image_folder_name: "B" list_folder_name: 'list' annotation_folder_name: "label" train_split: "train" validation_split: "val" label_suffix: .png augmentation: random_flip: vflip_probability: 0.5 hflip_probability: 0.5 enable: True random_rotate: rotate_probability: 0.5 angle_list: [90, 180, 270] enable: True random_color: brightness: 0.3 contrast: 0.3 saturation: 0.3 hue: 0.3 enable: True with_scale_random_crop: enable: True with_random_crop: True with_random_blur: True evaluate: checkpoint: "???" vis_after_n_batches: 10 inference: checkpoint: "???" vis_after_n_batches: 1 export: gpu_id: 0 checkpoint: "???" onnx_file: "???" input_width: 256 input_height: 256

Parameter Data Type Default Description Supported Values
model dict config – The configuration of the model architecture
dataset dict config – The configuration of the dataset
train dict config – The configuration of the training task
evaluate dict config – The configuration of the evaluation task
inference dict config – The configuration of the inference task
encryption_key string None The encryption key to encrypt and decrypt model files
results_dir string /results The directory where experiment results are saved
export dict config – The configuration of the ONNX export task
task str segment A flag to indicate the change detection task. Supports two tasks: ‘segment’ and ‘classify’ for segmentation and classification
Parameter Datatype Default Description Supported Values
num_gpus unsigned int 1 The number of GPUs to use for distributed training >0
gpu_ids List[int] [0] The indices of the GPU’s to use for distributed training
seed unsigned int 1234 The random seed for random, numpy, and torch >0
num_epochs unsigned int 10 The total number of epochs to run the experiment >0
checkpoint_interval unsigned int 1 The epoch interval at which the checkpoints are saved >0
validation_interval unsigned int 1 The epoch interval at which the validation is run >0
resume_training_checkpoint_path string The intermediate PyTorch Lightning checkpoint to resume training from
results_dir string /results/train The directory to save training results

segment

Dict

str
list

None

ce

The segment dict contains configurable parameters for the VisualChangeNet Segmentation pipeline with the following parameters:

* loss: The loss function used for segmentation training.
* weights: Weights for multi-scale training.

num_nodes unsigned int 1 The number of nodes. If the value is larger than 1, multi-node is enabled.
pretrained_model_path string – The path to the pretrained model checkpoint to initialize the end-end model weights.

optim

dict config

None

Contains the configurable parameters for the VisualChangeNet optimizer detailed in
the optim section.

optim

Copy
Copied!
            

optim: lr: 0.0001 optim: "adamw" policy: "linear" momentum: 0.9 weight_decay: 0.01

Parameter Datatype Default Description Supported Values
lr float 0.0005 The learning rate >=0.0
optim str adamw

policy

str

linear

The learning scheduler:
* linear : LambdaLR decreases the lr by a multiplicative factor.
* step : StepLR decrease the lr by 0.1 at every num_epochs//3

linear/step

momentum float 0.9 The momentum for the AdamW optimizer
weight_decay float 0.1 The weight decay coefficient

The following example model config provides options to change the VisualChangeNet-Segmentation architecture for training.

Copy
Copied!
            

model: backbone: type: "fan_small_12_p4_hybrid" pretrained_backbone_path: null freeze_backbone: False decode_head: feature_strides: [4, 8, 16, 16] align_corner: False

Parameter Datatype Default Description Supported Values

backbone

Dict
string

bool

None

None
False

A dictionary containing the following configurable parameters:
* type: The name of the backbone to be used

* pretrained_backbone_path: The path to pre-trained backbone weights file
* freeze_backbone: Whether to freeze the backbone weights during training

fan_tiny_8_p4_hybrid
fan_large_16_p4_hybrid
fan_small_12_p4_hybrid
fan_base_16_p4_hybrid

decode_head

Dict
bool
list

None
False
[4, 8, 16, 16]

A dictionary containing the following configurable parameters:
* align_corners
* feature_strides

True, False
True, False

The dataset parameter defines the dataset source, training batch size, augmentation, and pre-processing. An example dataset is provided below.

Copy
Copied!
            

dataset: segment: dataset: "CNDataset" root_dir: /path/to/root/dataset/dir/ data_name: "LEVIR-CD" label_transform: "norm" batch_size: 16 workers: 2 multi_scale_train: True multi_scale_infer: False num_classes: 2 img_size: 256 image_folder_name: "A" change_image_folder_name: "B" list_folder_name: 'list' annotation_folder_name: "label" train_split: "train" validation_split: "val" test_split: "test" predict_split: 'predict' label_suffix: .png augmentation: random_flip: vflip_probability: 0.5 hflip_probability: 0.5 enable: True random_rotate: rotate_probability: 0.5 angle_list: [90, 180, 270] enable: True random_color: brightness: 0.3 contrast: 0.3 saturation: 0.3 hue: 0.3 enable: True with_scale_random_crop: enable: True with_random_crop: True with_random_blur: True color_map: '0': [255, 255, 255] '1': [0, 0, 0]

Parameter Datatype Default Description Supported Values
segment Dict – The segment contains dataset config for the segmentation dataloader detailed in the segment section.
classify Dict – The classify contains dataset config for the classification dataloader

segment

Parameter Datatype Default Description Supported Values
dataset Dict CNDataset The dataloader supported for segmentation CNDataset
root_dir str – The root directory path where the dataset is located.
data_name str LEVIR-CD The dataset identifier LEVIR-CD, LandSCD, custom
batch_size int 32 The number of samples per batch >0
workers int 2 The number of worker processes for data loading >=0
multi_scale_train bool True Whether multi-scale training is enabled True, False
multi_scale_infer bool False Whether multi-scale inference is enabled True, False
num_classes int 2 Number of classes in the dataset. >=2
img_size int 256 Size of the input images after resizing.
image_folder_name str A Name of the folder containing input images.
change_image_folder_name str B Name of the folder containing the changed images
list_folder_name str list Name of the folder containing dataset split lists’ csv files.
annotation_folder_name str label Name of the folder containing annotation masks
train_split str train Dataset split used for training, should indicate the name of csv file in list_folder_name.
validation_split str val Dataset split used for validation, should indicate the name of csv file in list_folder_name.
test_split str test Dataset split used for evaluation, should indicate the name of csv file in list_folder_name.
predict_split str predict Dataset split used for inference, should indicate the name of csv file in list_folder_name.
label_suffix str .png Suffix of the label image files.
augmentation Dict None Dictionary containing various data augmentation settings, which is detailed in the augmentation section.
color_map Optional[Dict[str, List[int]]] None Mapping of string class labels (‘0’ to ‘n’) to rgb color codes.

augmentation

Parameter Datatype Default Description Supported Values

random_flip

Dict
float
float
enable

None
0.5
0.5
True

Random vertical and horizontal flipping augmentation settings.
* vflip_probability: Probability of vertical flipping.
* hflip_probability: Probability of horizontal flipping.
* enable: Enable or disable random flipping augmentation.

>=0.0
>=0.0

random_rotate

Dict
float
list
enable

None
0.5
[90, 180, 270]
True

Randomly rotate images with specified probability and angles
* rotate_probability: Probability of applying random rotation.
* angle_list: List of rotation angles to choose from.
* enable: Enable or disable random rotation augmentation.

>=0.0
>=0.0

random_color

Dict
float
float
float
float
enable
float

None
0.3
0.3
0.3
0.3
True
0.5

Apply random color augmentation to images.
* brightness: Maximum brightness change factor.
* contrast: Maximum contrast change factor.
* saturation: Maximum saturation change factor.
* hue: Maximum hue change factor.
* enabled: Enable or disable random color augmentation.
* color_probability: Probability of applying color augmentation.

>=0.0
>=0.0
>=0.0
>=0.0

>=0.0

with_scale_random_crop

Dict
enable

None
True

Apply random scaling and cropping augmentation.
* enabled: Enable or disable random color augmentation.

True, False

with_random_crop bool True Apply random crop augmentation. True, False
with_random_blur bool True Apply random blurring augmentation. True, False
mean List[float] [0.5, 0.5, 0.5] The mean to be subtracted for pre-processing.
std List[float] [0.5, 0.5, 0.5] The standard deviation to divide the image by.

Example spec file for ViT backbones

Note

The following spec file is only relevant for TAO versions 5.3 and later.

Copy
Copied!
            

encryption_key: tlt_encode task: segment train: pretrained_model_path: /path/to/pretrained/model.pth resume_training_checkpoint_path: null segment: loss: "ce" weights: [0.5, 0.5, 0.5, 0.8, 1.0] num_epochs: 350 num_nodes: 1 validation_interval: 1 checkpoint_interval: 1 optim: lr: 0.00002 optim: "adamw" policy: "linear" momentum: 0.9 weight_decay: 0.01 betas: [0.9, 0.999] results_dir: /path/to/experiment_results model: backbone: type: "vit_large_nvdinov2" pretrained_backbone_path: /path/to/pretrained/backbone.pth freeze_backbone: False decode_head: feature_strides: [4, 8, 16, 32] dataset: segment: dataset: "CNDataset" root_dir: /path/to/root/dataset/dir/ data_name: "LEVIR-CD" label_transform: "norm" batch_size: 16 workers: 2 multi_scale_train: True multi_scale_infer: False num_classes: 2 img_size: 256 image_folder_name: "A" change_image_folder_name: "B" list_folder_name: 'list' annotation_folder_name: "label" train_split: "train" validation_split: "val" label_suffix: .png augmentation: random_flip: vflip_probability: 0.5 hflip_probability: 0.5 enable: True random_rotate: rotate_probability: 0.5 angle_list: [90, 180, 270] enable: True random_color: brightness: 0.3 contrast: 0.3 saturation: 0.3 hue: 0.3 enable: True with_scale_random_crop: enable: True with_random_crop: True with_random_blur: True evaluate: checkpoint: "???" vis_after_n_batches: 10 inference: checkpoint: "???" vis_after_n_batches: 1 export: gpu_id: 0 checkpoint: "???" onnx_file: "???" input_width: 256 input_height: 256

Use the following command to run VisualChangeNet-Segmentation training:

Copy
Copied!
            

tao model visual_changenet train -e <experiment_spec_file> task=segment [results_dir=<global_results_dir>] [model.<model_option>=<model_option_value>] [dataset.<dataset_option>=<dataset_option_value>] [train.<train_option>=<train_option_value>] [train.gpu_ids=<gpu indices>] [train.num_gpus=<number of gpus>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file.

  • task: The task (‘segment’ or ‘classify’) for the visual_changenet training. Default: segment.

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

Note

For training, evaluation, and inference, we expose 2 variables for each respective task: num_gpus and gpu_ids, which default to 1 and [0], respectively. If both are passed, but inconsistent, for example num_gpus = 1, gpu_ids = [0, 1], then they are modified to follow the setting with more GPUs, for example num_gpus = 1 -> num_gpus = 2.

Checkpointing and Resuming Training

At every train.checkpoint_interval, a PyTorch Lightning checkpoint is saved. It is called model_epoch_<epoch_num>.pth. These are saved in train.results_dir, like so:

Copy
Copied!
            

$ ls /results/train 'model_epoch_000.pth' 'model_epoch_001.pth' 'model_epoch_002.pth' 'model_epoch_003.pth' 'model_epoch_004.pth'

The latest checkpoint will also be saved as changenet_model_segment_latest.pth. Training will automatically resume from changenet_model_segment_latest.pth if it exists in train.results_dir. This will be superseded by train.resume_training_checkpoint_path if it is provided.

The major implication of this logic is that, if you wish to trigger fresh training from scratch, either

  • Specify a new, empty results directory (Recommended), or

  • Remove the latest checkpoint from the results directory

Here is an example spec file for testing evaluation and inference of a trained VisualChangeNet-Segmentation model:

Copy
Copied!
            

results_dir: /path/to/experiment_results task: segment model: backbone: type: "fan_small_12_p4_hybrid" dataset: segment: dataset: "CNDataset" root_dir: /path/to/root/dataset/dir/ data_name: "LEVIR-CD" label_transform: "norm" batch_size: 16 workers: 2 multi_scale_train: True multi_scale_infer: False num_classes: 2 img_size: 256 image_folder_name: "A" change_image_folder_name: "B" list_folder_name: 'list' annotation_folder_name: "label" test_split: "test" predict_split: 'predict' label_suffix: .png evaluate: checkpoint: /path/to/checkpoint vis_after_n_batches: 1 results_dir: /results/evaluate inference: checkpoint: /path/to/checkpoint vis_after_n_batches: 1 results_dir: /results/inference

Parameter Datatype Default Description Supported Values
checkpoint string Path to PyTorch model to evaluate/infer
vis_after_n_batches int Number of batches interval between each visualisation output save.
trt_engine string Path to TensorRT model to inference. Should be only used with TAO Deploy
num_gpus unsigned int 1 The number of GPUs to use >0
gpu_ids unsigned int [0] The GPU ids to use
results_dir string The path to a folder where the experiment outputs should be written

Use the following command to run a VisualChangeNet-Segmentation evaluation:

Copy
Copied!
            

tao model visual_changenet evaluate -e <experiment_spec> task=segment evaluate.checkpoint=<model to be evaluated> [evaluate.<evaluate_option>=<evaluate_option_value>] [evaluate.gpu_ids=<gpu indices>] [evaluate.num_gpus=<number of gpus>]

Required Arguments

  • -e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment.

  • evaluate.checkpoint: The .pth model to be evaluated.

Optional Arguments

Use the following command to run inference on VisualChangeNet-Segmentation with the .tlt model:

Copy
Copied!
            

tao model visual_changenet inference -e <experiment_spec> task=segment inference.checkpoint=<inference model> [inference.<evaluate_option>=<evaluate_option_value>] [inference.gpu_ids=<gpu indices>] [inference.num_gpus=<number of gpus>]

Required Arguments

  • -e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment.

  • inference.checkpoint: The .pth model to run inference on.

Optional Arguments

Here is an example spec file for exporting the trained VisualChangeNet model:

Copy
Copied!
            

export: checkpoint: /path/to/model.pth onnx_file: /path/to/model.onnx opset_version: 12 input_channel: 3 input_width: 256 input_height: 256 batch_size: -1

Parameter Datatype Default Description Supported Values
checkpoint string The path to the PyTorch model to export
onnx_file string The path to the .onnx file
opset_version unsigned int 12 The opset version of the exported ONNX >0
input_channel unsigned int 3 The input channel size. Only the value 3 is supported. 3
input_width unsigned int 256 The input width >0
input_height unsigned int 256 The input height >0
batch_size unsigned int -1 The batch size of the ONNX model. If this value is set to -1, the export uses dynamic batch size. >=-1

Use the following command to export the model:

Copy
Copied!
            

tao model visual_changenet export [-h] -e <experiment spec file> task=segment export.checkpoint=<model to export> export.onnx_file=<onnx path> [export.<export_option>=<export_option_value>]

Required Arguments

  • -e, --experiment_spec: The path to an experiment spec file

  • export.checkpoint: The .pth model to export.

  • export.onnx_file: The path where the .etlt or .onnx model is saved.

Optional Arguments

Previous VisualChangeNet
Next Visual ChangeNet-Classification
© Copyright 2024, NVIDIA. Last updated on Aug 30, 2024.