Visual ChangeNet-Segmentation
Visual ChangeNet-Segmentation is an NVIDIA-developed semantic change segmentation model and is included in the TAO. Visual ChangeNet supports the following tasks:
train
evaluate
inference
export
These tasks can be invoked from the TAO Launcher using the following convention on the command-line:
tao model visual_changenet <sub_task> <args_per_subtask>
Where args_per_subtask
are the command-line arguments required for a given subtask. Each subtask is explained in the following sections.
VisualChangeNet-Segmentation requires the data to be provided as image and mask folders. See the Data Annotation Format page for more information about the input data format for VisualChangeNet-Segmentation.
Configuring a Custom Dataset
This section provides an example configuration and commands for training VisualChangeNet-Segmentation using the dataset format described for the LEVIR-CD dataset, above. LEVIR-CD dataset is a large-scale remote sensing building Change Detection dataset.
Here is an example spec file for training a VisualChangeNet-Segmentation model with NVIDIA’s FAN Hybrid backbone on the LEVIR-CD dataset using the Data Annotation Format.
encryption_key: tlt_encode
task: segment
train:
pretrained_model_path: /path/to/pretrained/model.pth
resume_training_checkpoint_path: null
segment:
loss: "ce"
weights: [0.5, 0.5, 0.5, 0.8, 1.0]
num_epochs: 10
num_nodes: 1
validation_interval: 5
checkpoint_interval: 5
seed: 1234
optim:
lr: 0.0001
optim: "adamw"
policy: "linear"
momentum: 0.9
weight_decay: 0.01
betas: [0.9, 0.999]
results_dir: /path/to/experiment_results
model:
backbone:
type: "fan_small_12_p4_hybrid"
pretrained_backbone_path: null
freeze_backbone: False
decode_head:
feature_strides: [4, 8, 16, 16]
dataset:
segment:
dataset: "CNDataset"
root_dir: /path/to/root/dataset/dir/
data_name: "LEVIR-CD"
label_transform: "norm"
batch_size: 16
workers: 2
multi_scale_train: True
multi_scale_infer: False
num_classes: 2
img_size: 256
image_folder_name: "A"
change_image_folder_name: "B"
list_folder_name: 'list'
annotation_folder_name: "label"
train_split: "train"
validation_split: "val"
label_suffix: .png
augmentation:
random_flip:
vflip_probability: 0.5
hflip_probability: 0.5
enable: True
random_rotate:
rotate_probability: 0.5
angle_list: [90, 180, 270]
enable: True
random_color:
brightness: 0.3
contrast: 0.3
saturation: 0.3
hue: 0.3
enable: True
with_scale_random_crop:
enable: True
with_random_crop: True
with_random_blur: True
evaluate:
checkpoint: "???"
vis_after_n_batches: 10
inference:
checkpoint: "???"
vis_after_n_batches: 1
export:
gpu_id: 0
checkpoint: "???"
onnx_file: "???"
input_width: 256
input_height: 256
Parameter | Data Type | Default | Description | Supported Values |
model |
dict config | – | The configuration of the model architecture | |
dataset |
dict config | – | The configuration of the dataset | |
train |
dict config | – | The configuration of the training task | |
evaluate |
dict config | – | The configuration of the evaluation task | |
inference |
dict config | – | The configuration of the inference task | |
encryption_key |
string | None | The encryption key to encrypt and decrypt model files | |
results_dir |
string | /results | The directory where experiment results are saved | |
export |
dict config | – | The configuration of the ONNX export task | |
task |
str | segment | A flag to indicate the change detection task. Supports two tasks: ‘segment’ and ‘classify’ for segmentation and classification |
Parameter | Datatype | Default | Description | Supported Values |
num_gpus |
unsigned int | 1 | The number of GPUs to use for distributed training | >0 |
gpu_ids |
List[int] | [0] | The indices of the GPU’s to use for distributed training | |
seed |
unsigned int | 1234 | The random seed for random, numpy, and torch | >0 |
num_epochs |
unsigned int | 10 | The total number of epochs to run the experiment | >0 |
checkpoint_interval |
unsigned int | 1 | The epoch interval at which the checkpoints are saved | >0 |
validation_interval |
unsigned int | 1 | The epoch interval at which the validation is run | >0 |
resume_training_checkpoint_path |
string | The intermediate PyTorch Lightning checkpoint to resume training from | ||
results_dir |
string | /results/train | The directory to save training results | |
|
Dict
str |
None ce |
The
* |
|
num_nodes |
unsigned int | 1 | The number of nodes. If the value is larger than 1, multi-node is enabled. | |
pretrained_model_path |
string | – | The path to the pretrained model checkpoint to initialize the end-end model weights. | |
|
dict config |
None |
Contains the configurable parameters for the VisualChangeNet optimizer detailed in |
|
optim
optim:
lr: 0.0001
optim: "adamw"
policy: "linear"
momentum: 0.9
weight_decay: 0.01
Parameter | Datatype | Default | Description | Supported Values |
lr |
float | 0.0005 | The learning rate | >=0.0 |
optim |
str | adamw | ||
|
str |
linear |
The learning scheduler: |
linear/step |
momentum |
float | 0.9 | The momentum for the AdamW optimizer | |
weight_decay |
float | 0.1 | The weight decay coefficient |
The following example model
config provides options to change the VisualChangeNet-Segmentation architecture for training.
model:
backbone:
type: "fan_small_12_p4_hybrid"
pretrained_backbone_path: null
freeze_backbone: False
decode_head:
feature_strides: [4, 8, 16, 16]
align_corner: False
Parameter | Datatype | Default | Description | Supported Values |
|
Dict bool |
None
None |
A dictionary containing the following configurable parameters:
* |
fan_tiny_8_p4_hybrid |
|
Dict |
None |
A dictionary containing the following configurable parameters: |
True, False |
The dataset
parameter defines the dataset source, training batch size,
augmentation, and pre-processing. An example dataset
is provided below.
dataset:
segment:
dataset: "CNDataset"
root_dir: /path/to/root/dataset/dir/
data_name: "LEVIR-CD"
label_transform: "norm"
batch_size: 16
workers: 2
multi_scale_train: True
multi_scale_infer: False
num_classes: 2
img_size: 256
image_folder_name: "A"
change_image_folder_name: "B"
list_folder_name: 'list'
annotation_folder_name: "label"
train_split: "train"
validation_split: "val"
test_split: "test"
predict_split: 'predict'
label_suffix: .png
augmentation:
random_flip:
vflip_probability: 0.5
hflip_probability: 0.5
enable: True
random_rotate:
rotate_probability: 0.5
angle_list: [90, 180, 270]
enable: True
random_color:
brightness: 0.3
contrast: 0.3
saturation: 0.3
hue: 0.3
enable: True
with_scale_random_crop:
enable: True
with_random_crop: True
with_random_blur: True
color_map:
'0': [255, 255, 255]
'1': [0, 0, 0]
Parameter | Datatype | Default | Description | Supported Values |
segment |
Dict | – | The segment contains dataset config for the segmentation dataloader detailed in the segment section. |
|
classify |
Dict | – | The classify contains dataset config for the classification dataloader |
segment
Parameter | Datatype | Default | Description | Supported Values |
dataset |
Dict | CNDataset | The dataloader supported for segmentation | CNDataset |
root_dir |
str | – | The root directory path where the dataset is located. | |
data_name |
str | LEVIR-CD | The dataset identifier | LEVIR-CD, LandSCD, custom |
batch_size |
int | 32 | The number of samples per batch | >0 |
workers |
int | 2 | The number of worker processes for data loading | >=0 |
multi_scale_train |
bool | True | Whether multi-scale training is enabled | True, False |
multi_scale_infer |
bool | False | Whether multi-scale inference is enabled | True, False |
num_classes |
int | 2 | Number of classes in the dataset. | >=2 |
img_size |
int | 256 | Size of the input images after resizing. | |
image_folder_name |
str | A | Name of the folder containing input images. | |
change_image_folder_name |
str | B | Name of the folder containing the changed images | |
list_folder_name |
str | list | Name of the folder containing dataset split lists’ csv files. | |
annotation_folder_name |
str | label | Name of the folder containing annotation masks | |
train_split |
str | train | Dataset split used for training, should indicate the name of csv file in list_folder_name. | |
validation_split |
str | val | Dataset split used for validation, should indicate the name of csv file in list_folder_name. | |
test_split |
str | test | Dataset split used for evaluation, should indicate the name of csv file in list_folder_name. | |
predict_split |
str | predict | Dataset split used for inference, should indicate the name of csv file in list_folder_name. | |
label_suffix |
str | .png | Suffix of the label image files. | |
augmentation |
Dict | None | Dictionary containing various data augmentation settings, which is detailed in the augmentation section. | |
color_map |
Optional[Dict[str, List[int]]] | None | Mapping of string class labels (‘0’ to ‘n’) to rgb color codes. |
augmentation
Parameter | Datatype | Default | Description | Supported Values |
|
Dict |
None |
Random vertical and horizontal flipping augmentation settings. |
>=0.0 |
|
Dict |
None |
Randomly rotate images with specified probability and angles |
>=0.0 |
|
Dict |
None |
Apply random color augmentation to images. |
>=0.0 >=0.0 |
|
Dict |
None |
Apply random scaling and cropping augmentation. |
True, False |
with_random_crop |
bool | True | Apply random crop augmentation. | True, False |
with_random_blur |
bool | True | Apply random blurring augmentation. | True, False |
mean |
List[float] | [0.5, 0.5, 0.5] | The mean to be subtracted for pre-processing. | |
std |
List[float] | [0.5, 0.5, 0.5] | The standard deviation to divide the image by. |
Example spec file for ViT backbones
The following spec file is only relevant for TAO versions 5.3 and later.
encryption_key: tlt_encode
task: segment
train:
pretrained_model_path: /path/to/pretrained/model.pth
resume_training_checkpoint_path: null
segment:
loss: "ce"
weights: [0.5, 0.5, 0.5, 0.8, 1.0]
num_epochs: 350
num_nodes: 1
validation_interval: 1
checkpoint_interval: 1
optim:
lr: 0.00002
optim: "adamw"
policy: "linear"
momentum: 0.9
weight_decay: 0.01
betas: [0.9, 0.999]
results_dir: /path/to/experiment_results
model:
backbone:
type: "vit_large_nvdinov2"
pretrained_backbone_path: /path/to/pretrained/backbone.pth
freeze_backbone: False
decode_head:
feature_strides: [4, 8, 16, 32]
dataset:
segment:
dataset: "CNDataset"
root_dir: /path/to/root/dataset/dir/
data_name: "LEVIR-CD"
label_transform: "norm"
batch_size: 16
workers: 2
multi_scale_train: True
multi_scale_infer: False
num_classes: 2
img_size: 256
image_folder_name: "A"
change_image_folder_name: "B"
list_folder_name: 'list'
annotation_folder_name: "label"
train_split: "train"
validation_split: "val"
label_suffix: .png
augmentation:
random_flip:
vflip_probability: 0.5
hflip_probability: 0.5
enable: True
random_rotate:
rotate_probability: 0.5
angle_list: [90, 180, 270]
enable: True
random_color:
brightness: 0.3
contrast: 0.3
saturation: 0.3
hue: 0.3
enable: True
with_scale_random_crop:
enable: True
with_random_crop: True
with_random_blur: True
evaluate:
checkpoint: "???"
vis_after_n_batches: 10
inference:
checkpoint: "???"
vis_after_n_batches: 1
export:
gpu_id: 0
checkpoint: "???"
onnx_file: "???"
input_width: 256
input_height: 256
Use the following command to run VisualChangeNet-Segmentation training:
tao model visual_changenet train -e <experiment_spec_file>
task=segment
[results_dir=<global_results_dir>]
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[train.<train_option>=<train_option_value>]
[train.gpu_ids=<gpu indices>]
[train.num_gpus=<number of gpus>]
Required Arguments
-e, --experiment_spec_file
: The path to the experiment spec file.task
: The task (‘segment’ or ‘classify’) for the visual_changenet training. Default: segment.
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
-h, --help
: Show this help message and exit.model.<model_option>
: The model options.dataset.<dataset_option>
: The dataset options.train.<train_option>
: The train options.train.optim.<optim_option>
: The optimizer options
For training, evaluation, and inference, we expose 2 variables for each respective task: num_gpus
and gpu_ids
, which
default to 1
and [0]
, respectively. If both are passed, but inconsistent, for example num_gpus = 1
,
gpu_ids = [0, 1]
, then they are modified to follow the setting with more GPUs, for example num_gpus = 1 -> num_gpus = 2
.
Checkpointing and Resuming Training
At every train.checkpoint_interval
, a PyTorch Lightning checkpoint is saved. It is called model_epoch_<epoch_num>.pth
.
These are saved in train.results_dir
, like so:
$ ls /results/train
'model_epoch_000.pth'
'model_epoch_001.pth'
'model_epoch_002.pth'
'model_epoch_003.pth'
'model_epoch_004.pth'
The latest checkpoint will also be saved as changenet_model_segment_latest.pth
.
Training will automatically resume from changenet_model_segment_latest.pth
if it exists in train.results_dir
.
This will be superseded by train.resume_training_checkpoint_path
if it is provided.
The major implication of this logic is that, if you wish to trigger fresh training from scratch, either
Specify a new, empty results directory (Recommended), or
Remove the latest checkpoint from the results directory
Here is an example spec file for testing evaluation and inference of a trained VisualChangeNet-Segmentation model:
results_dir: /path/to/experiment_results
task: segment
model:
backbone:
type: "fan_small_12_p4_hybrid"
dataset:
segment:
dataset: "CNDataset"
root_dir: /path/to/root/dataset/dir/
data_name: "LEVIR-CD"
label_transform: "norm"
batch_size: 16
workers: 2
multi_scale_train: True
multi_scale_infer: False
num_classes: 2
img_size: 256
image_folder_name: "A"
change_image_folder_name: "B"
list_folder_name: 'list'
annotation_folder_name: "label"
test_split: "test"
predict_split: 'predict'
label_suffix: .png
evaluate:
checkpoint: /path/to/checkpoint
vis_after_n_batches: 1
results_dir: /results/evaluate
inference:
checkpoint: /path/to/checkpoint
vis_after_n_batches: 1
results_dir: /results/inference
Parameter | Datatype | Default | Description | Supported Values |
checkpoint |
string | Path to PyTorch model to evaluate/infer | ||
vis_after_n_batches |
int | Number of batches interval between each visualisation output save. | ||
trt_engine |
string | Path to TensorRT model to inference. Should be only used with TAO Deploy | ||
num_gpus |
unsigned int | 1 | The number of GPUs to use | >0 |
gpu_ids |
unsigned int | [0] | The GPU ids to use | |
results_dir |
string | The path to a folder where the experiment outputs should be written |
Use the following command to run a VisualChangeNet-Segmentation evaluation:
tao model visual_changenet evaluate -e <experiment_spec>
task=segment
evaluate.checkpoint=<model to be evaluated>
[evaluate.<evaluate_option>=<evaluate_option_value>]
[evaluate.gpu_ids=<gpu indices>]
[evaluate.num_gpus=<number of gpus>]
Required Arguments
-e, --experiment_spec_file
: The experiment spec file to set up the evaluation experiment.evaluate.checkpoint
: The.pth
model to be evaluated.
Optional Arguments
evaluate.<evaluate_option>
: The evaluate options.
Use the following command to run inference on VisualChangeNet-Segmentation with the .tlt
model:
tao model visual_changenet inference -e <experiment_spec>
task=segment
inference.checkpoint=<inference model>
[inference.<evaluate_option>=<evaluate_option_value>]
[inference.gpu_ids=<gpu indices>]
[inference.num_gpus=<number of gpus>]
Required Arguments
-e, --experiment_spec_file
: The experiment spec file to set up the evaluation experiment.inference.checkpoint
: The.pth
model to run inference on.
Optional Arguments
inference.<inference_option>
: The inference options.
Here is an example spec file for exporting the trained VisualChangeNet model:
export:
checkpoint: /path/to/model.pth
onnx_file: /path/to/model.onnx
opset_version: 12
input_channel: 3
input_width: 256
input_height: 256
batch_size: -1
Parameter | Datatype | Default | Description | Supported Values |
checkpoint |
string | The path to the PyTorch model to export | ||
onnx_file |
string | The path to the .onnx file |
||
opset_version |
unsigned int | 12 | The opset version of the exported ONNX | >0 |
input_channel |
unsigned int | 3 | The input channel size. Only the value 3 is supported. | 3 |
input_width |
unsigned int | 256 | The input width | >0 |
input_height |
unsigned int | 256 | The input height | >0 |
batch_size |
unsigned int | -1 | The batch size of the ONNX model. If this value is set to -1, the export uses dynamic batch size. | >=-1 |
Use the following command to export the model:
tao model visual_changenet export [-h] -e <experiment spec file>
task=segment
export.checkpoint=<model to export>
export.onnx_file=<onnx path>
[export.<export_option>=<export_option_value>]
Required Arguments
-e, --experiment_spec
: The path to an experiment spec fileexport.checkpoint
: The.pth
model to export.export.onnx_file
: The path where the.etlt
or.onnx
model is saved.
Optional Arguments
export.<export_option>
: The export options.
For deployment, refer to the TAO Deploy Documentation for VisualChangeNet-Segmentation.