Visual ChangeNet-Segmentation#
Visual ChangeNet-Segmentation is an NVIDIA-developed semantic change segmentation model and is included in the TAO. Visual ChangeNet supports the following tasks:
train
evaluate
inference
export
Each task is explained in detail in the following sections.
Note
Throughout this documentation, you will see references to
$EXPERIMENT_ID
and$DATASET_ID
in the FTMS Client sections.For instructions on creating a dataset using the remote client, see the Creating a dataset section in the Remote Client documentation.
For instructions on creating an experiment using the remote client, see the Creating an experiment section in the Remote Client documentation.
The spec format is YAML for TAO Launcher and JSON for FTMS Client.
File-related parameters, such as dataset paths or pretrained model paths, are required only for TAO Launcher and not for FTMS Client.
Data Input for VisualChangeNet#
VisualChangeNet-Segmentation requires the data to be provided as image and mask folders. See the Data Annotation Format page for more information about the input data format for VisualChangeNet-Segmentation.
Creating a Training Experiment Spec File#
Configuring a Custom Dataset#
This section provides an example configuration and commands to retrieve configuration for training VisualChangeNet-Segmentation using the dataset format described for the LEVIR-CD dataset, above. LEVIR-CD dataset is a large-scale remote sensing building Change Detection dataset.
Note
Make sure to set task=segment in SPECS for all task specs.
SPECS=$(tao-client visual_changenet get-spec --action train --job_type experiment --id $EXPERIMENT_ID)
Here is an example spec file for training a VisualChangeNet-Segmentation model with NVIDIA’s FAN Hybrid backbone on the LEVIR-CD dataset using the Data Annotation Format.
encryption_key: tlt_encode
task: segment
train:
pretrained_model_path: /path/to/pretrained/model.pth
resume_training_checkpoint_path: null
segment:
loss: "ce"
weights: [0.5, 0.5, 0.5, 0.8, 1.0]
num_epochs: 10
num_nodes: 1
validation_interval: 5
checkpoint_interval: 5
seed: 1234
optim:
lr: 0.0001
optim: "adamw"
policy: "linear"
momentum: 0.9
weight_decay: 0.01
betas: [0.9, 0.999]
results_dir: /path/to/experiment_results
model:
backbone:
type: "fan_small_12_p4_hybrid"
pretrained_backbone_path: null
freeze_backbone: False
decode_head:
feature_strides: [4, 8, 16, 16]
use_summary_token: True
dataset:
segment:
dataset: "CNDataset"
root_dir: /path/to/root/dataset/dir/
data_name: "LEVIR-CD"
label_transform: "norm"
batch_size: 16
workers: 2
multi_scale_train: True
multi_scale_infer: False
num_classes: 2
img_size: 256
image_folder_name: "A"
change_image_folder_name: "B"
list_folder_name: 'list'
annotation_folder_name: "label"
train_split: "train"
validation_split: "val"
label_suffix: .png
augmentation:
random_flip:
vflip_probability: 0.5
hflip_probability: 0.5
enable: True
random_rotate:
rotate_probability: 0.5
angle_list: [90, 180, 270]
enable: True
random_color:
brightness: 0.3
contrast: 0.3
saturation: 0.3
hue: 0.3
enable: True
with_scale_random_crop:
enable: True
with_random_crop: True
with_random_blur: True
evaluate:
checkpoint: "???"
vis_after_n_batches: 10
inference:
checkpoint: "???"
vis_after_n_batches: 1
export:
gpu_id: 0
checkpoint: "???"
onnx_file: "???"
input_width: 256
input_height: 256
Parameter |
Data Type |
Default |
Description |
Supported Values |
---|---|---|---|---|
model |
dict config |
– |
The configuration of the model architecture. |
|
dataset |
dict config |
– |
The configuration of the dataset. |
|
train |
dict config |
– |
The configuration of the training task. |
|
evaluate |
dict config |
– |
The configuration of the evaluation task. |
|
inference |
dict config |
– |
The configuration of the inference task. |
|
encryption_key |
string |
None |
The encryption key to encrypt and decrypt model files. |
|
results_dir |
string |
/results |
The directory where experiment results are saved. |
|
export |
dict config |
– |
The configuration of the ONNX export task. |
|
task |
str |
segment |
A flag to indicate the change detection task. Supports two tasks: ‘segment’ and ‘classify’ for segmentation and classification. |
classify, segmen |
train#
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
num_gpus |
unsigned int |
1 |
The number of GPUs to use for distributed training. |
>0 |
gpu_ids |
List[int] |
[0] |
The indices of the GPU’s to use for distributed training. |
|
seed |
unsigned int |
1234 |
The random seed for random, numpy, and torch. |
>0 |
num_epochs |
unsigned int |
10 |
The total number of epochs to run the experiment. |
>0 |
checkpoint_interval |
unsigned int |
1 |
The epoch interval at which the checkpoints are saved. |
>0 |
validation_interval |
unsigned int |
1 |
The epoch interval at which the validation is run. |
>0 |
resume_training_checkpoint_path |
string |
The intermediate PyTorch Lightning checkpoint to resume training from. |
||
results_dir |
string |
/results/train |
The directory to save training results. |
|
segment
|
Dict
str
list
|
None
ce
|
The segment dict contains configurable parameters for the VisualChangeNet Segmentation pipeline with the following parameters:
* loss: The loss function used for segmentation training.
* weights: Weights for multi-scale training.
|
|
num_nodes |
unsigned int |
1 |
The number of nodes. If the value is larger than 1, multi-node is enabled. |
|
pretrained_model_path |
string |
– |
The path to the pretrained model checkpoint to initialize the end-end model weights. |
|
optim |
dict config |
None |
Contains the configurable parameters for the VisualChangeNet optimizer detailed in the optim section. |
optim#
optim:
lr: 0.0001
optim: "adamw"
policy: "linear"
momentum: 0.9
weight_decay: 0.01
Parameter |
Datatype |
Default |
Description |
Supported Values |
lr |
float |
0.0005 |
The learning rate. |
>=0.0 |
optim |
str |
adamw |
The optimizer. |
|
policy
|
str
|
linear
|
The learning scheduler:
* linear : LambdaLR decreases the lr by a multiplicative factor.
* step : StepLR decrease the lr by 0.1 at every
num_epochs // 3 steps. |
linear/step
|
momentum |
float |
0.9 |
The momentum for the AdamW optimizer. |
|
weight_decay |
float |
0.1 |
The weight decay coefficient. |
Model#
The following example model config provides options to change the VisualChangeNet-Segmentation architecture for training.
model:
backbone:
type: "fan_small_12_p4_hybrid"
pretrained_backbone_path: null
freeze_backbone: False
decode_head:
feature_strides: [4, 8, 16, 16]
align_corner: False
use_summary_token: True
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
backbone
|
Dict
string
bool
|
None
None
False
|
A dictionary containing the following configurable parameters:
* type: The name of the backbone to be used
* pretrained_backbone_path: The path to pre-trained backbone weights file.
* freeze_backbone: If set to
True , freezes the backbone weights during training. |
fan_tiny_8_p4_hybrid
fan_large_16_p4_hybrid
fan_small_12_p4_hybrid
fan_base_16_p4_hybrid
vit_large_nvdinov2
c_radio_v2_vit_huge_patch16_224
c_radio_v2_vit_large_patch16_224
c_radio_v2_vit_base_patch16_224
|
decode_head
|
Dict
bool
bool
list
|
None
False
True
[4, 8, 16, 16]
|
A dictionary containing the following configurable parameters for the decoder:
* align_corners: If set to
True , the input and output tensors are aligned by the center points of their corner pixels, preserving the values at the corner pixels.* use_summary_token: If set to
True , uses the summary token of the backbone.* feature_strides: The downsampling feature strides for different backbones.
|
True, False
True, False
|
Dataset#
The dataset parameter defines the dataset source, training batch size, augmentation, and pre-processing. An example dataset is provided below.
dataset:
segment:
dataset: "CNDataset"
root_dir: /path/to/root/dataset/dir/
data_name: "LEVIR-CD"
label_transform: "norm"
batch_size: 16
workers: 2
multi_scale_train: True
multi_scale_infer: False
num_classes: 2
img_size: 256
image_folder_name: "A"
change_image_folder_name: "B"
list_folder_name: 'list'
annotation_folder_name: "label"
train_split: "train"
validation_split: "val"
test_split: "test"
predict_split: 'predict'
label_suffix: .png
augmentation:
random_flip:
vflip_probability: 0.5
hflip_probability: 0.5
enable: True
random_rotate:
rotate_probability: 0.5
angle_list: [90, 180, 270]
enable: True
random_color:
brightness: 0.3
contrast: 0.3
saturation: 0.3
hue: 0.3
enable: True
with_scale_random_crop:
enable: True
with_random_crop: True
with_random_blur: True
color_map:
'0': [255, 255, 255]
'1': [0, 0, 0]
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
segment |
Dict |
– |
The segment contains dataset config for the segmentation dataloader detailed in the segment section. |
|
classify |
Dict |
– |
The classify contains dataset config for the classification dataloader. |
segment#
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
dataset |
Dict |
CNDataset |
The dataloader supported for segmentation. |
CNDataset |
root_dir |
str |
– |
The root directory path where the dataset is located. |
|
data_name |
str |
LEVIR-CD |
The dataset identifier. |
LEVIR-CD, LandSCD, custom |
batch_size |
int |
32 |
The number of samples per batch. |
>0 |
workers |
int |
2 |
The number of worker processes for data loading. |
>=0 |
multi_scale_train |
bool |
True |
If set to |
True, False |
multi_scale_infer |
bool |
False |
If set to |
True, False |
num_classes |
int |
2 |
Number of classes in the dataset. |
>=2 |
img_size |
int |
256 |
Size of the input images after resizing. |
|
image_folder_name |
str |
A |
Name of the folder containing input images. |
|
change_image_folder_name |
str |
B |
Name of the folder containing the changed images. |
|
list_folder_name |
str |
list |
Name of the folder containing dataset split lists’ csv files. |
|
annotation_folder_name |
str |
label |
Name of the folder containing annotation masks. |
|
train_split |
str |
train |
Dataset split used for training, should indicate the name of csv file in list_folder_name. |
|
validation_split |
str |
val |
Dataset split used for validation, should indicate the name of csv file in list_folder_name. |
|
test_split |
str |
test |
Dataset split used for evaluation, should indicate the name of csv file in list_folder_name. |
|
predict_split |
str |
predict |
Dataset split used for inference, should indicate the name of csv file in list_folder_name. |
|
label_suffix |
str |
.png |
Suffix of the label image files. |
|
augmentation |
Dict |
None |
Dictionary containing various data augmentation settings, which is detailed in the augmentation section. |
|
color_map |
Optional[Dict[str, List[int]]] |
None |
Mapping of string class labels (‘0’ to ‘n’) to rgb color codes. |
augmentation#
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
random_flip
|
Dict
float
float
bool
|
None
0.5
0.5
True
|
Random vertical and horizontal flipping augmentation settings.
* vflip_probability: Probability of vertical flipping.
* hflip_probability: Probability of horizontal flipping.
* enable: If set to
True , enables random flipping augmentation. |
>=0.0
>=0.0
|
random_rotate
|
Dict
float
list
bool
|
None
0.5
[90, 180, 270]
True
|
Random rotation augmentation settings.
* rotate_probability: Probability of applying random rotation.
* angle_list: List of rotation angles to choose from.
* enable: If set to
True , enables random rotation augmentation. |
>=0.0
>=0.0
|
random_color
|
Dict
float
float
float
float
bool
float
|
None
0.3
0.3
0.3
0.3
True
0.5
|
Random color augmentation settings.
* brightness: Maximum brightness change factor.
* contrast: Maximum contrast change factor.
* saturation: Maximum saturation change factor.
* hue: Maximum hue change factor.
* enabled: If set to
True , enables random color augmentation.* color_probability: Probability of applying color augmentation.
|
>=0.0
>=0.0
>=0.0
>=0.0
>=0.0
|
with_scale_random_crop
|
Dict
bool
|
None
True
|
Random scaling and cropping augmentation settings.
* enabled If set to
True , enables random color augmentation. |
True, False
|
with_random_crop |
bool |
True |
If set to |
True, False |
with_random_blur |
bool |
True |
If set to |
True, False |
mean |
List[float] |
[0.5, 0.5, 0.5] |
The mean to be subtracted for pre-processing. |
|
std |
List[float] |
[0.5, 0.5, 0.5] |
The standard deviation to divide the image by. |
Example spec file for ViT backbones#
Note
The following spec file is only relevant for TAO versions 5.3 and later.
SPECS=$(tao-client visual_changenet get-spec --action train --job_type experiment --id $EXPERIMENT_ID)
encryption_key: tlt_encode
task: segment
train:
pretrained_model_path: /path/to/pretrained/model.pth
resume_training_checkpoint_path: null
segment:
loss: "ce"
weights: [0.5, 0.5, 0.5, 0.8, 1.0]
num_epochs: 350
num_nodes: 1
validation_interval: 1
checkpoint_interval: 1
optim:
lr: 0.00002
optim: "adamw"
policy: "linear"
momentum: 0.9
weight_decay: 0.01
betas: [0.9, 0.999]
results_dir: /path/to/experiment_results
model:
backbone:
type: "vit_large_nvdinov2"
pretrained_backbone_path: /path/to/pretrained/backbone.pth
freeze_backbone: False
decode_head:
feature_strides: [4, 8, 16, 32]
use_summary_token: True
dataset:
segment:
dataset: "CNDataset"
root_dir: /path/to/root/dataset/dir/
data_name: "LEVIR-CD"
label_transform: "norm"
batch_size: 16
workers: 2
multi_scale_train: True
multi_scale_infer: False
num_classes: 2
img_size: 256
image_folder_name: "A"
change_image_folder_name: "B"
list_folder_name: 'list'
annotation_folder_name: "label"
train_split: "train"
validation_split: "val"
label_suffix: .png
augmentation:
random_flip:
vflip_probability: 0.5
hflip_probability: 0.5
enable: True
random_rotate:
rotate_probability: 0.5
angle_list: [90, 180, 270]
enable: True
random_color:
brightness: 0.3
contrast: 0.3
saturation: 0.3
hue: 0.3
enable: True
with_scale_random_crop:
enable: True
with_random_crop: True
with_random_blur: True
evaluate:
checkpoint: "???"
vis_after_n_batches: 10
inference:
checkpoint: "???"
vis_after_n_batches: 1
export:
gpu_id: 0
checkpoint: "???"
onnx_file: "???"
input_width: 256
input_height: 256
Training the Model#
Use the following command to run VisualChangeNet-Segmentation training:
TRAIN_JOB_ID=$(tao-client visual_changenet experiment-run-action --action train --id $EXPERIMENT_ID --specs "$SPECS")
tao model visual_changenet train -e <experiment_spec_file>
task=segment
[results_dir=<global_results_dir>]
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[train.<train_option>=<train_option_value>]
[train.gpu_ids=<gpu indices>]
[train.num_gpus=<number of gpus>]
Required Arguments
The following arguments are required.
-e
,--experiment_spec_file
: The path to the experiment spec file.task
: The task (‘segment’ or ‘classify’) for the visual_changenet training. Default: segment.
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
-h
,--help
: Show this help message and exit.model.<model_option>
: The model options.dataset.<dataset_option>
: The dataset options.train.<train_option>
: The train options.train.optim.<optim_option>
: The optimizer options
Note
For training, evaluation, and inference, we expose 2 variables for each respective task: num_gpus
and gpu_ids
, which
default to 1
and [0]
, respectively. If both are passed, but inconsistent, for example num_gpus = 1
,
gpu_ids = [0, 1]`, then they are modified to follow the setting with more GPUs, for example num_gpus = 1 -> num_gpus = 2
.
In some cases, you may encounter an issue with multi-GPU training resulting in a segmentation fault. You may circumvent this by setting the OMP_NUM_THREADS enviroment variable to 1. Depending upon your model of execution, you may use the following methods to set this variable
CLI Launcher
You may set this env variable by adding the following fields to the Envs field of your ~/.tao_mounts.json
file as mentioned in bullet 3
in this section
{
"Envs": [
{
"variable": "OMP_NUM_THREADSR",
"value": "1"
}
]
}
Docker
You may set environment variables in the docker by setting the -e
flag in the docker command line.
docker run -it --rm --gpus all \
-e OMP_NUM_THREADS=1 \
-v /path/to/local/mount:/path/to/docker/mount nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt <model> train -e
Checkpointing and Resuming Training
At every train.checkpoint_interval
, a PyTorch Lightning checkpoint is saved. It is called model_epoch_<epoch_num>.pth
.
These are saved in train.results_dir
, like so:
$ ls /results/train
'model_epoch_000.pth'
'model_epoch_001.pth'
'model_epoch_002.pth'
'model_epoch_003.pth'
'model_epoch_004.pth'
The latest checkpoint will also be saved as changenet_model_segment_latest.pth
.
Training will automatically resume from changenet_model_segment_latest.pth
if it exists in train.results_dir
.
This will be superseded by train.resume_training_checkpoint_path
if it is provided.
The major implication of this logic is that, if you wish to trigger fresh training from scratch, either
Specify a new, empty results directory (Recommended), or
Remove the latest checkpoint from the results directory
Creating a Testing Experiment Spec File#
Here is an example spec file for testing evaluation and inference of a trained VisualChangeNet-Segmentation model:
SPECS=$(tao-client visual_changenet get-spec --action evaluate --job_type experiment --id $EXPERIMENT_ID)
results_dir: /path/to/experiment_results
task: segment
model:
backbone:
type: "fan_small_12_p4_hybrid"
dataset:
segment:
dataset: "CNDataset"
root_dir: /path/to/root/dataset/dir/
data_name: "LEVIR-CD"
label_transform: "norm"
batch_size: 16
workers: 2
multi_scale_train: True
multi_scale_infer: False
num_classes: 2
img_size: 256
image_folder_name: "A"
change_image_folder_name: "B"
list_folder_name: 'list'
annotation_folder_name: "label"
test_split: "test"
predict_split: 'predict'
label_suffix: .png
evaluate:
checkpoint: /path/to/checkpoint
vis_after_n_batches: 1
results_dir: /results/evaluate
inference:
checkpoint: /path/to/checkpoint
vis_after_n_batches: 1
results_dir: /results/inference
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
checkpoint |
string |
Path to PyTorch model to evaluate/inference. |
||
trt_engine |
string |
Path to TensorRT model to inference/evaluate. Should be only used with TAO Deploy. |
||
num_gpus |
unsigned int |
1 |
The number of GPUs to use. |
>0 |
gpu_ids |
unsigned int |
[0] |
The GPU IDs to use. |
|
results_dir |
string |
The path to a folder where the experiment outputs should be written. |
||
vis_after_n_batches |
unsigned int |
1 |
Number of batches after which to save inference/evaluate visualization results. |
>0 |
Evaluating the Model#
Use the following command to run a VisualChangeNet-Segmentation evaluation:
EVALUATE_JOB_ID=$(tao-client visual_changenet experiment-run-action --action evaluate --id $EXPERIMENT_ID --specs "$SPECS" --parent_job_id $TRAIN_JOB_ID)
tao model visual_changenet evaluate -e <experiment_spec>
task=segment
evaluate.checkpoint=<model to be evaluated>
[evaluate.<evaluate_option>=<evaluate_option_value>]
[evaluate.gpu_ids=<gpu indices>]
[evaluate.num_gpus=<number of gpus>]
Required Arguments
The following arguments are required.
-e
,--experiment_spec_file
: The experiment spec file to set up the evaluation experiment.evaluate.checkpoint
: The.pth
model to be evaluated.
Optional Arguments
The following arguments are optional to run the command.
evaluate.<evaluate_option>
: The evaluate options.
Running Inference on the Model#
Use the following command to run inference on VisualChangeNet-Segmentation with the .pth
model:
INFERENCE_JOB_ID=$(tao-client visual_changenet experiment-run-action --action inference --id $EXPERIMENT_ID --specs "$SPECS" --parent_job_id $TRAIN_JOB_ID)
tao model visual_changenet inference -e <experiment_spec>
task=segment
inference.checkpoint=<inference model>
[inference.<evaluate_option>=<evaluate_option_value>]
[inference.gpu_ids=<gpu indices>]
[inference.num_gpus=<number of gpus>]
Required Arguments
The following arguments are required.
-e
,--experiment_spec_file
: The experiment spec file to set up the evaluation experiment.inference.checkpoint
: The.pth
model to run inference on.
Optional Arguments
The following arguments are optional to run the command.
inference.<inference_option>
: The inference options.
Exporting the Model#
Here is an example to get spec from the FTMS client and an example spec file from TAO Launcher for exporting the trained VisualChangeNet model:
SPECS=$(tao-client visual_changenet get-spec --action export --job_type experiment --id $EXPERIMENT_ID)
export:
checkpoint: /path/to/model.pth
onnx_file: /path/to/model.onnx
opset_version: 12
input_channel: 3
input_width: 256
input_height: 256
batch_size: -1
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
checkpoint |
string |
The path to the PyTorch model to export. |
||
onnx_file |
string |
The path to the |
||
opset_version |
unsigned int |
12 |
The opset version of the exported ONNX. |
>0 |
input_channel |
unsigned int |
3 |
The input channel size. Only the value 3 is supported. |
3 |
input_width |
unsigned int |
128 |
The input width. |
>0 |
input_height |
unsigned int |
512 |
The input height. |
>0 |
batch_size |
unsigned int |
-1 |
The batch size of the ONNX model. If this value is set to -1, the export uses dynamic batch size. |
>=-1 |
gpu_id |
unsigned int |
0 |
The GPU ID to use. |
|
on_cpu |
bool |
False |
If set to |
|
verbose |
bool |
False |
If set to |
Use the following command to export the model:
EXPORT_JOB_ID=$(tao-client visual_changenet experiment-run-action --action export --id $EXPERIMENT_ID --specs "$SPECS" --parent_job_id $TRAIN_JOB_ID)
tao model visual_changenet export [-h] -e <experiment spec file>
task=segment
export.checkpoint=<model to export>
export.onnx_file=<onnx path>
[export.<export_option>=<export_option_value>]
Required Arguments
The following arguments are required to run the command.
-e
,--experiment_spec
: The path to an experiment spec fileexport.checkpoint
: The.pth
model to export.export.onnx_file
: The path where the.etlt
or.onnx
model is saved.
Optional Arguments
The following arguments are optional to run the command.
export.<export_option>
: The export options.
TensorRT Engine Generation, Validation, and int8 Calibration#
For deployment, refer to the TAO Deploy Documentation for VisualChangeNet-Segmentation.