Visual ChangeNet-Segmentation#
Visual ChangeNet-Segmentation is an NVIDIA-developed semantic change segmentation model and is included in the TAO. Visual ChangeNet supports the following tasks:
trainevaluateinferenceexportquantize
Each task is explained in detail in the following sections.
Data Input for VisualChangeNet#
VisualChangeNet-Segmentation requires the data to be provided as image and mask folders. See the Data Annotation Format page for more information about the input data format for VisualChangeNet-Segmentation.
Creating a Training Experiment Specification File#
Configuring a Custom Dataset#
This section provides an example configuration and commands to retrieve configuration for training VisualChangeNet-Segmentation using the dataset format described for the LEVIR-CD dataset, above. LEVIR-CD dataset is a large-scale remote sensing building Change Detection dataset.
Note
Make sure to set task=segment in SPECS for all task specs.
Parameter |
Data Type |
Default |
Description |
Supported Values |
|---|---|---|---|---|
model |
dict config |
– |
The configuration of the model architecture. |
|
dataset |
dict config |
– |
The configuration of the dataset. |
|
train |
dict config |
– |
The configuration of the training task. |
|
evaluate |
dict config |
– |
The configuration of the evaluation task. |
|
inference |
dict config |
– |
The configuration of the inference task. |
|
encryption_key |
string |
None |
The encryption key to encrypt and decrypt model files. |
|
results_dir |
string |
/results |
The directory where experiment results are saved. |
|
export |
dict config |
– |
The configuration of the ONNX export task. |
|
task |
str |
segment |
A flag to indicate the change detection task. Supports two tasks: ‘segment’ and ‘classify’ for segmentation and classification. |
classify, segmen |
train#
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
num_gpus |
unsigned int |
1 |
The number of GPUs to use for distributed training. |
>0 |
gpu_ids |
List[int] |
[0] |
The indices of the GPU’s to use for distributed training. |
|
seed |
unsigned int |
1234 |
The random seed for random, numpy, and torch. |
>0 |
num_epochs |
unsigned int |
10 |
The total number of epochs to run the experiment. |
>0 |
checkpoint_interval |
unsigned int |
1 |
The epoch interval at which the checkpoints are saved. |
>0 |
validation_interval |
unsigned int |
1 |
The epoch interval at which the validation is run. |
>0 |
resume_training_checkpoint_path |
string |
The intermediate PyTorch Lightning checkpoint to resume training from. |
||
results_dir |
string |
/results/train |
The directory to save training results. |
|
segment
|
Dict
str
list
|
None
ce
|
The segment dict contains configurable parameters for the VisualChangeNet Segmentation pipeline with the following parameters:
* loss: The loss function used for segmentation training.
* weights: Weights for multi-scale training.
|
|
num_nodes |
unsigned int |
1 |
The number of nodes. If the value is larger than 1, multi-node is enabled. |
|
pretrained_model_path |
string |
– |
The path to the pretrained model checkpoint to initialize the end-end model weights. |
|
optim |
dict config |
None |
Contains the configurable parameters for the VisualChangeNet optimizer detailed in the optim section. |
optim#
optim:
lr: 0.0001
optim: "adamw"
policy: "linear"
momentum: 0.9
weight_decay: 0.01
Parameter |
Datatype |
Default |
Description |
Supported Values |
lr |
float |
0.0005 |
The learning rate. |
>=0.0 |
optim |
str |
adamw |
The optimizer. |
|
policy
|
str
|
linear
|
The learning scheduler:
* linear : LambdaLR decreases the lr by a multiplicative factor.
* step : StepLR decrease the lr by 0.1 at every
num_epochs // 3 steps. |
linear/step
|
momentum |
float |
0.9 |
The momentum for the AdamW optimizer. |
|
weight_decay |
float |
0.1 |
The weight decay coefficient. |
Model#
The following example model config provides options to change the VisualChangeNet-Segmentation architecture for training.
model:
backbone:
type: "fan_small_12_p4_hybrid"
pretrained_backbone_path: null
freeze_backbone: False
decode_head:
feature_strides: [4, 8, 16, 16]
align_corner: False
use_summary_token: True
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
backbone
|
Dict
string
bool
bool
|
None
None
False
False
|
A dictionary containing the following configurable parameters:
* type: The name of the backbone to be used
* pretrained_backbone_path: The path to pre-trained backbone weights file.
* freeze_backbone: If set to
True, freezes the backbone weights during training.* feat_downsample: If set to
True, downsamples the last feature map in FAN backbone configurations. This parameter is not propagated to other backbones. |
fan_tiny_8_p4_hybrid
fan_large_16_p4_hybrid
fan_small_12_p4_hybrid
fan_base_16_p4_hybrid
vit_large_nvdinov2
c_radio_p1_vit_huge_patch16_224_mlpnorm
c_radio_p2_vit_huge_patch16_224_mlpnorm
c_radio_p3_vit_huge_patch16_224_mlpnorm
c_radio_v2_vit_huge_patch16_224
c_radio_v2_vit_large_patch16_224
c_radio_v2_vit_base_patch16_224
|
decode_head
|
Dict
bool
bool
list
|
None
False
True
[4, 8, 16, 16]
|
A dictionary containing the following configurable parameters for the decoder:
* align_corners: If set to
True, the input and output tensors are aligned by the center points of their corner pixels, preserving the values at the corner pixels.* use_summary_token: If set to
True, uses the summary token of the backbone.* feature_strides: The downsampling feature strides for different backbones.
|
True, False
True, False
|
Dataset#
The dataset parameter defines the dataset source, training batch size, augmentation, and pre-processing. An example dataset is provided below.
dataset:
segment:
dataset: "CNDataset"
root_dir: /path/to/root/dataset/dir/
data_name: "LEVIR-CD"
label_transform: "norm"
batch_size: 16
workers: 2
multi_scale_train: True
multi_scale_infer: False
num_classes: 2
img_size: 256
image_folder_name: "A"
change_image_folder_name: "B"
list_folder_name: 'list'
annotation_folder_name: "label"
train_split: "train"
validation_split: "val"
test_split: "test"
predict_split: 'predict'
label_suffix: .png
augmentation:
random_flip:
vflip_probability: 0.5
hflip_probability: 0.5
enable: True
random_rotate:
rotate_probability: 0.5
angle_list: [90, 180, 270]
enable: True
random_color:
brightness: 0.3
contrast: 0.3
saturation: 0.3
hue: 0.3
enable: True
with_scale_random_crop:
enable: True
with_random_crop: True
with_random_blur: True
color_map:
'0': [255, 255, 255]
'1': [0, 0, 0]
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
segment |
Dict |
– |
The segment contains dataset config for the segmentation dataloader detailed in the segment section. |
|
classify |
Dict |
– |
The classify contains dataset config for the classification dataloader. |
segment#
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
dataset |
Dict |
CNDataset |
The dataloader supported for segmentation. |
CNDataset |
root_dir |
str |
– |
The root directory path where the dataset is located. |
|
data_name |
str |
LEVIR-CD |
The dataset identifier. |
LEVIR-CD, LandSCD, custom |
batch_size |
int |
32 |
The number of samples per batch. |
>0 |
workers |
int |
2 |
The number of worker processes for data loading. |
>=0 |
multi_scale_train |
bool |
True |
If set to |
True, False |
multi_scale_infer |
bool |
False |
If set to |
True, False |
num_classes |
int |
2 |
Number of classes in the dataset. |
>=2 |
img_size |
int |
256 |
Size of the input images after resizing. |
|
image_folder_name |
str |
A |
Name of the folder containing input images. |
|
change_image_folder_name |
str |
B |
Name of the folder containing the changed images. |
|
list_folder_name |
str |
list |
Name of the folder containing dataset split lists’ csv files. |
|
annotation_folder_name |
str |
label |
Name of the folder containing annotation masks. |
|
train_split |
str |
train |
Dataset split used for training, should indicate the name of csv file in list_folder_name. |
|
validation_split |
str |
val |
Dataset split used for validation, should indicate the name of csv file in list_folder_name. |
|
test_split |
str |
test |
Dataset split used for evaluation, should indicate the name of csv file in list_folder_name. |
|
predict_split |
str |
predict |
Dataset split used for inference, should indicate the name of csv file in list_folder_name. |
|
label_suffix |
str |
.png |
Suffix of the label image files. |
|
augmentation |
Dict |
None |
Dictionary containing various data augmentation settings, which is detailed in the augmentation section. |
|
color_map |
Optional[Dict[str, List[int]]] |
None |
Mapping of string class labels (‘0’ to ‘n’) to rgb color codes. |
augmentation#
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
random_flip
|
Dict
float
float
bool
|
None
0.5
0.5
True
|
Random vertical and horizontal flipping augmentation settings.
* vflip_probability: Probability of vertical flipping.
* hflip_probability: Probability of horizontal flipping.
* enable: If set to
True, enables random flipping augmentation. |
>=0.0
>=0.0
|
random_rotate
|
Dict
float
list
bool
|
None
0.5
[90, 180, 270]
True
|
Random rotation augmentation settings.
* rotate_probability: Probability of applying random rotation.
* angle_list: List of rotation angles to choose from.
* enable: If set to
True, enables random rotation augmentation. |
>=0.0
>=0.0
|
random_color
|
Dict
float
float
float
float
bool
float
|
None
0.3
0.3
0.3
0.3
True
0.5
|
Random color augmentation settings.
* brightness: Maximum brightness change factor.
* contrast: Maximum contrast change factor.
* saturation: Maximum saturation change factor.
* hue: Maximum hue change factor.
* enabled: If set to
True, enables random color augmentation.* color_probability: Probability of applying color augmentation.
|
>=0.0
>=0.0
>=0.0
>=0.0
>=0.0
|
with_scale_random_crop
|
Dict
bool
|
None
True
|
Random scaling and cropping augmentation settings.
* enabled If set to
True, enables random color augmentation. |
True, False
|
with_random_crop |
bool |
True |
If set to |
True, False |
with_random_blur |
bool |
True |
If set to |
True, False |
mean |
List[float] |
[0.5, 0.5, 0.5] |
The mean to be subtracted for pre-processing. |
|
std |
List[float] |
[0.5, 0.5, 0.5] |
The standard deviation to divide the image by. |
Example specification file for ViT backbones#
Note
The following specification file is only relevant for TAO versions 5.3 and later.
Creating a Testing Experiment Specification File#
Here is an example specification file for testing evaluation and inference of a trained VisualChangeNet-Segmentation model:
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
checkpoint |
string |
Path to PyTorch model to evaluate/inference. |
||
trt_engine |
string |
Path to TensorRT model to inference/evaluate. |
||
num_gpus |
unsigned int |
1 |
The number of GPUs to use. |
>0 |
gpu_ids |
unsigned int |
[0] |
The GPU IDs to use. |
|
results_dir |
string |
The path to a folder where the experiment outputs should be written. |
||
vis_after_n_batches |
unsigned int |
1 |
Number of batches after which to save inference/evaluate visualization results. |
>0 |
Exporting the Model#
An example specification file for exporting the trained VisualChangeNet model:
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
checkpoint |
string |
The path to the PyTorch model to export. |
||
onnx_file |
string |
The path to the |
||
opset_version |
unsigned int |
12 |
The opset version of the exported ONNX. |
>0 |
input_channel |
unsigned int |
3 |
The input channel size. Only the value 3 is supported. |
3 |
input_width |
unsigned int |
128 |
The input width. |
>0 |
input_height |
unsigned int |
512 |
The input height. |
>0 |
batch_size |
unsigned int |
-1 |
The batch size of the ONNX model. If this value is set to -1, the export uses dynamic batch size. |
>=-1 |
gpu_id |
unsigned int |
0 |
The GPU ID to use. |
|
on_cpu |
bool |
False |
If set to |
|
verbose |
bool |
False |
If set to |
Quantization#
Visual ChangeNet-Segmentation supports PTQ via TAO Quant using either the torchao (weight-only) or modelopt (static PTQ) backends.
Add a
quantizesection to your experiment specification (see TAO Quant documentation for schema and backend options).Use the quantized checkpoint by setting
evaluate.is_quantized: trueorinference.is_quantized: trueand pointing to the artifact saved underresults_dir(for example,quantized_model_torchao.pthorquantized_model_modelopt.pth). For ModelOpt artifacts, the model weights are stored undermodel_state_dict.
Notes#
For
modeloptstatic PTQ, ensure that your dataset configuration provides a representative calibration loader.For
torchao, activation settings in the configuration are ignored.
Calibration Dataset (ModelOpt)#
When you use the modelopt backend (static PTQ), provide a calibration dataset via dataset.segment.quant_calibration_dataset.
Minimal example:
quantize:
backend: "modelopt"
mode: "static_ptq"
algorithm: "minmax"
dataset:
segment:
quant_calibration_dataset:
images_dir: "/path/to/calib/images"
See also: TAO Quant overview and its Configuration and backend pages.