Visual ChangeNet-Segmentation#

Visual ChangeNet-Segmentation is an NVIDIA-developed semantic change segmentation model and is included in the TAO. Visual ChangeNet supports the following tasks:

train
evaluate
inference
export
quantize

Each task is explained in detail in the following sections.

Data Input for VisualChangeNet#

VisualChangeNet-Segmentation requires the data to be provided as image and mask folders. See the Data Annotation Format page for more information about the input data format for VisualChangeNet-Segmentation.

Creating a Training Experiment Specification File#

Configuring a Custom Dataset#

This section provides an example configuration and commands to retrieve configuration for training VisualChangeNet-Segmentation using the dataset format described for the LEVIR-CD dataset, above. LEVIR-CD dataset is a large-scale remote sensing building Change Detection dataset.

Note

Make sure to set task=segment in SPECS for all task specs.

Parameter	Data Type	Default	Description	Supported Values
model	dict config	–	The configuration of the model architecture.
dataset	dict config	–	The configuration of the dataset.
train	dict config	–	The configuration of the training task.
evaluate	dict config	–	The configuration of the evaluation task.
inference	dict config	–	The configuration of the inference task.
encryption_key	string	None	The encryption key to encrypt and decrypt model files.
results_dir	string	/results	The directory where experiment results are saved.
export	dict config	–	The configuration of the ONNX export task.
task	str	segment	A flag to indicate the change detection task. Supports two tasks: ‘segment’ and ‘classify’ for segmentation and classification.	classify, segmen

train#

Parameter	Datatype	Default	Description	Supported Values
num_gpus	unsigned int	1	The number of GPUs to use for distributed training.	>0
gpu_ids	List[int]	[0]	The indices of the GPU’s to use for distributed training.
seed	unsigned int	1234	The random seed for random, numpy, and torch.	>0
num_epochs	unsigned int	10	The total number of epochs to run the experiment.	>0
checkpoint_interval	unsigned int	1	The epoch interval at which the checkpoints are saved.	>0
validation_interval	unsigned int	1	The epoch interval at which the validation is run.	>0
resume_training_checkpoint_path	string		The intermediate PyTorch Lightning checkpoint to resume training from.
results_dir	string	/results/train	The directory to save training results.
segment	Dict str list	None ce	The segment dict contains configurable parameters for the VisualChangeNet Segmentation pipeline with the following parameters: * loss: The loss function used for segmentation training. * weights: Weights for multi-scale training.
num_nodes	unsigned int	1	The number of nodes. If the value is larger than 1, multi-node is enabled.
pretrained_model_path	string	–	The path to the pretrained model checkpoint to initialize the end-end model weights.
optim	dict config	None	Contains the configurable parameters for the VisualChangeNet optimizer detailed in the optim section.

optim#

optim:
  lr: 0.0001
  optim: "adamw"
  policy: "linear"
  momentum: 0.9
  weight_decay: 0.01

Parameter	Datatype	Default	Description	Supported Values
lr	float	0.0005	The learning rate.	>=0.0
optim	str	adamw	The optimizer.
policy	str	linear	The learning scheduler: * linear : LambdaLR decreases the lr by a multiplicative factor. * step : StepLR decrease the lr by 0.1 at every `num_epochs // 3` steps.	linear/step
momentum	float	0.9	The momentum for the AdamW optimizer.
weight_decay	float	0.1	The weight decay coefficient.

Model#

The following example model config provides options to change the VisualChangeNet-Segmentation architecture for training.

model:
  backbone:
    type: "fan_small_12_p4_hybrid"
    pretrained_backbone_path: null
    freeze_backbone: False
  decode_head:
    feature_strides: [4, 8, 16, 16]
    align_corner: False
    use_summary_token: True

Parameter	Datatype	Default	Description	Supported Values
backbone	Dict string bool bool	None None False False	A dictionary containing the following configurable parameters: * type: The name of the backbone to be used * pretrained_backbone_path: The path to pre-trained backbone weights file. * freeze_backbone: If set to `True`, freezes the backbone weights during training. * feat_downsample: If set to `True`, downsamples the last feature map in FAN backbone configurations. This parameter is not propagated to other backbones.	fan_tiny_8_p4_hybrid fan_large_16_p4_hybrid fan_small_12_p4_hybrid fan_base_16_p4_hybrid vit_large_nvdinov2 c_radio_p1_vit_huge_patch16_224_mlpnorm c_radio_p2_vit_huge_patch16_224_mlpnorm c_radio_p3_vit_huge_patch16_224_mlpnorm c_radio_v2_vit_huge_patch16_224 c_radio_v2_vit_large_patch16_224 c_radio_v2_vit_base_patch16_224
decode_head	Dict bool bool list	None False True [4, 8, 16, 16]	A dictionary containing the following configurable parameters for the decoder: * align_corners: If set to `True`, the input and output tensors are aligned by the center points of their corner pixels, preserving the values at the corner pixels. * use_summary_token: If set to `True`, uses the summary token of the backbone. * feature_strides: The downsampling feature strides for different backbones.	True, False True, False

Dataset#

The dataset parameter defines the dataset source, training batch size, augmentation, and pre-processing. An example dataset is provided below.

dataset:
  segment:
    dataset: "CNDataset"
    root_dir: /path/to/root/dataset/dir/
    data_name: "LEVIR-CD"
    label_transform: "norm"
    batch_size: 16
    workers: 2
    multi_scale_train: True
    multi_scale_infer: False
    num_classes: 2
    img_size: 256
    image_folder_name: "A"
    change_image_folder_name: "B"
    list_folder_name: 'list'
    annotation_folder_name: "label"
    train_split: "train"
    validation_split: "val"
    test_split: "test"
    predict_split: 'predict'
    label_suffix: .png
    augmentation:
      random_flip:
        vflip_probability: 0.5
        hflip_probability: 0.5
        enable: True
      random_rotate:
        rotate_probability: 0.5
        angle_list: [90, 180, 270]
        enable: True
      random_color:
        brightness: 0.3
        contrast: 0.3
        saturation: 0.3
        hue: 0.3
        enable: True
      with_scale_random_crop:
        enable: True
      with_random_crop: True
      with_random_blur: True
    color_map:
      '0': [255, 255, 255]
      '1': [0, 0, 0]

Parameter	Datatype	Default	Description	Supported Values
segment	Dict	–	The segment contains dataset config for the segmentation dataloader detailed in the segment section.
classify	Dict	–	The classify contains dataset config for the classification dataloader.

segment#

Parameter	Datatype	Default	Description	Supported Values
dataset	Dict	CNDataset	The dataloader supported for segmentation.	CNDataset
root_dir	str	–	The root directory path where the dataset is located.
data_name	str	LEVIR-CD	The dataset identifier.	LEVIR-CD, LandSCD, custom
batch_size	int	32	The number of samples per batch.	>0
workers	int	2	The number of worker processes for data loading.	>=0
multi_scale_train	bool	True	If set to `True`, enables multi-scale training.	True, False
multi_scale_infer	bool	False	If set to `True`, enables multi-scale inference.	True, False
num_classes	int	2	Number of classes in the dataset.	>=2
img_size	int	256	Size of the input images after resizing.
image_folder_name	str	A	Name of the folder containing input images.
change_image_folder_name	str	B	Name of the folder containing the changed images.
list_folder_name	str	list	Name of the folder containing dataset split lists’ csv files.
annotation_folder_name	str	label	Name of the folder containing annotation masks.
train_split	str	train	Dataset split used for training, should indicate the name of csv file in list_folder_name.
validation_split	str	val	Dataset split used for validation, should indicate the name of csv file in list_folder_name.
test_split	str	test	Dataset split used for evaluation, should indicate the name of csv file in list_folder_name.
predict_split	str	predict	Dataset split used for inference, should indicate the name of csv file in list_folder_name.
label_suffix	str	.png	Suffix of the label image files.
augmentation	Dict	None	Dictionary containing various data augmentation settings, which is detailed in the augmentation section.
color_map	Optional[Dict[str, List[int]]]	None	Mapping of string class labels (‘0’ to ‘n’) to rgb color codes.

augmentation#

Parameter	Datatype	Default	Description	Supported Values
random_flip	Dict float float bool	None 0.5 0.5 True	Random vertical and horizontal flipping augmentation settings. * vflip_probability: Probability of vertical flipping. * hflip_probability: Probability of horizontal flipping. * enable: If set to `True`, enables random flipping augmentation.	>=0.0 >=0.0
random_rotate	Dict float list bool	None 0.5 [90, 180, 270] True	Random rotation augmentation settings. * rotate_probability: Probability of applying random rotation. * angle_list: List of rotation angles to choose from. * enable: If set to `True`, enables random rotation augmentation.	>=0.0 >=0.0
random_color	Dict float float float float bool float	None 0.3 0.3 0.3 0.3 True 0.5	Random color augmentation settings. * brightness: Maximum brightness change factor. * contrast: Maximum contrast change factor. * saturation: Maximum saturation change factor. * hue: Maximum hue change factor. * enabled: If set to `True`, enables random color augmentation. * color_probability: Probability of applying color augmentation.	>=0.0 >=0.0 >=0.0 >=0.0 >=0.0
with_scale_random_crop	Dict bool	None True	Random scaling and cropping augmentation settings. * enabled If set to `True`, enables random color augmentation.	True, False
with_random_crop	bool	True	If set to `True`, applies random crop augmentation.	True, False
with_random_blur	bool	True	If set to `True`, applies random blurring augmentation.	True, False
mean	List[float]	[0.5, 0.5, 0.5]	The mean to be subtracted for pre-processing.
std	List[float]	[0.5, 0.5, 0.5]	The standard deviation to divide the image by.

Example specification file for ViT backbones#

Note

The following specification file is only relevant for TAO versions 5.3 and later.

Creating a Testing Experiment Specification File#

Here is an example specification file for testing evaluation and inference of a trained VisualChangeNet-Segmentation model:

Parameter	Datatype	Default	Description	Supported Values
checkpoint	string		Path to PyTorch model to evaluate/inference.
trt_engine	string		Path to TensorRT model to inference/evaluate.
num_gpus	unsigned int	1	The number of GPUs to use.	>0
gpu_ids	unsigned int	[0]	The GPU IDs to use.
results_dir	string		The path to a folder where the experiment outputs should be written.
vis_after_n_batches	unsigned int	1	Number of batches after which to save inference/evaluate visualization results.	>0

Exporting the Model#

An example specification file for exporting the trained VisualChangeNet model:

Parameter	Datatype	Default	Description	Supported Values
checkpoint	string		The path to the PyTorch model to export.
onnx_file	string		The path to the `.onnx` file.
opset_version	unsigned int	12	The opset version of the exported ONNX.	>0
input_channel	unsigned int	3	The input channel size. Only the value 3 is supported.	3
input_width	unsigned int	128	The input width.	>0
input_height	unsigned int	512	The input height.	>0
batch_size	unsigned int	-1	The batch size of the ONNX model. If this value is set to -1, the export uses dynamic batch size.	>=-1
gpu_id	unsigned int	0	The GPU ID to use.
on_cpu	bool	False	If set to `True`, exports the model on CPU.
verbose	bool	False	If set to `True`, prints a human-readable representation of the network.

Quantization#

Visual ChangeNet-Segmentation supports PTQ via TAO Quant using either the torchao (weight-only) or modelopt (static PTQ) backends.

Add a quantize section to your experiment specification (see TAO Quant documentation for schema and backend options).
Use the quantized checkpoint by setting evaluate.is_quantized: true or inference.is_quantized: true and pointing to the artifact saved under results_dir (for example, quantized_model_torchao.pth or quantized_model_modelopt.pth). For ModelOpt artifacts, the model weights are stored under model_state_dict.

Notes#

For modelopt static PTQ, ensure that your dataset configuration provides a representative calibration loader.
For torchao, activation settings in the configuration are ignored.

Calibration Dataset (ModelOpt)#

When you use the modelopt backend (static PTQ), provide a calibration dataset via dataset.segment.quant_calibration_dataset.

Minimal example:

quantize:
  backend: "modelopt"
  mode: "static_ptq"
  algorithm: "minmax"
dataset:
  segment:
    quant_calibration_dataset:
      images_dir: "/path/to/calib/images"

See also: TAO Quant overview and its Configuration and backend pages.