Visual ChangeNet-Segmentation

Visual ChangeNet-Segmentation is an NVIDIA-developed semantic change segmentation model and is included in the TAO Toolkit. Visual ChangeNet supports the following tasks:

train
evaluate
inference
export

These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:

Copy
Copied!

            
            tao model visual_changenet <sub_task> <args_per_subtask>

Where args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in the following sections.

Data Input for VisualChangeNet

VisualChangeNet-Segmentation requires the data to be provided as image and mask folders. See the Data Annotation Format page for more information about the input data format for VisualChangeNet-Segmentation.

Creating a Training Experiment Spec File

Configuring a Custom Dataset

This section provides an example configuration and commands for training VisualChangeNet-Segmentation using the dataset format described for the LEVIR-CD dataset, above. LEVIR-CD dataset is a large-scale remote sensing building Change Detection dataset.

Here is an example spec file for training a VisualChangeNet-Segmentation model with NVIDIA’s FAN Hybrid backbone on the LEVIR-CD dataset using the Data Annotation Format.

Copy
Copied!

            
            encryption_key: tlt_encode
task: segment
train:
  pretrained_model_path: /path/to/pretrained/model.pth
  resume_training_checkpoint_path: null
  segment:
    loss: "ce"
    weights: [0.5, 0.5, 0.5, 0.8, 1.0]
  num_epochs: 350
  num_nodes: 1
  val_interval: 1
  checkpoint_interval: 1
  optim:
    lr: 0.0001
    optim: "adamw"
    policy: "linear"
    momentum: 0.9
    weight_decay: 0.01
    betas: [0.9, 0.999]
results_dir: /path/to/experiment_results
model:
  backbone:
    type: "fan_small_12_p4_hybrid"
    pretrained_backbone_path: null
    freeze_backbone: False
  decode_head:
    feature_strides: [4, 8, 16, 16]
dataset:
  segment:
    dataset: "CNDataset"
    root_dir: /path/to/root/dataset/dir/
    data_name: "LEVIR-CD"
    label_transform: "norm"
    batch_size: 16
    workers: 2
    multi_scale_train: True
    multi_scale_infer: False
    num_classes: 2
    img_size: 256
    image_folder_name: "A"
    change_image_folder_name: "B"
    list_folder_name: 'list'
    annotation_folder_name: "label"
    train_split: "train"
    validation_split: "val"
    label_suffix: .png
    augmentation:
      random_flip:
        vflip_probability: 0.5
        hflip_probability: 0.5
        enable: True
      random_rotate:
        rotate_probability: 0.5
        angle_list: [90, 180, 270]
        enable: True
      random_color:
        brightness: 0.3
        contrast: 0.3
        saturation: 0.3
        hue: 0.3
        enable: True
      with_scale_random_crop:
        enable: True
      with_random_crop: True
      with_random_blur: True
evaluate:
  checkpoint: "???"
  vis_after_n_batches: 10
inference:
  checkpoint: "???"
  vis_after_n_batches: 1
export:
  gpu_id: 0
  checkpoint: "???"
  onnx_file: "???"
  input_width: 256
  input_height: 256

Parameter	Data Type	Default	Description
`model`	dict config	–	The configuration of the model architecture
`dataset`	dict config	–	The configuration for the dataset detailed in the Config section
`train`	dict config	–	The configuration for training parameters, which is detailed in the Train section
`results_dir`	string	–	The path to save the model experiment log outputs and model checkpoints
`task`	str	segment	A flag to indicate the change detection task. Currently supports two tasks: ‘segment’ and ‘classify’ for segmentation and classification

train

`results_dir`	string	–	The path to save the model training experiment log outputs and model checkpoints
`checkpoint_interval`	int	5	The interval at which the checkpoint needs to be saved
`resume_training_checkpoint_path`	str	None	The path to the checkpoint for resuming training
`segment`	Dict str list	None ce	The `segment` dict contains configurable parameters for the VisualChangeNet Segmentation pipeline with the following parameters: * `loss`: The loss function used for segmentation training. * `weights`: Weights for multi-scale training.
`num_nodes`	unsigned int	1	The number of nodes. If the value is larger than 1, multi-node is enabled.
`val_interval`	unsigned int	1	The epoch interval at which the validation is run
`checkpoint_interval`	int	1	The number of steps at which the checkpoint needs to be saved
`num_epochs`	int	300	The total number of epochs to run the experiment
`pretrained_model_path`	string	–	The path to the pretrained model checkpoint to initialize the end-end model weights.
`optim`	dict config	None	Contains the configurable parameters for the VisualChangeNet optimizer detailed in the optim section.

optim

Copy
Copied!

            
            optim:
  lr: 0.0001
  optim: "adamw"
  policy: "linear"
  momentum: 0.9
  weight_decay: 0.01

Parameter	Datatype	Default	Description	Supported Values
`lr`	float	0.0005	The learning rate	>=0.0
`optim`	str	adamw
`policy`	str	linear	The learning scheduler: * `linear` : LambdaLR decreases the `lr` by a multiplicative factor. * `step` : StepLR decrease the `lr` by 0.1 at every `num_epochs`//3	linear/step
`momentum`	float	0.9	The momentum for the AdamW optimizer
`weight_decay`	float	0.1	The weight decay coefficient

Model

The following example model config provides options to change the VisualChangeNet-Segmentation architecture for training.

Copy
Copied!

            
            model:
  backbone:
    type: "fan_small_12_p4_hybrid"
    pretrained_backbone_path: null
    freeze_backbone: False
  decode_head:
    feature_strides: [4, 8, 16, 16]
    align_corner: False

Parameter Datatype Default Description Supported Values

backbone

Dict
string

bool

None

None
False

A dictionary containing the following configurable parameters:
* type: The name of the backbone to be used

* pretrained_backbone_path: The path to pre-trained backbone weights file
* freeze_backbone: Whether to freeze the backbone weights during training

fan_tiny_8_p4_hybrid
fan_large_16_p4_hybrid
fan_small_12_p4_hybrid
fan_base_16_p4_hybrid

decode_head

Dict
bool
list

None
False
[4, 8, 16, 16]

A dictionary containing the following configurable parameters:
* align_corners
* feature_strides

True, False
True, False

Dataset

The dataset parameter defines the dataset source, training batch size, augmentation, and pre-processing. An example dataset is provided below.

Copy
Copied!

            
            dataset:
  segment:
    dataset: "CNDataset"
    root_dir: /path/to/root/dataset/dir/
    data_name: "LEVIR-CD"
    label_transform: "norm"
    batch_size: 16
    workers: 2
    multi_scale_train: True
    multi_scale_infer: False
    num_classes: 2
    img_size: 256
    image_folder_name: "A"
    change_image_folder_name: "B"
    list_folder_name: 'list'
    annotation_folder_name: "label"
    train_split: "train"
    validation_split: "val"
    test_split: "test"
    predict_split: 'predict'
    label_suffix: .png
    augmentation:
      random_flip:
        vflip_probability: 0.5
        hflip_probability: 0.5
        enable: True
      random_rotate:
        rotate_probability: 0.5
        angle_list: [90, 180, 270]
        enable: True
      random_color:
        brightness: 0.3
        contrast: 0.3
        saturation: 0.3
        hue: 0.3
        enable: True
      with_scale_random_crop:
        enable: True
      with_random_crop: True
      with_random_blur: True

Parameter	Datatype	Default	Description	Supported Values
`segment`	Dict	–	The `segment` contains dataset config for the segmentation dataloader detailed in the segment section.
`classify`	Dict	–	The `classify` contains dataset config for the classification dataloader

segment

Parameter	Datatype	Default	Description	Supported Values
`dataset`	Dict	CNDataset	The dataloader supported for segmentation	CNDataset
`root_dir`	str	–	The root directory path where the dataset is located.
`data_name`	str	LEVIR-CD	The dataset identifier	LEVIR-CD, LandSCD, custom
`batch_size`	int	32	The number of samples per batch	>0
`workers`	int	2	The number of worker processes for data loading	>=0
`multi_scale_train`	bool	True	Whether multi-scale training is enabled	True, False
`multi_scale_infer`	bool	False	Whether multi-scale inference is enabled	True, False
`num_classes`	int	2	Number of classes in the dataset.	>=2
`img_size`	int	256	Size of the input images after resizing.
`image_folder_name`	str	A	Name of the folder containing input images.
`change_image_folder_name`	str	B	Name of the folder containing the changed images
`list_folder_name`	str	list	Name of the folder containing dataset split lists’ csv files.
`annotation_folder_name`	str	label	Name of the folder containing annotation masks
`train_split`	str	train	Dataset split used for training, should indicate the name of csv file in list_folder_name.
`validation_split`	str	val	Dataset split used for validation, should indicate the name of csv file in list_folder_name.
`test_split`	str	test	Dataset split used for evaluation, should indicate the name of csv file in list_folder_name.
`predict_split`	str	predict	Dataset split used for inference, should indicate the name of csv file in list_folder_name.
`label_suffix`	str	.png	Suffix of the label image files.
`augmentation`	Dict	None	Dictionary containing various data augmentation settings, which is detailed in the augmentation section.

augmentation

Parameter	Datatype	Default	Description	Supported Values
`random_flip`	Dict float float enable	None 0.5 0.5 True	Random vertical and horizontal flipping augmentation settings. * `vflip_probability`: Probability of vertical flipping. * `hflip_probability`: Probability of horizontal flipping. * `enable`: Enable or disable random flipping augmentation.	>=0.0 >=0.0
`random_rotate`	Dict float list enable	None 0.5 [90, 180, 270] True	Randomly rotate images with specified probability and angles * `rotate_probability`: Probability of applying random rotation. * `angle_list`: List of rotation angles to choose from. * `enable`: Enable or disable random rotation augmentation.	>=0.0 >=0.0
`random_color`	Dict float float float float enable	None 0.3 0.3 0.3 0.3 True	Apply random color augmentation to images. * `brightness`: Maximum brightness change factor. * `contrast`: Maximum contrast change factor. * `saturation`: Maximum saturation change factor. * `hue`: Maximum hue change factor. * `enabled`: Enable or disable random color augmentation.	>=0.0 >=0.0 >=0.0 >=0.0
`with_scale_random_crop`	Dict enable	None True	Apply random scaling and cropping augmentation. * `enabled`: Enable or disable random color augmentation.	True, False
`with_random_crop`	bool	True	Apply random crop augmentation.	True, False
`with_random_blur`	bool	True	Apply random blurring augmentation.	True, False

Example spec file for ViT backbones

Note

The following spec file is only relevant for TAO Toolkit versions 5.3 and later.

Copy
Copied!

            
            encryption_key: tlt_encode
task: segment
train:
  pretrained_model_path: /path/to/pretrained/model.pth
  resume_training_checkpoint_path: null
  segment:
    loss: "ce"
    weights: [0.5, 0.5, 0.5, 0.8, 1.0]
  num_epochs: 350
  num_nodes: 1
  val_interval: 1
  checkpoint_interval: 1
  optim:
    lr: 0.00002
    optim: "adamw"
    policy: "linear"
    momentum: 0.9
    weight_decay: 0.01
    betas: [0.9, 0.999]
results_dir: /path/to/experiment_results
model:
  backbone:
    type: "vit_large_nvdinov2"
    pretrained_backbone_path: /path/to/pretrained/backbone.pth
    freeze_backbone: False
  decode_head:
    feature_strides: [4, 8, 16, 32]
dataset:
  segment:
    dataset: "CNDataset"
    root_dir: /path/to/root/dataset/dir/
    data_name: "LEVIR-CD"
    label_transform: "norm"
    batch_size: 16
    workers: 2
    multi_scale_train: True
    multi_scale_infer: False
    num_classes: 2
    img_size: 256
    image_folder_name: "A"
    change_image_folder_name: "B"
    list_folder_name: 'list'
    annotation_folder_name: "label"
    train_split: "train"
    validation_split: "val"
    label_suffix: .png
    augmentation:
      random_flip:
        vflip_probability: 0.5
        hflip_probability: 0.5
        enable: True
      random_rotate:
        rotate_probability: 0.5
        angle_list: [90, 180, 270]
        enable: True
      random_color:
        brightness: 0.3
        contrast: 0.3
        saturation: 0.3
        hue: 0.3
        enable: True
      with_scale_random_crop:
        enable: True
      with_random_crop: True
      with_random_blur: True
evaluate:
  checkpoint: "???"
  vis_after_n_batches: 10
inference:
  checkpoint: "???"
  vis_after_n_batches: 1
export:
  gpu_id: 0
  checkpoint: "???"
  onnx_file: "???"
  input_width: 256
  input_height: 256

Training the Model

Use the following command to run VisualChangeNet-Segmentation training:

Copy
Copied!

            
            tao model visual_changenet train -e <experiment_spec_file>
                    -r <results_dir>
                    --gpus <num_gpus>
                    task=segment

Required Arguments

-e, --experiment_spec_file: The path to the experiment spec file.
-r, --results_dir: The path to a folder where the experiment outputs should be written.
task: The task (‘segment’ or ‘classify’) for the visual_changenet training. Default: segment.

Optional Arguments

--gpus: The number of GPUs to use for training. The default value is 1.

Here’s an example of using the VisualChangeNet training command:

Copy
Copied!

            
            tao model visual_changenet train -e $DEFAULT_SPEC -r $RESULTS_DIR --gpus $NUM_GPUs

Creating Testing Experiment Spec File

Here is an example spec file for testing evaluation and inference of a trained VisualChangeNet-Segmentation model:

Copy
Copied!

            
            results_dir: /path/to/experiment_results
task: segment
model:
  backbone:
    type: "fan_small_12_p4_hybrid"
dataset:
  segment:
    dataset: "CNDataset"
    root_dir: /path/to/root/dataset/dir/
    data_name: "LEVIR-CD"
    label_transform: "norm"
    batch_size: 16
    workers: 2
    multi_scale_train: True
    multi_scale_infer: False
    num_classes: 2
    img_size: 256
    image_folder_name: "A"
    change_image_folder_name: "B"
    list_folder_name: 'list'
    annotation_folder_name: "label"
    test_split: "test"
    predict_split: 'predict'
    label_suffix: .png
evaluate:
  checkpoint: /path/to/checkpoint
  vis_after_n_batches: 1
inference:
  checkpoint: /path/to/checkpoint
  vis_after_n_batches: 1

Parameter	Datatype	Default	Description	Supported Values
`checkpoint`	string		Path to PyTorch model to evaluate/infer
`vis_after_n_batches`	int		Number of batches interval between each visualisation output save.
`trt_engine`	string		Path to TensorRT model to inference. Should be only used with TAO Deploy
`num_gpus`	unsigned int	1	The number of GPUs to use	>0

Evaluating the Model

Use the following command to run a VisualChangeNet-Segmentation evaluation:

Copy
Copied!

            
            tao model visual_changenet evaluate -e <experiment_spec>
                       -r <results_dir>
                       task=segment

Required Arguments

-e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment.
-r, --results_dir: The path to a folder where the experiment outputs should be written.

Here’s an example of using the VisualChangeNet evaluation command:

Copy
Copied!

            
            tao model visual_changenet evaluate -e $DEFAULT_SPEC -r $RESULTS_DIR

Running Inference on the Model

Use the following command to run inference on VisualChangeNet-Segmentation with the .tlt model:

Copy
Copied!

            
            tao model visual_changenet inference -e <experiment_spec>
                        -r <results_dir>
                        task=segment

Required Arguments

-e, --experiment_spec_file: The spec file to use to set up the evaluation experiment.
-r, --results_dir: The path to a folder where the experiment outputs should be written.

Here’s an example of using the VisualChangeNet inference command:

Copy
Copied!

            
            tao model visual_changenet inference -e $DEFAULT_SPEC -r $RESULTS_DIR

Exporting the Model

Here is an example spec file for exporting the trained VisualChangeNet model:

Copy
Copied!

            
            export:
  checkpoint: /path/to/model.pth
  onnx_file: /path/to/model.onnx
  opset_version: 12
  input_channel: 3
  input_width: 256
  input_height: 256
  batch_size: -1

Parameter	Datatype	Default	Description	Supported Values
`checkpoint`	string		The path to the PyTorch model to export
`onnx_file`	string		The path to the `.onnx` file
`opset_version`	unsigned int	12	The opset version of the exported ONNX	>0
`input_channel`	unsigned int	3	The input channel size. Only the value 3 is supported.	3
`input_width`	unsigned int	256	The input width	>0
`input_height`	unsigned int	256	The input height	>0
`batch_size`	unsigned int	-1	The batch size of the ONNX model. If this value is set to -1, the export uses dynamic batch size.	>=-1

Use the following command to export the model:

Copy
Copied!

            
            tao model visual_changenet export [-h] -e <experiment spec file>
                          -r <results_dir>
                          task=segment

Required Arguments

-e, --experiment_spec_file: The spec file to use to set up the evaluation experiment.
-r, --results_dir: The path to a folder where the experiment outputs should be written.

Sample Usage

The following is an example export command:

Copy
Copied!

            
            tao model visual_changenet export -e /path/to/spec.yaml -r $RESULTS_DIR

TensorRT Engine Generation, Validation, and int8 Calibration

For deployment, refer to the TAO Deploy Documentation for VisualChangeNet-Segmentation.