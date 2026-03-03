Visual ChangeNet-Segmentation#

Visual ChangeNet-Segmentation is an NVIDIA-developed semantic change segmentation model and is included in the TAO. Visual ChangeNet supports the following tasks:

  • train

  • evaluate

  • inference

  • export

Each task is explained in detail in the following sections.

Note

  • Throughout this documentation are references to $EXPERIMENT_ID and $DATASET_ID in the FTMS Client sections.

    • For instructions on creating a dataset using the remote client, refer to the Creating a dataset section in the Remote Client documentation.

    • For instructions on creating an experiment using the remote client, refer to the Creating an experiment section in the Remote Client documentation.

  • The spec format is YAML for TAO Launcher, and JSON for FTMS Client.

  • File-related parameters, such as dataset paths or pretrained model paths, are required only for TAO Launcher, not for FTMS Client.

Data Input for VisualChangeNet#

VisualChangeNet-Segmentation requires the data to be provided as image and mask folders. See the Data Annotation Format page for more information about the input data format for VisualChangeNet-Segmentation.

Creating a Training Experiment Spec File#

Configuring a Custom Dataset#

This section provides an example configuration and commands to retrieve configuration for training VisualChangeNet-Segmentation using the dataset format described for the LEVIR-CD dataset, above. LEVIR-CD dataset is a large-scale remote sensing building Change Detection dataset.

Note

Make sure to set task=segment in SPECS for all task specs.

BASE_EXPERIMENT_ID=$(tao visual_changenet list-base-experiments | jq -r '.[0].id')
SPECS=$(tao visual_changenet get-job-schema --action train --base-experiment-id $BASE_EXPERIMENT_ID | jq -r '.default')

Here is an example spec file for training a VisualChangeNet-Segmentation model with NVIDIA’s FAN Hybrid backbone on the LEVIR-CD dataset using the Data Annotation Format.

encryption_key: tlt_encode
task: segment
train:
  pretrained_model_path: /path/to/pretrained/model.pth
  resume_training_checkpoint_path: null
  segment:
    loss: "ce"
    weights: [0.5, 0.5, 0.5, 0.8, 1.0]
  num_epochs: 10
  num_nodes: 1
  validation_interval: 5
  checkpoint_interval: 5
  seed: 1234
  optim:
    lr: 0.0001
    optim: "adamw"
    policy: "linear"
    momentum: 0.9
    weight_decay: 0.01
    betas: [0.9, 0.999]
results_dir: /path/to/experiment_results
model:
  backbone:
    type: "fan_small_12_p4_hybrid"
    pretrained_backbone_path: null
    freeze_backbone: False
  decode_head:
    feature_strides: [4, 8, 16, 16]
    use_summary_token: True
dataset:
  segment:
    dataset: "CNDataset"
    root_dir: /path/to/root/dataset/dir/
    data_name: "LEVIR-CD"
    label_transform: "norm"
    batch_size: 16
    workers: 2
    multi_scale_train: True
    multi_scale_infer: False
    num_classes: 2
    img_size: 256
    image_folder_name: "A"
    change_image_folder_name: "B"
    list_folder_name: 'list'
    annotation_folder_name: "label"
    train_split: "train"
    validation_split: "val"
    label_suffix: .png
    augmentation:
      random_flip:
        vflip_probability: 0.5
        hflip_probability: 0.5
        enable: True
      random_rotate:
        rotate_probability: 0.5
        angle_list: [90, 180, 270]
        enable: True
      random_color:
        brightness: 0.3
        contrast: 0.3
        saturation: 0.3
        hue: 0.3
        enable: True
      with_scale_random_crop:
        enable: True
      with_random_crop: True
      with_random_blur: True
evaluate:
  checkpoint: "???"
  vis_after_n_batches: 10
inference:
  checkpoint: "???"
  vis_after_n_batches: 1
export:
  gpu_id: 0
  checkpoint: "???"
  onnx_file: "???"
  input_width: 256
  input_height: 256

Parameter

Data Type

Default

Description

Supported Values

model

dict config

The configuration of the model architecture.

dataset

dict config

The configuration of the dataset.

train

dict config

The configuration of the training task.

evaluate

dict config

The configuration of the evaluation task.

inference

dict config

The configuration of the inference task.

encryption_key

string

None

The encryption key to encrypt and decrypt model files.

results_dir

string

/results

The directory where experiment results are saved.

export

dict config

The configuration of the ONNX export task.

task

str

segment

A flag to indicate the change detection task. Supports two tasks: ‘segment’ and ‘classify’ for segmentation and classification.

classify, segmen

train#

Parameter

Datatype

Default

Description

Supported Values

num_gpus

unsigned int

1

The number of GPUs to use for distributed training.

>0

gpu_ids

List[int]

[0]

The indices of the GPU’s to use for distributed training.

seed

unsigned int

1234

The random seed for random, numpy, and torch.

>0

num_epochs

unsigned int

10

The total number of epochs to run the experiment.

>0

checkpoint_interval

unsigned int

1

The epoch interval at which the checkpoints are saved.

>0

validation_interval

unsigned int

1

The epoch interval at which the validation is run.

>0

resume_training_checkpoint_path

string

The intermediate PyTorch Lightning checkpoint to resume training from.

results_dir

string

/results/train

The directory to save training results.
segment
Dict
str
list
None
ce
The segment dict contains configurable parameters for the VisualChangeNet Segmentation pipeline with the following parameters:
* loss: The loss function used for segmentation training.
* weights: Weights for multi-scale training.



num_nodes

unsigned int

1

The number of nodes. If the value is larger than 1, multi-node is enabled.

pretrained_model_path

string

The path to the pretrained model checkpoint to initialize the end-end model weights.

optim

dict config

None

Contains the configurable parameters for the VisualChangeNet optimizer detailed in the optim section.

optim#

optim:
  lr: 0.0001
  optim: "adamw"
  policy: "linear"
  momentum: 0.9
  weight_decay: 0.01

Parameter

Datatype

Default

Description

Supported Values

lr

float

0.0005

The learning rate.

>=0.0

optim

str

adamw

The optimizer.
policy
str
linear
The learning scheduler:
* linear : LambdaLR decreases the lr by a multiplicative factor.
* step : StepLR decrease the lr by 0.1 at every num_epochs // 3 steps.
linear/step

momentum

float

0.9

The momentum for the AdamW optimizer.

weight_decay

float

0.1

The weight decay coefficient.

Model#

The following example model config provides options to change the VisualChangeNet-Segmentation architecture for training.

model:
  backbone:
    type: "fan_small_12_p4_hybrid"
    pretrained_backbone_path: null
    freeze_backbone: False
  decode_head:
    feature_strides: [4, 8, 16, 16]
    align_corner: False
    use_summary_token: True

Parameter

Datatype

Default

Description

Supported Values
backbone
Dict
string










bool
bool
None










None
False
False
A dictionary containing the following configurable parameters:
* type: The name of the backbone to be used









* pretrained_backbone_path: The path to pre-trained backbone weights file.
* freeze_backbone: If set to True, freezes the backbone weights during training.
* feat_downsample: If set to True, downsamples the last feature map in FAN backbone configurations. This parameter is not propagated to other backbones.
fan_tiny_8_p4_hybrid
fan_large_16_p4_hybrid
fan_small_12_p4_hybrid
fan_base_16_p4_hybrid
vit_large_nvdinov2
c_radio_p1_vit_huge_patch16_224_mlpnorm
c_radio_p2_vit_huge_patch16_224_mlpnorm
c_radio_p3_vit_huge_patch16_224_mlpnorm
c_radio_v2_vit_huge_patch16_224
c_radio_v2_vit_large_patch16_224
c_radio_v2_vit_base_patch16_224
decode_head
Dict
bool
bool
list
None
False
True
[4, 8, 16, 16]
A dictionary containing the following configurable parameters for the decoder:
* align_corners: If set to True, the input and output tensors are aligned by the center points of their corner pixels, preserving the values at the corner pixels.
* use_summary_token: If set to True, uses the summary token of the backbone.
* feature_strides: The downsampling feature strides for different backbones.

True, False
True, False

Dataset#

The dataset parameter defines the dataset source, training batch size, augmentation, and pre-processing. An example dataset is provided below.

dataset:
  segment:
    dataset: "CNDataset"
    root_dir: /path/to/root/dataset/dir/
    data_name: "LEVIR-CD"
    label_transform: "norm"
    batch_size: 16
    workers: 2
    multi_scale_train: True
    multi_scale_infer: False
    num_classes: 2
    img_size: 256
    image_folder_name: "A"
    change_image_folder_name: "B"
    list_folder_name: 'list'
    annotation_folder_name: "label"
    train_split: "train"
    validation_split: "val"
    test_split: "test"
    predict_split: 'predict'
    label_suffix: .png
    augmentation:
      random_flip:
        vflip_probability: 0.5
        hflip_probability: 0.5
        enable: True
      random_rotate:
        rotate_probability: 0.5
        angle_list: [90, 180, 270]
        enable: True
      random_color:
        brightness: 0.3
        contrast: 0.3
        saturation: 0.3
        hue: 0.3
        enable: True
      with_scale_random_crop:
        enable: True
      with_random_crop: True
      with_random_blur: True
    color_map:
      '0': [255, 255, 255]
      '1': [0, 0, 0]

Parameter

Datatype

Default

Description

Supported Values

segment

Dict

The segment contains dataset config for the segmentation dataloader detailed in the segment section.

classify

Dict

The classify contains dataset config for the classification dataloader.

segment#

Parameter

Datatype

Default

Description

Supported Values

dataset

Dict

CNDataset

The dataloader supported for segmentation.

CNDataset

root_dir

str

The root directory path where the dataset is located.

data_name

str

LEVIR-CD

The dataset identifier.

LEVIR-CD, LandSCD, custom

batch_size

int

32

The number of samples per batch.

>0

workers

int

2

The number of worker processes for data loading.

>=0

multi_scale_train

bool

True

If set to True, enables multi-scale training.

True, False

multi_scale_infer

bool

False

If set to True, enables multi-scale inference.

True, False

num_classes

int

2

Number of classes in the dataset.

>=2

img_size

int

256

Size of the input images after resizing.

image_folder_name

str

A

Name of the folder containing input images.

change_image_folder_name

str

B

Name of the folder containing the changed images.

list_folder_name

str

list

Name of the folder containing dataset split lists’ csv files.

annotation_folder_name

str

label

Name of the folder containing annotation masks.

train_split

str

train

Dataset split used for training, should indicate the name of csv file in list_folder_name.

validation_split

str

val

Dataset split used for validation, should indicate the name of csv file in list_folder_name.

test_split

str

test

Dataset split used for evaluation, should indicate the name of csv file in list_folder_name.

predict_split

str

predict

Dataset split used for inference, should indicate the name of csv file in list_folder_name.

label_suffix

str

.png

Suffix of the label image files.

augmentation

Dict

None

Dictionary containing various data augmentation settings, which is detailed in the augmentation section.

color_map

Optional[Dict[str, List[int]]]

None

Mapping of string class labels (‘0’ to ‘n’) to rgb color codes.

augmentation#

Parameter

Datatype

Default

Description

Supported Values
random_flip
Dict
float
float
bool
None
0.5
0.5
True
Random vertical and horizontal flipping augmentation settings.
* vflip_probability: Probability of vertical flipping.
* hflip_probability: Probability of horizontal flipping.
* enable: If set to True, enables random flipping augmentation.

>=0.0
>=0.0
random_rotate
Dict
float
list
bool
None
0.5
[90, 180, 270]
True
Random rotation augmentation settings.
* rotate_probability: Probability of applying random rotation.
* angle_list: List of rotation angles to choose from.
* enable: If set to True, enables random rotation augmentation.

>=0.0
>=0.0
random_color
Dict
float
float
float
float
bool
float
None
0.3
0.3
0.3
0.3
True
0.5
Random color augmentation settings.
* brightness: Maximum brightness change factor.
* contrast: Maximum contrast change factor.
* saturation: Maximum saturation change factor.
* hue: Maximum hue change factor.
* enabled: If set to True, enables random color augmentation.
* color_probability: Probability of applying color augmentation.

>=0.0
>=0.0
>=0.0
>=0.0

>=0.0
with_scale_random_crop
Dict
bool
None
True
Random scaling and cropping augmentation settings.
* enabled If set to True, enables random color augmentation.

True, False

with_random_crop

bool

True

If set to True, applies random crop augmentation.

True, False

with_random_blur

bool

True

If set to True, applies random blurring augmentation.

True, False

mean

List[float]

[0.5, 0.5, 0.5]

The mean to be subtracted for pre-processing.

std

List[float]

[0.5, 0.5, 0.5]

The standard deviation to divide the image by.

Example spec file for ViT backbones#

Note

The following spec file is only relevant for TAO versions 5.3 and later.

BASE_EXPERIMENT_ID=$(tao visual_changenet list-base-experiments | jq -r '.[0].id')
SPECS=$(tao visual_changenet get-job-schema --action train --base-experiment-id $BASE_EXPERIMENT_ID | jq -r '.default')

encryption_key: tlt_encode
task: segment
train:
  pretrained_model_path: /path/to/pretrained/model.pth
  resume_training_checkpoint_path: null
  segment:
    loss: "ce"
    weights: [0.5, 0.5, 0.5, 0.8, 1.0]
  num_epochs: 350
  num_nodes: 1
  validation_interval: 1
  checkpoint_interval: 1
  optim:
    lr: 0.00002
    optim: "adamw"
    policy: "linear"
    momentum: 0.9
    weight_decay: 0.01
    betas: [0.9, 0.999]
results_dir: /path/to/experiment_results
model:
  backbone:
    type: "vit_large_nvdinov2"
    pretrained_backbone_path: /path/to/pretrained/backbone.pth
    freeze_backbone: False
  decode_head:
    feature_strides: [4, 8, 16, 32]
    use_summary_token: True
dataset:
  segment:
    dataset: "CNDataset"
    root_dir: /path/to/root/dataset/dir/
    data_name: "LEVIR-CD"
    label_transform: "norm"
    batch_size: 16
    workers: 2
    multi_scale_train: True
    multi_scale_infer: False
    num_classes: 2
    img_size: 256
    image_folder_name: "A"
    change_image_folder_name: "B"
    list_folder_name: 'list'
    annotation_folder_name: "label"
    train_split: "train"
    validation_split: "val"
    label_suffix: .png
    augmentation:
      random_flip:
        vflip_probability: 0.5
        hflip_probability: 0.5
        enable: True
      random_rotate:
        rotate_probability: 0.5
        angle_list: [90, 180, 270]
        enable: True
      random_color:
        brightness: 0.3
        contrast: 0.3
        saturation: 0.3
        hue: 0.3
        enable: True
      with_scale_random_crop:
        enable: True
      with_random_crop: True
      with_random_blur: True
evaluate:
  checkpoint: "???"
  vis_after_n_batches: 10
inference:
  checkpoint: "???"
  vis_after_n_batches: 1
export:
  gpu_id: 0
  checkpoint: "???"
  onnx_file: "???"
  input_width: 256
  input_height: 256

Training the Model#

Use the following command to run VisualChangeNet-Segmentation training:

TRAIN_JOB_ID=$(tao visual_changenet create-job \
  --kind experiment \
  --name "visual_changenet_train" \
  --action train \
  --workspace-id $WORKSPACE_ID \
  --specs "$TRAIN_SPECS" \
  --train-datasets '["'$DATASET_ID'"]' \
  --eval-dataset "$DATASET_ID" \
  --base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
  --encryption-key "nvidia_tlt" | jq -r '.id')

tao model visual_changenet train -e <experiment_spec_file>
                      task=segment
                      [results_dir=<global_results_dir>]
                      [model.<model_option>=<model_option_value>]
                      [dataset.<dataset_option>=<dataset_option_value>]
                      [train.<train_option>=<train_option_value>]
                      [train.gpu_ids=<gpu indices>]
                      [train.num_gpus=<number of gpus>]

Required Arguments

The following arguments are required.

  • -e, --experiment_spec_file: The path to the experiment spec file.

  • task: The task (‘segment’ or ‘classify’) for the visual_changenet training. Default: segment.

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

Note

For training, evaluation, and inference, we expose two variables for each task: num_gpus and gpu_ids, which default to 1 and [0], respectively. If both are passed, but are inconsistent, for example num_gpus = 1, gpu_ids = [0, 1], then they are modified to follow the setting that implies more GPUs; in the same example num_gpus is modified from 1 to 2.

In some cases multi-GPU training may result in a segmentation fault. You can circumvent this by setting the enviroment variable OMP_NUM_THREADS to 1. Depending upon your model of execution, you may use the following methods to set this variable:

  • CLI Launcher:

    You may set the environment variable by adding the following fields to the Envs field of your ~/.tao_mounts.json file as mentioned in bullet 3 in ths section Running the launcher.

    {
    "Envs": [
        {
            "variable": "OMP_NUM_THREADSR",
            "value": "1"
        }

}

  • Docker:

    You may set environment variables in Docker by setting the -e flag in the Docker command line.

    docker run -it --rm --gpus all \
    -e OMP_NUM_THREADS=1 \
    -v /path/to/local/mount:/path/to/docker/mount nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt <model> train -e

Checkpointing and Resuming Training

At every train.checkpoint_interval, a PyTorch Lightning checkpoint is saved. It is called model_epoch_<epoch_num>.pth. Checkpoints are saved in train.results_dir, like this:

$ ls /results/train

'model_epoch_000.pth'
'model_epoch_001.pth'
'model_epoch_002.pth'
'model_epoch_003.pth'
'model_epoch_004.pth'

The latest checkpoint will also be saved as changenet_model_segment_latest.pth. Training will automatically resume from changenet_model_segment_latest.pth if it exists in train.results_dir. This will be superseded by train.resume_training_checkpoint_path if it is provided.

The major implication of this logic is that, if you wish to trigger fresh training from scratch, either

  • Specify a new, empty results directory (Recommended), or

  • Remove the latest checkpoint from the results directory

Creating a Testing Experiment Spec File#

Here is an example spec file for testing evaluation and inference of a trained VisualChangeNet-Segmentation model:

BASE_EXPERIMENT_ID=$(tao visual_changenet list-base-experiments | jq -r '.[0].id')
SPECS=$(tao visual_changenet get-job-schema --action evaluate --base-experiment-id $BASE_EXPERIMENT_ID | jq -r '.default')

results_dir: /path/to/experiment_results
task: segment
model:
  backbone:
    type: "fan_small_12_p4_hybrid"
dataset:
  segment:
    dataset: "CNDataset"
    root_dir: /path/to/root/dataset/dir/
    data_name: "LEVIR-CD"
    label_transform: "norm"
    batch_size: 16
    workers: 2
    multi_scale_train: True
    multi_scale_infer: False
    num_classes: 2
    img_size: 256
    image_folder_name: "A"
    change_image_folder_name: "B"
    list_folder_name: 'list'
    annotation_folder_name: "label"
    test_split: "test"
    predict_split: 'predict'
    label_suffix: .png
evaluate:
  checkpoint: /path/to/checkpoint
  vis_after_n_batches: 1
  results_dir: /results/evaluate
inference:
  checkpoint: /path/to/checkpoint
  vis_after_n_batches: 1
  results_dir: /results/inference

Parameter

Datatype

Default

Description

Supported Values

checkpoint

string

Path to PyTorch model to evaluate/inference.

trt_engine

string

Path to TensorRT model to inference/evaluate. Should be only used with TAO Deploy.

num_gpus

unsigned int

1

The number of GPUs to use.

>0

gpu_ids

unsigned int

[0]

The GPU IDs to use.

results_dir

string

The path to a folder where the experiment outputs should be written.

vis_after_n_batches

unsigned int

1

Number of batches after which to save inference/evaluate visualization results.

>0

Evaluating the Model#

Use the following command to run a VisualChangeNet-Segmentation evaluation:

EVALUATE_JOB_ID=$(tao visual_changenet create-job \
  --kind experiment \
  --name "visual_changenet_evaluate" \
  --action evaluate \
  --workspace-id $WORKSPACE_ID \
  --parent-job-id $TRAIN_JOB_ID \
  --eval-dataset "$DATASET_ID" \
  --specs "$EVALUATE_SPECS" \
  --base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
  --encryption-key "nvidia_tlt" | jq -r '.id')

tao model visual_changenet evaluate -e <experiment_spec>
                       task=segment
                       evaluate.checkpoint=<model to be evaluated>
                       [evaluate.<evaluate_option>=<evaluate_option_value>]
                       [evaluate.gpu_ids=<gpu indices>]
                       [evaluate.num_gpus=<number of gpus>]

Required Arguments

The following arguments are required.

  • -e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment.

  • evaluate.checkpoint: The .pth model to be evaluated.

Optional Arguments

The following arguments are optional to run the command.

Running Inference on the Model#

Use the following command to run inference on VisualChangeNet-Segmentation with the .pth model:

INFERENCE_JOB_ID=$(tao visual_changenet create-job \
  --kind experiment \
  --name "visual_changenet_inference" \
  --action inference \
  --workspace-id $WORKSPACE_ID \
  --parent-job-id $TRAIN_JOB_ID \
  --inference-dataset "$DATASET_ID" \
  --specs "$INFERENCE_SPECS" \
  --base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
  --encryption-key "nvidia_tlt" | jq -r '.id')

tao model visual_changenet inference -e <experiment_spec>
                       task=segment
                       inference.checkpoint=<inference model>
                       [inference.<evaluate_option>=<evaluate_option_value>]
                       [inference.gpu_ids=<gpu indices>]
                       [inference.num_gpus=<number of gpus>]

Required Arguments

The following arguments are required.

  • -e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment.

  • inference.checkpoint: The .pth model to run inference on.

Optional Arguments

The following arguments are optional to run the command.

Exporting the Model#

Here is an example to get spec from the FTMS client and an example spec file from TAO Launcher for exporting the trained VisualChangeNet model:

BASE_EXPERIMENT_ID=$(tao visual_changenet list-base-experiments | jq -r '.[0].id')
SPECS=$(tao visual_changenet get-job-schema --action export --base-experiment-id $BASE_EXPERIMENT_ID | jq -r '.default')

export:
  checkpoint: /path/to/model.pth
  onnx_file: /path/to/model.onnx
  opset_version: 12
  input_channel: 3
  input_width: 256
  input_height: 256
  batch_size: -1

Parameter

Datatype

Default

Description

Supported Values

checkpoint

string

The path to the PyTorch model to export.

onnx_file

string

The path to the .onnx file.

opset_version

unsigned int

12

The opset version of the exported ONNX.

>0

input_channel

unsigned int

3

The input channel size. Only the value 3 is supported.

3

input_width

unsigned int

128

The input width.

>0

input_height

unsigned int

512

The input height.

>0

batch_size

unsigned int

-1

The batch size of the ONNX model. If this value is set to -1, the export uses dynamic batch size.

>=-1

gpu_id

unsigned int

0

The GPU ID to use.

on_cpu

bool

False

If set to True, exports the model on CPU.

verbose

bool

False

If set to True, prints a human-readable representation of the network.

Use the following command to export the model:

EXPORT_JOB_ID=$(tao visual_changenet create-job \
  --kind experiment \
  --name "visual_changenet_export" \
  --action export \
  --workspace-id $WORKSPACE_ID \
  --parent-job-id $TRAIN_JOB_ID \
  --specs "$EXPORT_SPECS" \
  --base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
  --encryption-key "nvidia_tlt" | jq -r '.id')

tao model visual_changenet export [-h] -e <experiment spec file>
                          task=segment
                          export.checkpoint=<model to export>
                          export.onnx_file=<onnx path>
                          [export.<export_option>=<export_option_value>]

Required Arguments

The following arguments are required to run the command.

  • -e, --experiment_spec: The path to an experiment spec file

  • export.checkpoint: The .pth model to export.

  • export.onnx_file: The path where the .etlt or .onnx model is saved.

Optional Arguments

The following arguments are optional to run the command.

TensorRT Engine Generation, Validation, and int8 Calibration#

For deployment, refer to the TAO Deploy Documentation for VisualChangeNet-Segmentation.