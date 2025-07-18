Visual ChangeNet-Classification
Visual ChangeNet-Classification is an NVIDIA-developed classification change detection model and is included in the TAO. Visual ChangeNet supports the following tasks:
train
evaluate
inference
export
These tasks can be invoked from the TAO Launcher using the following convention on the command-line:
tao model visual_changenet <sub_task> <args_per_subtask>
Where
args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.
VisualChangeNet-Classification requires the data to be provided as image and CSV files. See the Data Annotation Format page for more information about the input data format for VisualChangeNet-Classification, which follows the same input data format as Optical Inspection.
Configuring a Custom Dataset
This section provides example configuration and commands for training VisualChangeNet-Classification using the dataset format described above.
Here is an example spec file for training a VisualChangeNet-Classification model with NVIDIA’s FAN Hybrid backbone using the Data Annotation Format.
encryption_key: tlt_encode
task: classify
train:
pretrained_model_path: /path/to/pretrained/model.pth
resume_training_checkpoint_path: null
classify:
loss: "ce"
cls_weight: [1.0, 10.0]
num_epochs: 10
num_nodes: 1
validation_interval: 5
checkpoint_interval: 5
seed: 1234
optim:
lr: 0.0001
optim: "adamw"
policy: "linear"
momentum: 0.9
weight_decay: 0.01
results_dir: "${results_dir}/train"
tensorboard:
enabled: True
results_dir: /path/to/experiment_results
model:
backbone:
type: "fan_small_12_p4_hybrid"
pretrained_backbone_path: null
freeze_backbone: False
decode_head:
feature_strides: [4, 8, 16, 16]
classify:
train_margin_euclid: 2.0
eval_margin: 0.005
embedding_vectors: 5
embed_dec: 30
difference_module: 'learnable'
learnable_difference_modules: 4
dataset:
classify:
train_dataset:
csv_path: /path/to/train.csv
images_dir: /path/to/img_dir
validation_dataset:
csv_path: /path/to/val.csv
images_dir: /path/to/img_dir
test_dataset:
csv_path: /path/to/test.csv
images_dir: /path/to/img_dir
infer_dataset:
csv_path: /path/to/infer.csv
images_dir: /path/to/img_dir
image_ext: .jpg
batch_size: 16
workers: 2
fpratio_sampling: 0.2
num_input: 4
input_map:
LowAngleLight: 0
SolderLight: 1
UniformLight: 2
WhiteLight: 3
concat_type: linear
grid_map:
x: 2
y: 2
image_width: 128
image_height: 128
augmentation_config:
rgb_input_mean: [0.485, 0.456, 0.406]
rgb_input_std: [0.229, 0.224, 0.225]
num_classes: 2
evaluate:
checkpoint: "???"
inference:
checkpoint: "???"
export:
gpu_id: 0
checkpoint: "???"
onnx_file: "???"
input_width: 128
input_height: 512
|Parameter
|Data Type
|Default
|Description
|Supported Values
|
model
|dict config
|–
|The configuration of the model architecture
|
dataset
|dict config
|–
|The configuration of the dataset
|
train
|dict config
|–
|The configuration of the training task
|
evaluate
|dict config
|–
|The configuration of the evaluation task
|
inference
|dict config
|–
|The configuration of the inference task
|
encryption_key
|string
|None
|The encryption key to encrypt and decrypt model files
|
results_dir
|string
|/results
|The directory where experiment results are saved
|
export
|dict config
|–
|The configuration of the ONNX export task
|
task
|str
|classify
|A flag to indicate the change detection task. Supports two tasks: ‘segment’ and ‘classify’ for segmentation and classification
|Parameter
|Datatype
|Default
|Description
|Supported Values
|
num_gpus
|unsigned int
|1
|The number of GPUs to use for distributed training
|>0
|
gpu_ids
|List[int]
|[0]
|The indices of the GPU’s to use for distributed training
|
seed
|unsigned int
|1234
|The random seed for random, NumPy, and torch
|>0
|
num_epochs
|unsigned int
|10
|The total number of epochs to run the experiment
|>0
|
checkpoint_interval
|unsigned int
|1
|The epoch interval at which the checkpoints are saved
|>0
|
validation_interval
|unsigned int
|1
|The epoch interval at which the validation is run
|>0
|
resume_training_checkpoint_path
|string
|The intermediate PyTorch Lightning checkpoint to resume training from
|
results_dir
|string
|/results/train
|The directory to save training results
|
|
Dict
str
|
None
ce
|
The
*
|
|
|
Dict
str
|
None
ce
|
The
*
|
|
num_nodes
|unsigned int
|1
|The number of nodes. If the value is larger than 1, multi-node is enabled.
|
pretrained_model_path
|string
|–
|The path to the pretrained model checkpoint to initialize the end-end model weights.
|
|
dict config
|
None
|
Contains the configurable parameters for the VisualChangeNet optimizer detailed in
|
|
|
dict config
|
None
|
Enable TensorBoard visualisation using a dict with configurable parameters:
|
optim
optim:
lr: 0.0001
optim: "adamw"
policy: "linear"
momentum: 0.9
weight_decay: 0.01
|Parameter
|Datatype
|Default
|Description
|Supported Values
|
lr
|float
|0.0005
|The learning rate
|>=0.0
|
optim
|str
|adamw
|
|
str
|
linear
|
The learning scheduler:
|
linear/step
|
momentum
|float
|0.9
|The momentum for the AdamW optimizer
|
weight_decay
|float
|0.1
|The weight decay coefficient
|
monitor_name
|str
|val_loss
|The name of the monitor used for saving the top-k checkpoints.
The following example
model config provides options to change the VisualChangeNet-Classification architecture for training.
VisualChangeNet-Classification supports two model architectures. Architecture 1 leverages only the last feature maps from the
FAN backbone using Euclidean difference to perform contrastive learning. Architecture 2 leverages the VisualChangeNet-Classification learnable
difference modules for 4 different features at 3 feature resolutions to minimize Cross-Entropy loss.
model:
backbone:
type: "fan_small_12_p4_hybrid"
pretrained_backbone_path: null
freeze_backbone: False
decode_head:
feature_strides: [4, 8, 16, 16]
align_corner: False
classify:
train_margin_euclid: 2.0
eval_margin: 0.005
embedding_vectors: 5
embed_dec: 30
difference_module: 'learnable'
learnable_difference_modules: 4
|Parameter
|Datatype
|Default
|Description
|Supported Values
|
|
Dict
bool
|
None
None
|
A dictionary containing the following configurable parameters for VisualChangeNet-Classification backbone:
*
|
fan_tiny_8_p4_hybrid
|
|
Dict
bool
|
None
False
256
|
A dictionary containing the following configurable parameters for the decoder:
*
*
|
True, False
|
|
Dict
|
None
5
|
A dictionary containing the following configurable parameters for VisualChangeNet-Classification model:
|
>0
The
dataset parameter defines the dataset source, training batch size,
augmentation, and pre-processing. An example
dataset is provided below.
dataset:
classify:
train_dataset:
csv_path: /path/to/train.csv
images_dir: /path/to/img_dir
validation_dataset:
csv_path: /path/to/val.csv
images_dir: /path/to/img_dir
test_dataset:
csv_path: /path/to/test.csv
images_dir: /path/to/img_dir
infer_dataset:
csv_path: /path/to/infer.csv
images_dir: /path/to/img_dir
image_ext: .jpg
batch_size: 16
workers: 2
fpratio_sampling: 0.2
num_input: 4
input_map:
LowAngleLight: 0
SolderLight: 1
UniformLight: 2
WhiteLight: 3
concat_type: linear
grid_map:
x: 2
y: 2
image_width: 128
image_height: 128
augmentation_config:
rgb_input_mean: [0.485, 0.456, 0.406]
rgb_input_std: [0.229, 0.224, 0.225]
num_classes: 2
* See the Dataset Annotation Format definition for more information about specifying lighting conditions.
|Parameter
|Datatype
|Default
|Description
|Supported Values
|
segment
|Dict
|–
|The
segment contains dataset config for the segmentation dataloader
|
classify
|Dict
|–
|The
classify contains dataset config for the classification dataloader detailed in the classify section.
classify
|Parameter
|Datatype
|Default
|Description
|Supported Values
|
train_dataset
|Dict
|–
|The paths to the image directory and CSV files for the training dataset
|
validation_dataset
|Dict
|–
|The paths to the image directory and CSV files for the validation dataset
|
test_dataset
|Dict
|–
|The paths to the image directory and CSV files for the test dataset
|
infer_dataset
|Dict
|–
|The paths to the image directory and CSV files for the inference dataset
|
image_ext
|str
|.jpg
|The file extension of the images in the dataset
|string
|
batch_size
|int
|32
|The number of samples per batch
|string
|
workers
|int
|8
|The number of worker processes for data loading
|
fpratio_sampling
|int
|0.1
|The ratio of false-positive examples to sample
|>0
|
num_input
|int
|4
|The number of lighting conditions for each input image*
|>0
|
input_map
|Dict
|–
|The mapping of lighting conditions to indices specifying concatenation ordering*
|
concat_type
|string
|linear
|Type of concatenation to use for different image lighting conditions
|linear, grid
|
|
Dict
Dict
dict config
|
None
None
None
|
The parameters to define the grid dimensions to concatenate images as a grid:
* x: The number of images along the x-axis
* y: The number of images along the y-axis
|
Dict
|
input_width
|int
|100
|The width of the input image
|>0
|
input_height
|int
|100
|The height of the input image
|>0
|
num_classes
|int
|2
|The number of classes in the dataset
|>1
|
augmentation_config
|Dict
|None
|Dictionary containing various data augmentation settings, which is detailed in the augmentation section.
augmentation_config
|Parameter
|Datatype
|Default
|Description
|Supported Values
|
|
Dict
|
None
|
Random vertical and horizontal flipping augmentation settings.
|
>=0.0
|
|
Dict
|
None
|
Randomly rotate images with specified probability and angles
|
>=0.0
|
|
Dict
|
None
|
Apply random color augmentation to images.
|
>=0.0
>=0.0
|
with_random_crop
|bool
|True
|Apply random crop augmentation.
|True, False
|
with_random_blur
|bool
|True
|Apply random blurring augmentation.
|True, False
|
rgb_input_mean
|List[float]
|[0.485, 0.456, 0.406]
|The mean to be subtracted for pre-processing.
|
rgb_input_std
|List[float]
|[0.229, 0.224, 0.225]
|The standard deviation to divide the image by.
|
augment
|bool
|False
|Flag to indicate whether to apply data augmentations
|True, False
Example spec File for ViT Backbones
The following spec file is only relevant for TAO versions 5.3 and later.
encryption_key: tlt_encode
task: classify
train:
pretrained_model_path: /path/to/pretrained/model.pth
resume_training_checkpoint_path: null
classify:
loss: "contrastive"
cls_weight: [1.0, 10.0]
num_epochs: 10
num_nodes: 1
validation_interval: 5
checkpoint_interval: 5
seed: 1234
optim:
lr: 0.0001
optim: "adamw"
policy: "linear"
momentum: 0.9
weight_decay: 0.01
results_dir: "${results_dir}/train"
tensorboard:
enabled: True
results_dir: /path/to/experiment_results
model:
backbone:
type: "vit_large_nvdinov2"
pretrained_backbone_path: /path/to/pretrained/backbone.pth
freeze_backbone: False
decode_head:
feature_strides: [4, 8, 16, 32]
classify:
train_margin_euclid: 2.0
eval_margin: 0.005
embedding_vectors: 5
embed_dec: 30
difference_module: 'euclidean'
learnable_difference_modules: 4
dataset:
classify:
train_dataset:
csv_path: /path/to/train.csv
images_dir: /path/to/img_dir
validation_dataset:
csv_path: /path/to/val.csv
images_dir: /path/to/img_dir
test_dataset:
csv_path: /path/to/test.csv
images_dir: /path/to/img_dir
infer_dataset:
csv_path: /path/to/infer.csv
images_dir: /path/to/img_dir
image_ext: .jpg
batch_size: 16
workers: 2
fpratio_sampling: 0.2
num_input: 4
input_map:
LowAngleLight: 0
SolderLight: 1
UniformLight: 2
WhiteLight: 3
concat_type: grid
grid_map:
x: 2
y: 2
image_width: 112
image_height: 112
augmentation_config:
rgb_input_mean: [0.485, 0.456, 0.406]
rgb_input_std: [0.229, 0.224, 0.225]
num_classes: 2
evaluate:
checkpoint: "???"
inference:
checkpoint: "???"
export:
gpu_id: 0
checkpoint: "???"
onnx_file: "???"
input_width: 224
input_height: 224
Use the following command to run VisualChangeNet-Classification training:
tao model visual_changenet train [-h] -e <experiment_spec>
task=classify
[results_dir=<global_results_dir>]
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[train.<train_option>=<train_option_value>]
[train.gpu_ids=<gpu indices>]
[train.num_gpus=<number of gpus>]
Required Arguments
-e, --experiment_spec_file: The path to the experiment spec file.
task: The task (‘segment’ or ‘classify’) for the visual_changenet training. Default: segment.
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
-h, --help: Show this help message and exit.
model.<model_option>: The model options.
dataset.<dataset_option>: The dataset options.
train.<train_option>: The train options.
train.optim.<optim_option>: The optimizer options
For training, evaluation, and inference, we expose 2 variables for each respective task:
num_gpus and
gpu_ids, which
default to
1 and
[0], respectively. If both are passed, but inconsistent, for example
num_gpus = 1,
gpu_ids = [0, 1], then they are modified to follow the setting with more GPUs, for example
num_gpus = 1 -> num_gpus = 2.
Checkpointing and Resuming Training
At every
train.checkpoint_interval, a PyTorch Lightning checkpoint is saved. It is called
model_epoch_<epoch_num>.pth.
These are saved in
train.results_dir, like so:
$ ls /results/train
'model_epoch_000.pth'
'model_epoch_001.pth'
'model_epoch_002.pth'
'model_epoch_003.pth'
'model_epoch_004.pth'
The latest checkpoint is also saved as
changenet_model_classify_latest.pth.
Training automatically resumes from
changenet_model_classify_latest.pth, if it exists in
train.results_dir.
This is superseded by
train.resume_training_checkpoint_path, if it is provided.
The major implication of this logic is that, if you wish to trigger fresh training from scratch, either:
Specify a new, empty results directory (Recommended)
Remove the latest checkpoint from the results directory
Here is an example spec file for testing evaluation and inference of a trained VisualChangeNet-Classification model.
results_dir: /path/to/experiment_results
task: classify
model:
backbone:
type: "fan_small_12_p4_hybrid"
classify:
eval_margin: 0.005
dataset:
classify:
test_dataset:
csv_path: /path/to/test.csv
images_dir: /path/to/img_dir
infer_dataset:
csv_path: /path/to/infer.csv
images_dir: /path/to/img_dir
image_ext: .jpg
batch_size: 16
workers: 2
num_input: 4
input_map:
LowAngleLight: 0
SolderLight: 1
UniformLight: 2
WhiteLight: 3
concat_type: linear
grid_map:
x: 2
y: 2
output_shape:
- 128
- 128
augmentation_config:
rgb_input_mean: [0.485, 0.456, 0.406]
rgb_input_std: [0.229, 0.224, 0.225]
num_classes: 2
evaluate:
checkpoint: /path/to/checkpoint
results_dir: /results/evaluate
inference:
checkpoint: /path/to/checkpoint
results_dir: /results/inference
Inference/Evaluate
|Parameter
|Datatype
|Default
|Description
|Supported Values
|
checkpoint
|string
|Path to PyTorch model to evaluate/inference
|
trt_engine
|string
|Path to TensorRT model to inference/evaluate. Should be only used with TAO Deploy
|
num_gpus
|unsigned int
|1
|The number of GPUs to use
|>0
|
gpu_ids
|unsigned int
|[0]
|The GPU ids to use
|
results_dir
|string
|The path to a folder where the experiment outputs should be written
|
vis_after_n_batches
|unsigned int
|1
|Number of batches after which to save inference/evaluate visualization results
|>0
|
batch_size
|unsigned int
|The batch size of inference/evaluate
Use the following command to run VisualChangeNet-Classification evaluation:
tao model visual_changenet evaluate [-h] -e <experiment_spec_file>
task=classify
evaluate.checkpoint=<model to be evaluated>
[evaluate.<evaluate_option>=<evaluate_option_value>]
[evaluate.gpu_ids=<gpu indices>]
[evaluate.num_gpus=<number of gpus>]
Multi-GPU evaluation is currently not supported for Visual ChangeNet Classify.
Required Arguments
-e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment.
evaluate.checkpoint: The
.pthmodel to be evaluated.
Optional Arguments
evaluate.<evaluate_option>: The evaluate options.
Use the following command to run inference on VisualChangeNet-Classification with the
.pth model:
tao model visual_changenet inference [-h] -e <experiment_spec_file>
task=classify
inference.checkpoint=<inference model>
[inference.<evaluate_option>=<evaluate_option_value>]
[inference.gpu_ids=<gpu indices>]
[inference.num_gpus=<number of gpus>]
Required Arguments
-e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment.
inference.checkpoint: The
.pthmodel to run inference on.
Optional Arguments
inference.<inference_option>: The inference options.
Here is an example spec file for exporting the trained VisualChangeNet model:
export:
checkpoint: /path/to/model.pth
onnx_file: /path/to/model.onnx
opset_version: 12
input_channel: 3
input_width: 128
input_height: 512
batch_size: -1
|Parameter
|Datatype
|Default
|Description
|Supported Values
|
checkpoint
|string
|The path to the PyTorch model to export
|
onnx_file
|string
|The path to the
.onnx file
|
opset_version
|unsigned int
|12
|The opset version of the exported ONNX
|>0
|
input_channel
|unsigned int
|3
|The input channel size. Only the value 3 is supported.
|3
|
input_width
|unsigned int
|128
|The input width
|>0
|
input_height
|unsigned int
|512
|The input height
|>0
|
batch_size
|unsigned int
|-1
|The batch size of the ONNX model. If this value is set to -1, the export uses dynamic batch size.
|>=-1
|
gpu_id
|unsigned int
|0
|The GPU id to use
|
on_cpu
|bool
|False
|Whether to export on cpu
|
verbose
|bool
|False
|Print out a human-readable representation of the network
Use the following command to export the model:
tao model visual_changenet export [-h] -e <experiment spec file>
task=classify
export.checkpoint=<model to export>
export.onnx_file=<onnx path>
[export.<export_option>=<export_option_value>]
Required Arguments
-e, --experiment_spec: The path to an experiment spec file
export.checkpoint: The
.pthmodel to export.
export.onnx_file: The path where the
.etltor
.onnxmodel is saved.
Optional Arguments
export.<export_option>: The export options.
For deployment, refer to the TAO Deploy Documentation for VisualChangeNet-Classification.