Visual ChangeNet-Classification#
Visual ChangeNet-Classification is an NVIDIA-developed classification change detection model and is included in the TAO. Visual ChangeNet supports the following tasks:
trainevaluateinferenceexportquantize
Each task is explained in detail in the following sections.
Data Input for VisualChangeNet#
Single Golden Data Format#
VisualChangeNet-Classification requires the data to be provided as image and CSV files. Refer to the Data Annotation Format page for more information about the input data format for VisualChangeNet-Classification, which follows the same input data format as Optical Inspection.
Multiple Golden Data Format#
To enable Multiple Golden mode, set num_golden > 1 in the Dataset Configuration.
This mode requires a different data format to support multiple golden reference images per sample.
Refer to the Data Annotation Format page for more information
about the input data format for Multiple-Golden-VisualChangeNet-Classification.
Creating a Training Experiment Specification File#
Configuring a Custom Dataset#
This section provides example configuration and commands to retrieve configuration for training VisualChangeNet-Classification using the dataset format described above.
Note
Make sure to set task=classify in SPECS for all task specs.
Parameter |
Data Type |
Default |
Description |
Supported Values |
|---|---|---|---|---|
model |
dict config |
– |
The configuration of the model architecture. |
|
dataset |
dict config |
– |
The configuration of the dataset. |
|
train |
dict config |
– |
The configuration of the training task. |
|
evaluate |
dict config |
– |
The configuration of the evaluation task. |
|
inference |
dict config |
– |
The configuration of the inference task. |
|
encryption_key |
string |
None |
The encryption key to encrypt and decrypt model files. |
|
results_dir |
string |
/results |
The directory where experiment results are saved. |
|
export |
dict config |
– |
The configuration of the ONNX export task. |
|
task |
str |
classify |
A flag to indicate the change detection task. Supports two tasks: ‘segment’ and ‘classify’ for segmentation and classification. |
classify, segment |
train#
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
num_gpus |
unsigned int |
1 |
The number of GPUs to use for distributed training. |
>0 |
gpu_ids |
List[int] |
[0] |
The indices of the GPU’s to use for distributed training. |
|
seed |
unsigned int |
1234 |
The random seed for random, NumPy, and torch. |
>0 |
num_epochs |
unsigned int |
10 |
The total number of epochs to run the experiment. |
>0 |
checkpoint_interval |
unsigned int |
1 |
The epoch interval at which the checkpoints are saved. |
>0 |
validation_interval |
unsigned int |
1 |
The epoch interval at which the validation is run. |
>0 |
resume_training_checkpoint_path |
string |
The intermediate PyTorch Lightning checkpoint from which to resume training. |
||
results_dir |
string |
/results/train |
The directory in which to save training results. |
|
classify
|
Dict
str
list
|
None
ce
|
The classify dict contains configurable parameters for the VisualChangeNet Classification pipeline with the following parameters:
* loss: The loss function used for classification training.
* cls_weights: Weights for Cross-Entropy Loss for unbalanced dataset distributions.
|
|
segment
|
Dict
str
list
|
None
ce
[0.5, 0.5, 0.5, 0.8, 1.0]
|
The segment dict contains configurable parameters for the VisualChangeNet Segmentation pipeline with the following parameters:
* loss: The loss function used for segmentation training.
|
|
num_nodes |
unsigned int |
1 |
The number of nodes. If larger than 1, multi-node is enabled. |
|
pretrained_model_path |
string |
– |
The path to the pretrained model checkpoint to initialize the end-end model weights. |
|
optim
|
dictconfig |
None
|
Contains the configurable parameters for the VisualChangeNet optimizer detailed in
the optim section.
|
|
tensorboard
|
dict configbool
|
None
True
|
Enable TensorBoard visualisation using a dict with configurable parameters:
* enabled: If set to
True, enables TensorBoard. |
optim#
optim:
lr: 0.0001
optim: "adamw"
policy: "linear"
momentum: 0.9
weight_decay: 0.01
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
lr |
float |
0.0005 |
The learning rate. |
>=0.0 |
optim |
str |
adamw |
The optimizer. |
|
policy
|
str
|
linear
|
The learning scheduler:
* linear : LambdaLR decreases the lr by a multiplicative factor.
* step : StepLR decrease the lr by 0.1 at every
num_epochs // 3 steps. |
linear/step
|
momentum |
float |
0.9 |
The momentum for the AdamW optimizer. |
|
weight_decay |
float |
0.1 |
The weight decay coefficient. |
|
monitor_name |
str |
val_loss |
The name of the monitor used for saving the top-k checkpoints. |
Model#
The following example model config provides options to change the VisualChangeNet-Classification architecture for
training. VisualChangeNet-Classification supports two model architectures. Architecture 1
(difference_module = euclidean) leverages only the last feature maps from the FAN backbone using Euclidean
difference to perform contrastive learning. Architecture 2 (difference_module = learnable) leverages the
VisualChangeNet-Classification learnable difference modules for 4 different features at 3 feature resolutions to
minimize Cross-Entropy loss.
model:
backbone:
type: "fan_small_12_p4_hybrid"
pretrained_backbone_path: null
freeze_backbone: False
decode_head:
feature_strides: [4, 8, 16, 16]
align_corner: False
use_summary_token: True
classify:
train_margin_euclid: 2.0
eval_margin: 0.005
embedding_vectors: 5
embed_dec: 30
difference_module: 'learnable'
learnable_difference_modules: 4
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
backbone
|
Dict
string
bool
bool
|
None
None
False
False
|
A dictionary containing the following configurable parameters for VisualChangeNet-Classification backbone:
* type: The name of the backbone to be used.
* pretrained_backbone_path: The path to pre-trained backbone weights file.
* freeze_backbone: If set to
True, freezes the backbone weights during training.* feat_downsample: If set to
True, downsamples the last feature map in FAN backbone configurations. This parameter is not propagated to other backbones. |
fan_tiny_8_p4_hybrid
fan_large_16_p4_hybrid
fan_small_12_p4_hybrid
fan_base_16_p4_hybrid
vit_large_nvdinov2
c_radio_p1_vit_huge_patch16_224_mlpnorm
c_radio_p2_vit_huge_patch16_224_mlpnorm
c_radio_p3_vit_huge_patch16_224_mlpnorm
c_radio_v2_vit_huge_patch16_224
c_radio_v2_vit_large_patch16_224
c_radio_v2_vit_base_patch16_224
|
decode_head
|
Dict
bool
bool
list
Dict
int
|
None
False
True
[4, 8, 16, 16]
256
|
A dictionary containing the following configurable parameters for the decoder:
* align_corners: If set to
True, the input and output tensors are aligned by the center points of their corner pixels, preserving the values at the corner pixels.* use_summary_token: If set to
True, uses the summary token of the backbone.* feature_strides: The downsampling feature strides for different backbones.
* decoder_params: Contains the following network parameters:
– embed_dims: The embedding dimensions.
|
True, False
True, False
>0
|
classify
|
Dict
string
|
None
2.0
5
30
learnable
4
|
A dictionary containing the following configurable parameters for VisualChangeNet-Classification model:
* train_margin_euclid: The training margin threshold for contrastive learning (applicable for Architecture 1).
* eval_margin: The evaluation margin threshold.
* embedding_vectors: The output embedding dimension for each input image before computing Euclidean distance (applicable to Architecture 1).
* embed_dec: The transformer decoder MLP embedding dimension (applicable to Architecture 2).
* difference_module: The type of difference module used (applicable to both architectures).
* learnable_difference_modules: The number of learnable difference modules (applicable to Architecture 2).
|
>0
>0
>0
>0
euclidean, learnable
<4
|
Dataset#
The dataset parameter defines the dataset source, training batch size, augmentation, and pre-processing. An example dataset is provided below.
dataset:
classify:
train_dataset:
csv_path: /path/to/train.csv
images_dir: /path/to/img_dir
validation_dataset:
csv_path: /path/to/val.csv
images_dir: /path/to/img_dir
test_dataset:
csv_path: /path/to/test.csv
images_dir: /path/to/img_dir
infer_dataset:
csv_path: /path/to/infer.csv
images_dir: /path/to/img_dir
image_ext: .jpg
batch_size: 16
workers: 2
fpratio_sampling: 0.2
num_input: 4
input_map:
LowAngleLight: 0
SolderLight: 1
UniformLight: 2
WhiteLight: 3
concat_type: linear
grid_map:
x: 2
y: 2
image_width: 128
image_height: 128
augmentation_config:
rgb_input_mean: [0.485, 0.456, 0.406]
rgb_input_std: [0.229, 0.224, 0.225]
num_classes: 2
* Refer to the Dataset Annotation Format definition for more information about specifying lighting conditions.
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
segment |
Dict |
– |
The |
|
classify |
Dict |
– |
The |
classify#
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
train_dataset |
Dict |
– |
The paths to the image directory and CSV files for the training dataset. |
|
validation_dataset |
Dict |
– |
The paths to the image directory and CSV files for the validation dataset. |
|
test_dataset |
Dict |
– |
The paths to the image directory and CSV files for the test dataset. |
|
infer_dataset |
Dict |
– |
The paths to the image directory and CSV files for the inference dataset. |
|
image_ext |
str |
.jpg |
The file extension of the images in the dataset. |
string |
batch_size |
int |
32 |
The number of samples per batch. |
string |
workers |
int |
8 |
The number of worker processes for data loading. |
|
fpratio_sampling |
int |
0.1 |
The ratio of false-positive examples to sample. |
>0 |
num_input |
int |
4 |
The number of lighting conditions for each input image*. |
>0 |
input_map |
Dict |
– |
The mapping of lighting conditions to indices specifying concatenation ordering*. |
|
concat_type |
string |
linear |
Type of concatenation to use for different image lighting conditions. |
linear, grid |
grid_map
|
Dict
Dict
Dict
|
None
None
None
|
The parameters to define the grid dimensions to concatenate images as a grid:
* x: The number of images along the x-axis.
* y: The number of images along the y-axis.
|
Dict
|
input_width |
int |
100 |
The width of the input image. |
>0 |
input_height |
int |
100 |
The height of the input image. |
>0 |
num_classes |
int |
2 |
The number of classes in the dataset. |
>1 |
augmentation_config |
Dict |
None |
Dictionary containing various data augmentation settings, which is detailed in the augmentation section. |
|
num_golden
|
int
|
1
|
Number of golden images to use per input image. Setting this value greater than 1 enables Multiple Golden mode.
Multiple Golden mode is only supported with ViT backbones, using
input_width = input_height = 224 and input_map = None.In Multiple Golden mode, the dataset must follow the multiple golden data format.
|
>0
|
augmentation_config#
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
random_flip
|
Dict
float
float
bool
|
None
0.5
0.5
True
|
Random vertical and horizontal flipping augmentation settings.
* vflip_probability: Probability of vertical flipping.
* hflip_probability: Probability of horizontal flipping.
* enable: If set to
True, enables random flipping augmentation. |
>=0.0
>=0.0
|
random_rotate
|
Dict
float
list
bool
|
None
0.5
[90, 180, 270]
True
|
Random rotation augmentation settings.
* rotate_probability: Probability of applying random rotation.
* angle_list: List of rotation angles to choose from.
* enable: If set to
True, enables random rotation augmentation. |
>=0.0
>=0.0
|
random_color
|
Dict
float
float
float
float
bool
float
|
None
0.3
0.3
0.3
0.3
True
0.5
|
Random color augmentation settings.
* brightness: Maximum brightness change factor.
* contrast: Maximum contrast change factor.
* saturation: Maximum saturation change factor.
* hue: Maximum hue change factor.
* enabled: If set to
True, enables random color augmentation.* color_probability: Probability of applying color augmentation.
|
>=0.0
>=0.0
>=0.0
>=0.0
>=0.0
|
with_random_crop |
bool |
True |
If set to |
True, False |
with_random_blur |
bool |
True |
If set to |
True, False |
rgb_input_mean |
List[float] |
[0.485, 0.456, 0.406] |
The mean to be subtracted for pre-processing. |
|
rgb_input_std |
List[float] |
[0.229, 0.224, 0.225] |
The standard deviation to divide the image by. |
|
augment |
bool |
False |
If set to |
True, False |
Example Specification File for ViT Backbones#
Note
The following specification file is only relevant for TAO versions 5.3 and later.
Creating a Testing Experiment Specification File#
Here is an example specification file for testing evaluation and inference of a trained VisualChangeNet-Classification model.
Inference/Evaluate#
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
checkpoint |
string |
Path to PyTorch model to evaluate/inference. |
||
trt_engine |
string |
Path to TensorRT model to inference/evaluate. |
||
num_gpus |
unsigned int |
1 |
The number of GPUs to use. |
>0 |
gpu_ids |
unsigned int |
[0] |
The GPU IDs to use. |
|
results_dir |
string |
The path to a folder where the experiment outputs should be written. |
||
vis_after_n_batches |
unsigned int |
1 |
Number of batches after which to save inference/evaluate visualization results. |
>0 |
batch_size |
unsigned int |
The batch size of inference/evaluate. |
Evaluating the Model#
Multi-GPU evaluation is currently not supported for Visual ChangeNet Classify.
Exporting the Model#
Here is an example specification file for exporting the trained VisualChangeNet model:
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
checkpoint |
string |
The path to the PyTorch model to export. |
||
onnx_file |
string |
The path to the |
||
opset_version |
unsigned int |
12 |
The opset version of the exported ONNX. |
>0 |
input_channel |
unsigned int |
3 |
The input channel size. Only the value 3 is supported. |
3 |
input_width |
unsigned int |
128 |
The input width. |
>0 |
input_height |
unsigned int |
512 |
The input height. |
>0 |
batch_size |
unsigned int |
-1 |
The batch size of the ONNX model. If this value is set to -1, the export uses dynamic batch size. |
>=-1 |
gpu_id |
unsigned int |
0 |
The GPU ID to use. |
|
on_cpu |
bool |
False |
If set to |
|
verbose |
bool |
False |
If set to |
Quantization#
Visual ChangeNet-Classification supports PTQ via TAO Quant using either the torchao (weight-only) or modelopt (static PTQ) backends.
Add a
quantizesection to your experiment specification (see TAO Quant documentation for schema and backend options).Use the quantized checkpoint by setting
evaluate.is_quantized: trueorinference.is_quantized: trueand pointing to the artifact saved underresults_dir(for example,quantized_model_torchao.pthorquantized_model_modelopt.pth). For ModelOpt artifacts, the model weights are stored undermodel_state_dict.
Notes#
For
modeloptstatic PTQ, ensure that your dataset configuration provides a representative calibration loader.For
torchao, activation settings in the configuration are ignored.
Calibration Dataset (ModelOpt)#
When you use the modelopt backend (static PTQ), provide a calibration dataset via dataset.classify.quant_calibration_dataset.
Minimal example:
quantize:
backend: "modelopt"
mode: "static_ptq"
algorithm: "minmax"
dataset:
classify:
quant_calibration_dataset:
images_dir: "/path/to/calib/images"
See also: TAO Quant overview and its Configuration and backend pages.