Visual ChangeNet-Classification#
Visual ChangeNet-Classification is an NVIDIA-developed classification change detection model and is included in the TAO. Visual ChangeNet supports the following tasks:
train
evaluate
inference
export
These tasks can be invoked from the TAO Launcher using the following convention on the command-line:
tao model visual_changenet <sub_task> <args_per_subtask>
Where args_per_subtask
are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.
Data Input for VisualChangeNet#
VisualChangeNet-Classification requires the data to be provided as image and CSV files. See the Data Annotation Format page for more information about the input data format for VisualChangeNet-Classification, which follows the same input data format as Optical Inspection.
Creating a Training Experiment Spec File#
Configuring a Custom Dataset#
This section provides example configuration and commands for training VisualChangeNet-Classification using the dataset format described above.
Here is an example spec file for training a VisualChangeNet-Classification model with NVIDIA’s FAN Hybrid backbone using the Data Annotation Format.
encryption_key: tlt_encode
task: classify
train:
pretrained_model_path: /path/to/pretrained/model.pth
resume_training_checkpoint_path: null
classify:
loss: "ce"
cls_weight: [1.0, 10.0]
num_epochs: 10
num_nodes: 1
validation_interval: 5
checkpoint_interval: 5
seed: 1234
optim:
lr: 0.0001
optim: "adamw"
policy: "linear"
momentum: 0.9
weight_decay: 0.01
results_dir: "${results_dir}/train"
tensorboard:
enabled: True
results_dir: /path/to/experiment_results
model:
backbone:
type: "fan_small_12_p4_hybrid"
pretrained_backbone_path: null
freeze_backbone: False
decode_head:
feature_strides: [4, 8, 16, 16]
classify:
train_margin_euclid: 2.0
eval_margin: 0.005
embedding_vectors: 5
embed_dec: 30
difference_module: 'learnable'
learnable_difference_modules: 4
dataset:
classify:
train_dataset:
csv_path: /path/to/train.csv
images_dir: /path/to/img_dir
validation_dataset:
csv_path: /path/to/val.csv
images_dir: /path/to/img_dir
test_dataset:
csv_path: /path/to/test.csv
images_dir: /path/to/img_dir
infer_dataset:
csv_path: /path/to/infer.csv
images_dir: /path/to/img_dir
image_ext: .jpg
batch_size: 16
workers: 2
fpratio_sampling: 0.2
num_input: 4
input_map:
LowAngleLight: 0
SolderLight: 1
UniformLight: 2
WhiteLight: 3
concat_type: linear
grid_map:
x: 2
y: 2
image_width: 128
image_height: 128
augmentation_config:
rgb_input_mean: [0.485, 0.456, 0.406]
rgb_input_std: [0.229, 0.224, 0.225]
num_classes: 2
evaluate:
checkpoint: "???"
inference:
checkpoint: "???"
export:
gpu_id: 0
checkpoint: "???"
onnx_file: "???"
input_width: 128
input_height: 512
Parameter |
Data Type |
Default |
Description |
Supported Values |
|
dict config |
– |
The configuration of the model architecture |
|
|
dict config |
– |
The configuration of the dataset |
|
|
dict config |
– |
The configuration of the training task |
|
|
dict config |
– |
The configuration of the evaluation task |
|
|
dict config |
– |
The configuration of the inference task |
|
|
string |
None |
The encryption key to encrypt and decrypt model files |
|
|
string |
/results |
The directory where experiment results are saved |
|
|
dict config |
– |
The configuration of the ONNX export task |
|
|
str |
classify |
A flag to indicate the change detection task. Supports two tasks: ‘segment’ and ‘classify’ for segmentation and classification |
train#
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
unsigned int |
1 |
The number of GPUs to use for distributed training |
>0 |
|
List[int] |
[0] |
The indices of the GPU’s to use for distributed training |
|
|
unsigned int |
1234 |
The random seed for random, NumPy, and torch |
>0 |
|
unsigned int |
10 |
The total number of epochs to run the experiment |
>0 |
|
unsigned int |
1 |
The epoch interval at which the checkpoints are saved |
>0 |
|
unsigned int |
1 |
The epoch interval at which the validation is run |
>0 |
|
string |
The intermediate PyTorch Lightning checkpoint to resume training from |
||
|
string |
/results/train |
The directory to save training results |
|
classify |
Dict
str
list
|
None
ce
|
The
classify dict contains configurable parameters for the VisualChangeNet Classification pipeline with the following parameters:*
loss : The loss function used for classification training.*
cls_weights : Weights for Cross-Entropy Loss for unbalanced dataset distributions. |
|
segment |
Dict
str
list
|
None
ce
[0.5, 0.5, 0.5, 0.8, 1.0]
|
The
segment dict contains configurable parameters for the VisualChangeNet Segmentation pipeline with the following parameters:*
loss : The loss function used for segmentation training.*
weights : List of weights used for calculating the multi-scale segmentation loss during training when multi_scale_train is “True” |
|
|
unsigned int |
1 |
The number of nodes. If the value is larger than 1, multi-node is enabled. |
|
|
string |
– |
The path to the pretrained model checkpoint to initialize the end-end model weights. |
|
optim |
dict config
|
None
|
Contains the configurable parameters for the VisualChangeNet optimizer detailed in
the optim section.
|
|
tensorboard |
dict config
|
None
True
|
Enable TensorBoard visualisation using a dict with configurable parameters:
*
enabled : Flag to enabled TensorBoard. |
optim#
optim:
lr: 0.0001
optim: "adamw"
policy: "linear"
momentum: 0.9
weight_decay: 0.01
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
float |
0.0005 |
The learning rate |
>=0.0 |
|
str |
adamw |
||
policy |
str
|
linear
|
The learning scheduler:
*
linear : LambdaLR decreases the lr by a multiplicative factor.*
step : StepLR decrease the lr by 0.1 at every num_epochs //3 |
linear/step
|
|
float |
0.9 |
The momentum for the AdamW optimizer |
|
|
float |
0.1 |
The weight decay coefficient |
|
|
str |
val_loss |
The name of the monitor used for saving the top-k checkpoints. |
Model#
The following example model
config provides options to change the VisualChangeNet-Classification architecture for training.
VisualChangeNet-Classification supports two model architectures. Architecture 1 leverages only the last feature maps from the
FAN backbone using Euclidean difference to perform contrastive learning. Architecture 2 leverages the VisualChangeNet-Classification learnable
difference modules for 4 different features at 3 feature resolutions to minimize Cross-Entropy loss.
model:
backbone:
type: "fan_small_12_p4_hybrid"
pretrained_backbone_path: null
freeze_backbone: False
decode_head:
feature_strides: [4, 8, 16, 16]
align_corner: False
classify:
train_margin_euclid: 2.0
eval_margin: 0.005
embedding_vectors: 5
embed_dec: 30
difference_module: 'learnable'
learnable_difference_modules: 4
Parameter |
Datatype |
Default |
Description |
Supported Values |
backbone |
Dict
string
bool
bool
|
None
None
False
False
|
A dictionary containing the following configurable parameters for VisualChangeNet-Classification backbone:
*
type : The name of the backbone to be used*
pretrained_backbone_path : The path to pre-trained backbone weights file*
freeze_backbone : Whether to freeze the backbone weights during training*
feat_downsample : Whether to downsample the last feature map in FAN backbone configurations |
fan_tiny_8_p4_hybrid
fan_large_16_p4_hybrid
fan_small_12_p4_hybrid
fan_base_16_p4_hybrid
vit_large_nvdinov2
|
decode_head |
Dict
bool
list
Dict
list
|
None
False
[4, 8, 16, 16]
256
|
A dictionary containing the following configurable parameters for the decoder:
*
align_corners : If set to True, the input and output tensors are aligned by the center points of their corner pixels, preserving the values at the corner pixels.*
feature_strides : The downsampling feature strides for different backbones.*
decoder_params : Contains the following network parameters:*
embed_dims : The embedding dimensions |
True, False
|
classify |
Dict
string
|
None
2.0
5
30
learnable
4
|
A dictionary containing the following configurable parameters for VisualChangeNet-Classification model:
*
train_margin_euclid : The training margin threshold for contrastive learning (applicable for Architecture 1)*
eval_margin : The evaluation margin threshold*
embedding_vectors : The output embedding dimension for each input image before computing Euclidean distance (applicable for Architecture 1)*
embed_dec : The transformer decoder MLP embedding dimension (applicable for Architecture 2)*
difference_module : The type of difference module used (applicable for both Achitectures)*
learnable_difference_modules : The number of learnable difference modules (applicable for Architecture 2) |
>0
>0
>0
>0
Euclidean, learnable
<4
|
Dataset#
The dataset
parameter defines the dataset source, training batch size,
augmentation, and pre-processing. An example dataset
is provided below.
dataset:
classify:
train_dataset:
csv_path: /path/to/train.csv
images_dir: /path/to/img_dir
validation_dataset:
csv_path: /path/to/val.csv
images_dir: /path/to/img_dir
test_dataset:
csv_path: /path/to/test.csv
images_dir: /path/to/img_dir
infer_dataset:
csv_path: /path/to/infer.csv
images_dir: /path/to/img_dir
image_ext: .jpg
batch_size: 16
workers: 2
fpratio_sampling: 0.2
num_input: 4
input_map:
LowAngleLight: 0
SolderLight: 1
UniformLight: 2
WhiteLight: 3
concat_type: linear
grid_map:
x: 2
y: 2
image_width: 128
image_height: 128
augmentation_config:
rgb_input_mean: [0.485, 0.456, 0.406]
rgb_input_std: [0.229, 0.224, 0.225]
num_classes: 2
* See the Dataset Annotation Format definition for more information about specifying lighting conditions.
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
Dict |
– |
The |
|
|
Dict |
– |
The |
classify#
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
Dict |
– |
The paths to the image directory and CSV files for the training dataset |
|
|
Dict |
– |
The paths to the image directory and CSV files for the validation dataset |
|
|
Dict |
– |
The paths to the image directory and CSV files for the test dataset |
|
|
Dict |
– |
The paths to the image directory and CSV files for the inference dataset |
|
|
str |
.jpg |
The file extension of the images in the dataset |
string |
|
int |
32 |
The number of samples per batch |
string |
|
int |
8 |
The number of worker processes for data loading |
|
|
int |
0.1 |
The ratio of false-positive examples to sample |
>0 |
|
int |
4 |
The number of lighting conditions for each input image* |
>0 |
|
Dict |
– |
The mapping of lighting conditions to indices specifying concatenation ordering* |
|
|
string |
linear |
Type of concatenation to use for different image lighting conditions |
linear, grid |
grid_map |
Dict
Dict
dict config
|
None
None
None
|
The parameters to define the grid dimensions to concatenate images as a grid:
* x: The number of images along the x-axis
* y: The number of images along the y-axis
|
Dict
|
|
int |
100 |
The width of the input image |
>0 |
|
int |
100 |
The height of the input image |
>0 |
|
int |
2 |
The number of classes in the dataset |
>1 |
|
Dict |
None |
Dictionary containing various data augmentation settings, which is detailed in the augmentation section. |
augmentation_config#
Parameter |
Datatype |
Default |
Description |
Supported Values |
random_flip |
Dict
float
float
enable
|
None
0.5
0.5
True
|
Random vertical and horizontal flipping augmentation settings.
*
vflip_probability : Probability of vertical flipping.*
hflip_probability : Probability of horizontal flipping.*
enable : Enable or disable random flipping augmentation. |
>=0.0
>=0.0
|
random_rotate |
Dict
float
list
enable
|
None
0.5
[90, 180, 270]
True
|
Randomly rotate images with specified probability and angles
*
rotate_probability : Probability of applying random rotation.*
angle_list : List of rotation angles to choose from.*
enable : Enable or disable random rotation augmentation. |
>=0.0
>=0.0
|
random_color |
Dict
float
float
float
float
enable
float
|
None
0.3
0.3
0.3
0.3
True
0.5
|
Apply random color augmentation to images.
*
brightness : Maximum brightness change factor.*
contrast : Maximum contrast change factor.*
saturation : Maximum saturation change factor.*
hue : Maximum hue change factor.*
enabled : Enable or disable random color augmentation.*
color_probability : Probability of applying color augmentation. |
>=0.0
>=0.0
>=0.0
>=0.0
>=0.0
|
|
bool |
True |
Apply random crop augmentation. |
True, False |
|
bool |
True |
Apply random blurring augmentation. |
True, False |
|
List[float] |
[0.485, 0.456, 0.406] |
The mean to be subtracted for pre-processing. |
|
|
List[float] |
[0.229, 0.224, 0.225] |
The standard deviation to divide the image by. |
|
|
bool |
False |
Flag to indicate whether to apply data augmentations |
True, False |
Example spec File for ViT Backbones#
Note
The following spec file is only relevant for TAO versions 5.3 and later.
encryption_key: tlt_encode
task: classify
train:
pretrained_model_path: /path/to/pretrained/model.pth
resume_training_checkpoint_path: null
classify:
loss: "contrastive"
cls_weight: [1.0, 10.0]
num_epochs: 10
num_nodes: 1
validation_interval: 5
checkpoint_interval: 5
seed: 1234
optim:
lr: 0.0001
optim: "adamw"
policy: "linear"
momentum: 0.9
weight_decay: 0.01
results_dir: "${results_dir}/train"
tensorboard:
enabled: True
results_dir: /path/to/experiment_results
model:
backbone:
type: "vit_large_nvdinov2"
pretrained_backbone_path: /path/to/pretrained/backbone.pth
freeze_backbone: False
decode_head:
feature_strides: [4, 8, 16, 32]
classify:
train_margin_euclid: 2.0
eval_margin: 0.005
embedding_vectors: 5
embed_dec: 30
difference_module: 'euclidean'
learnable_difference_modules: 4
dataset:
classify:
train_dataset:
csv_path: /path/to/train.csv
images_dir: /path/to/img_dir
validation_dataset:
csv_path: /path/to/val.csv
images_dir: /path/to/img_dir
test_dataset:
csv_path: /path/to/test.csv
images_dir: /path/to/img_dir
infer_dataset:
csv_path: /path/to/infer.csv
images_dir: /path/to/img_dir
image_ext: .jpg
batch_size: 16
workers: 2
fpratio_sampling: 0.2
num_input: 4
input_map:
LowAngleLight: 0
SolderLight: 1
UniformLight: 2
WhiteLight: 3
concat_type: grid
grid_map:
x: 2
y: 2
image_width: 112
image_height: 112
augmentation_config:
rgb_input_mean: [0.485, 0.456, 0.406]
rgb_input_std: [0.229, 0.224, 0.225]
num_classes: 2
evaluate:
checkpoint: "???"
inference:
checkpoint: "???"
export:
gpu_id: 0
checkpoint: "???"
onnx_file: "???"
input_width: 224
input_height: 224
Training the Model#
Use the following command to run VisualChangeNet-Classification training:
tao model visual_changenet train [-h] -e <experiment_spec>
task=classify
[results_dir=<global_results_dir>]
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[train.<train_option>=<train_option_value>]
[train.gpu_ids=<gpu indices>]
[train.num_gpus=<number of gpus>]
Required Arguments#
-e, --experiment_spec_file
: The path to the experiment spec file.task
: The task (‘segment’ or ‘classify’) for the visual_changenet training. Default: segment.
Optional Arguments#
You can set optional arguments to override the option values in the experiment spec file.
-h, --help
: Show this help message and exit.model.<model_option>
: The model options.dataset.<dataset_option>
: The dataset options.train.<train_option>
: The train options.train.optim.<optim_option>
: The optimizer options
Note
For training, evaluation, and inference, we expose 2 variables for each respective task: num_gpus
and gpu_ids
, which
default to 1
and [0]
, respectively. If both are passed, but inconsistent, for example num_gpus = 1
,
gpu_ids = [0, 1]
, then they are modified to follow the setting with more GPUs, for example num_gpus = 1 -> num_gpus = 2
.
Checkpointing and Resuming Training#
At every train.checkpoint_interval
, a PyTorch Lightning checkpoint is saved. It is called model_epoch_<epoch_num>.pth
.
These are saved in train.results_dir
, like so:
$ ls /results/train
'model_epoch_000.pth'
'model_epoch_001.pth'
'model_epoch_002.pth'
'model_epoch_003.pth'
'model_epoch_004.pth'
The latest checkpoint is also saved as changenet_model_classify_latest.pth
.
Training automatically resumes from changenet_model_classify_latest.pth
, if it exists in train.results_dir
.
This is superseded by train.resume_training_checkpoint_path
, if it is provided.
The major implication of this logic is that, if you wish to trigger fresh training from scratch, either:
Specify a new, empty results directory (Recommended)
Remove the latest checkpoint from the results directory
Creating Testing Experiment Spec File#
Here is an example spec file for testing evaluation and inference of a trained VisualChangeNet-Classification model.
results_dir: /path/to/experiment_results
task: classify
model:
backbone:
type: "fan_small_12_p4_hybrid"
classify:
eval_margin: 0.005
dataset:
classify:
test_dataset:
csv_path: /path/to/test.csv
images_dir: /path/to/img_dir
infer_dataset:
csv_path: /path/to/infer.csv
images_dir: /path/to/img_dir
image_ext: .jpg
batch_size: 16
workers: 2
num_input: 4
input_map:
LowAngleLight: 0
SolderLight: 1
UniformLight: 2
WhiteLight: 3
concat_type: linear
grid_map:
x: 2
y: 2
output_shape:
- 128
- 128
augmentation_config:
rgb_input_mean: [0.485, 0.456, 0.406]
rgb_input_std: [0.229, 0.224, 0.225]
num_classes: 2
evaluate:
checkpoint: /path/to/checkpoint
results_dir: /results/evaluate
inference:
checkpoint: /path/to/checkpoint
results_dir: /results/inference
Inference/Evaluate#
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
string |
Path to PyTorch model to evaluate/inference |
||
|
string |
Path to TensorRT model to inference/evaluate. Should be only used with TAO Deploy |
||
|
unsigned int |
1 |
The number of GPUs to use |
>0 |
|
unsigned int |
[0] |
The GPU ids to use |
|
|
string |
The path to a folder where the experiment outputs should be written |
||
|
unsigned int |
1 |
Number of batches after which to save inference/evaluate visualization results |
>0 |
|
unsigned int |
The batch size of inference/evaluate |
Evaluating the Model#
Use the following command to run VisualChangeNet-Classification evaluation:
tao model visual_changenet evaluate [-h] -e <experiment_spec_file>
task=classify
evaluate.checkpoint=<model to be evaluated>
[evaluate.<evaluate_option>=<evaluate_option_value>]
[evaluate.gpu_ids=<gpu indices>]
[evaluate.num_gpus=<number of gpus>]
Multi-GPU evaluation is currently not supported for Visual ChangeNet Classify.
Required Arguments#
-e, --experiment_spec_file
: The experiment spec file to set up the evaluation experiment.evaluate.checkpoint
: The.pth
model to be evaluated.
Optional Arguments#
evaluate.<evaluate_option>
: The evaluate options.
Running Inference on the Model#
Use the following command to run inference on VisualChangeNet-Classification with the .pth
model:
tao model visual_changenet inference [-h] -e <experiment_spec_file>
task=classify
inference.checkpoint=<inference model>
[inference.<evaluate_option>=<evaluate_option_value>]
[inference.gpu_ids=<gpu indices>]
[inference.num_gpus=<number of gpus>]
Required Arguments#
-e, --experiment_spec_file
: The experiment spec file to set up the evaluation experiment.inference.checkpoint
: The.pth
model to run inference on.
Optional Arguments#
inference.<inference_option>
: The inference options.
Exporting the Model#
Here is an example spec file for exporting the trained VisualChangeNet model:
export:
checkpoint: /path/to/model.pth
onnx_file: /path/to/model.onnx
opset_version: 12
input_channel: 3
input_width: 128
input_height: 512
batch_size: -1
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
string |
The path to the PyTorch model to export |
||
|
string |
The path to the |
||
|
unsigned int |
12 |
The opset version of the exported ONNX |
>0 |
|
unsigned int |
3 |
The input channel size. Only the value 3 is supported. |
3 |
|
unsigned int |
128 |
The input width |
>0 |
|
unsigned int |
512 |
The input height |
>0 |
|
unsigned int |
-1 |
The batch size of the ONNX model. If this value is set to -1, the export uses dynamic batch size. |
>=-1 |
|
unsigned int |
0 |
The GPU id to use |
|
|
bool |
False |
Whether to export on cpu |
|
|
bool |
False |
Print out a human-readable representation of the network |
Use the following command to export the model:
tao model visual_changenet export [-h] -e <experiment spec file>
task=classify
export.checkpoint=<model to export>
export.onnx_file=<onnx path>
[export.<export_option>=<export_option_value>]
Required Arguments#
-e, --experiment_spec
: The path to an experiment spec fileexport.checkpoint
: The.pth
model to export.export.onnx_file
: The path where the.etlt
or.onnx
model is saved.
Optional Arguments#
export.<export_option>
: The export options.
TensorRT Engine Generation, Validation, and int8 Calibration#
For deployment, refer to the TAO Deploy Documentation for VisualChangeNet-Classification.