Visual ChangeNet-Classification
Visual ChangeNet-Classification is an NVIDIA-developed classification change detection model and is included in the TAO Toolkit. Visual ChangeNet supports the following tasks:
train
evaluate
inference
export
These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:
tao model visual_changenet <sub_task> <args_per_subtask>
Where args_per_subtask
are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.
VisualChangeNet-Classification requires the data to be provided as image and CSV files. See the Data Annotation Format page for more information about the input data format for VisualChangeNet-Classification, which follows the same input data format as Optical Inspection.
Configuring a Custom Dataset
This section provides example configuration and commands for training VisualChangeNet-Classification using the dataset format described above.
Here is an example spec file for training a VisualChangeNet-Classification model with NVIDIA’s FAN Hybrid backbone using the Data Annotation Format.
encryption_key: tlt_encode
task: classify
train:
pretrained_model_path: /path/to/pretrained/model.pth
resume_training_checkpoint_path: null
classify:
loss: "ce"
cls_weight: [1.0, 10.0]
num_epochs: 350
num_nodes: 1
val_interval: 1
checkpoint_interval: 1
optim:
lr: 0.0001
optim: "adamw"
policy: "linear"
momentum: 0.9
weight_decay: 0.01
results_dir: "${results_dir}/train"
tensorboard:
enabled: True
results_dir: /path/to/experiment_results
model:
backbone:
type: "fan_small_12_p4_hybrid"
pretrained_backbone_path: null
freeze_backbone: False
decode_head:
feature_strides: [4, 8, 16, 16]
classify:
train_margin_euclid: 2.0
eval_margin: 0.005
embedding_vectors: 5
embed_dec: 30
difference_module: 'learnable'
learnable_difference_modules: 4
dataset:
classify:
train_dataset:
csv_path: /path/to/train.csv
images_dir: /path/to/img_dir
validation_dataset:
csv_path: /path/to/val.csv
images_dir: /path/to/img_dir
test_dataset:
csv_path: /path/to/test.csv
images_dir: /path/to/img_dir
infer_dataset:
csv_path: /path/to/infer.csv
images_dir: /path/to/img_dir
image_ext: .jpg
batch_size: 16
workers: 2
fpratio_sampling: 0.2
num_input: 4
input_map:
LowAngleLight: 0
SolderLight: 1
UniformLight: 2
WhiteLight: 3
concat_type: linear
grid_map:
x: 2
y: 2
output_shape:
- 128
- 128
augmentation_config:
rgb_input_mean: [0.485, 0.456, 0.406]
rgb_input_std: [0.229, 0.224, 0.225]
num_classes: 2
evaluate:
checkpoint: "???"
inference:
checkpoint: "???"
export:
gpu_id: 0
checkpoint: "???"
onnx_file: "???"
input_width: 128
input_height: 512
Parameter | Data Type | Default | Description |
model |
dict config | – | The configuration of the model architecture |
dataset |
dict config | – | The configuration for the dataset detailed in the Config section |
train |
dict config | – | The configuration for training parameters, which is detailed in the Train section |
results_dir |
string | – | The path to save the model experiment log outputs and model checkpoints |
task |
str | classify | A flag to indicate the change detection task. Supports two tasks: ‘segment’ and ‘classify’ for segmentation and classification |
results_dir |
string | – | The path to save the model training experiment log outputs and model checkpoints | |
checkpoint_interval |
int | 1 | The interval at which the checkpoint needs to be saved | |
resume_training_checkpoint_path |
str | None | The path to the checkpoint for resuming training | |
|
Dict
str |
None ce |
|
The
* |
|
Dict
str |
None
ce |
|
The
* |
num_nodes |
unsigned int | 1 | The number of nodes. If the value is larger than 1, multi-node is enabled. | |
val_interval |
unsigned int | 1 | The epoch interval at which the validation is run | |
checkpoint_interval |
int | 1 | The number of steps at which the checkpoint needs to be saved | |
num_epochs |
int | 50 | The total number of epochs to run the experiment | |
pretrained_model_path |
string | – | The path to the pretrained model checkpoint to initialize the end-end model weights. | |
|
dict config |
None |
|
Contains the configurable parameters for the VisualChangeNet optimizer detailed in |
|
dict config |
None |
|
Enable TensorBoard visualisation using a dict with configurable parameters: |
optim
optim:
lr: 0.0001
optim: "adamw"
policy: "linear"
momentum: 0.9
weight_decay: 0.01
Parameter | Datatype | Default | Description | Supported Values |
lr |
float | 0.0005 | The learning rate | >=0.0 |
optim |
str | adamw | ||
|
str |
linear |
The learning scheduler: |
linear/step |
momentum |
float | 0.9 | The momentum for the AdamW optimizer | |
weight_decay |
float | 0.1 | The weight decay coefficient | |
monitor_name |
str | val_loss | The name of the monitor used for saving the top-k checkpoints. |
The following example model
config provides options to change the VisualChangeNet-Classification architecture for training.
VisualChangeNet-Classification supports two model architectures. Architecture 1 leverages only the last feature maps from the
FAN backbone using Euclidean difference to perform contrastive learning. Architecture 2 leverages the VisualChangeNet-Classification learnable
difference modules for 4 different features at 3 feature resolutions to minimize Cross-Entropy loss.
model:
backbone:
type: "fan_small_12_p4_hybrid"
pretrained_backbone_path: null
freeze_backbone: False
decode_head:
feature_strides: [4, 8, 16, 16]
align_corner: False
classify:
train_margin_euclid: 2.0
eval_margin: 0.005
embedding_vectors: 5
embed_dec: 30
difference_module: 'learnable'
learnable_difference_modules: 4
Parameter | Datatype | Default | Description | Supported Values | |||||
|
|
|
Dict
bool |
|
None
None |
|
A dictionary containing the following configurable parameters for VisualChangeNet-Classification backbone:
* |
|
fan_tiny_8_p4_hybrid |
|
|
|
Dict
bool |
|
None
False 256 |
|
A dictionary containing the following configurable parameters for the decoder:
*
* |
|
True, False |
|
|
|
Dict |
|
None
5 |
|
A dictionary containing the following configurable parameters for VisualChangeNet-Classification model: |
|
>0 |
The dataset
parameter defines the dataset source, training batch size,
augmentation, and pre-processing. An example dataset
is provided below.
dataset:
classify:
train_dataset:
csv_path: /path/to/train.csv
images_dir: /path/to/img_dir
validation_dataset:
csv_path: /path/to/val.csv
images_dir: /path/to/img_dir
test_dataset:
csv_path: /path/to/test.csv
images_dir: /path/to/img_dir
infer_dataset:
csv_path: /path/to/infer.csv
images_dir: /path/to/img_dir
image_ext: .jpg
batch_size: 16
workers: 2
fpratio_sampling: 0.2
num_input: 4
input_map:
LowAngleLight: 0
SolderLight: 1
UniformLight: 2
WhiteLight: 3
concat_type: linear
grid_map:
x: 2
y: 2
output_shape:
- 128
- 128
augmentation_config:
rgb_input_mean: [0.485, 0.456, 0.406]
rgb_input_std: [0.229, 0.224, 0.225]
num_classes: 2
color_map:
'0': [255, 255, 255]
'1': [0, 0, 0]
* See the Dataset Annotation Format definition for more information about specifying lighting conditions.
Parameter | Datatype | Default | Description | Supported Values |
segment |
Dict | – | The segment contains dataset config for the segmentation dataloader |
|
classify |
Dict | – | The classify contains dataset config for the classification dataloader detailed in the classify section. |
classify
Parameter | Datatype | Default | Description | Supported Values |
train_dataset |
Dict | – | The paths to the image directory and CSV files for the training dataset | |
validation_dataset |
Dict | – | The paths to the image directory and CSV files for the validation dataset | |
test_dataset |
Dict | – | The paths to the image directory and CSV files for the test dataset | |
infer_dataset |
Dict | – | The paths to the image directory and CSV files for the inference dataset | |
image_ext |
str | .jpg | The file extension of the images in the dataset | string |
batch_size |
int | 32 | The number of samples per batch | string |
workers |
int | 8 | The number of worker processes for data loading | |
fpratio_sampling |
int | 0.1 | The ratio of false-positive examples to sample | >0 |
num_input |
int | 4 | The number of lighting conditions for each input image* | >0 |
input_map |
Dict | – | The mapping of lighting conditions to indices specifying concatenation ordering* | |
concat_type |
string | linear | Type of concatenation to use for different image lighting conditions | linear, grid |
|
Dict Dict dict config |
None None None |
The parameters to define the grid dimensions to concatenate images as a grid: * x: The number of images along the x-axis * y: The number of images along the y-axis |
Dict |
output_shape |
List[int] | [100, 100] | Image resolution of each lighting condition to be reshaped before concatenation | >=0 |
|
Dict List[float] List[float] |
None [0.485, 0.456, 0.406] [0.229, 0.224, 0.225] |
The image normalization config, which contains the following parameters:
*
* |
>=0.0 >=0.0 |
num_classes |
int | 2 | The number of classes in the dataset | >1 |
segment
Parameter | Datatype | Default | Description | Supported Values |
root_dir |
str | – | The root directory containing the dataset. | |
label_transform |
str | ‘norm’ | The label transformation applied to normalize the label images to represent class IDs. | ‘norm’, None |
data_name |
str | ‘LEVIR’ | The name of the dataset. | |
dataset |
str | ‘CNDataset’ | The name of the dataset class. | ‘CNDataset’ |
multi_scale_train |
bool | True | Whether to use multi-scale training. | True, False |
multi_scale_infer |
bool | False | Whether to use multi-scale inference. | True, False |
num_classes |
int | 2 | The number of classes in the dataset. | >0 |
img_size |
int | 256 | The size of input images. | >0 |
batch_size |
int | 8 | The number of samples per batch. | |
workers |
int | 2 | The number of worker processes for data loading. | |
shuffle |
bool | True | Whether to shuffle the dataset during loading. | |
image_folder_name |
str | ‘A’ | The name of the folder containing images. | |
change_image_folder_name |
str | ‘B’ | The name of the folder containing changed images. | |
list_folder_name |
str | ‘list’ | The name of the folder containing dataset split lists. | |
annotation_folder_name |
str | ‘label’ | The name of the folder containing annotations. | |
|
Dict
List[float] |
None
[0.5, 0.5, 0.5] |
The image augmentation config, which contains the following parameters:
* |
>=0.0 |
train_split |
str | ‘train’ | The name of the training split. | |
validation_split |
str | ‘val’ | The name of the validation split. | |
test_split |
str | ‘test’ | The name of the test split. | |
predict_split |
str | ‘test’ | The name of the prediction split. | |
label_suffix |
str | ‘.png’ | The suffix for label files. | |
color_map |
Optional[Dict[str, List[int]]] | None | Mapping of string class labels (‘0’ to ‘n’) to rgb color codes. |
Example spec file for ViT backbones
The following spec file is only relevant for TAO Toolkit versions 5.3 and later.
encryption_key: tlt_encode
task: classify
train:
pretrained_model_path: /path/to/pretrained/model.pth
resume_training_checkpoint_path: null
classify:
loss: "contrastive"
cls_weight: [1.0, 10.0]
num_epochs: 350
num_nodes: 1
val_interval: 1
checkpoint_interval: 1
optim:
lr: 0.0001
optim: "adamw"
policy: "linear"
momentum: 0.9
weight_decay: 0.01
results_dir: "${results_dir}/train"
tensorboard:
enabled: True
results_dir: /path/to/experiment_results
model:
backbone:
type: "vit_large_nvdinov2"
pretrained_backbone_path: /path/to/pretrained/backbone.pth
freeze_backbone: False
decode_head:
feature_strides: [4, 8, 16, 32]
classify:
train_margin_euclid: 2.0
eval_margin: 0.005
embedding_vectors: 5
embed_dec: 30
difference_module: 'euclidean'
learnable_difference_modules: 4
dataset:
classify:
train_dataset:
csv_path: /path/to/train.csv
images_dir: /path/to/img_dir
validation_dataset:
csv_path: /path/to/val.csv
images_dir: /path/to/img_dir
test_dataset:
csv_path: /path/to/test.csv
images_dir: /path/to/img_dir
infer_dataset:
csv_path: /path/to/infer.csv
images_dir: /path/to/img_dir
image_ext: .jpg
batch_size: 16
workers: 2
fpratio_sampling: 0.2
num_input: 4
input_map:
LowAngleLight: 0
SolderLight: 1
UniformLight: 2
WhiteLight: 3
concat_type: grid
grid_map:
x: 2
y: 2
output_shape:
- 112
- 112
augmentation_config:
rgb_input_mean: [0.485, 0.456, 0.406]
rgb_input_std: [0.229, 0.224, 0.225]
num_classes: 2
evaluate:
checkpoint: "???"
inference:
checkpoint: "???"
export:
gpu_id: 0
checkpoint: "???"
onnx_file: "???"
input_width: 224
input_height: 224
Use the following command to run VisualChangeNet-Classification training:
tao model visual_changenet train -e <experiment_spec_file>
-r <results_dir>
--gpus <num_gpus>
task=classify
Required Arguments
-e, --experiment_spec_file
: The path to the experiment spec file.-r, --results_dir
: The path to a folder where the experiment outputs should be written.task
: The task (‘segment’ or ‘classify’) for the visual_changenet training. Default: segment.
Optional Arguments
--gpus
: The number of GPUs to use for training. The default value is 1.
Here’s an example of using the VisualChangeNet training command:
tao model visual_changenet train -e $DEFAULT_SPEC -r $RESULTS_DIR --gpus $NUM_GPUs
Here is an example spec file for testing evaluation and inference of a trained VisualChangeNet-Classification model.
results_dir: /path/to/experiment_results
task: classify
model:
backbone:
type: "fan_small_12_p4_hybrid"
classify:
eval_margin: 0.005
dataset:
classify:
test_dataset:
csv_path: /path/to/test.csv
images_dir: /path/to/img_dir
infer_dataset:
csv_path: /path/to/infer.csv
images_dir: /path/to/img_dir
image_ext: .jpg
batch_size: 16
workers: 2
num_input: 4
input_map:
LowAngleLight: 0
SolderLight: 1
UniformLight: 2
WhiteLight: 3
concat_type: linear
grid_map:
x: 2
y: 2
output_shape:
- 128
- 128
augmentation_config:
rgb_input_mean: [0.485, 0.456, 0.406]
rgb_input_std: [0.229, 0.224, 0.225]
num_classes: 2
evaluate:
checkpoint: /path/to/checkpoint
inference:
checkpoint: /path/to/checkpoint
Inference/Evaluate
Parameter | Datatype | Default | Description | Supported Values |
checkpoint |
string | Path to PyTorch model to evaluate/inference | ||
trt_engine |
string | Path to TensorRT model to inference/evaluate. Should be only used with TAO Deploy | ||
num_gpus |
unsigned int | 1 | The number of GPUs to use | >0 |
results_dir |
string | The path to a folder where the experiment outputs should be written | ||
vis_after_n_batches |
unsigned int | 1 | Number of batches after which to save inference/evaluate visualization results | >0 |
batch_size |
unsigned int | The batch size of inference/evaluate | ||
gpu_id |
unsigned int | 0 | The GPU id to use |
Use the following command to run VisualChangeNet-Classification evaluation:
tao model visual_changenet evaluate -e <experiment_spec>
-r <results_dir>
task=classify
model.classify.eval_margin=0.5s
Required Arguments
-e, --experiment_spec_file
: The experiment spec file to set up the evaluation experiment.-r, --results_dir
: The path to a folder where the experiment outputs should be written.
Here’s an example of using the VisualChangeNet evaluation command:
tao model visual_changenet evaluate -e $DEFAULT_SPEC -r $RESULTS_DIR
Use the following command to run inference on VisualChangeNet-Classification with the .tlt
model:
tao model visual_changenet inference -e <experiment_spec>
-r <results_dir>
task=classify
model.classify.eval_margin=0.5
Required Arguments
-e, --experiment_spec_file
: The experiment spec file to set up the evaluation experiment.-r, --results_dir
: The path to a folder where the experiment outputs should be written.
Here’s an example of using the VisualChangeNet inference command:
tao model visual_changenet inference -e $DEFAULT_SPEC -r $RESULTS_DIR
Here is an example spec file for exporting the trained VisualChangeNet model:
export:
checkpoint: /path/to/model.pth
onnx_file: /path/to/model.onnx
opset_version: 12
input_channel: 3
input_width: 128
input_height: 512
batch_size: -1
Parameter | Datatype | Default | Description | Supported Values |
checkpoint |
string | The path to the PyTorch model to export | ||
onnx_file |
string | The path to the .onnx file |
||
opset_version |
unsigned int | 12 | The opset version of the exported ONNX | >0 |
input_channel |
unsigned int | 3 | The input channel size. Only the value 3 is supported. | 3 |
input_width |
unsigned int | 128 | The input width | >0 |
input_height |
unsigned int | 512 | The input height | >0 |
batch_size |
unsigned int | -1 | The batch size of the ONNX model. If this value is set to -1, the export uses dynamic batch size. | >=-1 |
gpu_id |
unsigned int | 0 | The GPU id to use | |
on_cpu |
bool | False | Whether to export on cpu | |
verbose |
bool | False | Print out a human-readable representation of the network |
Use the following command to export the model:
tao model visual_changenet export [-h] -e <experiment spec file>
-r <results_dir>
task=classify
Required Arguments
-e, --experiment_spec_file
: The experiment spec file to set up the evaluation experiment.-r, --results_dir
: The path to a folder where the experiment outputs should be written.
Sample Usage
The following is an example export
command:
tao model visual_changenet export -e /path/to/spec.yaml -r $RESULTS_DIR
For deployment, refer to the TAO Deploy Documentation for VisualChangeNet-Classification.