TAO v5.5.0
NVIDIA TAO v5.5.0

Visual ChangeNet-Classification

Visual ChangeNet-Classification is an NVIDIA-developed classification change detection model and is included in the TAO. Visual ChangeNet supports the following tasks:

  • train

  • evaluate

  • inference

  • export

These tasks can be invoked from the TAO Launcher using the following convention on the command-line:

Copy
Copied!
            

tao model visual_changenet <sub_task> <args_per_subtask>

Where args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.

VisualChangeNet-Classification requires the data to be provided as image and CSV files. See the Data Annotation Format page for more information about the input data format for VisualChangeNet-Classification, which follows the same input data format as Optical Inspection.

Configuring a Custom Dataset

This section provides example configuration and commands for training VisualChangeNet-Classification using the dataset format described above.

Here is an example spec file for training a VisualChangeNet-Classification model with NVIDIA’s FAN Hybrid backbone using the Data Annotation Format.

Copy
Copied!
            

encryption_key: tlt_encode task: classify train: pretrained_model_path: /path/to/pretrained/model.pth resume_training_checkpoint_path: null classify: loss: "ce" cls_weight: [1.0, 10.0] num_epochs: 10 num_nodes: 1 validation_interval: 5 checkpoint_interval: 5 seed: 1234 optim: lr: 0.0001 optim: "adamw" policy: "linear" momentum: 0.9 weight_decay: 0.01 results_dir: "${results_dir}/train" tensorboard: enabled: True results_dir: /path/to/experiment_results model: backbone: type: "fan_small_12_p4_hybrid" pretrained_backbone_path: null freeze_backbone: False decode_head: feature_strides: [4, 8, 16, 16] classify: train_margin_euclid: 2.0 eval_margin: 0.005 embedding_vectors: 5 embed_dec: 30 difference_module: 'learnable' learnable_difference_modules: 4 dataset: classify: train_dataset: csv_path: /path/to/train.csv images_dir: /path/to/img_dir validation_dataset: csv_path: /path/to/val.csv images_dir: /path/to/img_dir test_dataset: csv_path: /path/to/test.csv images_dir: /path/to/img_dir infer_dataset: csv_path: /path/to/infer.csv images_dir: /path/to/img_dir image_ext: .jpg batch_size: 16 workers: 2 fpratio_sampling: 0.2 num_input: 4 input_map: LowAngleLight: 0 SolderLight: 1 UniformLight: 2 WhiteLight: 3 concat_type: linear grid_map: x: 2 y: 2 image_width: 128 image_height: 128 augmentation_config: rgb_input_mean: [0.485, 0.456, 0.406] rgb_input_std: [0.229, 0.224, 0.225] num_classes: 2 evaluate: checkpoint: "???" inference: checkpoint: "???" export: gpu_id: 0 checkpoint: "???" onnx_file: "???" input_width: 128 input_height: 512

Parameter Data Type Default Description Supported Values
model dict config The configuration of the model architecture
dataset dict config The configuration of the dataset
train dict config The configuration of the training task
evaluate dict config The configuration of the evaluation task
inference dict config The configuration of the inference task
encryption_key string None The encryption key to encrypt and decrypt model files
results_dir string /results The directory where experiment results are saved
export dict config The configuration of the ONNX export task
task str classify A flag to indicate the change detection task. Supports two tasks: ‘segment’ and ‘classify’ for segmentation and classification
Parameter Datatype Default Description Supported Values
num_gpus unsigned int 1 The number of GPUs to use for distributed training >0
gpu_ids List[int] [0] The indices of the GPU’s to use for distributed training
seed unsigned int 1234 The random seed for random, NumPy, and torch >0
num_epochs unsigned int 10 The total number of epochs to run the experiment >0
checkpoint_interval unsigned int 1 The epoch interval at which the checkpoints are saved >0
validation_interval unsigned int 1 The epoch interval at which the validation is run >0
resume_training_checkpoint_path string The intermediate PyTorch Lightning checkpoint to resume training from
results_dir string /results/train The directory to save training results

classify

Dict

str
list

None

ce

The classify dict contains configurable parameters for the VisualChangeNet Classification pipeline with the following parameters:

* loss: The loss function used for classification training.
* cls_weights: Weights for Cross-Entropy Loss for unbalanced dataset distributions.

segment

Dict

str
list

None

ce
[0.5, 0.5, 0.5, 0.8, 1.0]

The segment dict contains configurable parameters for the VisualChangeNet Segmentation pipeline with the following parameters:

* loss: The loss function used for segmentation training.
* weights: List of weights used for calculating the multi-scale segmentation loss during training when multi_scale_train is “True”

num_nodes unsigned int 1 The number of nodes. If the value is larger than 1, multi-node is enabled.
pretrained_model_path string The path to the pretrained model checkpoint to initialize the end-end model weights.

optim

dict config

None

Contains the configurable parameters for the VisualChangeNet optimizer detailed in
the optim section.

tensorboard

dict config

None
True

Enable TensorBoard visualisation using a dict with configurable parameters:
* enabled: Flag to enabled TensorBoard.

optim

Copy
Copied!
            

optim: lr: 0.0001 optim: "adamw" policy: "linear" momentum: 0.9 weight_decay: 0.01

Parameter Datatype Default Description Supported Values
lr float 0.0005 The learning rate >=0.0
optim str adamw

policy

str

linear

The learning scheduler:
* linear : LambdaLR decreases the lr by a multiplicative factor.
* step : StepLR decrease the lr by 0.1 at every num_epochs//3

linear/step

momentum float 0.9 The momentum for the AdamW optimizer
weight_decay float 0.1 The weight decay coefficient
monitor_name str val_loss The name of the monitor used for saving the top-k checkpoints.

The following example model config provides options to change the VisualChangeNet-Classification architecture for training. VisualChangeNet-Classification supports two model architectures. Architecture 1 leverages only the last feature maps from the FAN backbone using Euclidean difference to perform contrastive learning. Architecture 2 leverages the VisualChangeNet-Classification learnable difference modules for 4 different features at 3 feature resolutions to minimize Cross-Entropy loss.

Copy
Copied!
            

model: backbone: type: "fan_small_12_p4_hybrid" pretrained_backbone_path: null freeze_backbone: False decode_head: feature_strides: [4, 8, 16, 16] align_corner: False classify: train_margin_euclid: 2.0 eval_margin: 0.005 embedding_vectors: 5 embed_dec: 30 difference_module: 'learnable' learnable_difference_modules: 4

Parameter Datatype Default Description Supported Values

backbone

Dict
string

bool
bool

None

None
False
False

A dictionary containing the following configurable parameters for VisualChangeNet-Classification backbone:
* type: The name of the backbone to be used

* pretrained_backbone_path: The path to pre-trained backbone weights file
* freeze_backbone: Whether to freeze the backbone weights during training
* feat_downsample: Whether to downsample the last feature map in FAN backbone configurations

fan_tiny_8_p4_hybrid
fan_large_16_p4_hybrid
fan_small_12_p4_hybrid
fan_base_16_p4_hybrid
vit_large_nvdinov2

decode_head

Dict

bool
list
Dict
list

None

False
[4, 8, 16, 16]

256

A dictionary containing the following configurable parameters for the decoder:

* align_corners: If set to True, the input and output tensors are aligned by the center points of their corner pixels, preserving the values at the corner pixels.
* feature_strides: The downsampling feature strides for different backbones.
* decoder_params: Contains the following network parameters:

* embed_dims: The embedding dimensions

True, False

classify

Dict
string

None
2.0

5
30
learnable
4

A dictionary containing the following configurable parameters for VisualChangeNet-Classification model:
* train_margin_euclid: The training margin threshold for contrastive learning (applicable for Architecture 1)
* eval_margin: The evaluation margin threshold
* embedding_vectors: The output embedding dimension for each input image before computing Euclidean distance (applicable for Architecture 1)
* embed_dec: The transformer decoder MLP embedding dimension (applicable for Architecture 2)
* difference_module: The type of difference module used (applicable for both Achitectures)
* learnable_difference_modules: The number of learnable difference modules (applicable for Architecture 2)

>0
>0
>0
>0
Euclidean, learnable
<4

The dataset parameter defines the dataset source, training batch size, augmentation, and pre-processing. An example dataset is provided below.

Copy
Copied!
            

dataset: classify: train_dataset: csv_path: /path/to/train.csv images_dir: /path/to/img_dir validation_dataset: csv_path: /path/to/val.csv images_dir: /path/to/img_dir test_dataset: csv_path: /path/to/test.csv images_dir: /path/to/img_dir infer_dataset: csv_path: /path/to/infer.csv images_dir: /path/to/img_dir image_ext: .jpg batch_size: 16 workers: 2 fpratio_sampling: 0.2 num_input: 4 input_map: LowAngleLight: 0 SolderLight: 1 UniformLight: 2 WhiteLight: 3 concat_type: linear grid_map: x: 2 y: 2 image_width: 128 image_height: 128 augmentation_config: rgb_input_mean: [0.485, 0.456, 0.406] rgb_input_std: [0.229, 0.224, 0.225] num_classes: 2

* See the Dataset Annotation Format definition for more information about specifying lighting conditions.

Parameter Datatype Default Description Supported Values
segment Dict The segment contains dataset config for the segmentation dataloader
classify Dict The classify contains dataset config for the classification dataloader detailed in the classify section.

classify

Parameter Datatype Default Description Supported Values
train_dataset Dict The paths to the image directory and CSV files for the training dataset
validation_dataset Dict The paths to the image directory and CSV files for the validation dataset
test_dataset Dict The paths to the image directory and CSV files for the test dataset
infer_dataset Dict The paths to the image directory and CSV files for the inference dataset
image_ext str .jpg The file extension of the images in the dataset string
batch_size int 32 The number of samples per batch string
workers int 8 The number of worker processes for data loading
fpratio_sampling int 0.1 The ratio of false-positive examples to sample >0
num_input int 4 The number of lighting conditions for each input image* >0
input_map Dict The mapping of lighting conditions to indices specifying concatenation ordering*
concat_type string linear Type of concatenation to use for different image lighting conditions linear, grid

grid_map

Dict

Dict

dict config

None

None

None

The parameters to define the grid dimensions to concatenate images as a grid:

* x: The number of images along the x-axis

* y: The number of images along the y-axis

Dict

input_width int 100 The width of the input image >0
input_height int 100 The height of the input image >0
num_classes int 2 The number of classes in the dataset >1
augmentation_config Dict None Dictionary containing various data augmentation settings, which is detailed in the augmentation section.

augmentation_config

Parameter Datatype Default Description Supported Values

random_flip

Dict
float
float
enable

None
0.5
0.5
True

Random vertical and horizontal flipping augmentation settings.
* vflip_probability: Probability of vertical flipping.
* hflip_probability: Probability of horizontal flipping.
* enable: Enable or disable random flipping augmentation.

>=0.0
>=0.0

random_rotate

Dict
float
list
enable

None
0.5
[90, 180, 270]
True

Randomly rotate images with specified probability and angles
* rotate_probability: Probability of applying random rotation.
* angle_list: List of rotation angles to choose from.
* enable: Enable or disable random rotation augmentation.

>=0.0
>=0.0

random_color

Dict
float
float
float
float
enable
float

None
0.3
0.3
0.3
0.3
True
0.5

Apply random color augmentation to images.
* brightness: Maximum brightness change factor.
* contrast: Maximum contrast change factor.
* saturation: Maximum saturation change factor.
* hue: Maximum hue change factor.
* enabled: Enable or disable random color augmentation.
* color_probability: Probability of applying color augmentation.

>=0.0
>=0.0
>=0.0
>=0.0

>=0.0

with_random_crop bool True Apply random crop augmentation. True, False
with_random_blur bool True Apply random blurring augmentation. True, False
rgb_input_mean List[float] [0.485, 0.456, 0.406] The mean to be subtracted for pre-processing.
rgb_input_std List[float] [0.229, 0.224, 0.225] The standard deviation to divide the image by.
augment bool False Flag to indicate whether to apply data augmentations True, False

Example spec File for ViT Backbones

Note

The following spec file is only relevant for TAO versions 5.3 and later.

Copy
Copied!
            

encryption_key: tlt_encode task: classify train: pretrained_model_path: /path/to/pretrained/model.pth resume_training_checkpoint_path: null classify: loss: "contrastive" cls_weight: [1.0, 10.0] num_epochs: 10 num_nodes: 1 validation_interval: 5 checkpoint_interval: 5 seed: 1234 optim: lr: 0.0001 optim: "adamw" policy: "linear" momentum: 0.9 weight_decay: 0.01 results_dir: "${results_dir}/train" tensorboard: enabled: True results_dir: /path/to/experiment_results model: backbone: type: "vit_large_nvdinov2" pretrained_backbone_path: /path/to/pretrained/backbone.pth freeze_backbone: False decode_head: feature_strides: [4, 8, 16, 32] classify: train_margin_euclid: 2.0 eval_margin: 0.005 embedding_vectors: 5 embed_dec: 30 difference_module: 'euclidean' learnable_difference_modules: 4 dataset: classify: train_dataset: csv_path: /path/to/train.csv images_dir: /path/to/img_dir validation_dataset: csv_path: /path/to/val.csv images_dir: /path/to/img_dir test_dataset: csv_path: /path/to/test.csv images_dir: /path/to/img_dir infer_dataset: csv_path: /path/to/infer.csv images_dir: /path/to/img_dir image_ext: .jpg batch_size: 16 workers: 2 fpratio_sampling: 0.2 num_input: 4 input_map: LowAngleLight: 0 SolderLight: 1 UniformLight: 2 WhiteLight: 3 concat_type: grid grid_map: x: 2 y: 2 image_width: 112 image_height: 112 augmentation_config: rgb_input_mean: [0.485, 0.456, 0.406] rgb_input_std: [0.229, 0.224, 0.225] num_classes: 2 evaluate: checkpoint: "???" inference: checkpoint: "???" export: gpu_id: 0 checkpoint: "???" onnx_file: "???" input_width: 224 input_height: 224

Use the following command to run VisualChangeNet-Classification training:

Copy
Copied!
            

tao model visual_changenet train [-h] -e <experiment_spec> task=classify [results_dir=<global_results_dir>] [model.<model_option>=<model_option_value>] [dataset.<dataset_option>=<dataset_option_value>] [train.<train_option>=<train_option_value>] [train.gpu_ids=<gpu indices>] [train.num_gpus=<number of gpus>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file.

  • task: The task (‘segment’ or ‘classify’) for the visual_changenet training. Default: segment.

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

Note

For training, evaluation, and inference, we expose 2 variables for each respective task: num_gpus and gpu_ids, which default to 1 and [0], respectively. If both are passed, but inconsistent, for example num_gpus = 1, gpu_ids = [0, 1], then they are modified to follow the setting with more GPUs, for example num_gpus = 1 -> num_gpus = 2.

Checkpointing and Resuming Training

At every train.checkpoint_interval, a PyTorch Lightning checkpoint is saved. It is called model_epoch_<epoch_num>.pth. These are saved in train.results_dir, like so:

Copy
Copied!
            

$ ls /results/train 'model_epoch_000.pth' 'model_epoch_001.pth' 'model_epoch_002.pth' 'model_epoch_003.pth' 'model_epoch_004.pth'

The latest checkpoint is also saved as changenet_model_classify_latest.pth. Training automatically resumes from changenet_model_classify_latest.pth, if it exists in train.results_dir. This is superseded by train.resume_training_checkpoint_path, if it is provided.

The major implication of this logic is that, if you wish to trigger fresh training from scratch, either:

  • Specify a new, empty results directory (Recommended)

  • Remove the latest checkpoint from the results directory

Here is an example spec file for testing evaluation and inference of a trained VisualChangeNet-Classification model.

Copy
Copied!
            

results_dir: /path/to/experiment_results task: classify model: backbone: type: "fan_small_12_p4_hybrid" classify: eval_margin: 0.005 dataset: classify: test_dataset: csv_path: /path/to/test.csv images_dir: /path/to/img_dir infer_dataset: csv_path: /path/to/infer.csv images_dir: /path/to/img_dir image_ext: .jpg batch_size: 16 workers: 2 num_input: 4 input_map: LowAngleLight: 0 SolderLight: 1 UniformLight: 2 WhiteLight: 3 concat_type: linear grid_map: x: 2 y: 2 output_shape: - 128 - 128 augmentation_config: rgb_input_mean: [0.485, 0.456, 0.406] rgb_input_std: [0.229, 0.224, 0.225] num_classes: 2 evaluate: checkpoint: /path/to/checkpoint results_dir: /results/evaluate inference: checkpoint: /path/to/checkpoint results_dir: /results/inference

Inference/Evaluate

Parameter Datatype Default Description Supported Values
checkpoint string Path to PyTorch model to evaluate/inference
trt_engine string Path to TensorRT model to inference/evaluate. Should be only used with TAO Deploy
num_gpus unsigned int 1 The number of GPUs to use >0
gpu_ids unsigned int [0] The GPU ids to use
results_dir string The path to a folder where the experiment outputs should be written
vis_after_n_batches unsigned int 1 Number of batches after which to save inference/evaluate visualization results >0
batch_size unsigned int The batch size of inference/evaluate

Use the following command to run VisualChangeNet-Classification evaluation:

Copy
Copied!
            

tao model visual_changenet evaluate [-h] -e <experiment_spec_file> task=classify evaluate.checkpoint=<model to be evaluated> [evaluate.<evaluate_option>=<evaluate_option_value>] [evaluate.gpu_ids=<gpu indices>] [evaluate.num_gpus=<number of gpus>]

Multi-GPU evaluation is currently not supported for Visual ChangeNet Classify.

Required Arguments

  • -e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment.

  • evaluate.checkpoint: The .pth model to be evaluated.

Optional Arguments

Use the following command to run inference on VisualChangeNet-Classification with the .pth model:

Copy
Copied!
            

tao model visual_changenet inference [-h] -e <experiment_spec_file> task=classify inference.checkpoint=<inference model> [inference.<evaluate_option>=<evaluate_option_value>] [inference.gpu_ids=<gpu indices>] [inference.num_gpus=<number of gpus>]

Required Arguments

  • -e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment.

  • inference.checkpoint: The .pth model to run inference on.

Optional Arguments

Here is an example spec file for exporting the trained VisualChangeNet model:

Copy
Copied!
            

export: checkpoint: /path/to/model.pth onnx_file: /path/to/model.onnx opset_version: 12 input_channel: 3 input_width: 128 input_height: 512 batch_size: -1

Parameter Datatype Default Description Supported Values
checkpoint string The path to the PyTorch model to export
onnx_file string The path to the .onnx file
opset_version unsigned int 12 The opset version of the exported ONNX >0
input_channel unsigned int 3 The input channel size. Only the value 3 is supported. 3
input_width unsigned int 128 The input width >0
input_height unsigned int 512 The input height >0
batch_size unsigned int -1 The batch size of the ONNX model. If this value is set to -1, the export uses dynamic batch size. >=-1
gpu_id unsigned int 0 The GPU id to use
on_cpu bool False Whether to export on cpu
verbose bool False Print out a human-readable representation of the network

Use the following command to export the model:

Copy
Copied!
            

tao model visual_changenet export [-h] -e <experiment spec file> task=classify export.checkpoint=<model to export> export.onnx_file=<onnx path> [export.<export_option>=<export_option_value>]

Required Arguments

  • -e, --experiment_spec: The path to an experiment spec file

  • export.checkpoint: The .pth model to export.

  • export.onnx_file: The path where the .etlt or .onnx model is saved.

Optional Arguments

Previous Visual ChangeNet-Segmentation
Next 3D Object Detection
© Copyright 2024, NVIDIA. Last updated on Oct 15, 2024.