TAO Toolkit v5.3.0
NVIDIA TAO v5.3.0

Visual ChangeNet-Classification

Visual ChangeNet-Classification is an NVIDIA-developed classification change detection model and is included in the TAO Toolkit. Visual ChangeNet supports the following tasks:

  • train

  • evaluate

  • inference

  • export

These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:

Copy
Copied!
            

tao model visual_changenet <sub_task> <args_per_subtask>

Where args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.

VisualChangeNet-Classification requires the data to be provided as image and CSV files. See the Data Annotation Format page for more information about the input data format for VisualChangeNet-Classification, which follows the same input data format as Optical Inspection.

Configuring a Custom Dataset

This section provides example configuration and commands for training VisualChangeNet-Classification using the dataset format described above.

Here is an example spec file for training a VisualChangeNet-Classification model with NVIDIA’s FAN Hybrid backbone using the Data Annotation Format.

Copy
Copied!
            

encryption_key: tlt_encode task: classify train: pretrained_model_path: /path/to/pretrained/model.pth resume_training_checkpoint_path: null classify: loss: "ce" cls_weight: [1.0, 10.0] num_epochs: 350 num_nodes: 1 val_interval: 1 checkpoint_interval: 1 optim: lr: 0.0001 optim: "adamw" policy: "linear" momentum: 0.9 weight_decay: 0.01 results_dir: "${results_dir}/train" tensorboard: enabled: True results_dir: /path/to/experiment_results model: backbone: type: "fan_small_12_p4_hybrid" pretrained_backbone_path: null freeze_backbone: False decode_head: feature_strides: [4, 8, 16, 16] classify: train_margin_euclid: 2.0 eval_margin: 0.005 embedding_vectors: 5 embed_dec: 30 difference_module: 'learnable' learnable_difference_modules: 4 dataset: classify: train_dataset: csv_path: /path/to/train.csv images_dir: /path/to/img_dir validation_dataset: csv_path: /path/to/val.csv images_dir: /path/to/img_dir test_dataset: csv_path: /path/to/test.csv images_dir: /path/to/img_dir infer_dataset: csv_path: /path/to/infer.csv images_dir: /path/to/img_dir image_ext: .jpg batch_size: 16 workers: 2 fpratio_sampling: 0.2 num_input: 4 input_map: LowAngleLight: 0 SolderLight: 1 UniformLight: 2 WhiteLight: 3 concat_type: linear grid_map: x: 2 y: 2 output_shape: - 128 - 128 augmentation_config: rgb_input_mean: [0.485, 0.456, 0.406] rgb_input_std: [0.229, 0.224, 0.225] num_classes: 2 evaluate: checkpoint: "???" inference: checkpoint: "???" export: gpu_id: 0 checkpoint: "???" onnx_file: "???" input_width: 128 input_height: 512

Parameter Data Type Default Description
model dict config The configuration of the model architecture
dataset dict config The configuration for the dataset detailed in the Config section
train dict config The configuration for training parameters, which is detailed in the Train section
results_dir string The path to save the model experiment log outputs and model checkpoints
task str classify A flag to indicate the change detection task. Supports two tasks: ‘segment’ and ‘classify’ for segmentation and classification

results_dir string The path to save the model training experiment log outputs and model checkpoints
checkpoint_interval int 1 The interval at which the checkpoint needs to be saved
resume_training_checkpoint_path str None The path to the checkpoint for resuming training

classify

Dict

str
list

None

ce

The classify dict contains configurable parameters for the VisualChangeNet Classification pipeline with the following parameters:

* loss: The loss function used for classification training.
* cls_weights: Weights for Cross-Entropy Loss for unbalanced dataset distributions.

segment

Dict

str
list

None

ce
[0.5, 0.5, 0.5, 0.8, 1.0]

The segment dict contains configurable parameters for the VisualChangeNet Segmentation pipeline with the following parameters:

* loss: The loss function used for segmentation training.
* weights: List of weights used for calculating the multi-scale segmentation loss during training when multi_scale_train is “True”

num_nodes unsigned int 1 The number of nodes. If the value is larger than 1, multi-node is enabled.
val_interval unsigned int 1 The epoch interval at which the validation is run
checkpoint_interval int 1 The number of steps at which the checkpoint needs to be saved
num_epochs int 50 The total number of epochs to run the experiment
pretrained_model_path string The path to the pretrained model checkpoint to initialize the end-end model weights.

optim

dict config

None

Contains the configurable parameters for the VisualChangeNet optimizer detailed in
the optim section.

tensorboard

dict config

None
True

Enable TensorBoard visualisation using a dict with configurable parameters:
* enabled: Flag to enabled TensorBoard.

optim

Copy
Copied!
            

optim: lr: 0.0001 optim: "adamw" policy: "linear" momentum: 0.9 weight_decay: 0.01

Parameter Datatype Default Description Supported Values
lr float 0.0005 The learning rate >=0.0
optim str adamw

policy

str

linear

The learning scheduler:
* linear : LambdaLR decreases the lr by a multiplicative factor.
* step : StepLR decrease the lr by 0.1 at every num_epochs//3

linear/step

momentum float 0.9 The momentum for the AdamW optimizer
weight_decay float 0.1 The weight decay coefficient
monitor_name str val_loss The name of the monitor used for saving the top-k checkpoints.

The following example model config provides options to change the VisualChangeNet-Classification architecture for training. VisualChangeNet-Classification supports two model architectures. Architecture 1 leverages only the last feature maps from the FAN backbone using Euclidean difference to perform contrastive learning. Architecture 2 leverages the VisualChangeNet-Classification learnable difference modules for 4 different features at 3 feature resolutions to minimize Cross-Entropy loss.

Copy
Copied!
            

model: backbone: type: "fan_small_12_p4_hybrid" pretrained_backbone_path: null freeze_backbone: False decode_head: feature_strides: [4, 8, 16, 16] align_corner: False classify: train_margin_euclid: 2.0 eval_margin: 0.005 embedding_vectors: 5 embed_dec: 30 difference_module: 'learnable' learnable_difference_modules: 4

Parameter Datatype Default Description Supported Values

backbone

Dict
string

bool
bool

None

None
False
False

A dictionary containing the following configurable parameters for VisualChangeNet-Classification backbone:
* type: Th|e name| of the backbone to be used

* pretrained_backbone_path: The path to pre-trained backbone weights file
* freeze_backbone: Whether to freeze the backbone weights during training
* feat_downsample: Whether to downsample the last feature map in FAN backbone configurations

fan_tiny_8_p4_hybrid
fan_large_16_p4_hybrid
fan_small_12_p4_hybrid
fan_base_16_p4_hybrid
vit_large_nvdinov2

decode_head

Dict

bool
list
Dict
list

None

False
[4, 8, 16, 16]

256

A dictionary containing the following configurable parameters for the decoder:

* align_corners: If set to True, the input and output tensors are aligned by the center points of their corner pixels, preserving the values at the corner pixels.
* feature_strides: The downsampling feature strides for different backbones.
* decoder_params: Contains the following network parameters:

* embed_dims: The embedding dimensions

True, False

classify

Dict
string

None
2.0

5
30
learnable
4

A dictionary containing the following configurable parameters for VisualChangeNet-Classification model:
* train_margin_euclid: The training margin threshold for contrastive learning (applicable for Architecture 1)
* eval_margin: The evaluation margin threshold
* embedding_vectors: The output embedding dimension for each input image before computing Euclidean distance (applicable for Architecture 1)
* embed_dec: The transformer decoder MLP embedding dimension (applicable for Architecture 2)
* difference_module: The type of difference module used (applicable for both Achitectures)
* learnable_difference_modules: The number of learnable difference modules (applicable for Architecture 2)

>0
>0
>0
>0
Euclidean, learnable
<4

The dataset parameter defines the dataset source, training batch size, augmentation, and pre-processing. An example dataset is provided below.

Copy
Copied!
            

dataset: classify: train_dataset: csv_path: /path/to/train.csv images_dir: /path/to/img_dir validation_dataset: csv_path: /path/to/val.csv images_dir: /path/to/img_dir test_dataset: csv_path: /path/to/test.csv images_dir: /path/to/img_dir infer_dataset: csv_path: /path/to/infer.csv images_dir: /path/to/img_dir image_ext: .jpg batch_size: 16 workers: 2 fpratio_sampling: 0.2 num_input: 4 input_map: LowAngleLight: 0 SolderLight: 1 UniformLight: 2 WhiteLight: 3 concat_type: linear grid_map: x: 2 y: 2 output_shape: - 128 - 128 augmentation_config: rgb_input_mean: [0.485, 0.456, 0.406] rgb_input_std: [0.229, 0.224, 0.225] num_classes: 2 color_map: '0': [255, 255, 255] '1': [0, 0, 0]

* See the Dataset Annotation Format definition for more information about specifying lighting conditions.

Parameter Datatype Default Description Supported Values
segment Dict The segment contains dataset config for the segmentation dataloader
classify Dict The classify contains dataset config for the classification dataloader detailed in the classify section.

classify

Parameter Datatype Default Description Supported Values
train_dataset Dict The paths to the image directory and CSV files for the training dataset
validation_dataset Dict The paths to the image directory and CSV files for the validation dataset
test_dataset Dict The paths to the image directory and CSV files for the test dataset
infer_dataset Dict The paths to the image directory and CSV files for the inference dataset
image_ext str .jpg The file extension of the images in the dataset string
batch_size int 32 The number of samples per batch string
workers int 8 The number of worker processes for data loading
fpratio_sampling int 0.1 The ratio of false-positive examples to sample >0
num_input int 4 The number of lighting conditions for each input image* >0
input_map Dict The mapping of lighting conditions to indices specifying concatenation ordering*
concat_type string linear Type of concatenation to use for different image lighting conditions linear, grid

grid_map

Dict

Dict

dict config

None

None

None

The parameters to define the grid dimensions to concatenate images as a grid:

* x: The number of images along the x-axis

* y: The number of images along the y-axis

Dict

output_shape List[int] [100, 100] Image resolution of each lighting condition to be reshaped before concatenation >=0

augmentation_config

Dict

List[float]

List[float]

None

[0.485, 0.456, 0.406]

[0.229, 0.224, 0.225]

The image normalization config, which contains the following parameters:

* rgb_input_mean: The mean to be subtracted for pre-processing

* rgb_input_std: The standard deviation to divide the image by

>=0.0

>=0.0

num_classes int 2 The number of classes in the dataset >1

segment

Parameter Datatype Default Description Supported Values
root_dir str The root directory containing the dataset.
label_transform str ‘norm’ The label transformation applied to normalize the label images to represent class IDs. ‘norm’, None
data_name str ‘LEVIR’ The name of the dataset.
dataset str ‘CNDataset’ The name of the dataset class. ‘CNDataset’
multi_scale_train bool True Whether to use multi-scale training. True, False
multi_scale_infer bool False Whether to use multi-scale inference. True, False
num_classes int 2 The number of classes in the dataset. >0
img_size int 256 The size of input images. >0
batch_size int 8 The number of samples per batch.
workers int 2 The number of worker processes for data loading.
shuffle bool True Whether to shuffle the dataset during loading.
image_folder_name str ‘A’ The name of the folder containing images.
change_image_folder_name str ‘B’ The name of the folder containing changed images.
list_folder_name str ‘list’ The name of the folder containing dataset split lists.
annotation_folder_name str ‘label’ The name of the folder containing annotations.

augmentation

Dict

List[float]
List[float]
bool
bool
Dict
Dict
Dict
Dict

None

[0.5, 0.5, 0.5]
[0.5, 0.5, 0.5]
True
True
-
-
-
-

The image augmentation config, which contains the following parameters:

* mean: The mean to be subtracted for pre-processing
* std: The mean to be subtracted for pre-processing
* with_random_blur: Whether to apply random blur
* with_random_crop: Whether to apply random crop
* with_scale_random_crop: Apply random crop with scale
* random_color: Apply random color augmentation
* random_rotate: Apply random rotate augmentation
* random_flip: Apply random flip augmentation

>=0.0
>=0.0

train_split str ‘train’ The name of the training split.
validation_split str ‘val’ The name of the validation split.
test_split str ‘test’ The name of the test split.
predict_split str ‘test’ The name of the prediction split.
label_suffix str ‘.png’ The suffix for label files.
color_map Optional[Dict[str, List[int]]] None Mapping of string class labels (‘0’ to ‘n’) to rgb color codes.

Example spec file for ViT backbones

Note

The following spec file is only relevant for TAO Toolkit versions 5.3 and later.

Copy
Copied!
            

encryption_key: tlt_encode task: classify train: pretrained_model_path: /path/to/pretrained/model.pth resume_training_checkpoint_path: null classify: loss: "contrastive" cls_weight: [1.0, 10.0] num_epochs: 350 num_nodes: 1 val_interval: 1 checkpoint_interval: 1 optim: lr: 0.0001 optim: "adamw" policy: "linear" momentum: 0.9 weight_decay: 0.01 results_dir: "${results_dir}/train" tensorboard: enabled: True results_dir: /path/to/experiment_results model: backbone: type: "vit_large_nvdinov2" pretrained_backbone_path: /path/to/pretrained/backbone.pth freeze_backbone: False decode_head: feature_strides: [4, 8, 16, 32] classify: train_margin_euclid: 2.0 eval_margin: 0.005 embedding_vectors: 5 embed_dec: 30 difference_module: 'euclidean' learnable_difference_modules: 4 dataset: classify: train_dataset: csv_path: /path/to/train.csv images_dir: /path/to/img_dir validation_dataset: csv_path: /path/to/val.csv images_dir: /path/to/img_dir test_dataset: csv_path: /path/to/test.csv images_dir: /path/to/img_dir infer_dataset: csv_path: /path/to/infer.csv images_dir: /path/to/img_dir image_ext: .jpg batch_size: 16 workers: 2 fpratio_sampling: 0.2 num_input: 4 input_map: LowAngleLight: 0 SolderLight: 1 UniformLight: 2 WhiteLight: 3 concat_type: grid grid_map: x: 2 y: 2 output_shape: - 112 - 112 augmentation_config: rgb_input_mean: [0.485, 0.456, 0.406] rgb_input_std: [0.229, 0.224, 0.225] num_classes: 2 evaluate: checkpoint: "???" inference: checkpoint: "???" export: gpu_id: 0 checkpoint: "???" onnx_file: "???" input_width: 224 input_height: 224


Use the following command to run VisualChangeNet-Classification training:

Copy
Copied!
            

tao model visual_changenet train -e <experiment_spec_file> -r <results_dir> --gpus <num_gpus> task=classify

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file.

  • -r, --results_dir: The path to a folder where the experiment outputs should be written.

  • task: The task (‘segment’ or ‘classify’) for the visual_changenet training. Default: segment.

Optional Arguments

  • --gpus: The number of GPUs to use for training. The default value is 1.

Here’s an example of using the VisualChangeNet training command:

Copy
Copied!
            

tao model visual_changenet train -e $DEFAULT_SPEC -r $RESULTS_DIR --gpus $NUM_GPUs


Here is an example spec file for testing evaluation and inference of a trained VisualChangeNet-Classification model.

Copy
Copied!
            

results_dir: /path/to/experiment_results task: classify model: backbone: type: "fan_small_12_p4_hybrid" classify: eval_margin: 0.005 dataset: classify: test_dataset: csv_path: /path/to/test.csv images_dir: /path/to/img_dir infer_dataset: csv_path: /path/to/infer.csv images_dir: /path/to/img_dir image_ext: .jpg batch_size: 16 workers: 2 num_input: 4 input_map: LowAngleLight: 0 SolderLight: 1 UniformLight: 2 WhiteLight: 3 concat_type: linear grid_map: x: 2 y: 2 output_shape: - 128 - 128 augmentation_config: rgb_input_mean: [0.485, 0.456, 0.406] rgb_input_std: [0.229, 0.224, 0.225] num_classes: 2 evaluate: checkpoint: /path/to/checkpoint inference: checkpoint: /path/to/checkpoint

Inference/Evaluate

Parameter Datatype Default Description Supported Values
checkpoint string Path to PyTorch model to evaluate/inference
trt_engine string Path to TensorRT model to inference/evaluate. Should be only used with TAO Deploy
num_gpus unsigned int 1 The number of GPUs to use >0
results_dir string The path to a folder where the experiment outputs should be written
vis_after_n_batches unsigned int 1 Number of batches after which to save inference/evaluate visualization results >0
batch_size unsigned int The batch size of inference/evaluate
gpu_id unsigned int 0 The GPU id to use

Use the following command to run VisualChangeNet-Classification evaluation:

Copy
Copied!
            

tao model visual_changenet evaluate -e <experiment_spec> -r <results_dir> task=classify model.classify.eval_margin=0.5s

Required Arguments

  • -e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment.

  • -r, --results_dir: The path to a folder where the experiment outputs should be written.

Here’s an example of using the VisualChangeNet evaluation command:

Copy
Copied!
            

tao model visual_changenet evaluate -e $DEFAULT_SPEC -r $RESULTS_DIR


Use the following command to run inference on VisualChangeNet-Classification with the .tlt model:

Copy
Copied!
            

tao model visual_changenet inference -e <experiment_spec> -r <results_dir> task=classify model.classify.eval_margin=0.5

Required Arguments

  • -e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment.

  • -r, --results_dir: The path to a folder where the experiment outputs should be written.

Here’s an example of using the VisualChangeNet inference command:

Copy
Copied!
            

tao model visual_changenet inference -e $DEFAULT_SPEC -r $RESULTS_DIR


Here is an example spec file for exporting the trained VisualChangeNet model:

Copy
Copied!
            

export: checkpoint: /path/to/model.pth onnx_file: /path/to/model.onnx opset_version: 12 input_channel: 3 input_width: 128 input_height: 512 batch_size: -1

Parameter Datatype Default Description Supported Values
checkpoint string The path to the PyTorch model to export
onnx_file string The path to the .onnx file
opset_version unsigned int 12 The opset version of the exported ONNX >0
input_channel unsigned int 3 The input channel size. Only the value 3 is supported. 3
input_width unsigned int 128 The input width >0
input_height unsigned int 512 The input height >0
batch_size unsigned int -1 The batch size of the ONNX model. If this value is set to -1, the export uses dynamic batch size. >=-1
gpu_id unsigned int 0 The GPU id to use
on_cpu bool False Whether to export on cpu
verbose bool False Print out a human-readable representation of the network

Use the following command to export the model:

Copy
Copied!
            

tao model visual_changenet export [-h] -e <experiment spec file> -r <results_dir> task=classify

Required Arguments

  • -e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment.

  • -r, --results_dir: The path to a folder where the experiment outputs should be written.

Sample Usage

The following is an example export command:

Copy
Copied!
            

tao model visual_changenet export -e /path/to/spec.yaml -r $RESULTS_DIR


Previous Visual ChangeNet-Segmentation
Next CenterPose
© Copyright 2023, NVIDIA.. Last updated on Aug 26, 2024.