Visual ChangeNet-Classification#

Visual ChangeNet-Classification is an NVIDIA-developed classification change detection model and is included in the TAO. Visual ChangeNet supports the following tasks:

  • train

  • evaluate

  • inference

  • export

These tasks can be invoked from the TAO Launcher using the following convention on the command-line:

tao model visual_changenet <sub_task> <args_per_subtask>

Where args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.

Data Input for VisualChangeNet#

VisualChangeNet-Classification requires the data to be provided as image and CSV files. See the Data Annotation Format page for more information about the input data format for VisualChangeNet-Classification, which follows the same input data format as Optical Inspection.

Creating a Training Experiment Spec File#

Configuring a Custom Dataset#

This section provides example configuration and commands for training VisualChangeNet-Classification using the dataset format described above.

Here is an example spec file for training a VisualChangeNet-Classification model with NVIDIA’s FAN Hybrid backbone using the Data Annotation Format.

encryption_key: tlt_encode
task: classify
train:
  pretrained_model_path: /path/to/pretrained/model.pth
  resume_training_checkpoint_path: null
  classify:
    loss: "ce"
    cls_weight: [1.0, 10.0]
  num_epochs: 10
  num_nodes: 1
  validation_interval: 5
  checkpoint_interval: 5
  seed: 1234
  optim:
    lr: 0.0001
    optim: "adamw"
    policy: "linear"
    momentum: 0.9
    weight_decay: 0.01
  results_dir: "${results_dir}/train"
  tensorboard:
    enabled: True
results_dir: /path/to/experiment_results
model:
  backbone:
    type: "fan_small_12_p4_hybrid"
    pretrained_backbone_path: null
    freeze_backbone: False
  decode_head:
    feature_strides: [4, 8, 16, 16]
  classify:
    train_margin_euclid: 2.0
    eval_margin: 0.005
    embedding_vectors: 5
    embed_dec: 30
    difference_module: 'learnable'
    learnable_difference_modules: 4
dataset:
  classify:
    train_dataset:
      csv_path: /path/to/train.csv
      images_dir: /path/to/img_dir
    validation_dataset:
      csv_path: /path/to/val.csv
      images_dir: /path/to/img_dir
    test_dataset:
      csv_path: /path/to/test.csv
      images_dir: /path/to/img_dir
    infer_dataset:
      csv_path: /path/to/infer.csv
      images_dir: /path/to/img_dir
    image_ext: .jpg
    batch_size: 16
    workers: 2
    fpratio_sampling: 0.2
    num_input: 4
    input_map:
      LowAngleLight: 0
      SolderLight: 1
      UniformLight: 2
      WhiteLight: 3
    concat_type: linear
    grid_map:
      x: 2
      y: 2
    image_width: 128
    image_height: 128
    augmentation_config:
      rgb_input_mean: [0.485, 0.456, 0.406]
      rgb_input_std: [0.229, 0.224, 0.225]
    num_classes: 2
evaluate:
  checkpoint: "???"
inference:
  checkpoint: "???"
export:
  gpu_id: 0
  checkpoint: "???"
  onnx_file: "???"
  input_width: 128
  input_height: 512

Parameter

Data Type

Default

Description

Supported Values

model

dict config

The configuration of the model architecture

dataset

dict config

The configuration of the dataset

train

dict config

The configuration of the training task

evaluate

dict config

The configuration of the evaluation task

inference

dict config

The configuration of the inference task

encryption_key

string

None

The encryption key to encrypt and decrypt model files

results_dir

string

/results

The directory where experiment results are saved

export

dict config

The configuration of the ONNX export task

task

str

classify

A flag to indicate the change detection task. Supports two tasks: ‘segment’ and ‘classify’ for segmentation and classification

train#

Parameter

Datatype

Default

Description

Supported Values

num_gpus

unsigned int

1

The number of GPUs to use for distributed training

>0

gpu_ids

List[int]

[0]

The indices of the GPU’s to use for distributed training

seed

unsigned int

1234

The random seed for random, NumPy, and torch

>0

num_epochs

unsigned int

10

The total number of epochs to run the experiment

>0

checkpoint_interval

unsigned int

1

The epoch interval at which the checkpoints are saved

>0

validation_interval

unsigned int

1

The epoch interval at which the validation is run

>0

resume_training_checkpoint_path

string

The intermediate PyTorch Lightning checkpoint to resume training from

results_dir

string

/results/train

The directory to save training results

classify



Dict

str
list
None

ce

The classify dict contains configurable parameters for the VisualChangeNet Classification pipeline with the following parameters:

* loss: The loss function used for classification training.
* cls_weights: Weights for Cross-Entropy Loss for unbalanced dataset distributions.




segment



Dict

str
list
None

ce
[0.5, 0.5, 0.5, 0.8, 1.0]
The segment dict contains configurable parameters for the VisualChangeNet Segmentation pipeline with the following parameters:

* loss: The loss function used for segmentation training.
* weights: List of weights used for calculating the multi-scale segmentation loss during training when multi_scale_train is “True”




num_nodes

unsigned int

1

The number of nodes. If the value is larger than 1, multi-node is enabled.

pretrained_model_path

string

The path to the pretrained model checkpoint to initialize the end-end model weights.

optim

dict config

None

Contains the configurable parameters for the VisualChangeNet optimizer detailed in
the optim section.


tensorboard

dict config

None
True
Enable TensorBoard visualisation using a dict with configurable parameters:
* enabled: Flag to enabled TensorBoard.


optim#

optim:
  lr: 0.0001
  optim: "adamw"
  policy: "linear"
  momentum: 0.9
  weight_decay: 0.01

Parameter

Datatype

Default

Description

Supported Values

lr

float

0.0005

The learning rate

>=0.0

optim

str

adamw

policy


str


linear


The learning scheduler:
* linear : LambdaLR decreases the lr by a multiplicative factor.
* step : StepLR decrease the lr by 0.1 at every num_epochs//3
linear/step


momentum

float

0.9

The momentum for the AdamW optimizer

weight_decay

float

0.1

The weight decay coefficient

monitor_name

str

val_loss

The name of the monitor used for saving the top-k checkpoints.

Model#

The following example model config provides options to change the VisualChangeNet-Classification architecture for training. VisualChangeNet-Classification supports two model architectures. Architecture 1 leverages only the last feature maps from the FAN backbone using Euclidean difference to perform contrastive learning. Architecture 2 leverages the VisualChangeNet-Classification learnable difference modules for 4 different features at 3 feature resolutions to minimize Cross-Entropy loss.

model:
  backbone:
    type: "fan_small_12_p4_hybrid"
    pretrained_backbone_path: null
    freeze_backbone: False
  decode_head:
    feature_strides: [4, 8, 16, 16]
    align_corner: False
  classify:
    train_margin_euclid: 2.0
    eval_margin: 0.005
    embedding_vectors: 5
    embed_dec: 30
    difference_module: 'learnable'
    learnable_difference_modules: 4

Parameter

Datatype

Default

Description

Supported Values

backbone







Dict
string




bool
bool
None




None
False
False
A dictionary containing the following configurable parameters for VisualChangeNet-Classification backbone:
* type: The name of the backbone to be used



* pretrained_backbone_path: The path to pre-trained backbone weights file
* freeze_backbone: Whether to freeze the backbone weights during training
* feat_downsample: Whether to downsample the last feature map in FAN backbone configurations
fan_tiny_8_p4_hybrid
fan_large_16_p4_hybrid
fan_small_12_p4_hybrid
fan_base_16_p4_hybrid
vit_large_nvdinov2



decode_head







Dict

bool
list
Dict
list


None

False
[4, 8, 16, 16]

256


A dictionary containing the following configurable parameters for the decoder:

* align_corners: If set to True, the input and output tensors are aligned by the center points of their corner pixels, preserving the values at the corner pixels.
* feature_strides: The downsampling feature strides for different backbones.
* decoder_params: Contains the following network parameters:

* embed_dims: The embedding dimensions



True, False





classify






Dict
string





None
2.0

5
30
learnable
4
A dictionary containing the following configurable parameters for VisualChangeNet-Classification model:
* train_margin_euclid: The training margin threshold for contrastive learning (applicable for Architecture 1)
* eval_margin: The evaluation margin threshold
* embedding_vectors: The output embedding dimension for each input image before computing Euclidean distance (applicable for Architecture 1)
* embed_dec: The transformer decoder MLP embedding dimension (applicable for Architecture 2)
* difference_module: The type of difference module used (applicable for both Achitectures)
* learnable_difference_modules: The number of learnable difference modules (applicable for Architecture 2)

>0
>0
>0
>0
Euclidean, learnable
<4

Dataset#

The dataset parameter defines the dataset source, training batch size, augmentation, and pre-processing. An example dataset is provided below.

dataset:
  classify:
    train_dataset:
      csv_path: /path/to/train.csv
      images_dir: /path/to/img_dir
    validation_dataset:
      csv_path: /path/to/val.csv
      images_dir: /path/to/img_dir
    test_dataset:
      csv_path: /path/to/test.csv
      images_dir: /path/to/img_dir
    infer_dataset:
      csv_path: /path/to/infer.csv
      images_dir: /path/to/img_dir
    image_ext: .jpg
    batch_size: 16
    workers: 2
    fpratio_sampling: 0.2
    num_input: 4
    input_map:
      LowAngleLight: 0
      SolderLight: 1
      UniformLight: 2
      WhiteLight: 3
    concat_type: linear
    grid_map:
      x: 2
      y: 2
    image_width: 128
    image_height: 128
    augmentation_config:
      rgb_input_mean: [0.485, 0.456, 0.406]
      rgb_input_std: [0.229, 0.224, 0.225]
    num_classes: 2

* See the Dataset Annotation Format definition for more information about specifying lighting conditions.

Parameter

Datatype

Default

Description

Supported Values

segment

Dict

The segment contains dataset config for the segmentation dataloader

classify

Dict

The classify contains dataset config for the classification dataloader detailed in the classify section.

classify#

Parameter

Datatype

Default

Description

Supported Values

train_dataset

Dict

The paths to the image directory and CSV files for the training dataset

validation_dataset

Dict

The paths to the image directory and CSV files for the validation dataset

test_dataset

Dict

The paths to the image directory and CSV files for the test dataset

infer_dataset

Dict

The paths to the image directory and CSV files for the inference dataset

image_ext

str

.jpg

The file extension of the images in the dataset

string

batch_size

int

32

The number of samples per batch

string

workers

int

8

The number of worker processes for data loading

fpratio_sampling

int

0.1

The ratio of false-positive examples to sample

>0

num_input

int

4

The number of lighting conditions for each input image*

>0

input_map

Dict

The mapping of lighting conditions to indices specifying concatenation ordering*

concat_type

string

linear

Type of concatenation to use for different image lighting conditions

linear, grid

grid_map




Dict

Dict

dict config
None

None

None
The parameters to define the grid dimensions to concatenate images as a grid:

* x: The number of images along the x-axis

* y: The number of images along the y-axis
Dict




input_width

int

100

The width of the input image

>0

input_height

int

100

The height of the input image

>0

num_classes

int

2

The number of classes in the dataset

>1

augmentation_config

Dict

None

Dictionary containing various data augmentation settings, which is detailed in the augmentation section.

augmentation_config#

Parameter

Datatype

Default

Description

Supported Values

random_flip



Dict
float
float
enable
None
0.5
0.5
True
Random vertical and horizontal flipping augmentation settings.
* vflip_probability: Probability of vertical flipping.
* hflip_probability: Probability of horizontal flipping.
* enable: Enable or disable random flipping augmentation.

>=0.0
>=0.0

random_rotate



Dict
float
list
enable
None
0.5
[90, 180, 270]
True
Randomly rotate images with specified probability and angles
* rotate_probability: Probability of applying random rotation.
* angle_list: List of rotation angles to choose from.
* enable: Enable or disable random rotation augmentation.

>=0.0
>=0.0

random_color






Dict
float
float
float
float
enable
float
None
0.3
0.3
0.3
0.3
True
0.5
Apply random color augmentation to images.
* brightness: Maximum brightness change factor.
* contrast: Maximum contrast change factor.
* saturation: Maximum saturation change factor.
* hue: Maximum hue change factor.
* enabled: Enable or disable random color augmentation.
* color_probability: Probability of applying color augmentation.

>=0.0
>=0.0
>=0.0
>=0.0

>=0.0

with_random_crop

bool

True

Apply random crop augmentation.

True, False

with_random_blur

bool

True

Apply random blurring augmentation.

True, False

rgb_input_mean

List[float]

[0.485, 0.456, 0.406]

The mean to be subtracted for pre-processing.

rgb_input_std

List[float]

[0.229, 0.224, 0.225]

The standard deviation to divide the image by.

augment

bool

False

Flag to indicate whether to apply data augmentations

True, False

Example spec File for ViT Backbones#

Note

The following spec file is only relevant for TAO versions 5.3 and later.

encryption_key: tlt_encode
task: classify
train:
  pretrained_model_path: /path/to/pretrained/model.pth
  resume_training_checkpoint_path: null
  classify:
    loss: "contrastive"
    cls_weight: [1.0, 10.0]
  num_epochs: 10
  num_nodes: 1
  validation_interval: 5
  checkpoint_interval: 5
  seed: 1234
  optim:
    lr: 0.0001
    optim: "adamw"
    policy: "linear"
    momentum: 0.9
    weight_decay: 0.01
  results_dir: "${results_dir}/train"
  tensorboard:
    enabled: True
results_dir: /path/to/experiment_results
model:
  backbone:
    type: "vit_large_nvdinov2"
    pretrained_backbone_path: /path/to/pretrained/backbone.pth
    freeze_backbone: False
  decode_head:
    feature_strides: [4, 8, 16, 32]
  classify:
    train_margin_euclid: 2.0
    eval_margin: 0.005
    embedding_vectors: 5
    embed_dec: 30
    difference_module: 'euclidean'
    learnable_difference_modules: 4
dataset:
  classify:
    train_dataset:
      csv_path: /path/to/train.csv
      images_dir: /path/to/img_dir
    validation_dataset:
      csv_path: /path/to/val.csv
      images_dir: /path/to/img_dir
    test_dataset:
      csv_path: /path/to/test.csv
      images_dir: /path/to/img_dir
    infer_dataset:
      csv_path: /path/to/infer.csv
      images_dir: /path/to/img_dir
    image_ext: .jpg
    batch_size: 16
    workers: 2
    fpratio_sampling: 0.2
    num_input: 4
    input_map:
      LowAngleLight: 0
      SolderLight: 1
      UniformLight: 2
      WhiteLight: 3
    concat_type: grid
    grid_map:
      x: 2
      y: 2
    image_width: 112
    image_height: 112
    augmentation_config:
      rgb_input_mean: [0.485, 0.456, 0.406]
      rgb_input_std: [0.229, 0.224, 0.225]
    num_classes: 2
evaluate:
  checkpoint: "???"
inference:
  checkpoint: "???"
export:
  gpu_id: 0
  checkpoint: "???"
  onnx_file: "???"
  input_width: 224
  input_height: 224

Training the Model#

Use the following command to run VisualChangeNet-Classification training:

tao model visual_changenet train [-h] -e <experiment_spec>
                           task=classify
                           [results_dir=<global_results_dir>]
                           [model.<model_option>=<model_option_value>]
                           [dataset.<dataset_option>=<dataset_option_value>]
                           [train.<train_option>=<train_option_value>]
                           [train.gpu_ids=<gpu indices>]
                           [train.num_gpus=<number of gpus>]

Required Arguments#

  • -e, --experiment_spec_file: The path to the experiment spec file.

  • task: The task (‘segment’ or ‘classify’) for the visual_changenet training. Default: segment.

Optional Arguments#

You can set optional arguments to override the option values in the experiment spec file.

Note

For training, evaluation, and inference, we expose 2 variables for each respective task: num_gpus and gpu_ids, which default to 1 and [0], respectively. If both are passed, but inconsistent, for example num_gpus = 1, gpu_ids = [0, 1], then they are modified to follow the setting with more GPUs, for example num_gpus = 1 -> num_gpus = 2.

Checkpointing and Resuming Training#

At every train.checkpoint_interval, a PyTorch Lightning checkpoint is saved. It is called model_epoch_<epoch_num>.pth. These are saved in train.results_dir, like so:

$ ls /results/train

'model_epoch_000.pth'
'model_epoch_001.pth'
'model_epoch_002.pth'
'model_epoch_003.pth'
'model_epoch_004.pth'

The latest checkpoint is also saved as changenet_model_classify_latest.pth. Training automatically resumes from changenet_model_classify_latest.pth, if it exists in train.results_dir. This is superseded by train.resume_training_checkpoint_path, if it is provided.

The major implication of this logic is that, if you wish to trigger fresh training from scratch, either:

  • Specify a new, empty results directory (Recommended)

  • Remove the latest checkpoint from the results directory

Creating Testing Experiment Spec File#

Here is an example spec file for testing evaluation and inference of a trained VisualChangeNet-Classification model.

results_dir: /path/to/experiment_results
task: classify
model:
  backbone:
    type: "fan_small_12_p4_hybrid"
  classify:
    eval_margin: 0.005
dataset:
  classify:
    test_dataset:
      csv_path: /path/to/test.csv
      images_dir: /path/to/img_dir
    infer_dataset:
      csv_path: /path/to/infer.csv
      images_dir: /path/to/img_dir
    image_ext: .jpg
    batch_size: 16
    workers: 2
    num_input: 4
    input_map:
      LowAngleLight: 0
      SolderLight: 1
      UniformLight: 2
      WhiteLight: 3
    concat_type: linear
    grid_map:
      x: 2
      y: 2
    output_shape:
      - 128
      - 128
    augmentation_config:
      rgb_input_mean: [0.485, 0.456, 0.406]
      rgb_input_std: [0.229, 0.224, 0.225]
    num_classes: 2
evaluate:
  checkpoint: /path/to/checkpoint
  results_dir: /results/evaluate
inference:
  checkpoint: /path/to/checkpoint
  results_dir: /results/inference

Inference/Evaluate#

Parameter

Datatype

Default

Description

Supported Values

checkpoint

string

Path to PyTorch model to evaluate/inference

trt_engine

string

Path to TensorRT model to inference/evaluate. Should be only used with TAO Deploy

num_gpus

unsigned int

1

The number of GPUs to use

>0

gpu_ids

unsigned int

[0]

The GPU ids to use

results_dir

string

The path to a folder where the experiment outputs should be written

vis_after_n_batches

unsigned int

1

Number of batches after which to save inference/evaluate visualization results

>0

batch_size

unsigned int

The batch size of inference/evaluate

Evaluating the Model#

Use the following command to run VisualChangeNet-Classification evaluation:

tao model visual_changenet evaluate [-h] -e <experiment_spec_file>
                           task=classify
                           evaluate.checkpoint=<model to be evaluated>
                           [evaluate.<evaluate_option>=<evaluate_option_value>]
                           [evaluate.gpu_ids=<gpu indices>]
                           [evaluate.num_gpus=<number of gpus>]

Multi-GPU evaluation is currently not supported for Visual ChangeNet Classify.

Required Arguments#

  • -e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment.

  • evaluate.checkpoint: The .pth model to be evaluated.

Optional Arguments#

Running Inference on the Model#

Use the following command to run inference on VisualChangeNet-Classification with the .pth model:

tao model visual_changenet inference [-h] -e <experiment_spec_file>
                           task=classify
                           inference.checkpoint=<inference model>
                           [inference.<evaluate_option>=<evaluate_option_value>]
                           [inference.gpu_ids=<gpu indices>]
                           [inference.num_gpus=<number of gpus>]

Required Arguments#

  • -e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment.

  • inference.checkpoint: The .pth model to run inference on.

Optional Arguments#

Exporting the Model#

Here is an example spec file for exporting the trained VisualChangeNet model:

export:
  checkpoint: /path/to/model.pth
  onnx_file: /path/to/model.onnx
  opset_version: 12
  input_channel: 3
  input_width: 128
  input_height: 512
  batch_size: -1

Parameter

Datatype

Default

Description

Supported Values

checkpoint

string

The path to the PyTorch model to export

onnx_file

string

The path to the .onnx file

opset_version

unsigned int

12

The opset version of the exported ONNX

>0

input_channel

unsigned int

3

The input channel size. Only the value 3 is supported.

3

input_width

unsigned int

128

The input width

>0

input_height

unsigned int

512

The input height

>0

batch_size

unsigned int

-1

The batch size of the ONNX model. If this value is set to -1, the export uses dynamic batch size.

>=-1

gpu_id

unsigned int

0

The GPU id to use

on_cpu

bool

False

Whether to export on cpu

verbose

bool

False

Print out a human-readable representation of the network

Use the following command to export the model:

tao model visual_changenet export [-h] -e <experiment spec file>
                           task=classify
                           export.checkpoint=<model to export>
                           export.onnx_file=<onnx path>
                           [export.<export_option>=<export_option_value>]

Required Arguments#

  • -e, --experiment_spec: The path to an experiment spec file

  • export.checkpoint: The .pth model to export.

  • export.onnx_file: The path where the .etlt or .onnx model is saved.

Optional Arguments#

TensorRT Engine Generation, Validation, and int8 Calibration#

For deployment, refer to the TAO Deploy Documentation for VisualChangeNet-Classification.