Visual ChangeNet-Classification

Visual ChangeNet-Classification is an NVIDIA-developed classification change detection model and is included in the TAO Toolkit. Visual ChangeNet supports the following tasks:

train
evaluate
inference
export

These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:

Copy
Copied!

            
            tao model visual_changenet <sub_task> <args_per_subtask>

Where args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.

Data Input for VisualChangeNet

VisualChangeNet-Classification requires the data to be provided as image and CSV files. See the Data Annotation Format page for more information about the input data format for VisualChangeNet-Classification, which follows the same input data format as Optical Inspection.

Creating a Training Experiment Spec File

Configuring a Custom Dataset

This section provides example configuration and commands for training VisualChangeNet-Classification using the dataset format described above.

Here is an example spec file for training a VisualChangeNet-Classification model with NVIDIA’s FAN Hybrid backbone using the Data Annotation Format.

Copy
Copied!

            
            encryption_key: tlt_encode
task: classify
train:
  pretrained_model_path: /path/to/pretrained/model.pth
  resume_training_checkpoint_path: null
  classify:
    loss: "ce"
    cls_weight: [1.0, 10.0]
  num_epochs: 350
  num_nodes: 1
  val_interval: 1
  checkpoint_interval: 1
  optim:
    lr: 0.0001
    optim: "adamw"
    policy: "linear"
    momentum: 0.9
    weight_decay: 0.01
  results_dir: "${results_dir}/train"
  tensorboard:
    enabled: True
results_dir: /path/to/experiment_results
model:
  backbone:
    type: "fan_small_12_p4_hybrid"
    pretrained_backbone_path: null
    freeze_backbone: False
  decode_head:
    feature_strides: [4, 8, 16, 16]
  classify:
    train_margin_euclid: 2.0
    eval_margin: 0.005
    embedding_vectors: 5
    embed_dec: 30
    difference_module: 'learnable'
    learnable_difference_modules: 4
dataset:
  classify:
    train_dataset:
      csv_path: /path/to/train.csv
      images_dir: /path/to/img_dir
    validation_dataset:
      csv_path: /path/to/val.csv
      images_dir: /path/to/img_dir
    test_dataset:
      csv_path: /path/to/test.csv
      images_dir: /path/to/img_dir
    infer_dataset:
      csv_path: /path/to/infer.csv
      images_dir: /path/to/img_dir
    image_ext: .jpg
    batch_size: 16
    workers: 2
    fpratio_sampling: 0.2
    num_input: 4
    input_map:
      LowAngleLight: 0
      SolderLight: 1
      UniformLight: 2
      WhiteLight: 3
    concat_type: linear
    grid_map:
      x: 2
      y: 2
    output_shape:
      - 128
      - 128
    augmentation_config:
      rgb_input_mean: [0.485, 0.456, 0.406]
      rgb_input_std: [0.229, 0.224, 0.225]
    num_classes: 2
evaluate:
  checkpoint: "???"
inference:
  checkpoint: "???"
export:
  gpu_id: 0
  checkpoint: "???"
  onnx_file: "???"
  input_width: 128
  input_height: 512

Parameter	Data Type	Default	Description
`model`	dict config	–	The configuration of the model architecture
`dataset`	dict config	–	The configuration for the dataset detailed in the Config section
`train`	dict config	–	The configuration for training parameters, which is detailed in the Train section
`results_dir`	string	–	The path to save the model experiment log outputs and model checkpoints
`task`	str	classify	A flag to indicate the change detection task. Supports two tasks: ‘segment’ and ‘classify’ for segmentation and classification

train

`results_dir`	string	–	The path to save the model training experiment log outputs and model checkpoints
`checkpoint_interval`	int	1	The interval at which the checkpoint needs to be saved
`resume_training_checkpoint_path`	str	None	The path to the checkpoint for resuming training
`classify`	Dict str list	None ce	The `classify` dict contains configurable parameters for the VisualChangeNet Classification pipeline with the following parameters: * `loss`: The loss function used for classification training. * `cls_weights`: Weights for Cross-Entropy Loss for unbalanced dataset distributions.
`segment`	Dict str list	None ce [0.5, 0.5, 0.5, 0.8, 1.0]	The `segment` dict contains configurable parameters for the VisualChangeNet Segmentation pipeline with the following parameters: * `loss`: The loss function used for segmentation training. * `weights`: List of weights used for calculating the multi-scale segmentation loss during training when multi_scale_train is “True”
`num_nodes`	unsigned int	1	The number of nodes. If the value is larger than 1, multi-node is enabled.
`val_interval`	unsigned int	1	The epoch interval at which the validation is run
`checkpoint_interval`	int	1	The number of steps at which the checkpoint needs to be saved
`num_epochs`	int	50	The total number of epochs to run the experiment
`pretrained_model_path`	string	–	The path to the pretrained model checkpoint to initialize the end-end model weights.
`optim`	dict config	None	Contains the configurable parameters for the VisualChangeNet optimizer detailed in the optim section.
`tensorboard`	dict config	None True	Enable TensorBoard visualisation using a dict with configurable parameters: * `enabled`: Flag to enabled TensorBoard.

optim

Copy
Copied!

            
            optim:
  lr: 0.0001
  optim: "adamw"
  policy: "linear"
  momentum: 0.9
  weight_decay: 0.01

Parameter	Datatype	Default	Description	Supported Values
`lr`	float	0.0005	The learning rate	>=0.0
`optim`	str	adamw
`policy`	str	linear	The learning scheduler: * `linear` : LambdaLR decreases the `lr` by a multiplicative factor. * `step` : StepLR decrease the `lr` by 0.1 at every `num_epochs`//3	linear/step
`momentum`	float	0.9	The momentum for the AdamW optimizer
`weight_decay`	float	0.1	The weight decay coefficient
`monitor_name`	str	val_loss	The name of the monitor used for saving the top-k checkpoints.

Model

The following example model config provides options to change the VisualChangeNet-Classification architecture for training. VisualChangeNet-Classification supports two model architectures. Architecture 1 leverages only the last feature maps from the FAN backbone using Euclidean difference to perform contrastive learning. Architecture 2 leverages the VisualChangeNet-Classification learnable difference modules for 4 different features at 3 feature resolutions to minimize Cross-Entropy loss.

Copy
Copied!

            
            model:
  backbone:
    type: "fan_small_12_p4_hybrid"
    pretrained_backbone_path: null
    freeze_backbone: False
  decode_head:
    feature_strides: [4, 8, 16, 16]
    align_corner: False
  classify:
    train_margin_euclid: 2.0
    eval_margin: 0.005
    embedding_vectors: 5
    embed_dec: 30
    difference_module: 'learnable'
    learnable_difference_modules: 4

Parameter Datatype Default Description Supported Values

backbone

Dict
string

bool
bool

None

None
False
False

A dictionary containing the following configurable parameters for VisualChangeNet-Classification backbone:
* type: Th|e name| of the backbone to be used

* pretrained_backbone_path: The path to pre-trained backbone weights file
* freeze_backbone: Whether to freeze the backbone weights during training
* feat_downsample: Whether to downsample the last feature map in FAN backbone configurations

fan_tiny_8_p4_hybrid
fan_large_16_p4_hybrid
fan_small_12_p4_hybrid
fan_base_16_p4_hybrid
vit_large_nvdinov2

decode_head

Dict

bool
list
Dict
list

None

False
[4, 8, 16, 16]

256

A dictionary containing the following configurable parameters for the decoder:

* align_corners: If set to True, the input and output tensors are aligned by the center points of their corner pixels, preserving the values at the corner pixels.
* feature_strides: The downsampling feature strides for different backbones.
* decoder_params: Contains the following network parameters:

* embed_dims: The embedding dimensions

True, False

classify

Dict
string

None
2.0

5
30
learnable
4

A dictionary containing the following configurable parameters for VisualChangeNet-Classification model:
* train_margin_euclid: The training margin threshold for contrastive learning (applicable for Architecture 1)
* eval_margin: The evaluation margin threshold
* embedding_vectors: The output embedding dimension for each input image before computing Euclidean distance (applicable for Architecture 1)
* embed_dec: The transformer decoder MLP embedding dimension (applicable for Architecture 2)
* difference_module: The type of difference module used (applicable for both Achitectures)
* learnable_difference_modules: The number of learnable difference modules (applicable for Architecture 2)

>0
>0
>0
>0
Euclidean, learnable
<4

Dataset

The dataset parameter defines the dataset source, training batch size, augmentation, and pre-processing. An example dataset is provided below.

Copy
Copied!

            
            dataset:
  classify:
    train_dataset:
      csv_path: /path/to/train.csv
      images_dir: /path/to/img_dir
    validation_dataset:
      csv_path: /path/to/val.csv
      images_dir: /path/to/img_dir
    test_dataset:
      csv_path: /path/to/test.csv
      images_dir: /path/to/img_dir
    infer_dataset:
      csv_path: /path/to/infer.csv
      images_dir: /path/to/img_dir
    image_ext: .jpg
    batch_size: 16
    workers: 2
    fpratio_sampling: 0.2
    num_input: 4
    input_map:
      LowAngleLight: 0
      SolderLight: 1
      UniformLight: 2
      WhiteLight: 3
    concat_type: linear
    grid_map:
      x: 2
      y: 2
    output_shape:
      - 128
      - 128
    augmentation_config:
      rgb_input_mean: [0.485, 0.456, 0.406]
      rgb_input_std: [0.229, 0.224, 0.225]
    num_classes: 2
    color_map:
      '0': [255, 255, 255]
      '1': [0, 0, 0]

* See the Dataset Annotation Format definition for more information about specifying lighting conditions.

Parameter	Datatype	Default	Description	Supported Values
`segment`	Dict	–	The `segment` contains dataset config for the segmentation dataloader
`classify`	Dict	–	The `classify` contains dataset config for the classification dataloader detailed in the classify section.

classify

Parameter	Datatype	Default	Description	Supported Values
`train_dataset`	Dict	–	The paths to the image directory and CSV files for the training dataset
`validation_dataset`	Dict	–	The paths to the image directory and CSV files for the validation dataset
`test_dataset`	Dict	–	The paths to the image directory and CSV files for the test dataset
`infer_dataset`	Dict	–	The paths to the image directory and CSV files for the inference dataset
`image_ext`	str	.jpg	The file extension of the images in the dataset	string
`batch_size`	int	32	The number of samples per batch	string
`workers`	int	8	The number of worker processes for data loading
`fpratio_sampling`	int	0.1	The ratio of false-positive examples to sample	>0
`num_input`	int	4	The number of lighting conditions for each input image*	>0
`input_map`	Dict	–	The mapping of lighting conditions to indices specifying concatenation ordering*
`concat_type`	string	linear	Type of concatenation to use for different image lighting conditions	linear, grid
`grid_map`	Dict Dict dict config	None None None	The parameters to define the grid dimensions to concatenate images as a grid: * x: The number of images along the x-axis * y: The number of images along the y-axis	Dict
`output_shape`	List[int]	[100, 100]	Image resolution of each lighting condition to be reshaped before concatenation	>=0
`augmentation_config`	Dict List[float] List[float]	None [0.485, 0.456, 0.406] [0.229, 0.224, 0.225]	The image normalization config, which contains the following parameters: * `rgb_input_mean`: The mean to be subtracted for pre-processing * `rgb_input_std`: The standard deviation to divide the image by	>=0.0 >=0.0
`num_classes`	int	2	The number of classes in the dataset	>1

segment

Parameter	Datatype	Default	Description	Supported Values
`root_dir`	str	–	The root directory containing the dataset.
`label_transform`	str	‘norm’	The label transformation applied to normalize the label images to represent class IDs.	‘norm’, None
`data_name`	str	‘LEVIR’	The name of the dataset.
`dataset`	str	‘CNDataset’	The name of the dataset class.	‘CNDataset’
`multi_scale_train`	bool	True	Whether to use multi-scale training.	True, False
`multi_scale_infer`	bool	False	Whether to use multi-scale inference.	True, False
`num_classes`	int	2	The number of classes in the dataset.	>0
`img_size`	int	256	The size of input images.	>0
`batch_size`	int	8	The number of samples per batch.
`workers`	int	2	The number of worker processes for data loading.
`shuffle`	bool	True	Whether to shuffle the dataset during loading.
`image_folder_name`	str	‘A’	The name of the folder containing images.
`change_image_folder_name`	str	‘B’	The name of the folder containing changed images.
`list_folder_name`	str	‘list’	The name of the folder containing dataset split lists.
`annotation_folder_name`	str	‘label’	The name of the folder containing annotations.
`augmentation`	Dict List[float] List[float] bool bool Dict Dict Dict Dict	None [0.5, 0.5, 0.5] [0.5, 0.5, 0.5] True True - - - -	The image augmentation config, which contains the following parameters: * `mean`: The mean to be subtracted for pre-processing * `std`: The mean to be subtracted for pre-processing * `with_random_blur`: Whether to apply random blur * `with_random_crop`: Whether to apply random crop * `with_scale_random_crop`: Apply random crop with scale * `random_color`: Apply random color augmentation * `random_rotate`: Apply random rotate augmentation * `random_flip`: Apply random flip augmentation	>=0.0 >=0.0
`train_split`	str	‘train’	The name of the training split.
`validation_split`	str	‘val’	The name of the validation split.
`test_split`	str	‘test’	The name of the test split.
`predict_split`	str	‘test’	The name of the prediction split.
`label_suffix`	str	‘.png’	The suffix for label files.
`color_map`	Optional[Dict[str, List[int]]]	None	Mapping of string class labels (‘0’ to ‘n’) to rgb color codes.

Example spec file for ViT backbones

Note

The following spec file is only relevant for TAO Toolkit versions 5.3 and later.

Copy
Copied!

            
            encryption_key: tlt_encode
task: classify
train:
  pretrained_model_path: /path/to/pretrained/model.pth
  resume_training_checkpoint_path: null
  classify:
    loss: "contrastive"
    cls_weight: [1.0, 10.0]
  num_epochs: 350
  num_nodes: 1
  val_interval: 1
  checkpoint_interval: 1
  optim:
    lr: 0.0001
    optim: "adamw"
    policy: "linear"
    momentum: 0.9
    weight_decay: 0.01
  results_dir: "${results_dir}/train"
  tensorboard:
    enabled: True
results_dir: /path/to/experiment_results
model:
  backbone:
    type: "vit_large_nvdinov2"
    pretrained_backbone_path: /path/to/pretrained/backbone.pth
    freeze_backbone: False
  decode_head:
    feature_strides: [4, 8, 16, 32]
  classify:
    train_margin_euclid: 2.0
    eval_margin: 0.005
    embedding_vectors: 5
    embed_dec: 30
    difference_module: 'euclidean'
    learnable_difference_modules: 4
dataset:
  classify:
    train_dataset:
      csv_path: /path/to/train.csv
      images_dir: /path/to/img_dir
    validation_dataset:
      csv_path: /path/to/val.csv
      images_dir: /path/to/img_dir
    test_dataset:
      csv_path: /path/to/test.csv
      images_dir: /path/to/img_dir
    infer_dataset:
      csv_path: /path/to/infer.csv
      images_dir: /path/to/img_dir
    image_ext: .jpg
    batch_size: 16
    workers: 2
    fpratio_sampling: 0.2
    num_input: 4
    input_map:
      LowAngleLight: 0
      SolderLight: 1
      UniformLight: 2
      WhiteLight: 3
    concat_type: grid
    grid_map:
      x: 2
      y: 2
    output_shape:
      - 112
      - 112
    augmentation_config:
      rgb_input_mean: [0.485, 0.456, 0.406]
      rgb_input_std: [0.229, 0.224, 0.225]
    num_classes: 2
evaluate:
  checkpoint: "???"
inference:
  checkpoint: "???"
export:
  gpu_id: 0
  checkpoint: "???"
  onnx_file: "???"
  input_width: 224
  input_height: 224

Training the Model

Use the following command to run VisualChangeNet-Classification training:

Copy
Copied!

            
            tao model visual_changenet train -e <experiment_spec_file>
                    -r <results_dir>
                    --gpus <num_gpus>
                    task=classify

Required Arguments

-e, --experiment_spec_file: The path to the experiment spec file.
-r, --results_dir: The path to a folder where the experiment outputs should be written.
task: The task (‘segment’ or ‘classify’) for the visual_changenet training. Default: segment.

Optional Arguments

--gpus: The number of GPUs to use for training. The default value is 1.

Here’s an example of using the VisualChangeNet training command:

Copy
Copied!

            
            tao model visual_changenet train -e $DEFAULT_SPEC -r $RESULTS_DIR --gpus $NUM_GPUs

Creating Testing Experiment Spec File

Here is an example spec file for testing evaluation and inference of a trained VisualChangeNet-Classification model.

Copy
Copied!

            
            results_dir: /path/to/experiment_results
task: classify
model:
  backbone:
    type: "fan_small_12_p4_hybrid"
  classify:
    eval_margin: 0.005
dataset:
  classify:
    test_dataset:
      csv_path: /path/to/test.csv
      images_dir: /path/to/img_dir
    infer_dataset:
      csv_path: /path/to/infer.csv
      images_dir: /path/to/img_dir
    image_ext: .jpg
    batch_size: 16
    workers: 2
    num_input: 4
    input_map:
      LowAngleLight: 0
      SolderLight: 1
      UniformLight: 2
      WhiteLight: 3
    concat_type: linear
    grid_map:
      x: 2
      y: 2
    output_shape:
      - 128
      - 128
    augmentation_config:
      rgb_input_mean: [0.485, 0.456, 0.406]
      rgb_input_std: [0.229, 0.224, 0.225]
    num_classes: 2
evaluate:
  checkpoint: /path/to/checkpoint
inference:
  checkpoint: /path/to/checkpoint

Inference/Evaluate

Parameter	Datatype	Default	Description	Supported Values
`checkpoint`	string		Path to PyTorch model to evaluate/inference
`trt_engine`	string		Path to TensorRT model to inference/evaluate. Should be only used with TAO Deploy
`num_gpus`	unsigned int	1	The number of GPUs to use	>0
`results_dir`	string		The path to a folder where the experiment outputs should be written
`vis_after_n_batches`	unsigned int	1	Number of batches after which to save inference/evaluate visualization results	>0
`batch_size`	unsigned int		The batch size of inference/evaluate
`gpu_id`	unsigned int	0	The GPU id to use

Evaluating the Model

Use the following command to run VisualChangeNet-Classification evaluation:

Copy
Copied!

            
            tao model visual_changenet evaluate -e <experiment_spec>
                       -r <results_dir>
                       task=classify
                       model.classify.eval_margin=0.5s

Required Arguments

-e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment.
-r, --results_dir: The path to a folder where the experiment outputs should be written.

Here’s an example of using the VisualChangeNet evaluation command:

Copy
Copied!

            
            tao model visual_changenet evaluate -e $DEFAULT_SPEC -r $RESULTS_DIR

Running Inference on the Model

Use the following command to run inference on VisualChangeNet-Classification with the .tlt model:

Copy
Copied!

            
            tao model visual_changenet inference -e <experiment_spec>
                        -r <results_dir>
                        task=classify
                        model.classify.eval_margin=0.5

Required Arguments

-e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment.
-r, --results_dir: The path to a folder where the experiment outputs should be written.

Here’s an example of using the VisualChangeNet inference command:

Copy
Copied!

            
            tao model visual_changenet inference -e $DEFAULT_SPEC -r $RESULTS_DIR

Exporting the Model

Here is an example spec file for exporting the trained VisualChangeNet model:

Copy
Copied!

            
            export:
  checkpoint: /path/to/model.pth
  onnx_file: /path/to/model.onnx
  opset_version: 12
  input_channel: 3
  input_width: 128
  input_height: 512
  batch_size: -1

Parameter	Datatype	Default	Description	Supported Values
`checkpoint`	string		The path to the PyTorch model to export
`onnx_file`	string		The path to the `.onnx` file
`opset_version`	unsigned int	12	The opset version of the exported ONNX	>0
`input_channel`	unsigned int	3	The input channel size. Only the value 3 is supported.	3
`input_width`	unsigned int	128	The input width	>0
`input_height`	unsigned int	512	The input height	>0
`batch_size`	unsigned int	-1	The batch size of the ONNX model. If this value is set to -1, the export uses dynamic batch size.	>=-1
`gpu_id`	unsigned int	0	The GPU id to use
`on_cpu`	bool	False	Whether to export on cpu
`verbose`	bool	False	Print out a human-readable representation of the network

Use the following command to export the model:

Copy
Copied!

            
            tao model visual_changenet export [-h] -e <experiment spec file>
                          -r <results_dir>
                          task=classify

Required Arguments

-e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment.
-r, --results_dir: The path to a folder where the experiment outputs should be written.

Sample Usage

The following is an example export command:

Copy
Copied!

            
            tao model visual_changenet export -e /path/to/spec.yaml -r $RESULTS_DIR

TensorRT Engine Generation, Validation, and int8 Calibration

For deployment, refer to the TAO Deploy Documentation for VisualChangeNet-Classification.