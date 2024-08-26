Visual ChangeNet-Classification
Visual ChangeNet-Classification is an NVIDIA-developed classification change detection model and is included in the TAO Toolkit. Visual ChangeNet supports the following tasks:
train
evaluate
inference
export
These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:
tao model visual_changenet <sub_task> <args_per_subtask>
Where
args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.
VisualChangeNet-Classification requires the data to be provided as image and CSV files. See the Data Annotation Format page for more information about the input data format for VisualChangeNet-Classification, which follows the same input data format as Optical Inspection.
Configuring a Custom Dataset
This section provides example configuration and commands for training VisualChangeNet-Classification using the dataset format described above.
Here is an example spec file for training a VisualChangeNet-Classification model with NVIDIA’s FAN Hybrid backbone using the Data Annotation Format.
encryption_key: tlt_encode
task: classify
train:
pretrained_model_path: /path/to/pretrained/model.pth
resume_training_checkpoint_path: null
classify:
loss: "ce"
cls_weight: [1.0, 10.0]
num_epochs: 350
num_nodes: 1
val_interval: 1
checkpoint_interval: 1
optim:
lr: 0.0001
optim: "adamw"
policy: "linear"
momentum: 0.9
weight_decay: 0.01
results_dir: "${results_dir}/train"
tensorboard:
enabled: True
results_dir: /path/to/experiment_results
model:
backbone:
type: "fan_small_12_p4_hybrid"
pretrained_backbone_path: null
freeze_backbone: False
decode_head:
feature_strides: [4, 8, 16, 16]
classify:
train_margin_euclid: 2.0
eval_margin: 0.005
embedding_vectors: 5
embed_dec: 30
difference_module: 'learnable'
learnable_difference_modules: 4
dataset:
classify:
train_dataset:
csv_path: /path/to/train.csv
images_dir: /path/to/img_dir
validation_dataset:
csv_path: /path/to/val.csv
images_dir: /path/to/img_dir
test_dataset:
csv_path: /path/to/test.csv
images_dir: /path/to/img_dir
infer_dataset:
csv_path: /path/to/infer.csv
images_dir: /path/to/img_dir
image_ext: .jpg
batch_size: 16
workers: 2
fpratio_sampling: 0.2
num_input: 4
input_map:
LowAngleLight: 0
SolderLight: 1
UniformLight: 2
WhiteLight: 3
concat_type: linear
grid_map:
x: 2
y: 2
output_shape:
- 128
- 128
augmentation_config:
rgb_input_mean: [0.485, 0.456, 0.406]
rgb_input_std: [0.229, 0.224, 0.225]
num_classes: 2
evaluate:
checkpoint: "???"
inference:
checkpoint: "???"
export:
gpu_id: 0
checkpoint: "???"
onnx_file: "???"
input_width: 128
input_height: 512
|Parameter
|Data Type
|Default
|Description
|
model
|dict config
|–
|The configuration of the model architecture
|
dataset
|dict config
|–
|The configuration for the dataset detailed in the Config section
|
train
|dict config
|–
|The configuration for training parameters, which is detailed in the Train section
|
results_dir
|string
|–
|The path to save the model experiment log outputs and model checkpoints
|
task
|str
|classify
|A flag to indicate the change detection task. Supports two tasks: ‘segment’ and ‘classify’ for segmentation and classification
|
results_dir
|string
|–
|The path to save the model training experiment log outputs and model checkpoints
|
checkpoint_interval
|int
|1
|The interval at which the checkpoint needs to be saved
|
resume_training_checkpoint_path
|str
|None
|The path to the checkpoint for resuming training
|
num_nodes
|unsigned int
|1
|The number of nodes. If the value is larger than 1, multi-node is enabled.
|
val_interval
|unsigned int
|1
|The epoch interval at which the validation is run
|
checkpoint_interval
|int
|1
|The number of steps at which the checkpoint needs to be saved
|
num_epochs
|int
|50
|The total number of epochs to run the experiment
|
pretrained_model_path
|string
|–
|The path to the pretrained model checkpoint to initialize the end-end model weights.
|
optim
optim:
lr: 0.0001
optim: "adamw"
policy: "linear"
momentum: 0.9
weight_decay: 0.01
model config provides options to change the VisualChangeNet-Classification architecture for training.
VisualChangeNet-Classification supports two model architectures. Architecture 1 leverages only the last feature maps from the
FAN backbone using Euclidean difference to perform contrastive learning. Architecture 2 leverages the VisualChangeNet-Classification learnable
difference modules for 4 different features at 3 feature resolutions to minimize Cross-Entropy loss.
model:
backbone:
type: "fan_small_12_p4_hybrid"
pretrained_backbone_path: null
freeze_backbone: False
decode_head:
feature_strides: [4, 8, 16, 16]
align_corner: False
classify:
train_margin_euclid: 2.0
eval_margin: 0.005
embedding_vectors: 5
embed_dec: 30
difference_module: 'learnable'
learnable_difference_modules: 4
The
dataset parameter defines the dataset source, training batch size,
augmentation, and pre-processing. An example
dataset is provided below.
dataset:
classify:
train_dataset:
csv_path: /path/to/train.csv
images_dir: /path/to/img_dir
validation_dataset:
csv_path: /path/to/val.csv
images_dir: /path/to/img_dir
test_dataset:
csv_path: /path/to/test.csv
images_dir: /path/to/img_dir
infer_dataset:
csv_path: /path/to/infer.csv
images_dir: /path/to/img_dir
image_ext: .jpg
batch_size: 16
workers: 2
fpratio_sampling: 0.2
num_input: 4
input_map:
LowAngleLight: 0
SolderLight: 1
UniformLight: 2
WhiteLight: 3
concat_type: linear
grid_map:
x: 2
y: 2
output_shape:
- 128
- 128
augmentation_config:
rgb_input_mean: [0.485, 0.456, 0.406]
rgb_input_std: [0.229, 0.224, 0.225]
num_classes: 2
color_map:
'0': [255, 255, 255]
'1': [0, 0, 0]
* See the Dataset Annotation Format definition for more information about specifying lighting conditions.
|Parameter
|Datatype
|Default
|Description
|Supported Values
|
segment
|Dict
|–
|The
segment contains dataset config for the segmentation dataloader
|
classify
|Dict
|–
|The
classify contains dataset config for the classification dataloader detailed in the classify section.
classify
|Parameter
|Datatype
|Default
|Description
|Supported Values
|
train_dataset
|Dict
|–
|The paths to the image directory and CSV files for the training dataset
|
validation_dataset
|Dict
|–
|The paths to the image directory and CSV files for the validation dataset
|
test_dataset
|Dict
|–
|The paths to the image directory and CSV files for the test dataset
|
infer_dataset
|Dict
|–
|The paths to the image directory and CSV files for the inference dataset
|
image_ext
|str
|.jpg
|The file extension of the images in the dataset
|string
|
batch_size
|int
|32
|The number of samples per batch
|string
|
workers
|int
|8
|The number of worker processes for data loading
|
fpratio_sampling
|int
|0.1
|The ratio of false-positive examples to sample
|>0
|
num_input
|int
|4
|The number of lighting conditions for each input image*
|>0
|
input_map
|Dict
|–
|The mapping of lighting conditions to indices specifying concatenation ordering*
|
concat_type
|string
|linear
|Type of concatenation to use for different image lighting conditions
|linear, grid
|
|
Dict
Dict
dict config
|
None
None
None
|
The parameters to define the grid dimensions to concatenate images as a grid:
* x: The number of images along the x-axis
* y: The number of images along the y-axis
|
Dict
|
output_shape
|List[int]
|[100, 100]
|Image resolution of each lighting condition to be reshaped before concatenation
|>=0
|
|
Dict
List[float]
List[float]
|
None
[0.485, 0.456, 0.406]
[0.229, 0.224, 0.225]
|
The image normalization config, which contains the following parameters:
*
*
|
>=0.0
>=0.0
|
num_classes
|int
|2
|The number of classes in the dataset
|>1
segment
|Parameter
|Datatype
|Default
|Description
|Supported Values
|
root_dir
|str
|–
|The root directory containing the dataset.
|
label_transform
|str
|‘norm’
|The label transformation applied to normalize the label images to represent class IDs.
|‘norm’, None
|
data_name
|str
|‘LEVIR’
|The name of the dataset.
|
dataset
|str
|‘CNDataset’
|The name of the dataset class.
|‘CNDataset’
|
multi_scale_train
|bool
|True
|Whether to use multi-scale training.
|True, False
|
multi_scale_infer
|bool
|False
|Whether to use multi-scale inference.
|True, False
|
num_classes
|int
|2
|The number of classes in the dataset.
|>0
|
img_size
|int
|256
|The size of input images.
|>0
|
batch_size
|int
|8
|The number of samples per batch.
|
workers
|int
|2
|The number of worker processes for data loading.
|
shuffle
|bool
|True
|Whether to shuffle the dataset during loading.
|
image_folder_name
|str
|‘A’
|The name of the folder containing images.
|
change_image_folder_name
|str
|‘B’
|The name of the folder containing changed images.
|
list_folder_name
|str
|‘list’
|The name of the folder containing dataset split lists.
|
annotation_folder_name
|str
|‘label’
|The name of the folder containing annotations.
|
|
Dict
List[float]
|
None
[0.5, 0.5, 0.5]
|
The image augmentation config, which contains the following parameters:
*
|
>=0.0
|
train_split
|str
|‘train’
|The name of the training split.
|
validation_split
|str
|‘val’
|The name of the validation split.
|
test_split
|str
|‘test’
|The name of the test split.
|
predict_split
|str
|‘test’
|The name of the prediction split.
|
label_suffix
|str
|‘.png’
|The suffix for label files.
|
color_map
|Optional[Dict[str, List[int]]]
|None
|Mapping of string class labels (‘0’ to ‘n’) to rgb color codes.
Example spec file for ViT backbones
The following spec file is only relevant for TAO Toolkit versions 5.3 and later.
encryption_key: tlt_encode
task: classify
train:
pretrained_model_path: /path/to/pretrained/model.pth
resume_training_checkpoint_path: null
classify:
loss: "contrastive"
cls_weight: [1.0, 10.0]
num_epochs: 350
num_nodes: 1
val_interval: 1
checkpoint_interval: 1
optim:
lr: 0.0001
optim: "adamw"
policy: "linear"
momentum: 0.9
weight_decay: 0.01
results_dir: "${results_dir}/train"
tensorboard:
enabled: True
results_dir: /path/to/experiment_results
model:
backbone:
type: "vit_large_nvdinov2"
pretrained_backbone_path: /path/to/pretrained/backbone.pth
freeze_backbone: False
decode_head:
feature_strides: [4, 8, 16, 32]
classify:
train_margin_euclid: 2.0
eval_margin: 0.005
embedding_vectors: 5
embed_dec: 30
difference_module: 'euclidean'
learnable_difference_modules: 4
dataset:
classify:
train_dataset:
csv_path: /path/to/train.csv
images_dir: /path/to/img_dir
validation_dataset:
csv_path: /path/to/val.csv
images_dir: /path/to/img_dir
test_dataset:
csv_path: /path/to/test.csv
images_dir: /path/to/img_dir
infer_dataset:
csv_path: /path/to/infer.csv
images_dir: /path/to/img_dir
image_ext: .jpg
batch_size: 16
workers: 2
fpratio_sampling: 0.2
num_input: 4
input_map:
LowAngleLight: 0
SolderLight: 1
UniformLight: 2
WhiteLight: 3
concat_type: grid
grid_map:
x: 2
y: 2
output_shape:
- 112
- 112
augmentation_config:
rgb_input_mean: [0.485, 0.456, 0.406]
rgb_input_std: [0.229, 0.224, 0.225]
num_classes: 2
evaluate:
checkpoint: "???"
inference:
checkpoint: "???"
export:
gpu_id: 0
checkpoint: "???"
onnx_file: "???"
input_width: 224
input_height: 224
Use the following command to run VisualChangeNet-Classification training:
tao model visual_changenet train -e <experiment_spec_file>
-r <results_dir>
--gpus <num_gpus>
task=classify
Required Arguments
-e, --experiment_spec_file: The path to the experiment spec file.
-r, --results_dir: The path to a folder where the experiment outputs should be written.
task: The task (‘segment’ or ‘classify’) for the visual_changenet training. Default: segment.
Optional Arguments
--gpus: The number of GPUs to use for training. The default value is 1.
Here’s an example of using the VisualChangeNet training command:
tao model visual_changenet train -e $DEFAULT_SPEC -r $RESULTS_DIR --gpus $NUM_GPUs
Here is an example spec file for testing evaluation and inference of a trained VisualChangeNet-Classification model.
results_dir: /path/to/experiment_results
task: classify
model:
backbone:
type: "fan_small_12_p4_hybrid"
classify:
eval_margin: 0.005
dataset:
classify:
test_dataset:
csv_path: /path/to/test.csv
images_dir: /path/to/img_dir
infer_dataset:
csv_path: /path/to/infer.csv
images_dir: /path/to/img_dir
image_ext: .jpg
batch_size: 16
workers: 2
num_input: 4
input_map:
LowAngleLight: 0
SolderLight: 1
UniformLight: 2
WhiteLight: 3
concat_type: linear
grid_map:
x: 2
y: 2
output_shape:
- 128
- 128
augmentation_config:
rgb_input_mean: [0.485, 0.456, 0.406]
rgb_input_std: [0.229, 0.224, 0.225]
num_classes: 2
evaluate:
checkpoint: /path/to/checkpoint
inference:
checkpoint: /path/to/checkpoint
Inference/Evaluate
|Parameter
|Datatype
|Default
|Description
|Supported Values
|
checkpoint
|string
|Path to PyTorch model to evaluate/inference
|
trt_engine
|string
|Path to TensorRT model to inference/evaluate. Should be only used with TAO Deploy
|
num_gpus
|unsigned int
|1
|The number of GPUs to use
|>0
|
results_dir
|string
|The path to a folder where the experiment outputs should be written
|
vis_after_n_batches
|unsigned int
|1
|Number of batches after which to save inference/evaluate visualization results
|>0
|
batch_size
|unsigned int
|The batch size of inference/evaluate
|
gpu_id
|unsigned int
|0
|The GPU id to use
Use the following command to run VisualChangeNet-Classification evaluation:
tao model visual_changenet evaluate -e <experiment_spec>
-r <results_dir>
task=classify
model.classify.eval_margin=0.5s
Required Arguments
-e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment.
-r, --results_dir: The path to a folder where the experiment outputs should be written.
Here’s an example of using the VisualChangeNet evaluation command:
tao model visual_changenet evaluate -e $DEFAULT_SPEC -r $RESULTS_DIR
Use the following command to run inference on VisualChangeNet-Classification with the
.tlt model:
tao model visual_changenet inference -e <experiment_spec>
-r <results_dir>
task=classify
model.classify.eval_margin=0.5
Required Arguments
-e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment.
-r, --results_dir: The path to a folder where the experiment outputs should be written.
Here’s an example of using the VisualChangeNet inference command:
tao model visual_changenet inference -e $DEFAULT_SPEC -r $RESULTS_DIR
Here is an example spec file for exporting the trained VisualChangeNet model:
export:
checkpoint: /path/to/model.pth
onnx_file: /path/to/model.onnx
opset_version: 12
input_channel: 3
input_width: 128
input_height: 512
batch_size: -1
|Parameter
|Datatype
|Default
|Description
|Supported Values
|
checkpoint
|string
|The path to the PyTorch model to export
|
onnx_file
|string
|The path to the
.onnx file
|
opset_version
|unsigned int
|12
|The opset version of the exported ONNX
|>0
|
input_channel
|unsigned int
|3
|The input channel size. Only the value 3 is supported.
|3
|
input_width
|unsigned int
|128
|The input width
|>0
|
input_height
|unsigned int
|512
|The input height
|>0
|
batch_size
|unsigned int
|-1
|The batch size of the ONNX model. If this value is set to -1, the export uses dynamic batch size.
|>=-1
|
gpu_id
|unsigned int
|0
|The GPU id to use
|
on_cpu
|bool
|False
|Whether to export on cpu
|
verbose
|bool
|False
|Print out a human-readable representation of the network
Use the following command to export the model:
tao model visual_changenet export [-h] -e <experiment spec file>
-r <results_dir>
task=classify
Required Arguments
-e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment.
-r, --results_dir: The path to a folder where the experiment outputs should be written.
Sample Usage
The following is an example
export command:
tao model visual_changenet export -e /path/to/spec.yaml -r $RESULTS_DIR
For deployment, refer to the TAO Deploy Documentation for VisualChangeNet-Classification.