SiameseOI
SiameseOI is an NVIDIA-developed optical inspection model for PCB data and is included in the TAO. SiameseOI supports the following tasks:
train
evaluate
inference
export
These tasks can be invoked from the TAO Launcher using the following convention on the command-line:
tao model optical_inspection <sub_task> <args_per_subtask>
Where args_per_subtask
are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.
SiameseOI requires the data to be provided as image folders and CSV files. See the Data Annotation Format page for more information about the input data format for SiameseOI.
Configuring a Custom Dataset
This section provides an example configuration and commands for training SiameseOI using the dataset format described above.
You will need to configure the augmentation_config
mean and standard deviation based on your input dataset.
Here is an example spec file for training a SiameseOI model with a custom backbone on a custom dataset using the Data Annotation Format.
results_dir: /path/to/experiment_results
model:
model_type: Siamese_3
model_backbone: custom
embedding_vectors: 5
margin: 2.0
dataset:
train_dataset:
csv_path: /path/to/split/train.csv
images_dir: /path/to/images_dir/
validation_dataset:
csv_path: /path/to/split/val.csv
images_dir: /path/to/images_dir/
image_ext: .jpg
batch_size: 32
workers: 8
fpratio_sampling: 0.1
num_input: 4
input_map:
LowAngleLight: 0
SolderLight: 1
UniformLight: 2
WhiteLight: 3
concat_type: linear
grid_map:
x: 2
y: 2
image_width: 100
image_height: 100
augmentation_config:
rgb_input_mean: [0.485, 0.456, 0.406]
rgb_input_std: [0.229, 0.224, 0.225]
train:
optim:
type: Adam
lr: 0.0005
loss: contrastive
num_epochs: 10
checkpoint_interval: 5
validation_interval: 5
results_dir: "${results_dir}/train"
seed: 1234
Parameter | Data Type | Default | Description | Supported Values |
model |
dict config | – | The configuration of the model architecture | |
dataset |
dict config | – | The configuration of the dataset | |
train |
dict config | – | The configuration of the training task | |
evaluate |
dict config | – | The configuration of the evaluation task | |
inference |
dict config | – | The configuration of the inference task | |
encryption_key |
string | None | The encryption key to encrypt and decrypt model files | |
results_dir |
string | /results | The directory where experiment results are saved | |
export |
dict config | – | The configuration of the ONNX export task | |
gen_trt_engine |
dict config | – | The configuration of the TensorRT generation task. Only used in TAO deploy |
Parameter | Datatype | Default | Description | Supported Values |
num_gpus |
unsigned int | 1 | The number of GPUs to use for distributed training | >0 |
gpu_ids |
List[int] | [0] | The indices of the GPU’s to use for distributed training | |
seed |
unsigned int | 1234 | The random seed for random, NumPy, and torch | >0 |
num_epochs |
unsigned int | 10 | The total number of epochs to run the experiment | >0 |
checkpoint_interval |
unsigned int | 1 | The epoch interval at which the checkpoints are saved | >0 |
validation_interval |
unsigned int | 1 | The epoch interval at which the validation is run | >0 |
resume_training_checkpoint_path |
string | The intermediate PyTorch Lightning checkpoint to resume training from | ||
results_dir |
string | /results/train | The directory to save training results | |
optim |
dict config | None | Contains the configurable parameters for the SiameseOI optimizer detailed in the optim section. | |
loss |
str | contrastive | The loss function used during training |
optim
optim:
lr: 0.0005
Parameter | Datatype | Default | Description | Supported Values |
lr |
float | 0.0005 | The learning rate | >=0.0 |
The following example model
config provides options to change the SiameseOI architecture for training.
model:
model_type: Siamese_3
model_backbone: custom
embedding_vectors: 5
margin: 2.0
The following example model
is used during SiameseOI evaluation/inference.
Parameter | Datatype | Default | Description | Supported Values |
model_type |
string | Siamese_3 | The default model architecture from the supported custom model architectures | Siamese_3, Siamese_1 |
model_backbone |
string | custom | The name of the backbone to use | custom |
embedding_vectors |
int | 5 | The embedding dimensions of the final output from the model before computing Euclidian distance | |
margin |
float | 2.0 | The threshold parameter that determines the minimum distance between embeddings of positive and negative pairs |
The dataset
parameter defines the dataset source, training batch size,
augmentation, and pre-processing. An example dataset
is provided below.
dataset:
train_dataset:
csv_path: /path/to/split/train.csv
images_dir: /path/to/images_dir/
validation_dataset:
csv_path: /path/to/split/val.csv
images_dir: /path/to/images_dir/
image_ext: .jpg
batch_size: 32
workers: 8
fpratio_sampling: 0.1
num_input: 4
input_map:
LowAngleLight: 0
SolderLight: 1
UniformLight: 2
WhiteLight: 3
concat_type: linear
grid_map:
x: 2
y: 2
image_width: 100
image_height: 100
augmentation_config:
rgb_input_mean: [0.485, 0.456, 0.406]
rgb_input_std: [0.229, 0.224, 0.225]
Parameter | Datatype | Default | Description | Supported Values |
train_dataset |
Dict | – | The paths to the image directory and CSV files for the training dataset | |
validation_dataset |
Dict | – | The paths to the image directory and CSV files for the validation dataset | |
image_ext |
str | .jpg | The file extension of the images in the dataset | string |
batch_size |
int | 32 | The number of samples per batch | string |
workers |
int | 8 | The number of worker processes for data loading | |
fpratio_sampling |
int | 0.1 | The ratio of false-positive examples to sample | >0 |
num_input |
int | 4 | The number of lighting conditions for each input image* | >0 |
input_map |
Dict | – | The mapping of lighting conditions to indices specifying concatenation ordering* | |
concat_type |
string | linear | Type of concatenation to use for different image lighting conditions | linear, grid |
|
Dict Dict dict config |
None None None |
The parameters to define the grid dimensions to concatenate images as a grid: * x: The number of images along the x-axis * y: The number of images along the y-axis |
Dict |
input_width |
int | 100 | The width of the input image | >0 |
input_height |
int | 100 | The height of the input image | >0 |
|
Dict List[float] List[float] |
None [0.485, 0.456, 0.406] [0.229, 0.224, 0.225] |
The image normalization config, which contains the following parameters:
*
* |
>=0.0 >=0.0 |
* See the Dataset Annotation Format definition for more information about specifying lighting conditions.
Use the following command to run SiameseOI training:
tao model optical_inspection train [-h] -e <experiment_spec>
[results_dir=<global_results_dir>]
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[train.<train_option>=<train_option_value>]
[train.gpu_ids=<gpu indices>]
[train.num_gpus=<number of gpus>]
Required Arguments
The only required argument is the path to the experiment spec:
-e, --experiment_spec
: The experiment specification file to set up the training experiment
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
-h, --help
: Show this help message and exit.model.<model_option>
: The model options.dataset.<dataset_option>
: The dataset options.train.<train_option>
: The train options.train.optim.<optim_option>
: The optimizer options
For training, evaluation, and inference, we expose 2 variables for each respective task: num_gpus
and gpu_ids
, which
default to 1
and [0]
, respectively. If both are passed, but inconsistent, for example num_gpus = 1
,
gpu_ids = [0, 1]
, then they are modified to follow the setting with more GPUs, for example num_gpus = 1 -> num_gpus = 2
.
Checkpointing and Resuming Training
At every train.checkpoint_interval
, a PyTorch Lightning checkpoint is saved. It is called model_epoch_<epoch_num>.pth
.
These are saved in train.results_dir
, like so:
$ ls /results/train
'model_epoch_000.pth'
'model_epoch_001.pth'
'model_epoch_002.pth'
'model_epoch_003.pth'
'model_epoch_004.pth'
The latest checkpoint is also be saved as oi_model_latest.pth
.
Training automatically resumes from oi_model_latest.pth
, if it exists in train.results_dir
.
This is superseded by train.resume_training_checkpoint_path
, if it is provided.
The major implication of this logic is that, if you wish to trigger fresh training from scratch, either:
Specify a new, empty results directory (Recommended)
Remove the latest checkpoint from the results directory
Here is an example spec file for testing evaluation and inference of a trained SiameseOI model.
results_dir: /path/to/experiment_results
model:
model_type: Siamese_3
model_backbone: custom
embedding_vectors: 5
margin: 2.0
dataset:
validation_dataset:
csv_path: /path/to/split/val.csv
images_dir: /path/to/images_dir/
image_ext: .jpg
batch_size: 32
workers: 8
num_input: 4
input_map:
LowAngleLight: 0
SolderLight: 1
UniformLight: 2
WhiteLight: 3
concat_type: linear
grid_map:
x: 2
y: 2
image_width: 100
image_height: 100
augmentation_config:
rgb_input_mean: [0.485, 0.456, 0.406]
rgb_input_std: [0.229, 0.224, 0.225]
evaluate:
num_gpus: 1
gpu_ids: [0]
checkpoint: "${results_dir}/train/oi_model_lastest.pth"
results_dir: "${results_dir}/evaluate"
inference:
num_gpus: 1
gpu_ids: [0]
checkpoint: "${results_dir}/train/oi_model_latest.pth"
results_dir: "${results_dir}/inference"
Use the following command to run SiameseOI evaluation:
tao model optical_inspection evaluate [-h] -e <experiment_spec>
evaluate.checkpoint=<model to be evaluated>
[evaluate.<evaluate_option>=<evaluate_option_value>]
[evaluate.gpu_ids=<gpu indices>]
[evaluate.num_gpus=<number of gpus>]
Multi-GPU evaluation is currently not supported for Optical Inspection.
Required Arguments
-e, --experiment_spec
: The experiment spec file to set up the evaluation experiment.evaluate.checkpoint
: The.pth
model to be evaluated.
Optional Arguments
evaluate.<evaluate_option>
: The evaluate options.
Use the following command to run inference on SiameseOI with the .tlt
model:
tao model optical_inspection inference [-h] -e <experiment spec file>
inference.checkpoint=<model to be inferenced>
[inference.<inference_option>=<inference_option_value>]
[inference.gpu_ids=<gpu indices>]
[inference.num_gpus=<number of gpus>]
Required Arguments
-e, --experiment_spec
: The experiment spec file to set up the inference experiment.inference.checkpoint
: The.pth
model to inference.
Optional Arguments
inference.<inference_option>
: The inference options.
Here is an example spec file for exporting the trained SiameseOI model:
export:
checkpoint: "${results_dir}/train/oi_model_epoch=004.pth"
results_dir: "${results_dir}/export"
onnx_file: "${export.results_dir}/oi_model.onnx"
batch_size: 32
Use the following command to export the model:
tao model optical_inspection export [-h] -e <experiment spec file>
export.checkpoint=<model to export>
export.onnx_file=<onnx path>
[export.<export_option>=<export_option_value>]
Required Arguments
-e, --experiment_spec
: The path to an experiment spec file.export.checkpoint
: The.pth
model to export.export.onnx_file
: The path where the.etlt
or.onnx
model is saved.
Optional Arguments
export.<export_option>
: The export options.
For deployment, refer to the TAO Deploy Documentation for SiameseOI.