SiameseOI#
SiameseOI is an NVIDIA-developed optical inspection model for PCB data and is included in the TAO. SiameseOI supports the following tasks:
train
evaluate
inference
export
These tasks can be invoked from the TAO Launcher using the following convention on the command-line:
tao model optical_inspection <sub_task> <args_per_subtask>
Where args_per_subtask
are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.
Data Input for SiameseOI#
SiameseOI requires the data to be provided as image folders and CSV files. See the Data Annotation Format page for more information about the input data format for SiameseOI.
Creating a Training Experiment Spec File#
Configuring a Custom Dataset#
This section provides an example configuration and commands for training SiameseOI using the dataset format described above.
You will need to configure the augmentation_config
mean and standard deviation based on your input dataset.
Here is an example spec file for training a SiameseOI model with a custom backbone on a custom dataset using the Data Annotation Format.
results_dir: /path/to/experiment_results
model:
model_type: Siamese_3
model_backbone: custom
embedding_vectors: 5
margin: 2.0
dataset:
train_dataset:
csv_path: /path/to/split/train.csv
images_dir: /path/to/images_dir/
validation_dataset:
csv_path: /path/to/split/val.csv
images_dir: /path/to/images_dir/
image_ext: .jpg
batch_size: 32
workers: 8
fpratio_sampling: 0.1
num_input: 4
input_map:
LowAngleLight: 0
SolderLight: 1
UniformLight: 2
WhiteLight: 3
concat_type: linear
grid_map:
x: 2
y: 2
image_width: 100
image_height: 100
augmentation_config:
rgb_input_mean: [0.485, 0.456, 0.406]
rgb_input_std: [0.229, 0.224, 0.225]
train:
optim:
type: Adam
lr: 0.0005
loss: contrastive
num_epochs: 10
checkpoint_interval: 5
validation_interval: 5
results_dir: "${results_dir}/train"
seed: 1234
Parameter |
Data Type |
Default |
Description |
Supported Values |
|
dict config |
– |
The configuration of the model architecture |
|
|
dict config |
– |
The configuration of the dataset |
|
|
dict config |
– |
The configuration of the training task |
|
|
dict config |
– |
The configuration of the evaluation task |
|
|
dict config |
– |
The configuration of the inference task |
|
|
string |
None |
The encryption key to encrypt and decrypt model files |
|
|
string |
/results |
The directory where experiment results are saved |
|
|
dict config |
– |
The configuration of the ONNX export task |
|
|
dict config |
– |
The configuration of the TensorRT generation task. Only used in TAO deploy |
train#
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
unsigned int |
1 |
The number of GPUs to use for distributed training |
>0 |
|
List[int] |
[0] |
The indices of the GPU’s to use for distributed training |
|
|
unsigned int |
1234 |
The random seed for random, NumPy, and torch |
>0 |
|
unsigned int |
10 |
The total number of epochs to run the experiment |
>0 |
|
unsigned int |
1 |
The epoch interval at which the checkpoints are saved |
>0 |
|
unsigned int |
1 |
The epoch interval at which the validation is run |
>0 |
|
string |
The intermediate PyTorch Lightning checkpoint to resume training from |
||
|
string |
/results/train |
The directory to save training results |
|
|
dict config |
None |
Contains the configurable parameters for the SiameseOI optimizer detailed in the optim section. |
|
|
str |
contrastive |
The loss function used during training |
optim#
optim:
lr: 0.0005
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
float |
0.0005 |
The learning rate |
>=0.0 |
Model#
The following example model
config provides options to change the SiameseOI architecture for training.
model:
model_type: Siamese_3
model_backbone: custom
embedding_vectors: 5
margin: 2.0
The following example model
is used during SiameseOI evaluation/inference.
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
string |
Siamese_3 |
The default model architecture from the supported custom model architectures |
Siamese_3, Siamese_1 |
|
string |
custom |
The name of the backbone to use |
custom |
|
int |
5 |
The embedding dimensions of the final output from the model before computing Euclidian distance |
|
|
float |
2.0 |
The threshold parameter that determines the minimum distance between embeddings of positive and negative pairs |
Dataset#
The dataset
parameter defines the dataset source, training batch size,
augmentation, and pre-processing. An example dataset
is provided below.
dataset:
train_dataset:
csv_path: /path/to/split/train.csv
images_dir: /path/to/images_dir/
validation_dataset:
csv_path: /path/to/split/val.csv
images_dir: /path/to/images_dir/
image_ext: .jpg
batch_size: 32
workers: 8
fpratio_sampling: 0.1
num_input: 4
input_map:
LowAngleLight: 0
SolderLight: 1
UniformLight: 2
WhiteLight: 3
concat_type: linear
grid_map:
x: 2
y: 2
image_width: 100
image_height: 100
augmentation_config:
rgb_input_mean: [0.485, 0.456, 0.406]
rgb_input_std: [0.229, 0.224, 0.225]
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
Dict |
– |
The paths to the image directory and CSV files for the training dataset |
|
|
Dict |
– |
The paths to the image directory and CSV files for the validation dataset |
|
|
str |
.jpg |
The file extension of the images in the dataset |
string |
|
int |
32 |
The number of samples per batch |
string |
|
int |
8 |
The number of worker processes for data loading |
|
|
int |
0.1 |
The ratio of false-positive examples to sample |
>0 |
|
int |
4 |
The number of lighting conditions for each input image* |
>0 |
|
Dict |
– |
The mapping of lighting conditions to indices specifying concatenation ordering* |
|
|
string |
linear |
Type of concatenation to use for different image lighting conditions |
linear, grid |
grid_map |
Dict
Dict
dict config
|
None
None
None
|
The parameters to define the grid dimensions to concatenate images as a grid:
* x: The number of images along the x-axis
* y: The number of images along the y-axis
|
Dict
|
|
int |
100 |
The width of the input image |
>0 |
|
int |
100 |
The height of the input image |
>0 |
augmentation_config |
Dict
List[float]
List[float]
|
None
[0.485, 0.456, 0.406]
[0.229, 0.224, 0.225]
|
The image normalization config, which contains the following parameters:
*
rgb_input_mean : The mean to be subtracted for pre-processing*
rgb_input_std : The standard deviation to divide the image by |
>=0.0
>=0.0
|
* See the Dataset Annotation Format definition for more information about specifying lighting conditions.
Training the Model#
Use the following command to run SiameseOI training:
tao model optical_inspection train [-h] -e <experiment_spec>
[results_dir=<global_results_dir>]
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[train.<train_option>=<train_option_value>]
[train.gpu_ids=<gpu indices>]
[train.num_gpus=<number of gpus>]
Required Arguments#
The only required argument is the path to the experiment spec:
-e, --experiment_spec
: The experiment specification file to set up the training experiment
Optional Arguments#
You can set optional arguments to override the option values in the experiment spec file.
-h, --help
: Show this help message and exit.model.<model_option>
: The model options.dataset.<dataset_option>
: The dataset options.train.<train_option>
: The train options.train.optim.<optim_option>
: The optimizer options
Note
For training, evaluation, and inference, we expose 2 variables for each respective task: num_gpus
and gpu_ids
, which
default to 1
and [0]
, respectively. If both are passed, but inconsistent, for example num_gpus = 1
,
gpu_ids = [0, 1]
, then they are modified to follow the setting with more GPUs, for example num_gpus = 1 -> num_gpus = 2
.
Checkpointing and Resuming Training#
At every train.checkpoint_interval
, a PyTorch Lightning checkpoint is saved. It is called model_epoch_<epoch_num>.pth
.
These are saved in train.results_dir
, like so:
$ ls /results/train
'model_epoch_000.pth'
'model_epoch_001.pth'
'model_epoch_002.pth'
'model_epoch_003.pth'
'model_epoch_004.pth'
The latest checkpoint is also be saved as oi_model_latest.pth
.
Training automatically resumes from oi_model_latest.pth
, if it exists in train.results_dir
.
This is superseded by train.resume_training_checkpoint_path
, if it is provided.
The major implication of this logic is that, if you wish to trigger fresh training from scratch, either:
Specify a new, empty results directory (Recommended)
Remove the latest checkpoint from the results directory
Creating Testing Experiment Spec File#
Here is an example spec file for testing evaluation and inference of a trained SiameseOI model.
results_dir: /path/to/experiment_results
model:
model_type: Siamese_3
model_backbone: custom
embedding_vectors: 5
margin: 2.0
dataset:
validation_dataset:
csv_path: /path/to/split/val.csv
images_dir: /path/to/images_dir/
image_ext: .jpg
batch_size: 32
workers: 8
num_input: 4
input_map:
LowAngleLight: 0
SolderLight: 1
UniformLight: 2
WhiteLight: 3
concat_type: linear
grid_map:
x: 2
y: 2
image_width: 100
image_height: 100
augmentation_config:
rgb_input_mean: [0.485, 0.456, 0.406]
rgb_input_std: [0.229, 0.224, 0.225]
evaluate:
num_gpus: 1
gpu_ids: [0]
checkpoint: "${results_dir}/train/oi_model_lastest.pth"
results_dir: "${results_dir}/evaluate"
inference:
num_gpus: 1
gpu_ids: [0]
checkpoint: "${results_dir}/train/oi_model_latest.pth"
results_dir: "${results_dir}/inference"
Evaluating the Model#
Use the following command to run SiameseOI evaluation:
tao model optical_inspection evaluate [-h] -e <experiment_spec>
evaluate.checkpoint=<model to be evaluated>
[evaluate.<evaluate_option>=<evaluate_option_value>]
[evaluate.gpu_ids=<gpu indices>]
[evaluate.num_gpus=<number of gpus>]
Multi-GPU evaluation is currently not supported for Optical Inspection.
Required Arguments#
-e, --experiment_spec
: The experiment spec file to set up the evaluation experiment.evaluate.checkpoint
: The.pth
model to be evaluated.
Optional Arguments#
evaluate.<evaluate_option>
: The evaluate options.
Running Inference on the Model#
Use the following command to run inference on SiameseOI with the .tlt
model:
tao model optical_inspection inference [-h] -e <experiment spec file>
inference.checkpoint=<model to be inferenced>
[inference.<inference_option>=<inference_option_value>]
[inference.gpu_ids=<gpu indices>]
[inference.num_gpus=<number of gpus>]
Required Arguments#
-e, --experiment_spec
: The experiment spec file to set up the inference experiment.inference.checkpoint
: The.pth
model to inference.
Optional Arguments#
inference.<inference_option>
: The inference options.
Exporting the Model#
Here is an example spec file for exporting the trained SiameseOI model:
export:
checkpoint: "${results_dir}/train/oi_model_epoch=004.pth"
results_dir: "${results_dir}/export"
onnx_file: "${export.results_dir}/oi_model.onnx"
batch_size: 32
Use the following command to export the model:
tao model optical_inspection export [-h] -e <experiment spec file>
export.checkpoint=<model to export>
export.onnx_file=<onnx path>
[export.<export_option>=<export_option_value>]
Required Arguments#
-e, --experiment_spec
: The path to an experiment spec file.export.checkpoint
: The.pth
model to export.export.onnx_file
: The path where the.etlt
or.onnx
model is saved.
Optional Arguments#
export.<export_option>
: The export options.
TensorRT Engine Generation, Validation, and int8 Calibration#
For deployment, refer to the TAO Deploy Documentation for SiameseOI.