SiameseOI#
SiameseOI is an NVIDIA-developed optical inspection model for PCB data and is included in the TAO. SiameseOI supports the following tasks:
train
evaluate
inference
export
Each task is explained in detail in the following sections.
Note
Throughout this documentation, you will see references to
$EXPERIMENT_ID
and$DATASET_ID
in the FTMS Client sections.For instructions on creating a dataset using the remote client, see the Creating a dataset section in the Remote Client documentation.
For instructions on creating an experiment using the remote client, see the Creating an experiment section in the Remote Client documentation.
The spec format is YAML for TAO Launcher and JSON for FTMS Client.
File-related parameters, such as dataset paths or pretrained model paths, are required only for TAO Launcher and not for FTMS Client.
Data Input for SiameseOI#
SiameseOI requires the data to be provided as image folders and CSV files. See the Data Annotation Format page for more information about the input data format for SiameseOI.
Creating a Training Experiment Spec File#
Configuring a Custom Dataset#
This section provides an example configuration and commands for training SiameseOI using the dataset format described above.
You will need to configure the augmentation_config
mean and standard deviation based on your input dataset.
Here is an example spec file for training a SiameseOI model with a custom backbone on a custom dataset using the Data Annotation Format.
SPECS=$(tao-client optical_inspection get-spec --action train --job_type experiment --id $EXPERIMENT_ID)
results_dir: /path/to/experiment_results
model:
model_type: Siamese_3
model_backbone: custom
embedding_vectors: 5
margin: 2.0
dataset:
train_dataset:
csv_path: /path/to/split/train.csv
images_dir: /path/to/images_dir/
validation_dataset:
csv_path: /path/to/split/val.csv
images_dir: /path/to/images_dir/
image_ext: .jpg
batch_size: 32
workers: 8
fpratio_sampling: 0.1
num_input: 4
input_map:
LowAngleLight: 0
SolderLight: 1
UniformLight: 2
WhiteLight: 3
concat_type: linear
grid_map:
x: 2
y: 2
image_width: 100
image_height: 100
augmentation_config:
rgb_input_mean: [0.485, 0.456, 0.406]
rgb_input_std: [0.229, 0.224, 0.225]
train:
optim:
type: Adam
lr: 0.0005
loss: contrastive
num_epochs: 10
checkpoint_interval: 5
validation_interval: 5
results_dir: "${results_dir}/train"
seed: 1234
Parameter |
Data Type |
Default |
Description |
Supported Values |
|
dict config |
– |
The configuration of the model architecture |
|
|
dict config |
– |
The configuration of the dataset |
|
|
dict config |
– |
The configuration of the training task |
|
|
dict config |
– |
The configuration of the evaluation task |
|
|
dict config |
– |
The configuration of the inference task |
|
|
string |
None |
The encryption key to encrypt and decrypt model files |
|
|
string |
/results |
The directory where experiment results are saved |
|
|
dict config |
– |
The configuration of the ONNX export task |
|
|
dict config |
– |
The configuration of the TensorRT generation task. Only used in TAO deploy |
train#
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
unsigned int |
1 |
The number of GPUs to use for distributed training |
>0 |
|
List[int] |
[0] |
The indices of the GPU’s to use for distributed training |
|
|
unsigned int |
1234 |
The random seed for random, NumPy, and torch |
>0 |
|
unsigned int |
10 |
The total number of epochs to run the experiment |
>0 |
|
unsigned int |
1 |
The epoch interval at which the checkpoints are saved |
>0 |
|
unsigned int |
1 |
The epoch interval at which the validation is run |
>0 |
|
string |
The intermediate PyTorch Lightning checkpoint to resume training from |
||
|
string |
/results/train |
The directory to save training results |
|
|
dict config |
None |
Contains the configurable parameters for the SiameseOI optimizer detailed in the optim section. |
|
|
str |
contrastive |
The loss function used during training |
optim#
optim:
lr: 0.0005
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
float |
0.0005 |
The learning rate |
>=0.0 |
Model#
The following example model
config provides options to change the SiameseOI architecture for training.
model:
model_type: Siamese_3
model_backbone: custom
embedding_vectors: 5
margin: 2.0
The following example model
is used during SiameseOI evaluation/inference.
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
string |
Siamese_3 |
The default model architecture from the supported custom model architectures |
Siamese_3, Siamese_1 |
|
string |
custom |
The name of the backbone to use |
custom |
|
int |
5 |
The embedding dimensions of the final output from the model before computing Euclidian distance |
|
|
float |
2.0 |
The threshold parameter that determines the minimum distance between embeddings of positive and negative pairs |
Dataset#
The dataset
parameter defines the dataset source, training batch size,
augmentation, and pre-processing. An example dataset
is provided below.
dataset:
train_dataset:
csv_path: /path/to/split/train.csv
images_dir: /path/to/images_dir/
validation_dataset:
csv_path: /path/to/split/val.csv
images_dir: /path/to/images_dir/
image_ext: .jpg
batch_size: 32
workers: 8
fpratio_sampling: 0.1
num_input: 4
input_map:
LowAngleLight: 0
SolderLight: 1
UniformLight: 2
WhiteLight: 3
concat_type: linear
grid_map:
x: 2
y: 2
image_width: 100
image_height: 100
augmentation_config:
rgb_input_mean: [0.485, 0.456, 0.406]
rgb_input_std: [0.229, 0.224, 0.225]
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
Dict |
– |
The paths to the image directory and CSV files for the training dataset |
|
|
Dict |
– |
The paths to the image directory and CSV files for the validation dataset |
|
|
str |
.jpg |
The file extension of the images in the dataset |
string |
|
int |
32 |
The number of samples per batch |
string |
|
int |
8 |
The number of worker processes for data loading |
|
|
int |
0.1 |
The ratio of false-positive examples to sample |
>0 |
|
int |
4 |
The number of lighting conditions for each input image* |
>0 |
|
Dict |
– |
The mapping of lighting conditions to indices specifying concatenation ordering* |
|
|
string |
linear |
Type of concatenation to use for different image lighting conditions |
linear, grid |
grid_map |
Dict
Dict
dict config
|
None
None
None
|
The parameters to define the grid dimensions to concatenate images as a grid:
* x: The number of images along the x-axis
* y: The number of images along the y-axis
|
Dict
|
|
int |
100 |
The width of the input image |
>0 |
|
int |
100 |
The height of the input image |
>0 |
augmentation_config |
Dict
List[float]
List[float]
|
None
[0.485, 0.456, 0.406]
[0.229, 0.224, 0.225]
|
The image normalization config, which contains the following parameters:
*
rgb_input_mean : The mean to be subtracted for pre-processing*
rgb_input_std : The standard deviation to divide the image by |
>=0.0
>=0.0
|
* See the Dataset Annotation Format definition for more information about specifying lighting conditions.
Training the Model#
Use the following command to run SiameseOI training:
TRAIN_JOB_ID=$(tao-client optical_inspection experiment-run-action --action train --id $EXPERIMENT_ID --specs "$SPECS")
tao-client optical_inspection train [-h] -e <experiment_spec> \
[results_dir=<global_results_dir>] \
[model.<model_option>=<model_option_value>] \
[dataset.<dataset_option>=<dataset_option_value>] \
[train.<train_option>=<train_option_value>] \
[train.gpu_ids=<gpu indices>] \
[train.num_gpus=<number of gpus>]
Required Arguments
The only required argument is the path to the experiment spec:
-e, --experiment_spec
: The experiment specification file to set up the training experiment
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
-h, --help
: Show this help message and exit.model.<model_option>
: The model options.dataset.<dataset_option>
: The dataset options.train.<train_option>
: The train options.train.optim.<optim_option>
: The optimizer options
Note
For training, evaluation, and inference, we expose 2 variables for each respective task: num_gpus
and gpu_ids
, which
default to 1
and [0]
, respectively. If both are passed, but inconsistent, for example num_gpus = 1
,
gpu_ids = [0, 1]`, then they are modified to follow the setting with more GPUs, for example num_gpus = 1 -> num_gpus = 2
.
In some cases, you may encounter an issue with multi-GPU training resulting in a segmentation fault. You may circumvent this by setting the OMP_NUM_THREADS enviroment variable to 1. Depending upon your model of execution, you may use the following methods to set this variable
CLI Launcher
You may set this env variable by adding the following fields to the Envs field of your ~/.tao_mounts.json
file as mentioned in bullet 3
in this section
{
"Envs": [
{
"variable": "OMP_NUM_THREADSR",
"value": "1"
}
]
}
Docker
You may set environment variables in the docker by setting the -e
flag in the docker command line.
docker run -it --rm --gpus all \
-e OMP_NUM_THREADS=1 \
-v /path/to/local/mount:/path/to/docker/mount nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt <model> train -e
Checkpointing and Resuming Training
At every train.checkpoint_interval
, a PyTorch Lightning checkpoint is saved. It is called model_epoch_<epoch_num>.pth
.
These are saved in train.results_dir
, like so:
$ ls /results/train
'model_epoch_000.pth'
'model_epoch_001.pth'
'model_epoch_002.pth'
'model_epoch_003.pth'
'model_epoch_004.pth'
The latest checkpoint is also be saved as oi_model_latest.pth
.
Training automatically resumes from oi_model_latest.pth
, if it exists in train.results_dir
.
This is superseded by train.resume_training_checkpoint_path
, if it is provided.
The major implication of this logic is that, if you wish to trigger fresh training from scratch, either:
Specify a new, empty results directory (Recommended)
Remove the latest checkpoint from the results directory
Creating Testing Experiment Spec File#
Here is an example spec file for testing evaluation and inference of a trained SiameseOI model.
SPECS=$(tao-client optical_inspection get-spec --action evaluate --job_type experiment --id $EXPERIMENT_ID)
results_dir: /path/to/experiment_results
model:
model_type: Siamese_3
model_backbone: custom
embedding_vectors: 5
margin: 2.0
dataset:
validation_dataset:
csv_path: /path/to/split/val.csv
images_dir: /path/to/images_dir/
image_ext: .jpg
batch_size: 32
workers: 8
num_input: 4
input_map:
LowAngleLight: 0
SolderLight: 1
UniformLight: 2
WhiteLight: 3
concat_type: linear
grid_map:
x: 2
y: 2
image_width: 100
image_height: 100
augmentation_config:
rgb_input_mean: [0.485, 0.456, 0.406]
rgb_input_std: [0.229, 0.224, 0.225]
evaluate:
num_gpus: 1
gpu_ids: [0]
checkpoint: "${results_dir}/train/oi_model_lastest.pth"
results_dir: "${results_dir}/evaluate"
inference:
num_gpus: 1
gpu_ids: [0]
checkpoint: "${results_dir}/train/oi_model_latest.pth"
results_dir: "${results_dir}/inference"
Evaluating the Model#
Use the following command to run SiameseOI evaluation:
EVAL_JOB_ID=$(tao-client optical_inspection experiment-run-action --action evaluate --id $EXPERIMENT_ID --specs "$SPECS" --parent_job_id $TRAIN_JOB_ID)
tao-client optical_inspection evaluate [-h] -e <experiment_spec>
evaluate.checkpoint=<model to be evaluated>
[evaluate.<evaluate_option>=<evaluate_option_value>]
[evaluate.gpu_ids=<gpu indices>]
[evaluate.num_gpus=<number of gpus>]
Required Arguments
The following arguments are required.
-e, --experiment_spec
: The experiment spec file to set up the evaluation experiment.evaluate.checkpoint
: The.pth
model to be evaluated.
Optional Arguments
The following arguments are optional to run the command.
evaluate.<evaluate_option>
: The evaluate options.
Multi-GPU evaluation is currently not supported for Optical Inspection.
Running Inference on the Model#
Use the following command to run inference on SiameseOI with the .tlt
model:
INFER_JOB_ID=$(tao-client optical_inspection experiment-run-action --action inference --id $EXPERIMENT_ID --specs "$SPECS" --parent_job_id $TRAIN_JOB_ID)
tao model optical_inspection inference [-h] -e <experiment spec file>
inference.checkpoint=<model to be inferenced>
[inference.<inference_option>=<inference_option_value>]
[inference.gpu_ids=<gpu indices>]
[inference.num_gpus=<number of gpus>]
Required Arguments
The following arguments are required to run the command.
-e, --experiment_spec
: The experiment spec file to set up the inference experiment.inference.checkpoint
: The.pth
model to inference.
Optional Arguments
The following arguments are optional to run the command.
inference.<inference_option>
: The inference options.
Exporting the Model#
Here is an example spec file for exporting the trained SiameseOI model:
SPECS=$(tao-client optical_inspection get-spec --action export --job_type experiment --id $EXPERIMENT_ID)
export:
checkpoint: "${results_dir}/train/oi_model_epoch=004.pth"
results_dir: "${results_dir}/export"
onnx_file: "${export.results_dir}/oi_model.onnx"
batch_size: 32
Use the following command to export the model:
EXPORT_JOB_ID=$(tao-client optical_inspection experiment-run-action --action export --id $EXPERIMENT_ID --specs "$SPECS" --parent_job_id $TRAIN_JOB_ID)
tao-client optical_inspection export [-h] -e <experiment spec file>
export.checkpoint=<model to export>
export.onnx_file=<onnx path>
[export.<export_option>=<export_option_value>]
Required Arguments
The following arguments are required to run the command.
-e, --experiment_spec
: The path to an experiment spec file.export.checkpoint
: The.pth
model to export.export.onnx_file
: The path where the.etlt
or.onnx
model is saved.
Optional Arguments
The following arguments are optional to run the command.
export.<export_option>
: The export options.
TensorRT Engine Generation, Validation, and int8 Calibration#
For deployment, refer to the TAO Deploy Documentation for SiameseOI.