SiameseOI is an NVIDIA-developed optical inspection model for PCB data and is included in the TAO. SiameseOI supports the following tasks:
train
evaluate
inference
export
Each task is explained in detail in the following sections.
Throughout this documentation are references to
$EXPERIMENT_IDand
$DATASET_IDin the FTMS Client sections.
For instructions on creating a dataset using the remote client, refer to the Creating a dataset section in the Remote Client documentation.
For instructions on creating an experiment using the remote client, refer to the Creating an experiment section in the Remote Client documentation.
-
The spec format is YAML for TAO Launcher, and JSON for FTMS Client.
File-related parameters, such as dataset paths or pretrained model paths, are required only for TAO Launcher, not for FTMS Client.
Data Input for SiameseOI#
SiameseOI requires the data to be provided as image folders and CSV files. See the Data Annotation Format page for more information about the input data format for SiameseOI.
Creating a Training Experiment Spec File#
Configuring a Custom Dataset#
This section provides an example configuration and commands for training SiameseOI using the dataset format described above.
You will need to configure the
augmentation_config mean and standard deviation based on your input dataset.
Here is an example spec file for training a SiameseOI model with a custom backbone on a custom dataset using the Data Annotation Format.
BASE_EXPERIMENT_ID=$(tao siamese_oi list-base-experiments | jq -r '.[0].id')
SPECS=$(tao siamese_oi get-job-schema --action train --base-experiment-id $BASE_EXPERIMENT_ID | jq -r '.default')
results_dir: /path/to/experiment_results
model:
model_type: Siamese_3
model_backbone: custom
embedding_vectors: 5
margin: 2.0
dataset:
train_dataset:
csv_path: /path/to/split/train.csv
images_dir: /path/to/images_dir/
validation_dataset:
csv_path: /path/to/split/val.csv
images_dir: /path/to/images_dir/
image_ext: .jpg
batch_size: 32
workers: 8
fpratio_sampling: 0.1
num_input: 4
input_map:
LowAngleLight: 0
SolderLight: 1
UniformLight: 2
WhiteLight: 3
concat_type: linear
grid_map:
x: 2
y: 2
image_width: 100
image_height: 100
augmentation_config:
rgb_input_mean: [0.485, 0.456, 0.406]
rgb_input_std: [0.229, 0.224, 0.225]
train:
optim:
type: Adam
lr: 0.0005
loss: contrastive
num_epochs: 10
checkpoint_interval: 5
validation_interval: 5
results_dir: "${results_dir}/train"
seed: 1234
|
Parameter
|
Data Type
|
Default
|
Description
|
Supported Values
|
|
dict config
|
–
|
The configuration of the model architecture
|
|
dict config
|
–
|
The configuration of the dataset
|
|
dict config
|
–
|
The configuration of the training task
|
|
dict config
|
–
|
The configuration of the evaluation task
|
|
dict config
|
–
|
The configuration of the inference task
|
|
string
|
None
|
The encryption key to encrypt and decrypt model files
|
|
string
|
/results
|
The directory where experiment results are saved
|
|
dict config
|
–
|
The configuration of the ONNX export task
|
|
dict config
|
–
|
The configuration of the TensorRT generation task. Only used in TAO deploy
train#
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
|
unsigned int
|
1
|
The number of GPUs to use for distributed training
|
>0
|
|
List[int]
|
[0]
|
The indices of the GPU’s to use for distributed training
|
|
unsigned int
|
1234
|
The random seed for random, NumPy, and torch
|
>0
|
|
unsigned int
|
10
|
The total number of epochs to run the experiment
|
>0
|
|
unsigned int
|
1
|
The epoch interval at which the checkpoints are saved
|
>0
|
|
unsigned int
|
1
|
The epoch interval at which the validation is run
|
>0
|
|
string
|
The intermediate PyTorch Lightning checkpoint to resume training from
|
|
string
|
/results/train
|
The directory to save training results
|
|
dict config
|
None
|
Contains the configurable parameters for the SiameseOI optimizer detailed in the optim section.
|
|
str
|
contrastive
|
The loss function used during training
optim#
optim:
lr: 0.0005
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
|
float
|
0.0005
|
The learning rate
|
>=0.0
Model#
The following example
model config provides options to change the SiameseOI architecture for training.
model:
model_type: Siamese_3
model_backbone: custom
embedding_vectors: 5
margin: 2.0
The following example
model is used during SiameseOI evaluation/inference.
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
|
string
|
Siamese_3
|
The default model architecture from the supported custom model architectures
|
Siamese_3, Siamese_1
|
|
string
|
custom
|
The name of the backbone to use
|
custom
|
|
int
|
5
|
The embedding dimensions of the final output from the model before computing Euclidian distance
|
|
float
|
2.0
|
The threshold parameter that determines the minimum distance between embeddings of positive and negative pairs
Dataset#
The
dataset parameter defines the dataset source, training batch size,
augmentation, and pre-processing. An example
dataset is provided below.
dataset:
train_dataset:
csv_path: /path/to/split/train.csv
images_dir: /path/to/images_dir/
validation_dataset:
csv_path: /path/to/split/val.csv
images_dir: /path/to/images_dir/
image_ext: .jpg
batch_size: 32
workers: 8
fpratio_sampling: 0.1
num_input: 4
input_map:
LowAngleLight: 0
SolderLight: 1
UniformLight: 2
WhiteLight: 3
concat_type: linear
grid_map:
x: 2
y: 2
image_width: 100
image_height: 100
augmentation_config:
rgb_input_mean: [0.485, 0.456, 0.406]
rgb_input_std: [0.229, 0.224, 0.225]
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
|
Dict
|
–
|
The paths to the image directory and CSV files for the training dataset
|
|
Dict
|
–
|
The paths to the image directory and CSV files for the validation dataset
|
|
str
|
.jpg
|
The file extension of the images in the dataset
|
string
|
|
int
|
32
|
The number of samples per batch
|
string
|
|
int
|
8
|
The number of worker processes for data loading
|
|
int
|
0.1
|
The ratio of false-positive examples to sample
|
>0
|
|
int
|
4
|
The number of lighting conditions for each input image*
|
>0
|
|
Dict
|
–
|
The mapping of lighting conditions to indices specifying concatenation ordering*
|
|
string
|
linear
|
Type of concatenation to use for different image lighting conditions
|
linear, grid
|
grid_map
|
Dict
Dict
dict config
|
None
None
None
|
The parameters to define the grid dimensions to concatenate images as a grid:
* x: The number of images along the x-axis
* y: The number of images along the y-axis
|
Dict
|
|
int
|
100
|
The width of the input image
|
>0
|
|
int
|
100
|
The height of the input image
|
>0
|
augmentation_config
|
Dict
List[float]
List[float]
|
None
[0.485, 0.456, 0.406]
[0.229, 0.224, 0.225]
|
The image normalization config, which contains the following parameters:
*
rgb_input_mean: The mean to be subtracted for pre-processing
*
rgb_input_std: The standard deviation to divide the image by
|
>=0.0
>=0.0
* See the Dataset Annotation Format definition for more information about specifying lighting conditions.
Training the Model#
Use the following command to run SiameseOI training:
TRAIN_JOB_ID=$(tao siamese_oi create-job \
--kind experiment \
--name "siamese_oi_train" \
--action train \
--workspace-id $WORKSPACE_ID \
--specs "$TRAIN_SPECS" \
--train-datasets '["'$DATASET_ID'"]' \
--eval-dataset "$DATASET_ID" \
--base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
--encryption-key "nvidia_tlt" | jq -r '.id')
tao-client optical_inspection train [-h] -e <experiment_spec> \
[results_dir=<global_results_dir>] \
[model.<model_option>=<model_option_value>] \
[dataset.<dataset_option>=<dataset_option_value>] \
[train.<train_option>=<train_option_value>] \
[train.gpu_ids=<gpu indices>] \
[train.num_gpus=<number of gpus>]
Required Arguments
The only required argument is the path to the experiment spec:
-e, --experiment_spec: The experiment specification file to set up the training experiment
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
-h, --help: Show this help message and exit.
model.<model_option>: The model options.
dataset.<dataset_option>: The dataset options.
train.<train_option>: The train options.
train.optim.<optim_option>: The optimizer options
Note
For training, evaluation, and inference, we expose two variables for each task:
num_gpus and
gpu_ids, which
default to
1 and
[0], respectively. If both are passed, but are inconsistent, for example
num_gpus = 1,
gpu_ids = [0, 1], then they are modified to follow the setting that implies more GPUs; in the same example
num_gpus is modified from 1 to 2.
In some cases multi-GPU training may result in a segmentation fault. You can circumvent this by
setting the enviroment variable
OMP_NUM_THREADS to 1. Depending upon your model of execution, you may use the following methods to set
this variable:
CLI Launcher:
You may set the environment variable by adding the following fields to the
Envsfield of your
~/.tao_mounts.jsonfile as mentioned in bullet 3 in ths section Running the launcher.
{ "Envs": [ { "variable": "OMP_NUM_THREADSR", "value": "1" } }
Docker:
You may set environment variables in Docker by setting the
-eflag in the Docker command line.
docker run -it --rm --gpus all \ -e OMP_NUM_THREADS=1 \ -v /path/to/local/mount:/path/to/docker/mount nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt <model> train -e
Checkpointing and Resuming Training
At every
train.checkpoint_interval, a PyTorch Lightning checkpoint is saved. It is called
model_epoch_<epoch_num>.pth.
Checkpoints are saved in
train.results_dir, like this:
$ ls /results/train
'model_epoch_000.pth'
'model_epoch_001.pth'
'model_epoch_002.pth'
'model_epoch_003.pth'
'model_epoch_004.pth'
The latest checkpoint is also be saved as
oi_model_latest.pth.
Training automatically resumes from
oi_model_latest.pth, if it exists in
train.results_dir.
This is superseded by
train.resume_training_checkpoint_path, if it is provided.
The major implication of this logic is that, if you wish to trigger fresh training from scratch, either:
Specify a new, empty results directory (Recommended)
Remove the latest checkpoint from the results directory
Creating Testing Experiment Spec File#
Here is an example spec file for testing evaluation and inference of a trained SiameseOI model.
BASE_EXPERIMENT_ID=$(tao siamese_oi list-base-experiments | jq -r '.[0].id')
SPECS=$(tao siamese_oi get-job-schema --action evaluate --base-experiment-id $BASE_EXPERIMENT_ID | jq -r '.default')
results_dir: /path/to/experiment_results
model:
model_type: Siamese_3
model_backbone: custom
embedding_vectors: 5
margin: 2.0
dataset:
validation_dataset:
csv_path: /path/to/split/val.csv
images_dir: /path/to/images_dir/
image_ext: .jpg
batch_size: 32
workers: 8
num_input: 4
input_map:
LowAngleLight: 0
SolderLight: 1
UniformLight: 2
WhiteLight: 3
concat_type: linear
grid_map:
x: 2
y: 2
image_width: 100
image_height: 100
augmentation_config:
rgb_input_mean: [0.485, 0.456, 0.406]
rgb_input_std: [0.229, 0.224, 0.225]
evaluate:
num_gpus: 1
gpu_ids: [0]
checkpoint: "${results_dir}/train/oi_model_lastest.pth"
results_dir: "${results_dir}/evaluate"
inference:
num_gpus: 1
gpu_ids: [0]
checkpoint: "${results_dir}/train/oi_model_latest.pth"
results_dir: "${results_dir}/inference"
Evaluating the Model#
Use the following command to run SiameseOI evaluation:
EVAL_JOB_ID=$(tao siamese_oi create-job \
--kind experiment \
--name "siamese_oi_evaluate" \
--action evaluate \
--workspace-id $WORKSPACE_ID \
--parent-job-id $TRAIN_JOB_ID \
--eval-dataset "$DATASET_ID" \
--specs "$EVALUATE_SPECS" \
--base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
--encryption-key "nvidia_tlt" | jq -r '.id')
tao-client optical_inspection evaluate [-h] -e <experiment_spec>
evaluate.checkpoint=<model to be evaluated>
[evaluate.<evaluate_option>=<evaluate_option_value>]
[evaluate.gpu_ids=<gpu indices>]
[evaluate.num_gpus=<number of gpus>]
Required Arguments
The following arguments are required.
-e, --experiment_spec: The experiment spec file to set up the evaluation experiment.
evaluate.checkpoint: The
.pthmodel to be evaluated.
Optional Arguments
The following arguments are optional to run the command.
evaluate.<evaluate_option>: The evaluate options.
Multi-GPU evaluation is currently not supported for Optical Inspection.
Running Inference on the Model#
Use the following command to run inference on SiameseOI with the
.tlt model:
INFER_JOB_ID=$(tao siamese_oi create-job \
--kind experiment \
--name "siamese_oi_inference" \
--action inference \
--workspace-id $WORKSPACE_ID \
--parent-job-id $TRAIN_JOB_ID \
--inference-dataset "$DATASET_ID" \
--specs "$INFERENCE_SPECS" \
--base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
--encryption-key "nvidia_tlt" | jq -r '.id')
tao model optical_inspection inference [-h] -e <experiment spec file>
inference.checkpoint=<model to be inferenced>
[inference.<inference_option>=<inference_option_value>]
[inference.gpu_ids=<gpu indices>]
[inference.num_gpus=<number of gpus>]
Required Arguments
The following arguments are required to run the command.
-e, --experiment_spec: The experiment spec file to set up the inference experiment.
inference.checkpoint: The
.pthmodel to inference.
Optional Arguments
The following arguments are optional to run the command.
inference.<inference_option>: The inference options.
Exporting the Model#
Here is an example spec file for exporting the trained SiameseOI model:
BASE_EXPERIMENT_ID=$(tao siamese_oi list-base-experiments | jq -r '.[0].id')
SPECS=$(tao siamese_oi get-job-schema --action export --base-experiment-id $BASE_EXPERIMENT_ID | jq -r '.default')
export:
checkpoint: "${results_dir}/train/oi_model_epoch=004.pth"
results_dir: "${results_dir}/export"
onnx_file: "${export.results_dir}/oi_model.onnx"
batch_size: 32
Use the following command to export the model:
EXPORT_JOB_ID=$(tao siamese_oi create-job \
--kind experiment \
--name "siamese_oi_export" \
--action export \
--workspace-id $WORKSPACE_ID \
--parent-job-id $TRAIN_JOB_ID \
--specs "$EXPORT_SPECS" \
--base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
--encryption-key "nvidia_tlt" | jq -r '.id')
tao-client optical_inspection export [-h] -e <experiment spec file>
export.checkpoint=<model to export>
export.onnx_file=<onnx path>
[export.<export_option>=<export_option_value>]
Required Arguments
The following arguments are required to run the command.
-e, --experiment_spec: The path to an experiment spec file.
export.checkpoint: The
.pthmodel to export.
export.onnx_file: The path where the
.etltor
.onnxmodel is saved.
Optional Arguments
The following arguments are optional to run the command.
export.<export_option>: The export options.
TensorRT Engine Generation, Validation, and int8 Calibration#
For deployment, refer to the TAO Deploy Documentation for SiameseOI.