SiameseOI#

SiameseOI is an NVIDIA-developed optical inspection model for PCB data and is included in the TAO. SiameseOI supports the following tasks:

train
evaluate
inference
export

Each task is explained in detail in the following sections.

Note

Throughout this documentation, you will see references to $EXPERIMENT_ID and $DATASET_ID in the FTMS Client sections.
- For instructions on creating a dataset using the remote client, see the Creating a dataset section in the Remote Client documentation.
- For instructions on creating an experiment using the remote client, see the Creating an experiment section in the Remote Client documentation.
The spec format is YAML for TAO Launcher and JSON for FTMS Client.
File-related parameters, such as dataset paths or pretrained model paths, are required only for TAO Launcher and not for FTMS Client.

Data Input for SiameseOI#

SiameseOI requires the data to be provided as image folders and CSV files. See the Data Annotation Format page for more information about the input data format for SiameseOI.

Creating a Training Experiment Spec File#

Configuring a Custom Dataset#

This section provides an example configuration and commands for training SiameseOI using the dataset format described above. You will need to configure the augmentation_config mean and standard deviation based on your input dataset.

Here is an example spec file for training a SiameseOI model with a custom backbone on a custom dataset using the Data Annotation Format.

SPECS=$(tao-client optical_inspection get-spec --action train --job_type experiment --id $EXPERIMENT_ID)

results_dir: /path/to/experiment_results
model:
model_type: Siamese_3
model_backbone: custom
embedding_vectors: 5
margin: 2.0
dataset:
  train_dataset:
    csv_path: /path/to/split/train.csv
    images_dir: /path/to/images_dir/
  validation_dataset:
    csv_path: /path/to/split/val.csv
    images_dir: /path/to/images_dir/
  image_ext: .jpg
  batch_size: 32
  workers: 8
  fpratio_sampling: 0.1
  num_input: 4
  input_map:
    LowAngleLight: 0
    SolderLight: 1
    UniformLight: 2
    WhiteLight: 3
  concat_type: linear
  grid_map:
    x: 2
    y: 2
  image_width: 100
  image_height: 100
  augmentation_config:
    rgb_input_mean: [0.485, 0.456, 0.406]
    rgb_input_std: [0.229, 0.224, 0.225]
train:
  optim:
    type: Adam
    lr: 0.0005
  loss: contrastive
  num_epochs: 10
  checkpoint_interval: 5
  validation_interval: 5
  results_dir: "${results_dir}/train"
  seed: 1234

Parameter	Data Type	Default	Description	Supported Values
`model`	dict config	–	The configuration of the model architecture
`dataset`	dict config	–	The configuration of the dataset
`train`	dict config	–	The configuration of the training task
`evaluate`	dict config	–	The configuration of the evaluation task
`inference`	dict config	–	The configuration of the inference task
`encryption_key`	string	None	The encryption key to encrypt and decrypt model files
`results_dir`	string	/results	The directory where experiment results are saved
`export`	dict config	–	The configuration of the ONNX export task
`gen_trt_engine`	dict config	–	The configuration of the TensorRT generation task. Only used in TAO deploy

train#

Parameter	Datatype	Default	Description	Supported Values
`num_gpus`	unsigned int	1	The number of GPUs to use for distributed training	>0
`gpu_ids`	List[int]	[0]	The indices of the GPU’s to use for distributed training
`seed`	unsigned int	1234	The random seed for random, NumPy, and torch	>0
`num_epochs`	unsigned int	10	The total number of epochs to run the experiment	>0
`checkpoint_interval`	unsigned int	1	The epoch interval at which the checkpoints are saved	>0
`validation_interval`	unsigned int	1	The epoch interval at which the validation is run	>0
`resume_training_checkpoint_path`	string		The intermediate PyTorch Lightning checkpoint to resume training from
`results_dir`	string	/results/train	The directory to save training results
`optim`	dict config	None	Contains the configurable parameters for the SiameseOI optimizer detailed in the optim section.
`loss`	str	contrastive	The loss function used during training

optim#

optim:
  lr: 0.0005

Parameter	Datatype	Default	Description	Supported Values
`lr`	float	0.0005	The learning rate	>=0.0

Model#

The following example model config provides options to change the SiameseOI architecture for training.

model:
  model_type: Siamese_3
  model_backbone: custom
  embedding_vectors: 5
  margin: 2.0

The following example model is used during SiameseOI evaluation/inference.

Parameter	Datatype	Default	Description	Supported Values
`model_type`	string	Siamese_3	The default model architecture from the supported custom model architectures	Siamese_3, Siamese_1
`model_backbone`	string	custom	The name of the backbone to use	custom
`embedding_vectors`	int	5	The embedding dimensions of the final output from the model before computing Euclidian distance
`margin`	float	2.0	The threshold parameter that determines the minimum distance between embeddings of positive and negative pairs

Dataset#

The dataset parameter defines the dataset source, training batch size, augmentation, and pre-processing. An example dataset is provided below.

dataset:
  train_dataset:
    csv_path: /path/to/split/train.csv
    images_dir: /path/to/images_dir/
  validation_dataset:
    csv_path: /path/to/split/val.csv
    images_dir: /path/to/images_dir/
  image_ext: .jpg
  batch_size: 32
  workers: 8
  fpratio_sampling: 0.1
  num_input: 4
  input_map:
    LowAngleLight: 0
    SolderLight: 1
    UniformLight: 2
    WhiteLight: 3
  concat_type: linear
  grid_map:
    x: 2
    y: 2
  image_width: 100
  image_height: 100
  augmentation_config:
    rgb_input_mean: [0.485, 0.456, 0.406]
    rgb_input_std: [0.229, 0.224, 0.225]

Parameter	Datatype	Default	Description	Supported Values
`train_dataset`	Dict	–	The paths to the image directory and CSV files for the training dataset
`validation_dataset`	Dict	–	The paths to the image directory and CSV files for the validation dataset
`image_ext`	str	.jpg	The file extension of the images in the dataset	string
`batch_size`	int	32	The number of samples per batch	string
`workers`	int	8	The number of worker processes for data loading
`fpratio_sampling`	int	0.1	The ratio of false-positive examples to sample	>0
`num_input`	int	4	The number of lighting conditions for each input image*	>0
`input_map`	Dict	–	The mapping of lighting conditions to indices specifying concatenation ordering*
`concat_type`	string	linear	Type of concatenation to use for different image lighting conditions	linear, grid
`grid_map`	Dict Dict dict config	None None None	The parameters to define the grid dimensions to concatenate images as a grid: * x: The number of images along the x-axis * y: The number of images along the y-axis	Dict
`input_width`	int	100	The width of the input image	>0
`input_height`	int	100	The height of the input image	>0
`augmentation_config`	Dict List[float] List[float]	None [0.485, 0.456, 0.406] [0.229, 0.224, 0.225]	The image normalization config, which contains the following parameters: * `rgb_input_mean`: The mean to be subtracted for pre-processing * `rgb_input_std`: The standard deviation to divide the image by	>=0.0 >=0.0

* See the Dataset Annotation Format definition for more information about specifying lighting conditions.

Training the Model#

Use the following command to run SiameseOI training:

TRAIN_JOB_ID=$(tao-client optical_inspection experiment-run-action --action train --id $EXPERIMENT_ID --specs "$SPECS")

tao-client optical_inspection train [-h] -e <experiment_spec> \
                           [results_dir=<global_results_dir>] \
                           [model.<model_option>=<model_option_value>] \
                           [dataset.<dataset_option>=<dataset_option_value>] \
                           [train.<train_option>=<train_option_value>] \
                           [train.gpu_ids=<gpu indices>] \
                           [train.num_gpus=<number of gpus>]

Required Arguments

The only required argument is the path to the experiment spec:

-e, --experiment_spec: The experiment specification file to set up the training experiment

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

-h, --help: Show this help message and exit.
model.<model_option>: The model options.
dataset.<dataset_option>: The dataset options.
train.<train_option>: The train options.
train.optim.<optim_option>: The optimizer options

Note

For training, evaluation, and inference, we expose 2 variables for each respective task: num_gpus and gpu_ids, which default to 1 and [0], respectively. If both are passed, but inconsistent, for example num_gpus = 1, gpu_ids = [0, 1]`, then they are modified to follow the setting with more GPUs, for example num_gpus = 1 -> num_gpus = 2.

In some cases, you may encounter an issue with multi-GPU training resulting in a segmentation fault. You may circumvent this by setting the OMP_NUM_THREADS enviroment variable to 1. Depending upon your model of execution, you may use the following methods to set this variable

CLI Launcher

You may set this env variable by adding the following fields to the Envs field of your ~/.tao_mounts.json file as mentioned in bullet 3 in this section

{
    "Envs": [
        {
            "variable": "OMP_NUM_THREADSR",
            "value": "1"
        }
    ]
}

Docker

You may set environment variables in the docker by setting the -e flag in the docker command line.

docker run -it --rm --gpus all \
    -e OMP_NUM_THREADS=1 \
    -v /path/to/local/mount:/path/to/docker/mount nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt <model> train -e

Checkpointing and Resuming Training

At every train.checkpoint_interval, a PyTorch Lightning checkpoint is saved. It is called model_epoch_<epoch_num>.pth. These are saved in train.results_dir, like so:

$ ls /results/train

'model_epoch_000.pth'
'model_epoch_001.pth'
'model_epoch_002.pth'
'model_epoch_003.pth'
'model_epoch_004.pth'

The latest checkpoint is also be saved as oi_model_latest.pth. Training automatically resumes from oi_model_latest.pth, if it exists in train.results_dir. This is superseded by train.resume_training_checkpoint_path, if it is provided.

The major implication of this logic is that, if you wish to trigger fresh training from scratch, either:

Specify a new, empty results directory (Recommended)
Remove the latest checkpoint from the results directory

Creating Testing Experiment Spec File#

Here is an example spec file for testing evaluation and inference of a trained SiameseOI model.

SPECS=$(tao-client optical_inspection get-spec --action evaluate --job_type experiment --id $EXPERIMENT_ID)

results_dir: /path/to/experiment_results
model:
  model_type: Siamese_3
  model_backbone: custom
  embedding_vectors: 5
  margin: 2.0
dataset:
  validation_dataset:
    csv_path: /path/to/split/val.csv
    images_dir: /path/to/images_dir/
  image_ext: .jpg
  batch_size: 32
  workers: 8
  num_input: 4
  input_map:
    LowAngleLight: 0
    SolderLight: 1
    UniformLight: 2
    WhiteLight: 3
  concat_type: linear
  grid_map:
    x: 2
    y: 2
  image_width: 100
  image_height: 100
  augmentation_config:
    rgb_input_mean: [0.485, 0.456, 0.406]
    rgb_input_std: [0.229, 0.224, 0.225]
evaluate:
  num_gpus: 1
  gpu_ids: [0]
  checkpoint: "${results_dir}/train/oi_model_lastest.pth"
  results_dir: "${results_dir}/evaluate"
inference:
  num_gpus: 1
  gpu_ids: [0]
  checkpoint: "${results_dir}/train/oi_model_latest.pth"
  results_dir: "${results_dir}/inference"

Evaluating the Model#

Use the following command to run SiameseOI evaluation:

EVAL_JOB_ID=$(tao-client optical_inspection experiment-run-action --action evaluate --id $EXPERIMENT_ID --specs "$SPECS" --parent_job_id $TRAIN_JOB_ID)

tao-client optical_inspection evaluate [-h] -e <experiment_spec>
                           evaluate.checkpoint=<model to be evaluated>
                           [evaluate.<evaluate_option>=<evaluate_option_value>]
                           [evaluate.gpu_ids=<gpu indices>]
                           [evaluate.num_gpus=<number of gpus>]

Required Arguments

The following arguments are required.

-e, --experiment_spec: The experiment spec file to set up the evaluation experiment.
evaluate.checkpoint: The .pth model to be evaluated.

Optional Arguments

The following arguments are optional to run the command.

evaluate.<evaluate_option>: The evaluate options.

Multi-GPU evaluation is currently not supported for Optical Inspection.

Running Inference on the Model#

Use the following command to run inference on SiameseOI with the .tlt model:

INFER_JOB_ID=$(tao-client optical_inspection experiment-run-action --action inference --id $EXPERIMENT_ID --specs "$SPECS" --parent_job_id $TRAIN_JOB_ID)

tao model optical_inspection inference [-h] -e <experiment spec file>
                           inference.checkpoint=<model to be inferenced>
                           [inference.<inference_option>=<inference_option_value>]
                           [inference.gpu_ids=<gpu indices>]
                           [inference.num_gpus=<number of gpus>]

Required Arguments

The following arguments are required to run the command.

-e, --experiment_spec: The experiment spec file to set up the inference experiment.
inference.checkpoint: The .pth model to inference.

Optional Arguments

The following arguments are optional to run the command.

inference.<inference_option>: The inference options.

Exporting the Model#

Here is an example spec file for exporting the trained SiameseOI model:

SPECS=$(tao-client optical_inspection get-spec --action export --job_type experiment --id $EXPERIMENT_ID)

export:
  checkpoint: "${results_dir}/train/oi_model_epoch=004.pth"
  results_dir: "${results_dir}/export"
  onnx_file: "${export.results_dir}/oi_model.onnx"
  batch_size: 32

Use the following command to export the model:

EXPORT_JOB_ID=$(tao-client optical_inspection experiment-run-action --action export --id $EXPERIMENT_ID --specs "$SPECS" --parent_job_id $TRAIN_JOB_ID)

tao-client optical_inspection export [-h] -e <experiment spec file>
                           export.checkpoint=<model to export>
                           export.onnx_file=<onnx path>
                           [export.<export_option>=<export_option_value>]

Required Arguments

The following arguments are required to run the command.

-e, --experiment_spec: The path to an experiment spec file.
export.checkpoint: The .pth model to export.
export.onnx_file: The path where the .etlt or .onnx model is saved.

Optional Arguments

The following arguments are optional to run the command.

export.<export_option>: The export options.

TensorRT Engine Generation, Validation, and int8 Calibration#

For deployment, refer to the TAO Deploy Documentation for SiameseOI.