Metric Learning Recognition

NVIDIA TAO Release 4.0.1

Metric Learning Recognition (MLRecogNet) is a classifier that encodes the input image to embedding vectors and predicts their labels based on the embedding vectors in the reference space. MLRecogNet consists of two parts:

  • Trunk: A backbone network that encodes the input image to a feature vector.

  • Embedder: A fully connected layer that maps the feature vector to the embedding space.

The embedding space is a high-dimensional space where the distance between the embedding vectors of the same class is small and the distance between the embedding vectors of different classes is large. The embedder is trained to minimize the distance between the embedding vectors of the same class and maximize the distance between the embedding vectors of different classes. The embedding vectors of the query images are compared with the embedding vectors of the reference images to predict the labels of the query images.

The current supported trunk is ResNet, which is the most commonly used baseline for vision classification. And the current supported embedder is a one-layer MLP.

During training, evaluation, and inference, MLRecogNet requires a reference set and a query set for validation or test. The reference set consists of a collection of labeled images, while the query set refers to a group of unlabeled images–the goal is to predict the labels of the unlabeled images by comparing their similarity to the embedding vectors of the reference set generated by trained MLRecogNet.

MLRecogNet requires cropped images from the detection set or classification set as input. These images are resized to 224x224 by default for model input. Augmentation is applied to each image during training.

The data should be organized in the following structure:

Copy
Copied!
            

/Dataset /reference /class1 0001.jpg 0002.jpg ... 0100.jpg /class2 0001.jpg 0002.jpg ... 0100.jpg ... /train /reference /class1 0101.jpg 0102.jpg ... 0200.jpg /class2 0101.jpg 0102.jpg ... 0200.jpg /val /class1 0201.jpg 0202.jpg ... 0220.jpg /class2 0201.jpg 0202.jpg ... 0220.jpg /test /class1 0301.jpg 0302.jpg ... 0400.jpg /class2 0301.jpg 0302.jpg ... 0400.jpg

The root directory of the dataset contains sub-directories for reference, training, validation, and test. The sub-directories are required to be in ImageNet structure, as demonstrated above. Each sub-directory has images of the same class. If the classes in test set are not in the reference set, the queried images cannot be correctly recognized.

The spec file for MLRecogNet includes model, train, and dataset parameters. Here is an example spec $TRAIN_SPEC for training a MLRecogNet model on a target dataset.

Copy
Copied!
            

results_dir: "???" model: backbone: resnet_101 pretrain_choice: "" pretrained_model_path: /path/to/resnet101_pretrained_mlrecog.pth.tar input_width: 224 input_height: 224 feat_dim: 2048 train: optim: name: Adam steps: [40, 70] gamma: 0.1 embedder: bias_lr_factor: 1 weight_decay: 0.0001 weight_decay_bias: 0.0005 base_lr: 0.000001 momentum: 0.9 trunk: bias_lr_factor: 1 weight_decay: 0.0001 weight_decay_bias: 0.0005 base_lr: 0.00001 momentum: 0.9 warmup_factor: 0.01 warmup_iters: 10 warmup_method: linear triplet_loss_margin: 0.3 miner_function_margin: 0.1 num_epochs: 5 resume_training_checkpoint_path: null checkpoint_interval: 1 smooth_loss: False batch_size: 16 val_batch_size: 16 dataset: train_dataset: /path/to/dataset/train val_dataset: reference: /path/to/dataset/reference query: /path/to/dataset/val workers: 12 pixel_mean: [0.485, 0.456, 0.406] pixel_std: [0.226, 0.226, 0.226] prob: 0.5 re_prob: 0.5 num_instance: 4 color_augmentation: enabled: True brightness: 0.5 contrast: 0.3 saturation: 0.1 hue: 0.1 gaussian_blur: enabled: True kernel: [15, 15] sigma: [0.3, 0.7] random_rotation: True class_map: /path/to/class_map.yaml

Parameter

Data Type

Default

Description

model

dict config

The configuration for the model architecture

train

dict config

The configuration for the training process

dataset

dict config

The configuration for the dataset

results_dir

string

None

The path to the root results directory. It’s not required when the target task has its own results_dir specified.

model

The model parameter provides options to change the MetricLearningRecognition architecture.

Copy
Copied!
            

model: backbone: resnet_50 pretrain_choice: imagenet pretrained_model_path: "/path/to/pretrained_model.pth" input_channels: 3 input_width: 224 input_height: 224 feat_dim: 256

Parameter

Datatype

Default

Description

Supported Values

backbone

string

resnet_50

The type of model. Values are currently limited to resnet_50 and resnet_101.

resnet_50

pretrain_choice

string

imagenet

The pretrained network

imagenet/””

pretrained_model_path

string

The path to the pretrained model. The weights are only loaded to the trunk part.

input_channels

unsigned int

3

The number of input channels

>0

input_width

int

224

The input width of the images

int

input_height

int

224

The input height of the images

int

feat_dim

unsigned int

256

The output size of the feature embeddings

>0

train

The train parameter defines the hyperparameters of the training process.

Copy
Copied!
            

train: optim: name: Adam steps: [40, 70] gamma: 0.1 warmup_factor: 0.01 warmup_iters: 10 warmup_method: 'linear' triplet_loss_margin: 0.3 miner_function_margin: 0.1 embedder: bias_lr_factor: 1 base_lr: 0.000001 momentum: 0.9 weight_decay: 0.0001 weight_decay_bias: 0.0005 trunk: bias_lr_factor: 1 base_lr: 0.00001 momentum: 0.9 weight_decay: 0.0001 weight_decay_bias: 0.0005 num_epochs: 120 checkpoint_interval: 10 clip_grad_norm: 0.0 resume_training_checkpoint_path: null report_accuracy_per_class: True smooth_loss: True batch_size: 64 val_batch_size: 64 results_dir: null

Parameter

Datatype

Default

Description

Supported Values

optim

dict config

The configuration for the torch optimizer (Optim Config), including the learning rate, learning scheduler, weight decay, etc.

gpu_ids

unsigned int list

[0]

The list of indices of GPUs for training task

num_epoch

unsigned int

120

The total number of epochs to run the experiment

>0

checkpoint_interval

unsigned int

10

The interval at which the checkpoints are saved

>0

clip_grad_norm

float

0.0

The amount to clip the gradient by the L2 norm. A value of 0.0 specifies no clipping.

>=0

resume_training_checkpoint_path

string

The path to a checkpoint to continue training

report_accuracy_per_class

bool

True

If True, the top1 precision of each class will be reported.

True/False

smooth_loss

bool

True

If True, the log-exp version of the triplet loss will be used.

True/False

batch_size

unsigned int

64

The batch size for training

>0

val_batch_size

unsigned int

64

The batch size for validation

>0

results_dir

string

The results directory of the train task

optim

The optim parameter defines the configuration for the Torch optimizer in training, including the learning rate, learning scheduler, and weight decay.

Copy
Copied!
            

optim: name: Adam steps: [40, 70] gamma: 0.1 warmup_factor: 0.01 warmup_iters: 10 warmup_method: 'linear' triplet_loss_margin: 0.3 miner_epsilon: 0.1 embedder: bias_lr_factor: 1 base_lr: 0.00035 momentum: 0.9 weight_decay: 0.0005 weight_decay_bias: 0.0005 trunk: bias_lr_factor: 1 base_lr: 0.00035 momentum: 0.9 weight_decay: 0.0005 weight_decay_bias: 0.0005

Parameter

Datatype

Default

Description

Supported Values

name

string

Adam

The name of the optimizer. The Algorithms in torch.optim are supported.

Adam/SGD/Adamax/…

steps

int list

[40, 70]

The steps to decrease the learning rate for the MultiStep scheduler

gamma

float

0.1

The decay rate for the WarmupMultiStepLR scheduler

>0.0

warmup_factor

float

0.01

The warmup factor for the WarmupMultiStepLR scheduler

>0.0

warmup_iters

unsigned int

10

The number of warmup iterations for the WarmupMultiStepLR scheduler

>0

warmup_method

string

linear

The warmup method for the optimizer

constant/linear

triplet_loss_margin

float

0.3

The desired difference between the anchor-positive distance and the anchor-negative distance

>0.0

miner_function_margin

float

0.1

Negative pairs are chosen if they have similarity greater than the hardest positive pair, minus this margin; positive pairs are chosen if they have similarity less than the hardest negative pair, plus the margin

>0.0

embedder

dict config

The learning rate configurations (LR Config) for the MLRecogNet embedder

trunk

dict config

The learning rate configurations (LR Config) for MLRecogNet trunk

LR config

Parameter

Datatype

Default

Description

Supported Values

base_lr

float

0.00035

The initial learning rate for the training

>0.0

bias_lr_factor

float

1

The bias learning rate factor for the WarmupMultiStepLR

>=1

momentum

float

0.9

The momentum for the WarmupMultiStepLR optimizer

>0.0

weight_decay

float

0.0005

The weight decay coefficient for the optimizer

>0.0

weight_decay_bias

float

0.0005

The weight decay bias for the optimizer

>0.0

dataset

The dataset parameter defines the dataset source, training batch size, and augmentation.

Copy
Copied!
            

dataset: train_dataset: /path/to/dataset/train val_dataset: reference: /path/to/dataset/reference query: /path/to/dataset/val workers: 8 pixel_mean: [0.485, 0.456, 0.406] pixel_std: [0.226, 0.226, 0.226] padding: 10 prob: 0.5 re_prob: 0.5 sampler: softmax_triplet num_instance: 4 gaussian_blur: enabled: True kernel: [15, 15] sigma: [0.3, 0.7] color_augmentation: enabled: True brightness: 0.5 contrast: 0.3 saturation: 0.1 hue: 0.1

Parameter

Datatype

Default

Description

Supported Values

train_dataset

string

The path to the train dataset. This field is only required for the train task.

val_dataset

dict

The map of reference set and query set addresses. For training and evaluation, both fields are required. For inference, only the reference set address is needed.

{“reference”: /path/to/reference/set, “query”: “”}

workers

unsigned int

8

The number of parallel workers processing data

>0

class_map

string

  • In tao model, the class_map is a YAML file mapping dataset class names to desired class names. If not specified, by default the reported class names are the folder names in the dataset folder.

  • In tao deploy, the class_map is a TXT file listing the class names line by line. And the line index would be the class index. By default the class names are the folder names and the order of the classes are alphanumeric.

pixel_mean

float list

[0.485, 0.456, 0.406]

The pixel mean for image normalization

float list

pixel_std

float list

[0.226, 0.226, 0.226]

The pixel standard deviation for image normalization

float list

num_instance

unsigned int

4

The number of image instances of the same person in a batch

>0

prob

float

0.5

The random horizontal flipping probability for image augmentation

>0

re_prob

float

0.5

The random erasing probability for image augmentation

>0

random_rotation

bool

True

If True, random rotations at 0 ~ 180 degrees to the input data are applied

True/False

gaussian_blur

dict config

The configuration of the Gaussian blur augmentation on input samples

color_augmentation

dict config

The configuration of the color augmentation on input samples

Gaussian Blur Config

Parameter

Datatype

Default

Description

Supported Values

enabled

bool

True

If True, applies Gaussian blur augmentation to input samples

True/False

kernel

unsigned int list

[15, 15]

The kernel size for the Gaussian blur

sigma

float list

[0.3, 0.7]

The sigma value range for the Gaussian blur

Color Augmentation Config

Parameter

Datatype

Default

Description

Supported Values

enabled

bool

True

If True, applies color augmentation to input samples

True/False

brightness

float

0.5

The value of jittering brightness

>=0

contrast

float

0.3

The value of jittering contrast

>=0

saturation

float

0.1

The value of jittering saturation

>=0

hue

float

0.1

The value of jittering hue

>=0, <=0.5

Use the following command to run MLRecogNet training:

Copy
Copied!
            

tao model ml_recog train -e <experiment_spec_file> -r <results_dir> [train.gpu_ids=<gpu id list>] [train.resume_training_checkpoint_path=<absolute path to \*.pth checkpoint>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file

  • -r, --results_dir: The path to a folder where the experiment outputs should be written

Optional Arguments

  • train.gpu_ids: The GPU indices list for training. If you set more than one GPU ID, multi-GPU training will be triggered automatically.

  • train.resume_training_checkpoint_path: The path to a checkpoint to continue training

Here’s an example of using the MLRecogNet training command:

Copy
Copied!
            

tao model ml_recog train -e $TRAIN_SPEC -r $RESULTS_DIR

Here’s an example of output $RESULTS_DIR/train/status.json:

Copy
Copied!
            

{"date": "6/20/2023", "time": "23:11:2", "status": "STARTED", "verbosity": "INFO", "message": "Starting Training Loop."} ... {"date": "6/20/2023", "time": "23:11:22", "status": "SUCCESS", "verbosity": "INFO", "message": "Train finished successfully."}


Here is an example spec $EVAL_SPEC for evaluating an MLRecogNet model on a test dataset.

Copy
Copied!
            

results_dir: /path/to/root/results/dir model: backbone: resnet_50 input_width: 224 input_height: 224 feat_dim: 256 dataset: workers: 8 val_dataset: reference: /path/to/dataset/reference query: /path/to/dataset/val evaluate: checkpoint: /path/to/checkpoint batch_size: 128 results_dir: /path/to/results

Parameter

Datatype

Default

Description

Supported Values

checkpoint

string

None

The path to the .pth Torch model to be evaluated

trt_engine

string

None

The path to the TensorRT (TRT) engine to be evaluated. Currently, only trt_engine is supported in TAO Deploy

gpu_id

unsigned int

0

The GPU ID for evaluation. Currently, evaluation is only supported on a single GPU.

>=0

topk

int

1

If greater than 1, the accuracy will be top-k precision. Currently, only evaluate.topk is supported in TAO Deploy

>0

batch_size

int

64

The batch size for the evaluation task

>0

report_accuracy_per_class

bool

True

If True, the top-1 precision of each class will be reported

True/False

results_dir

string

None

The path to the results directory of the evaluation task

The following are evaluation metrics for MLRecogNet:

  • Adjusted Mutual Information (AMI): A measure used in statistics and information theory to quantify the agreement between two assignments, such as cluster assignments, which is adjusted for chance and therefore provides a more accurate depiction of the similarity between the two compared to raw mutual information.

  • Normalized Mutual Information (NMI): A normalization of the Mutual Information (MI) score to scale the results between 0 (no mutual information) and 1 (perfect correlation).

  • Mean Average Precision: The average precision achieved by a model across different recall levels, providing a comprehensive evaluation of its performance on information retrieval.

  • Mean Average Precision at r: A model’s average precision for the top-R ranked results, offering insight into the effectiveness of the retrieval or object detection performance of the model when considering a limited number of results

  • Mean Reciprocal Rank: The average of the inverse ranks of the first relevant result for a set of queries, emphasizing the importance of retrieving relevant information as early as possible.

  • Precision at 1: The accuracy of the nearest neighbor retrievals

  • R Precision: An evaluation metric for information retrieval systems that measures the proportion of relevant documents among the top-R ranked results, where “”R corresponds to the total number of relevant documents for a given query.

When evaluate.report_accuracy_per_class is set to True, the accuracy of each class will be added.

Use the following command to run MLRecogNet evaluation:

Copy
Copied!
            

tao model ml_recog evaluate -e <experiment_spec_file> evaluate.checkpoint=<model to be evaluated> dataset.val_dataset.reference=<path to test reference set> dataset.val_dataset.query=<path to test query set> evaluate.results_dir=<directory of the saved JSON file> [evaluate.gpu_id=<gpu index>]

Required Arguments

  • -e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment

  • evaluate.checkpoint: The path to the .pth model to be evaluated

  • dataset.val_dataset.reference: The path to the test reference set

  • dataset.val_dataset.query: The path to the test query set

  • evaluate.results_dir: The path to the directory where the output JSON file is saved. If this value is not specified, the output JSON file will be saved to results_dir.

Optional Argument

  • evaluate.gpu_id: The GPU index used to run the evaluation. You can specify the GPU index used to run evaluation when the machine has multiple GPUs installed. Note that evaluation can only run on a single GPU.

Here’s an example of using the MLRecogNet evaluation command:

Copy
Copied!
            

tao model ml_recog evaluate -e $EVAL_SPEC evaluate.checkpoint=$TRAINED_PTH_MODEL dataset.val_dataset.reference=$DATA/test/reference dataset.val_dataset.query=$DATA/test/query

Here’s an example of output $RESULTS_DIR/evaluate/status.json:

Copy
Copied!
            

{"date": "6/2/2023", "time": "6:12:16", "status": "STARTED", "verbosity": "INFO", "message": "Starting Metric Learning Recognition evaluate."} {"date": "6/2/2023", "time": "6:12:17", "status": "STARTED", "verbosity": "INFO", "message": "Loading checkpoint:$RESULTS_DIR/train/ml_model_epoch=000.pth"} {"date": "6/2/2023", "time": "6:12:17", "status": "RUNNING", "verbosity": "INFO", "message": "Constructing model graph..."} {"date": "6/2/2023", "time": "6:12:17", "status": "SKIPPED", "verbosity": "INFO", "message": "Skipped loading pretrained model as checkpoint is to load."} {"date": "6/2/2023", "time": "6:12:23", "status": "SUCCESS", "verbosity": "INFO", "message": "Evaluate finished successfully.", "kpi": {"AMI": 0.8074901483322209, "NMI": 0.8118350536509751, "Mean Average Precision": 0.6876838920302153, "Mean Reciprocal Rank": 0.992727267742157, "r-Precision": 0.666027864375903, "Precision at Rank 1": 0.989090909090909}}

Below is an example of the printouts:

Copy
Copied!
            

Starting Metric Learning Recognition evaluate. Experiment configuration: ... results_dir: $RESULTS_DIR Loading checkpoint: $RESULTS_DIR/train/ml_model_epoch=000.pth Constructing model graph... Skipped loading pretrained model as checkpoint is to load. Evaluating epoch eval mode ... Computing accuracy for the query split w.r.t ['gallery'] running k-nn with k=106 embedding dimensionality is 256 /usr/local/lib/python3.8/dist-packages/torch/storage.py:315: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. warnings.warn(message, UserWarning) running k-means clustering with k=5 embedding dimensionality is 256 ******************* Evaluation results ********************** AMI: 0.8075 NMI: 0.8118 Mean Average Precision: 0.7560 Mean Reciprocal Rank: 0.9922 r-Precision: 0.7421 Precision at Rank 1: 0.9882 *************************************************************


Here is an example spec $INFERENCE_SPEC for running MLRecogNet model inference on an inference set.

Copy
Copied!
            

results_dir: /path/to/root/results/dir model: backbone: resnet_50 input_width: 224 input_height: 224 feat_dim: 256 dataset: workers: 8 val_dataset: reference: /path/to/dataset/reference query: "" inference: input_path: /path/to/dataset/test inference_input_type: classification_folder checkpoint: /path/to/model/checkpoint results_dir: /path/to/results/dir batch_size: 128

Parameter

Datatype

Default

Description

Supported Values

checkpoint

string

None

The path to the .pth torch model to run inference

trt_engine

string

None

The path to the TensorRT (TRT) engine to run inference. Currently, only trt_engine is supported in TAO Deploy.

gpu_id

unsigned int

0

The GPU ID for inference. Currently, inference is only supported on a single GPU.

>=0

input_path

string

The path to the data to run inference on

>0

inference_input_type

string

“image_folder”

Three options are supported:

  • image_folder: Used when input_path is a folder of images.

  • classification_folder: Used when input_path is an ImageNet structured folder.

  • image: Used when input_path is an image file

“image_folder”/”classification_folder”/”image”

batch_size

int

64

The batch size for the inference task

>0

topk

int

1

The number of top results to be returned

>0

results_dir

string

None

The path to the results directory of the inference task

Use the following command to run inference on MLRecogNet with the .pth model:

Copy
Copied!
            

tao model ml_recog inference -e <experiment_spec> inference.checkpoint=<inference model> dataset.val_dataset.reference=<path to gallery data> inference.input_path=<path to query data> inference.results_dir=<path to output file> [inference.gpu_id=<gpu index>]

The output will be a CSV file that contains the feature embeddings of all the query data and their predicted labels.

Required Arguments

  • -e, --experiment_spec: The experiment spec file to set up inference

  • inference.checkpoint: The .pth model to perform inference with

  • dataset.val_dataset.reference: The path to the reference set

  • inference.input_path: The path to the data to run inference on

  • inference.results_dir: The directory to the saved output CSV file

Optional Argument

  • inference.gpu_id: The index of the GPU that will be used to run inference. You can specify this value when the machine has multiple GPUs installed. Note that inference can only run on a single GPU.

Here’s an example of using the MLRecogNet inference command:

Copy
Copied!
            

tao model ml_recog inference -e $INFERENCE_SPEC inference.checkpoint=$TRAINED_PTH_MODEL dataset.val_data.reference=$DATA/test/reference inference.input_path=$DATA/test/query inference.inference_input_type=classification_folder inference.results_dir=$OUTPUT_DIR

The expected output is as follows:

Copy
Copied!
            

/path/to/images/c000001_10.png,"['c000001', 'c000005', 'c000001', 'c000005']","[5.0030694183078595e-06, 5.5495906963187736e-06, 5.976316515443614e-06, 6.004379429214168e-06]" /path/to/images/c000001_11.png,"['c000001', 'c000005', 'c000001', 'c000001']","[3.968068540416425e-06, 5.043690180173144e-06, 5.885293830942828e-06, 6.030047643434955e-06]" /path/to/images/c000001_120.png,"['c000001', 'c000001', 'c000005', 'c000003']","[1.9612791675172048e-06, 4.112744136364199e-06, 4.603011802828405e-06, 5.8091877690458205e-06]"

Where the first column contains the inference image paths, the second column the top-k predicted labels, and the third column the embedding vector distances of the top-k results.

Here’s an example of output $RESULTS_DIR/inference/status.json:

Copy
Copied!
            

{"date": "6/2/2023", "time": "6:13:47", "status": "STARTED", "verbosity": "INFO", "message": "Starting Metric Learning Recognition inference."} {"date": "6/2/2023", "time": "6:13:47", "status": "STARTED", "verbosity": "INFO", "message": "Loading checkpoint:$RESULTS_DIR/train/ml_model_epoch=001.pth"} {"date": "6/2/2023", "time": "6:13:47", "status": "RUNNING", "verbosity": "INFO", "message": "Constructing model graph..."} {"date": "6/2/2023", "time": "6:13:48", "status": "SKIPPED", "verbosity": "INFO", "message": "Skipped loading pretrained model as checkpoint is to load."} {"date": "6/2/2023", "time": "6:14:6", "status": "SUCCESS", "verbosity": "INFO", "message": "result saved at$RESULTS_DIR/inference/result.csv"} {"date": "6/2/2023", "time": "6:14:6", "status": "SUCCESS", "verbosity": "INFO", "message": "Inference finished successfully."}

Below is an example of the printouts:

Copy
Copied!
            

Starting Metric Learning Recognition inference. Experiment configuration: ... Loading checkpoint: $RESULTS_DIR/train/ml_model_epoch=001.pth Constructing model graph... Skipped loading pretrained model as checkpoint is to load. /usr/local/lib/python3.8/dist-packages/torch/storage.py:315: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. warnings.warn(message, UserWarning) ... result saved at $RESULTS_DIR/inference/result.csv Inference finished successfully.


Here is an example spec $EXPORT_SPEC for exporting the MLRecogNet model.

Copy
Copied!
            

results_dir: /path/to/root/results/dir model: backbone: resnet_50 input_width: 224 input_height: 224 feat_dim: 256 export: checkpoint: /path/to/checkpoint onnx_file: /path/to/results/model.onnx results_dir: /path/to/results batch_size: -1 on_cpu: false verbose: true

Parameter

Datatype

Default

Description

Supported Values

checkpoint

string

None

the path to the .pth Torch model to be evaluated

onnx_file

string

None

The path to the exported ONNX file. If this value is not specified, it defaults to model.onnx in export.results_dir

batch_size

int

-1

The batch size of the exported ONNX model. If batch_size is -1, the exported ONNX model has a dynamic batch size.

>0; -1

gpu_id

unsigned int

0

The GPU ID for Torch-to-ONNX export. Currently, the export task only supports running on a single GPU

>=0

on_cpu

bool

False

If True, the Torch-to-ONNX export will be performed on CPU

True/False

opset_version

unsigned int

14

The version of the default (ai.onnx) opset to target

>= 7 and <= 16.

verbose

bool

True

If True, prints a description of the model being exported to stdout.

True/False

results_dir

string

None

The path to the results directory of the export task

Use the following command to export MLRecogNet to the .onnx format for deployment:

Copy
Copied!
            

tao model ml_recog export -e <experiment_spec> export.checkpoint=<.pth checkpoint to be exported> [export.gpu_id=<gpu index>] [export.onnx_file=<path to exported ONNX file>]

Required Arguments

  • -e, --experiment_spec: The experiment spec file to set up export

  • export.checkpoint: The .pth model to be exported

Optional Arguments

  • export.gpu_id: The index of the GPU that will be used to run the export. You can specify this value when the machine has multiple GPUs installed. Note that export can only run on a single GPU.

  • export.onnx_file: The path to save the exported model to. The default path is in the same directory as the export.results_dir (if any) or results_dir.

Here’s an example of using the MLRecogNet export command:

Copy
Copied!
            

tao model ml_recog export -e $EXPORT_SPEC export.checkpoint=$TRAINED_PTH_MODEL

Here’s an example of output $RESULTS_DIR/export/status.json:

Copy
Copied!
            

{"date": "6/2/2023", "time": "6:17:45", "status": "STARTED", "verbosity": "INFO", "message": "Starting Metric Learning Recognition export."} {"date": "6/2/2023", "time": "6:17:45", "status": "STARTED", "verbosity": "INFO", "message": "Loading checkpoint:$RESULTS_DIR/train/ml_model_epoch=001.pth"} {"date": "6/2/2023", "time": "6:17:45", "status": "RUNNING", "verbosity": "INFO", "message": "Constructing model graph..."} {"date": "6/2/2023", "time": "6:17:46", "status": "SKIPPED", "verbosity": "INFO", "message": "Skipped loading pretrained model as checkpoint is to load."} {"date": "6/2/2023", "time": "6:17:46", "status": "STARTED", "verbosity": "INFO", "message": "Exporting model to ONNX"} {"date": "6/2/2023", "time": "6:17:48", "status": "STARTED", "verbosity": "INFO", "message": "Simplifying ONNX model"} {"date": "6/2/2023", "time": "6:17:50", "status": "SUCCESS", "verbosity": "INFO", "message": "ONNX model saved at$RESULTS_DIR/export/ml_model_epoch=001.onnx"} {"date": "6/2/2023", "time": "6:17:50", "status": "SUCCESS", "verbosity": "INFO", "message": "Export finished successfully."}

Below is an example of the printouts:

Copy
Copied!
            

Starting Metric Learning Recognition export. Experiment configuration: ... Loading checkpoint: $RESULTS_DIR/train/ml_model_epoch=001.pth Constructing model graph... Skipped loading pretrained model as checkpoint is to load. Exporting model to ONNX Exported graph: graph(%input : Float(*, 3, 224, 224, strides=[150528, 50176, 224, 1], requires_grad=0, device=cuda:0), ... ========== Diagnostic Run torch.onnx.export version 1.14.0a0+44dac51 =========== verbose: False, log level: Level.ERROR ======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ======================== Simplifying ONNX model Checking 0/3... Checking 1/3... Checking 2/3... ONNX model saved at $RESULTS_DIR/export/ml_model_epoch=001.onnx Export finished successfully.


You can use TAO Deploy to deploy the trained deep-learning and computer-vision models on edge devices–such as a Jetson Xavier, Jetson Nano, or Tesla–or in the cloud with NVIDIA GPUs. TAO Deploy an application in TAO Toolkit that converts an ONNX model to a TensorRT engine and runs inferences through the TensorRT engine.

Running MLRecogNet Inference on TAO Deploy

The MLRecogNet ONNX file generated from export is taken as input to TAO Deploy to generate an optimized TensorRT engine. For more information about using TAO Deploy to run inference on an MLRecogNet TensorRT engine, refer to the TAO Deploy documentation.

© Copyright 2023, NVIDIA.. Last updated on Jul 27, 2023.