MLRecogNet with TAO Deploy#

To generate an optimized TensorRT engine for MLRecogNet, the gen_trt_engine action takes an ONNX file previously produced by the MLRecogNet export action. MLRecogNet supports FP32, FP16, and INT8 data types.

For more information about training an MLRecogNet model, refer to the MLRecogNet training documentation.

Each task is explained in detail in the following sections.

Converting ONNX File into TensorRT Engine#

Here is an example spec $TRT_GEN_SPEC for generating TensorRT engine from the exported MLRecogNet onnx model.

trt_config#

The trt_config parameter provides options related to TensorRT generation.

results_dir: /path/to/results/dir
dataset:
  val_dataset:
    reference: /path/to/reference/set
    query: /path/to/query/set
  pixel_mean: [0.485, 0.456, 0.406]
  pixel_std: [0.226, 0.226, 0.226]

model:
  input_channel: 3
  input_width: 224
  input_height: 224

gen_trt_engine:
  gpu_id: 0
  onnx_file: /path/to/exported/onnx/file
  trt_engine: /path/to/trt/engine/to/generate
  tensorrt:
    data_type: int8
    workspace_size: 1024
    min_batch_size: 1
    opt_batch_size: 10
    max_batch_size: 10
    calibration:
      cal_cache_file: /path/to/calibration/cache/file/to/generate
      cal_batch_size: 16
      cal_batches: 100
      cal_image_dir:
        - /path/to/calibration/image/folder

Parameter

Datatype

Default

Description

Supported Values

data_type

string

FP32

The precision to be used for the TensorRT engine

FP32/FP16/INT8

workspace_size

unsigned int

1024

The maximum workspace size for the TensorRT engine

>1024

min_batch_size

unsigned int

1

The minimum batch size for optimization profile shape

>0

opt_batch_size

unsigned int

1

The optimal batch size for optimization profile shape

>0

max_batch_size

unsigned int

1

The maximum batch size for optimization profile shape

>0

calibration

dict config

None

The configuration for the INT8 calibration

Calibration Config#

Parameter

Datatype

Default

Description

Supported Values

cal_cache_file

string

None

The path to calibration cache file. If there’s no calibration cache file at this path, a cache file is generated based on the the other calibration config parameters.

cal_batch_size

unsigned int

1

the batch size of calibration dataset

>0

cal_batches

unsigned int

1

The number of batches used for calibration. In total, there are cal_batches`x:code:`cal_batch_size calibration images used.

>0

cal_image_dir

string

None

The directory containing the calibration images

Ask the agent to run the gen_trt_engine action against your spec. For example:

Build an FP16 TensorRT engine for MLRecogNet from the exported ONNX at
``s3://my-bucket/mlrecog/model.onnx`` using ``trt-spec.yaml``. Write the
engine to ``s3://my-bucket/mlrecog/model.engine``. Run on the local Docker daemon.

A successful run writes a status.json with a SUCCESS message to the configured results directory.

Running Evaluation through TensorRT Engine#

Use the same specification file as the TAO evaluation specification file. The following is a sample specification file:

evaluate:
  trt_engine: /path/to/generated/trt_engine
  batch_size: 8
  topk: 5
dataset:
  val_dataset:
    reference: /path/to/reference/set
    query: /path/to/query/set

Ask the agent to run the evaluate action against the engine you built. For example:

Evaluate the MLRecogNet TensorRT engine at
``s3://my-bucket/mlrecog/model.engine`` against ``eval-spec.yaml``. Run
on local Docker.

A successful run writes Top-K accuracy, a confusion matrix, and a classification report to the configured results directory.

Running Inference through TensorRT Engine#

Use the same specification file as the TAO inference specification file. The following is a sample specification file:

results_dir: "/path/to/output_dir"
model:
  input_channels: 3
input_width: 224
input_height: 224
inference:
  trt_engine: "/path/to/generated/trt_engine"
  batch_size: 10
  inference_input_type: classification_folder
  topk: 5
dataset:
  val_dataset:
    reference: "/path/to/reference/set"
    query: ""

Ask the agent to run the inference action against the engine you built. For example:

Run MLRecogNet inference with the TensorRT engine at
``s3://my-bucket/mlrecog/model.engine`` using ``infer-spec.yaml``. Run on
your chosen backend.

JSON-format results are written to trt_inference under the configured results directory.