MLRecogNet with TAO Deploy#

To generate an optimized TensorRT engine for MLRecogNet, the gen_trt_engine action takes an ONNX file previously produced by the MLRecogNet export action. MLRecogNet supports FP32, FP16, and INT8 data types.

For more information about training an MLRecogNet model, refer to the MLRecogNet training documentation.

Each task is explained in detail in the following sections.

Converting ONNX File into TensorRT Engine#

Here is an example spec $TRT_GEN_SPEC for generating TensorRT engine from the exported MLRecogNet onnx model.

trt_config#

The trt_config parameter provides options related to TensorRT generation.

results_dir: /path/to/results/dir
dataset:
  val_dataset:
    reference: /path/to/reference/set
    query: /path/to/query/set
  pixel_mean: [0.485, 0.456, 0.406]
  pixel_std: [0.226, 0.226, 0.226]

model:
  input_channel: 3
  input_width: 224
  input_height: 224

gen_trt_engine:
  gpu_id: 0
  onnx_file: /path/to/exported/onnx/file
  trt_engine: /path/to/trt/engine/to/generate
  tensorrt:
    data_type: int8
    workspace_size: 1024
    min_batch_size: 1
    opt_batch_size: 10
    max_batch_size: 10
    calibration:
      cal_cache_file: /path/to/calibration/cache/file/to/generate
      cal_batch_size: 16
      cal_batches: 100
      cal_image_dir:
        - /path/to/calibration/image/folder

Parameter	Datatype	Default	Description	Supported Values
`data_type`	string	FP32	The precision to be used for the TensorRT engine	FP32/FP16/INT8
`workspace_size`	unsigned int	1024	The maximum workspace size for the TensorRT engine	>1024
`min_batch_size`	unsigned int	1	The minimum batch size for optimization profile shape	>0
`opt_batch_size`	unsigned int	1	The optimal batch size for optimization profile shape	>0
`max_batch_size`	unsigned int	1	The maximum batch size for optimization profile shape	>0
`calibration`	dict config	None	The configuration for the INT8 calibration

Calibration Config#

Parameter	Datatype	Default	Description	Supported Values
`cal_cache_file`	string	None	The path to calibration cache file. If there’s no calibration cache file at this path, a cache file is generated based on the the other `calibration` config parameters.
`cal_batch_size`	unsigned int	1	the batch size of calibration dataset	>0
`cal_batches`	unsigned int	1	The number of batches used for calibration. In total, there are cal_batches`x:code:`cal_batch_size calibration images used.	>0
`cal_image_dir`	string	None	The directory containing the calibration images

Ask the agent to run the gen_trt_engine action against your spec. For example:

Build an FP16 TensorRT engine for MLRecogNet from the exported ONNX at
``s3://my-bucket/mlrecog/model.onnx`` using ``trt-spec.yaml``. Write the
engine to ``s3://my-bucket/mlrecog/model.engine``. Run on the local Docker daemon.

A successful run writes a status.json with a SUCCESS message to the configured results directory.

Running Evaluation through TensorRT Engine#

Use the same specification file as the TAO evaluation specification file. The following is a sample specification file:

evaluate:
  trt_engine: /path/to/generated/trt_engine
  batch_size: 8
  topk: 5
dataset:
  val_dataset:
    reference: /path/to/reference/set
    query: /path/to/query/set

Ask the agent to run the evaluate action against the engine you built. For example:

Evaluate the MLRecogNet TensorRT engine at
``s3://my-bucket/mlrecog/model.engine`` against ``eval-spec.yaml``. Run
on local Docker.

A successful run writes Top-K accuracy, a confusion matrix, and a classification report to the configured results directory.

Running Inference through TensorRT Engine#

Use the same specification file as the TAO inference specification file. The following is a sample specification file:

results_dir: "/path/to/output_dir"
model:
  input_channels: 3
input_width: 224
input_height: 224
inference:
  trt_engine: "/path/to/generated/trt_engine"
  batch_size: 10
  inference_input_type: classification_folder
  topk: 5
dataset:
  val_dataset:
    reference: "/path/to/reference/set"
    query: ""

Ask the agent to run the inference action against the engine you built. For example:

Run MLRecogNet inference with the TensorRT engine at
``s3://my-bucket/mlrecog/model.engine`` using ``infer-spec.yaml``. Run on
your chosen backend.

JSON-format results are written to trt_inference under the configured results directory.