Classification (PyTorch) with TAO Deploy#

To generate an optimized TensorRT engine for classification (PyTorch), the gen_trt_engine action takes an ONNX file previously produced by the classification (PyTorch) export action. For more information about training a classification (PyTorch) model, refer to the Classification PyTorch training documentation. With TAO 5.0.0, INT8 precision is not supported for classification (PyTorch) models.

Converting .onnx File into TensorRT Engine#

The gen_trt_engine section of the spec configures TensorRT engine generation. You can reuse the spec from the classification (PyTorch) export action as a starting point.

gen_trt_engine:
  onnx_file: /path/to/onnx_file
  trt_engine: /path/to/trt_engine
  input_channel: 3
  input_width: 224
  input_height: 224
  tensorrt:
    data_type: fp16
    workspace_size: 1024
    min_batch_size: 1
    opt_batch_size: 16
    max_batch_size: 16

Parameter

Datatype

Default

Description

Supported Values

onnx_file

string

The precision to be used for the TensorRT engine

trt_engine

string

The maximum workspace size for the TensorRT engine

input_channel

unsigned int

3

The input channel size. Only the value 3 is supported.

3

input_width

unsigned int

224

The input width

>0

input_height

unsigned int

224

The input height

>0

batch_size

unsigned int

-1

The batch size of the ONNX model

>=-1

verbose

bool

False

Enables verbosity for the TensorRT log

tensorrt#

The tensorrt parameter defines TensorRT engine generation.

Parameter

Datatype

Default

Description

Supported Values

data_type

string

fp32

The precision to be used for the TensorRT engine

fp32/fp16/int8

workspace_size

unsigned int

1024

The maximum workspace size for the TensorRT engine

>1024

min_batch_size

unsigned int

1

The minimum batch size used for the optimization profile shape

>0

opt_batch_size

unsigned int

1

The optimal batch size used for the optimization profile shape

>0

max_batch_size

unsigned int

1

The maximum batch size used for the optimization profile shape

>0

Ask the agent to run the gen_trt_engine action against your spec. For example:

Build an FP16 TensorRT engine for the classification PyTorch model from
the exported ONNX at ``s3://my-bucket/cls/model.onnx`` using
``trt-spec.yaml``. Write the engine to ``s3://my-bucket/cls/model.engine``.
Run on the local Docker daemon.

Running Evaluation through a TensorRT Engine#

You can reuse the TAO evaluation specification file for evaluation through a TensorRT engine. The classes field is only required if you are using a custom class names. If this field is not provided, class mapping is based on the alphanumerical order of the image folder names. The following is a sample specification file:

evaluate:
  trt_engine: /path/to/engine/file
  topk: 1
dataset:
  data:
    samples_per_gpu: 16
    test:
      data_prefix: /raid/ImageNet2012/ImageNet2012/val
      classes: /raid/ImageNet2012/classnames.txt

Ask the agent to run the evaluate action against the engine you built. For example:

Evaluate the classification PyTorch TensorRT engine at
``s3://my-bucket/cls/model.engine`` against ``eval-spec.yaml``. Run on
local Docker.

Note

Currently there is an accuracy regression with TAO Classification with LogisticRegressionHead in TAO Deploy TRT evaluation compared to TAO PyTorch evaluation. This will be addressed in the next release.

Running Inference through a TensorRT Engine#

You can reuse the TAO inference specification file for inference through a TensorRT engine. The following is a sample specification file:

inference:
  trt_engine: /path/to/engine/file
dataset:
  data:
    samples_per_gpu: 16
    test:
      data_prefix: /raid/ImageNet2012/ImageNet2012/val
      classes: /raid/ImageNet2012/classnames.txt

Ask the agent to run the inference action against the engine you built. For example:

Run classification PyTorch inference with the TensorRT engine at
``s3://my-bucket/cls/model.engine`` using ``infer-spec.yaml``. Run on
your chosen backend.

Annotated visualizations are written to images_annotated under the configured results directory, and KITTI-format predictions are written to labels.