Classification (PyTorch) with TAO Deploy#

To generate an optimized TensorRT engine for classification (PyTorch), the gen_trt_engine action takes an ONNX file previously produced by the classification (PyTorch) export action. For more information about training a classification (PyTorch) model, refer to the Classification PyTorch training documentation. With TAO 5.0.0, INT8 precision is not supported for classification (PyTorch) models.

Converting .onnx File into TensorRT Engine#

The gen_trt_engine section of the spec configures TensorRT engine generation. You can reuse the spec from the classification (PyTorch) export action as a starting point.

gen_trt_engine:
  onnx_file: /path/to/onnx_file
  trt_engine: /path/to/trt_engine
  input_channel: 3
  input_width: 224
  input_height: 224
  tensorrt:
    data_type: fp16
    workspace_size: 1024
    min_batch_size: 1
    opt_batch_size: 16
    max_batch_size: 16

Parameter	Datatype	Default	Description	Supported Values
`onnx_file`	string		The precision to be used for the TensorRT engine
`trt_engine`	string		The maximum workspace size for the TensorRT engine
`input_channel`	unsigned int	3	The input channel size. Only the value 3 is supported.	3
`input_width`	unsigned int	224	The input width	>0
`input_height`	unsigned int	224	The input height	>0
`batch_size`	unsigned int	-1	The batch size of the ONNX model	>=-1
`verbose`	bool	False	Enables verbosity for the TensorRT log

tensorrt#

The tensorrt parameter defines TensorRT engine generation.

Parameter	Datatype	Default	Description	Supported Values
`data_type`	string	fp32	The precision to be used for the TensorRT engine	fp32/fp16/int8
`workspace_size`	unsigned int	1024	The maximum workspace size for the TensorRT engine	>1024
`min_batch_size`	unsigned int	1	The minimum batch size used for the optimization profile shape	>0
`opt_batch_size`	unsigned int	1	The optimal batch size used for the optimization profile shape	>0
`max_batch_size`	unsigned int	1	The maximum batch size used for the optimization profile shape	>0

Ask the agent to run the gen_trt_engine action against your spec. For example:

Build an FP16 TensorRT engine for the classification PyTorch model from
the exported ONNX at ``s3://my-bucket/cls/model.onnx`` using
``trt-spec.yaml``. Write the engine to ``s3://my-bucket/cls/model.engine``.
Run on the local Docker daemon.

Running Evaluation through a TensorRT Engine#

You can reuse the TAO evaluation specification file for evaluation through a TensorRT engine. The classes field is only required if you are using a custom class names. If this field is not provided, class mapping is based on the alphanumerical order of the image folder names. The following is a sample specification file:

evaluate:
  trt_engine: /path/to/engine/file
  topk: 1
dataset:
  data:
    samples_per_gpu: 16
    test:
      data_prefix: /raid/ImageNet2012/ImageNet2012/val
      classes: /raid/ImageNet2012/classnames.txt

Ask the agent to run the evaluate action against the engine you built. For example:

Evaluate the classification PyTorch TensorRT engine at
``s3://my-bucket/cls/model.engine`` against ``eval-spec.yaml``. Run on
local Docker.

Note

Currently there is an accuracy regression with TAO Classification with LogisticRegressionHead in TAO Deploy TRT evaluation compared to TAO PyTorch evaluation. This will be addressed in the next release.

Running Inference through a TensorRT Engine#

You can reuse the TAO inference specification file for inference through a TensorRT engine. The following is a sample specification file:

inference:
  trt_engine: /path/to/engine/file
dataset:
  data:
    samples_per_gpu: 16
    test:
      data_prefix: /raid/ImageNet2012/ImageNet2012/val
      classes: /raid/ImageNet2012/classnames.txt

Ask the agent to run the inference action against the engine you built. For example:

Run classification PyTorch inference with the TensorRT engine at
``s3://my-bucket/cls/model.engine`` using ``infer-spec.yaml``. Run on
your chosen backend.

Annotated visualizations are written to images_annotated under the configured results directory, and KITTI-format predictions are written to labels.