Classification (PyTorch) with TAO Deploy#
To generate an optimized TensorRT engine for classification (PyTorch), the gen_trt_engine
action takes an ONNX file previously produced by the classification (PyTorch) export action.
For more information about training a classification (PyTorch) model, refer to the
Classification PyTorch training documentation. With TAO 5.0.0,
INT8 precision is not supported for classification (PyTorch) models.
Converting .onnx File into TensorRT Engine#
The gen_trt_engine section of the spec configures TensorRT engine generation. You can reuse
the spec from the classification (PyTorch) export action as a starting point.
gen_trt_engine:
onnx_file: /path/to/onnx_file
trt_engine: /path/to/trt_engine
input_channel: 3
input_width: 224
input_height: 224
tensorrt:
data_type: fp16
workspace_size: 1024
min_batch_size: 1
opt_batch_size: 16
max_batch_size: 16
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
string |
The precision to be used for the TensorRT engine |
||
|
string |
The maximum workspace size for the TensorRT engine |
||
|
unsigned int |
3 |
The input channel size. Only the value 3 is supported. |
3 |
|
unsigned int |
224 |
The input width |
>0 |
|
unsigned int |
224 |
The input height |
>0 |
|
unsigned int |
-1 |
The batch size of the ONNX model |
>=-1 |
|
bool |
False |
Enables verbosity for the TensorRT log |
tensorrt#
The tensorrt parameter defines TensorRT engine generation.
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
string |
fp32 |
The precision to be used for the TensorRT engine |
fp32/fp16/int8 |
|
unsigned int |
1024 |
The maximum workspace size for the TensorRT engine |
>1024 |
|
unsigned int |
1 |
The minimum batch size used for the optimization profile shape |
>0 |
|
unsigned int |
1 |
The optimal batch size used for the optimization profile shape |
>0 |
|
unsigned int |
1 |
The maximum batch size used for the optimization profile shape |
>0 |
Ask the agent to run the gen_trt_engine action against your spec. For example:
Build an FP16 TensorRT engine for the classification PyTorch model from
the exported ONNX at ``s3://my-bucket/cls/model.onnx`` using
``trt-spec.yaml``. Write the engine to ``s3://my-bucket/cls/model.engine``.
Run on the local Docker daemon.
Running Evaluation through a TensorRT Engine#
You can reuse the TAO evaluation specification file for evaluation through a TensorRT engine.
The classes field is only required if you are using a custom class names. If this field is not provided, class
mapping is based on the alphanumerical order of the image folder names. The following is a sample specification file:
evaluate:
trt_engine: /path/to/engine/file
topk: 1
dataset:
data:
samples_per_gpu: 16
test:
data_prefix: /raid/ImageNet2012/ImageNet2012/val
classes: /raid/ImageNet2012/classnames.txt
Ask the agent to run the evaluate action against the engine you built. For example:
Evaluate the classification PyTorch TensorRT engine at
``s3://my-bucket/cls/model.engine`` against ``eval-spec.yaml``. Run on
local Docker.
Note
Currently there is an accuracy regression with TAO Classification with LogisticRegressionHead in TAO Deploy TRT evaluation compared to TAO PyTorch evaluation. This will be addressed in the next release.
Running Inference through a TensorRT Engine#
You can reuse the TAO inference specification file for inference through a TensorRT engine. The following is a sample specification file:
inference:
trt_engine: /path/to/engine/file
dataset:
data:
samples_per_gpu: 16
test:
data_prefix: /raid/ImageNet2012/ImageNet2012/val
classes: /raid/ImageNet2012/classnames.txt
Ask the agent to run the inference action against the engine you built. For example:
Run classification PyTorch inference with the TensorRT engine at
``s3://my-bucket/cls/model.engine`` using ``infer-spec.yaml``. Run on
your chosen backend.
Annotated visualizations are written to images_annotated under the configured
results directory, and KITTI-format predictions are written to labels.