TAO Deploy Overview#

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference. It provides APIs and parsers to import trained models from all major deep learning frameworks; it then generates optimized runtime engines deployable in a data center, as well as in automotive and embedded environments. To understand TensorRT and its capabilities better, refer to the official TensorRT documentation.
Models trained in TAO are deployed to NVIDIA inference SDKs, like DeepStream,
via TensorRT. Computer vision models trained by TAO can be consumed by TensorRT via tao deploy
, which
is included as part of the tao
launcher. TAO Deploy parses the exported .onnx
model file and generates an optimized TensorRT engine.
These engines can be generated to support inference at low precision (e.g. FP16
or INT8
).
While most of the TAO models support direct integration of the .onnx
files with DeepStream, DeepStream can also
consume the optimized engine generated by tao deploy
.
TAO Deploy separates the model training and optimization steps from deployment by parsing a
.onnx
file to generate an optimized TensorRT engine. TAO Deploy also provides tools to run evaluation and inference
using the original TAO spec file. With TAO Deploy, you can perform the following tasks:
gen_trt_engine
evaluate
inference
TAO Deploy Installation#
This section describes how to install and run TAO Deploy by the following methods:
When you invoke the tao deploy
command through the TAO launcher, the launcher pulls the tao deploy
from NGC and instantiates it.
The TAO Deploy container contains only a few lightweight Python packages such as OpenCV, Numpy, Pillow, and ONNX. It is based on the NGC TensorRT container.
Running TAO Deploy with the Launcher#
Like other TAO commands, the TAO Deploy CLI follows a cascaded structure:
tao deploy <task> <sub-task> <args>
Currently, TAO Deploy only supports computer vision models. For example, DetectNet_v2 is a computer vision task for
object detection in TAO and supports the gen_trt_engine
, evaluate
, and inference
subtasks.
When you execute a command like tao deploy detectnet_v2 gen_trt_engine --help
, the TAO
Launcher does the following:
Pulls the TAO Deploy container with the entrypoint for
detectnet_v2
.Creates an instance of the container.
Runs the
detectnet_v2
entrypoint with thegetn_trt_engine
sub-task.
Once the TAO launcher has been installed, the workflow to run the launcher is as follows.
List the tasks supported in the deploy docker.
You can list the
tao deploy
tasks that are supported in the TAO Launcher using thetao deploy --help`
command:usage: tao deploy [-h] {list,stop,info,dataset,deploy,model} ... {classification_pyt,classification_tf1,classification_tf2,deformable_detr,detectnet_v2,dino,dssd,efficientdet_tf1,efficientdet_tf2,faster_rcnn,lprnet,mask_rcnn,ml_recog,multitask_classification,ocdnet,ocrnet,optical_inspection,retinanet,segformer,ssd,unet,yolo_v3,yolo_v4,yolo_v4_tiny} ... optional arguments: -h, --help show this help message and exit task_groups: {list,stop,info,dataset,deploy,model} task: {classification_pyt,classification_tf1,classification_tf2,deformable_detr,detectnet_v2,dino,dssd,efficientdet_tf1,efficientdet_tf2,faster_rcnn,lprnet,mask_rcnn,ml_recog,multitask_classification,ocdnet,ocrnet,optical_inspection,retinanet,segformer,ssd,unet,yolo_v3,yolo_v4,yolo_v4_tiny}
Configure the launcher instance.
This step is identical to the regular
tao
launcher steps. For more details, refer to TAO Launcher documentation.Run a task.
You use the following command format to run tasks supported by TAO:
tao deploy <task> <sub-task> <cli_args>
To view the sub-tasks supported by a certain task, you can use the
help
command. For example, the following command lists the tasks fordetectnet_v2
:$ tao deploy detectnet_v2 --help usage: detectnet_v2 [-h] [--gpu_index GPU_INDEX] [--log_file LOG_FILE] {evaluate,gen_trt_engine,inference} ... Transfer Learning Toolkit optional arguments: -h, --help show this help message and exit --gpu_index GPU_INDEX The index of the GPU to be used. --log_file LOG_FILE Path to the output log file. tasks: {evaluate,gen_trt_engine,inference}
In addition to the NGC container, tao deploy
is released as a public wheel on PyPI.
Each TensorRT engines generated by tao deploy
is specific to
the GPU that it is generated on, so you must based download the specific version of the tao deploy
wheel
for the platform that the model is being deployed to, and
generate the engine there after installing
the corresponding TensorRT version for your platform.
Invoking the TAO Deploy Container Directly#
To deploy TAO models to TensorRT from the tao-deploy container, first identify the latest docker tag
associated with the tao launcher by running tao info --verbose
.
The following is sample output from TAO 5.0.0:
Configuration of the TAO Instance
task_group:
deploy:
dockers:
nvidia/tao/tao-toolkit-deploy:
5.0.0-deploy:
docker_registry: nvcr.io
tasks:
1. centerpose
2. classification_pyt
3. classification_tf1
4. classification_tf2
5. deformable_detr
6. detectnet_v2
7. dino
8. dssd
9. efficientdet_tf1
10. efficientdet_tf2
11. faster_rcnn
12. lprnet
13. mask_rcnn
14. ml_recog
15. multitask_classification
16. ocdnet
17. ocrnet
18. optical_inspection
19. retinanet
20. segformer
21. ssd
22. unet
23. visual_changenet
24. yolo_v3
25. yolo_v4
26. yolo_v4_tiny
format_version: 3.0
toolkit_version: 5.0.0
The container name associated with the task can be retrieved as $DOCKER_REGISTRY/$DOCKER_NAME:$DOCKER_TAG
.
For example, from the log above, the Docker name to run detectnet_v2
can be derived as follows:
export DOCKER_REGISTRY="nvcr.io"
export DOCKER_NAME="nvidia/tao/tao-toolkit"
export DOCKER_TAG="5.0.0-deploy"
export DOCKER_CONTAINER=$DOCKER_REGISTRY/$DOCKER_NAME:$DOCKER_TAG
Once you have the Docker name, invoke the container by running the commands defined by the network without the
tao deploy
prefix. For example, the following command runs detectnet_v2 TensorRT engine generation
for FP16.
docker run -it --rm --gpus all \
-v /path/in/host:/path/in/docker \
$DOCKER_CONTAINER \
detectnet_v2 gen_trt_engine -e /path/to/experiment/spec.txt \
-m /path/to/etlt/file \
-k $KEY \
--data_type fp16
--engine_file /path/to/engine/file
Installing TAO Deploy through wheel#
TAO Deploy is also distributed as a public wheel file at PyPI. The wheel does not include TensorRT or TensorRT OSS as part of its dependencies. Hence, you must either install these dependencies through the official TensorRT website or invoke TensorRT container available on NGC.
Run the following command to install the nvidia-tao-deploy
wheel in your python environment.
pip install nvidia-tao-deploy
Then you can run TAO Deploy tasks with the tao deploy
prefix.
For example, the following command runs a detectnet_v2 TensorRT engine generation for FP16.
detectnet_v2 gen_trt_engine -e /path/to/experiment/spec.txt \
-m /path/to/etlt/file \
-k $KEY \
--data_type fp16 \
--engine_file /path/to/engine/file
Installing TAO Deploy on Google Colab#
You can download the nvidia-tao-deploy
wheel to Google Colab using the same commands as the x86 platform installation.
Follow these steps to run TAO Deploy on Google Colab:
Get the TensorRT TAR archive:
Visit the TensorRT web page <https://developer.nvidia.com/tensorrt>
Click Download now on the TensorRT web page. This directs you to the login web page <https://developer.nvidia.com/nvidia-tensorrt-download>. On this landing page, you have to select either Login or Join Now for NVIDIA Developer Program Membership.
After logging in, choose TensorRT 8 from the available versions.
Agree to the Terms and Conditions.
On the next landing page, click TensorRT 8.5 GA to expand the available options.
Click TensorRT 8.5 GA for Linux x86_64 and CUDA 11.0, 11.1, 11.2, 11.3, 11.4, 11.5, 11.6, 11.7 and 11.8 TAR Package to download the TAR file.
Upload the the TAR file to your Google Drive.
After you upload the TAR file, you can run and view this example Notebook <https://colab.research.google.com/github/NVIDIA-AI-IOT/nvidia-tao/blob/main/ptm/tao_deploy.ipynb>, which generates a TRT engine for TAO PTMs and runs inference using TAO Deploy.
Installing TAO Deploy on a Jetson Platform#
You can download the nvidia-tao-deploy
wheel to an NVIDIA“sup`®` Jetson™ platform using the same commands as the x86 platform installation.
We recommend using the NVIDIA L4T TensorRT Docker container that already includes the TensorRT installation for aarch64.
Once you’ve successfully installed TensorRT, run the following command to install the nvidia-tao-deploy
wheel in your Python environment.
pip install nvidia-tao-deploy
TAO Deploy Modelwise Instructions#
- CenterPose with TAO Deploy
- Classification (PyTorch) with TAO Deploy
- Classification (TF1) with TAO Deploy
- Classification (TF2) with TAO Deploy
- Deformable DETR with TAO Deploy
- DINO with TAO Deploy
- Grounding DINO with TAO Deploy
- DetectNet_v2 with TAO Deploy
- DSSD with TAO Deploy
- EfficientDet (TF1) with TAO Deploy
- EfficientDet (TF2) with TAO Deploy
- Faster RCNN with TAO Deploy
- LPRNet with TAO Deploy
- MAE with TAO Deploy
- Mask RCNN with TAO Deploy
- Mask2former with TAO Deploy
- MLRecogNet with TAO Deploy
- Multitask Image Classification with TAO Deploy
- OCDNet with TAO Deploy
- RetinaNet with TAO Deploy
- RT-DETR with TAO Deploy
- SSD with TAO Deploy
- Segformer with TAO Deploy
- UNet with TAO Deploy
- YOLOv3 with TAO Deploy
- YOLOv4 with TAO Deploy
- YOLOv4-tiny with TAO Deploy
- OCRNet with TAO Deploy
- SiameseOI with TAO Deploy
- VisualChangeNet-Classification with TAO Deploy
- VisualChangeNet-Segmentation with TAO Deploy
- Mask Grounding DINO with TAO Deploy
- PointPillars with TAO Deploy