NVIDIA Docs Hub NVIDIA TAO NVIDIA TAO Toolkit v4.0 TAO Deploy Overview

TAO Deploy Overview

NVIDIA TensorRT is an SDK for high-performance deep learning inference. It provides APIs and parsers to import trained models from all major deep learning frameworks; it then generates optimized runtime engines deployable in a data center, as well as in automotive and embedded environments. To understand TensorRT and its capabilities better, refer to the official TensorRT documentation.

Models trained in TAO Toolkit are deployed to NVIDIA inference SDKs, such as DeepStream and Riva, via TensorRT. While conversational AI models trained using TAO Toolkit can be consumed via TensorRT only via Riva, computer vision models trained by TAO Toolkit can be consumed by TensorRT via tao-deploy, which is included as part of the tao launcher. TAO Deploy parses the exported .etlt model file and generates an optimized TensorRT engine. These engines can be generated to support inference at low precision (e.g. FP16 or INT8). While most of the TAO models support direct integration of the .etlt files with DeepStream 6.0, DeepStream can also consume the optimized engine generated by tao-deploy.

TAO Deploy separates the model training and optimization steps from deployment by parsing a .etlt file to generate an optimized TensorRT engine. TAO Deploy also provides tools to run evaluation and inference using the original TAO spec file. With TAO Deploy, you can perform the following tasks:

gen_trt_engine
evaluate
inference

Like other TAO commands, the TAO Deploy CLI follows a cascaded structure:

Copy
Copied!

            
            tao-deploy <task> <sub-task> <args>

Currently, TAO Deploy only supports computer vision models. For example, DetectNet_v2 is a computer vision task for object detection in TAO Toolkit and supports the gen_trt_engine, evaluate, and inference subtasks. When you execute a command like tao-deploy detectnet_v2 gen_trt_engine --help, the TAO Toolkit Launcher does the following:

Pulls the TAO Deploy container with the entrypoint for detectnet_v2.
Creates an instance of the container.
Runs the detectnet_v2 entrypoint with the getn_trt_engine sub-task.

Running TAO Deploy with the Launcher

Once the TAO launcher has been installed, the workflow to run the launcher is as follows.

List the tasks supported in the deploy docker.

You can list the tao-deploy tasks that are supported in the TAO Toolkit Launcher using the tao-deploy --help` command:

Copy
Copied!

            
            usage: tao-deploy [-h]
            {list,stop,info,classification_tf1,classification_tf2,deformable_detr,detectnet_v2,dssd,efficientdet_tf1,efficientdet_tf2,faster_rcnn,lprnet,mask_rcnn,multitask_classification,retinanet,segformer,ssd,unet,yolo_v3,yolo_v4,yolo_v4_tiny}
            ...

Launcher for TAO Toolkit Deploy.

optional arguments:
-h, --help            show this help message and exit

tasks:
{list,stop,info,classification_tf1,classification_tf2,deformable_detr,detectnet_v2,dssd,efficientdet_tf1,efficientdet_tf2,faster_rcnn,lprnet,mask_rcnn,multitask_classification,retinanet,segformer,ssd,unet,yolo_v3,yolo_v4,yolo_v4_tiny}

Configure the launcher instance.

This step is identical to the regular tao launcher steps. For more details, refer to TAO Launcher documentation.

Run a task.

You use the following command format to run tasks supported by TAO Toolkit:

Copy
Copied!

            
            tao-deploy <task> <sub-task> <cli_args>

To view the sub-tasks supported by a certain task, you can use the help command. For example, the following command lists the tasks for detectnet_v2:

Copy
Copied!

            
            $ tao-deploy detectnet_v2 --help

usage: detectnet_v2 [-h] [--gpu_index GPU_INDEX] [--log_file LOG_FILE] {evaluate,gen_trt_engine,inference} ...

Transfer Learning Toolkit

optional arguments:
-h, --help            show this help message and exit
--gpu_index GPU_INDEX
                        The index of the GPU to be used.
--log_file LOG_FILE   Path to the output log file.

tasks:
{evaluate,gen_trt_engine,inference}