TAO Deploy Installation
When the tao deploy
command is invoked through the TAO launcher, the tao deploy
container is pulled from NGC and instantiated.
The TAO Deploy container only contains few lightweight python packages such as OpenCV, Numpy, Pillow, and ONNX and is based on the NGC TensorRT container.
Along with the NGC container, tao deploy
is also released as a public wheel on PyPI.
The TensorRT engines generated by tao deploy
are specific to
the GPU that it is generated on. So, based on the platform that the model is being deployed to, you will need to
download the specific version of the tao deploy
wheel and generate the engine there after installing
the corresponding TensorRT version for your platform.
To deploy TAO models to TensorRT from the tao-deploy container, you should first identify the latest docker tag
associated with the tao launcher by running tao info --verbose
.
The following is sample output from TAO 5.0.0:
Configuration of the TAO Toolkit Instance
task_group:
deploy:
dockers:
nvidia/tao/tao-toolkit-deploy:
5.0.0-deploy:
docker_registry: nvcr.io
tasks:
1. centerpose
2. classification_pyt
3. classification_tf1
4. classification_tf2
5. deformable_detr
6. detectnet_v2
7. dino
8. dssd
9. efficientdet_tf1
10. efficientdet_tf2
11. faster_rcnn
12. lprnet
13. mask_rcnn
14. ml_recog
15. multitask_classification
16. ocdnet
17. ocrnet
18. optical_inspection
19. retinanet
20. segformer
21. ssd
22. unet
23. visual_changenet
24. yolo_v3
25. yolo_v4
26. yolo_v4_tiny
format_version: 3.0
toolkit_version: 5.0.0
The container name associated with the task can be retrieved as $DOCKER_REGISTRY/$DOCKER_NAME:$DOCKER_TAG
.
For example, from the log above, the Docker name to run detectnet_v2
can be derived as follows:
export DOCKER_REGISTRY="nvcr.io"
export DOCKER_NAME="nvidia/tao/tao-toolkit"
export DOCKER_TAG="5.0.0-deploy"
export DOCKER_CONTAINER=$DOCKER_REGISTRY/$DOCKER_NAME:$DOCKER_TAG
Once you have the Docker name, invoke the container by running the commands defined by the network without the
tao deploy
prefix. For example, the following command will run detectnet_v2 TensorRT engine generation
for FP16.
docker run -it --rm --gpus all \
-v /path/in/host:/path/in/docker \
$DOCKER_CONTAINER \
detectnet_v2 gen_trt_engine -e /path/to/experiment/spec.txt \
-m /path/to/etlt/file \
-k $KEY \
--data_type fp16
--engine_file /path/to/engine/file
TAO Deploy is also distributed as a public wheel file at PyPI. The wheel does not include TensorRT or TensorRT OSS as part of its dependencies. Hence, you must either install these dependencies through the official TensorRT website or invoke TensorRT container available on NGC.
Run the following command to install the nvidia-tao-deploy
wheel in your python environment.
pip install nvidia-tao-deploy
Then, you can run TAO Deploy tasks with the tao deploy
prefix.
For example, the following command will run a detectnet_v2 TensorRT engine generation for FP16.
detectnet_v2 gen_trt_engine -e /path/to/experiment/spec.txt \
-m /path/to/etlt/file \
-k $KEY \
--data_type fp16 \
--engine_file /path/to/engine/file
Installing TAO Deploy on Google Colab
You can download the nvidia-tao-deploy wheel to Google Colab using the same commands as the x86 platform installation.
The general limitations of Colab are outlined here.
Follow these steps to run TAO Deploy on Google-Colab:
Get the TensorRT TAR archive:
Visit the TensorRT webpage <https://developer.nvidia.com/tensorrt>
Click Download now on the TensorRT webpage. This directs you to the login webpage <https://developer.nvidia.com/nvidia-tensorrt-download>. On this landing page, you have to select either Login or Join Now for NVIDIA Developer Program Membership.
After logging in, choose TensorRT 8 from the available versions.
Agree to the Terms and Conditions.
On the next landing page, click TensorRT 8.5 GA to expand the available options.
Click TensorRT 8.5 GA for Linux x86_64 and CUDA 11.0, 11.1, 11.2, 11.3, 11.4, 11.5, 11.6, 11.7 and 11.8 TAR Package to download the TAR file.
Upload the the TAR file to your Google Drive.
After you upload the TAR file, you can run/view this example Notebook <https://colab.research.google.com/github/NVIDIA-AI-IOT/nvidia-tao/blob/main/ptm/tao_deploy.ipynb>, which generates a TRT engine for TAO PTMs and runs inference using TAO Deploy.
Installing TAO Deploy on a Jetson Platform
You can download the nvidia-tao-deploy wheel to a jetson platform using the same commands as the x86 platform installation.
We recommend using the NVIDIA L4T TensorRT Docker container that already includes the TensorRT installation for aarch64.
Once you’ve successfully installed TensorRT, run the following command to install the nvidia-tao-deploy
wheel in your Python environment.
pip install nvidia-tao-deploy
Due to memory issues, you should first run the gen_trt_engine
subtask on the x86 platform to generate
the engine; you can then use the generated engine to run inference or evaluation on the Jetson platform and with
the target dataset.