Overview#

NVIDIA TAO is a low-code AI toolkit built on TensorFlow and PyTorch, which simplifies and accelerates the model training process by abstracting away the complexity of AI models and the deep learning framework. With TAO, users can select one of 100+ pre-trained vision AI models from NGC and fine-tune and customize on their own dataset without writing a single line of code. The output of TAO is a trained model in ONNX format that can be deployed on any platform that supports ONNX.

TAO supports most of the popular CV tasks such as:

Image Classification
Multi-Model Sensor Fusion for computer vision
Object Detection
Instance Segmentation
Semantic Segmentation
Optical character detection & recognition (OCD/OCR)
Body Pose Estimation
Key point estimation
Action Recognition
Siamese network
Change Detection
CenterPose
Segmentation-In Context

For image classification, object detection and segmentation, users can choose one of the many feature extractors and use it with one of many heads for classification, detection and segmentation tasks, opening a possibility of 100+ model combinations. TAO supports some of the leading Vision Transformers (ViT) like FAN, GC-ViT, SWIN, DINO, D-DETR and SegFormer.

Backbone	Image classification
NvDINOv2	X
GcViT	X
ViT	X
FAN	X
FasterViT	X
ResNet	X
Swin	X
EfficientNet	X
ST-GCN (graph convolutional network)
MIT-b

TAO provides means to enhance a user’s dataset. These class of features and tasks are included under the Data Services modality.

TAO 5.5.0 introduces finetuning and inference support for Open Vocabulary Grounded Object Detection and Instance Segmentation through the GroundingDINO and Mask GroundingDINO. GitHub repository.

NVIDIA also includes two new inference applications as part of the TAO.

A gradio app to try out zero-shot in context segmentation using the SEGIC model in the TAO PyTorch GitHub repository.
A Triton inference application for the FoundationPose model in TAO Triton Apps.
A catalog of NVIDIA Inference Microservices (NIMs) to try out different TAO models.
A GitHub repository containing called metropolis_nim_workflows reference workflows using the published NIMs

Note

As of version 5.5.0, the TAO containers run only on x86 platforms with discrete GPUs. For more information about the supported GPUs, refer to the Quick Start Guide.

Pretrained models#

TAO has an extensive selection of pre-trained models either trained on public datasets like ImageNet, COCO, OpenImages or on proprietary datasets for task specific use cases like People detection, vehicle detection and action recognition and more. The task specific models can be used directly for inference but can also be fine-tuned on custom datasets for better accuracy.

Go to Model Zoo section to learn more about all the pre-trained models.

Key Features#

TAO packages several key features to help developers accelerate their AI training and optimization. Here are few of the key features:

Computer vision worflows
- Model Pruning - Reduce the number of parameters in a model to reduce model size and improve accuracy
- ONNX export - Supports model output in industry standard ONNX format which can then be used directly with any platforms
- Quantization Aware Training - Emulates lower precision quantization during training to reduce accuracy loss from training to lower precision inference
- Multi-GPU - Accelerate training by parallelizing training jobs across multiple GPUs on a single node
- Multi-Node - Accelerate training by parallelizing training jobs across multiple nodes
- Training Visualization - Visualize training graphs and metrics in Tensorboard or in 3rd party services
Data Services
- Data Augmentation - Offline and online augmentation to add data diversity to your dataset which can then generalize the model
- AI-assisted annotation - Class agnostic auto-labeler to generate segmentation masks provided the bounding box.
- Data Analytics - Analyzes object-detection annotation files and image files, calculates insights, and generate graphs and a summary.

TAO also provides several features for service providers and NVIDIA partners looking to integrate TAO with their workflow to provide added services.

AutoML - Automatic hyperparameter sweeps and optimization to generate best accuracy on a given dataset.
REST APIs - Use cloud API endpoints to call into your managed TAO services in the cloud.
Kubernetes deployment - Deploy TAO services in K8s cluster either on-prem or with one of cloud managed Kubernetes services.
Source code availability - Access source code for TAO to add your own customization

How to Get Started#

The detail getting started is provided in TAO getting started guide.

The getting started package contains install scripts, Jupyter notebooks and model configuration files for training and optimization. There are Jupyter notebooks for all the models that can be used as templates to run your training. All notebooks comes with a call to download sample dataset to run training jobs. These can be replaced with your own datasets.

TAO Architecture#

TAO is a Python package hosted on the NVIDIA Python Package Index. It interacts with lower-level TAO dockers available from the NVIDIA GPU Accelerated Container Registry (NGC); TAO containers come pre-installed with all dependencies required for training. The CLI is run from Jupyter notebooks packaged inside each docker container and consists of a few simple commands, such as train, evaluate, infer, prune, export, and augment (i.e. data augmentation). The output of the TAO workflow is a trained model that can be deployed for inference on NVIDIA devices using DeepStream and TensorRT

The TAO application layer is built on top of CUDA-X, which contains all the lower-level NVIDIA libraries, including NVIDIA Container Runtime for GPU acceleration, CUDA and cuDNN for deep learning (DL) operations, and TensorRT (the NVIDIA inference optimization and runtime engine) for optimizing models. Models that are generated with TAO are completely compatible with and accelerated for TensorRT, which ensures maximum inference performance without any extra effort.

Model Pruning#

Model pruning is one of the key differentiators for TAO. Pruning involves removing from the neural network nodes that contribute less to the overall accuracy of the model, reducing the overall size of the model, significantly reducing the memory footprint, and increasing inference throughput–all factors that are very important for edge deployment.

Currently, pruning is supported for a subset of Computer Vision models. The following graph provides an example of performance gains achieved when going from an unpruned CV model to a pruned CV model (inference was run on an NVIDIA T4; TrafficCamNet, DashCamNet, and PeopleNet are three of the custom pre-trained models that are available on NGC).

../_images/pruned_vs_unpruned.png — Pruned vs Unpruned Performance#

Learning Resources#

Tutorial Videos#

TAO Toolkit provides the following tutorial videos to cover popular use cases:

Developer blogs#

To learn more about using TAO, read the technical blogs, which provide a step-by-step guide to training with TAO:

Learn about the New Foundational Models and Training Capabilities with NVIDIA TAO 5.5
Learn about the latest features in TAO 5.0
Learn how to train like a pro using TAO AutoML
Learn how to train with PeopleNet and other pre-trained models using TAO Toolkit.
Learn how to create custom AI models with TAO on AzureML
Learn how to improve INT8 accuracy using quantization aware training (QAT) with TAO Toolkit.
Learn how to create a real time license plate detection and recognition app
Learn how to prepare state of the art models for classification and object detection with TAO Toolkit
Learn how to train and optimize a 2D body-pose estimation model with TAO: 2D Pose Estimation Part 1 | 2D Pose Estimation Part 2.
Read about the different use cases with this Whitepaper

Webinars#

Support Information#

If you have any questions when using TAO to train a model and deploy to Riva or DeepStream, post them here: