TAO v5.5.0
NVIDIA TAO v5.5.0

Overview

NVIDIA TAO is a low-code AI toolkit built on TensorFlow and PyTorch, which simplifies and accelerates the model training process by abstracting away the complexity of AI models and the deep learning framework. With TAO, users can select one of 100+ pre-trained vision AI models from NGC and fine-tune and customize on their own dataset without writing a single line of code. The output of TAO is a trained model in ONNX format that can be deployed on any platform that supports ONNX.

tao_overview_image.png

TAO Overview Image

TAO supports most of the popular CV tasks such as:

  • Image Classification

  • Multi-Model Sensor Fusion for computer vision

  • Object Detection

  • Instance Segmentation

  • Semantic Segmentation

  • Optical character detection & recognition (OCD/OCR)

  • Body Pose Estimation

  • Key point estimation

  • Action Recognition

  • Siamese network

  • Change Detection

  • CenterPose

  • Segmentation-In Context

For image classification, object detection and segmentation, users can choose one of the many feature extractors and use it with one of many heads for classification, detection and segmentation tasks, opening a possibility of 100+ model combinations. TAO supports some of the leading Vision Transformers (ViT) like FAN, GC-ViT, SWIN, DINO, D-DETR and SegFormer.

Backbone

Image classification

NvDINOv2 X
GcViT X
ViT X
FAN X
FasterViT X
ResNet X
Swin X
EfficientNet X
ST-GCN (graph convolutional network)
MIT-b

Backbone

DINO

D-DETR

Grounding DINO

EfficientDet

NvDINOv2 X
GcViT X X
ViT X X
FAN X
FasterViT
ResNet X X
Swin X
EfficientNet X
ST-GCN (graph convolutional network)
MIT-b

Backbone

Mask2Former

NvDINOv2
GcViT
ViT
FAN
FasterViT
ResNet
Swin X
EfficientNet
ST-GCN (graph convolutional network)
MIT-b

Backbone

OCD

OCR

NvDINOv2
GcViT
ViT
FAN X X
FasterViT
ResNet X X
Swin
EfficientNet
ST-GCN (graph convolutional network)
MIT-b

Backbone

MAL

Mask GroundingDINO

Mask2Former

NvDINOv2
GcViT
ViT X
FAN
FasterViT
ResNet
Swin X X
EfficientNet
ST-GCN (graph convolutional network)
MIT-b

Backbone

SegFormer

Mask2Former

NvDINOv2
GcViT
ViT
FAN X
FasterViT
ResNet
Swin X
EfficientNet
ST-GCN (graph convolutional network)
MIT-b X

Backbone

Re-identification

Metric Learning Recognition

NvDINOv2 X
GcViT
ViT X
FAN
FasterViT
ResNet X X
Swin X
EfficientNet
ST-GCN (graph convolutional network)
MIT-b

Backbone

Classification

Segmentation

NvDINOv2 X X
GcViT
ViT X X
FAN X X
FasterViT
ResNet
Swin
EfficientNet
ST-GCN (graph convolutional network)
MIT-b

Backbone

Pose Classification

NvDINOv2
GcViT
ViT
FAN
FasterViT
ResNet
Swin
EfficientNet
ST-GCN (graph convolutional network) X
MIT-b

TAO provides means to enhance a user’s dataset. These class of features and tasks are included under the Data Services modality.

TAO 5.5.0 introduces finetuning and inference support for Open Vocabulary Grounded Object Detection and Instance Segmentation through the GroundingDINO and Mask GroundingDINO. GitHub repository.

NVIDIA also includes two new inference applications as part of the TAO.

Note

As of version 5.5.0, the TAO containers run only on x86 platforms with discrete GPUs. For more information about the supported GPUs, refer to the Quick Start Guide.

TAO has an extensive selection of pre-trained models either trained on public datasets like ImageNet, COCO, OpenImages or on proprietary datasets for task specific use cases like People detection, vehicle detection and action recognition and more. The task specific models can be used directly for inference but can also be fine-tuned on custom datasets for better accuracy.

Go to Model Zoo section to learn more about all the pre-trained models.

TAO packages several key features to help developers accelerate their AI training and optimization. Here are few of the key features:

  • Computer vision worflows

    • Model Pruning - Reduce the number of parameters in a model to reduce model size and improve accuracy

    • ONNX export - Supports model output in industry standard ONNX format which can then be used directly with any platforms

    • Quantization Aware Training - Emulates lower precision quantization during training to reduce accuracy loss from training to lower precision inference

    • Multi-GPU - Accelerate training by parallelizing training jobs across multiple GPUs on a single node

    • Multi-Node - Accelerate training by parallelizing training jobs across multiple nodes

    • Training Visualization - Visualize training graphs and metrics in Tensorboard or in 3rd party services

  • Data Services

    • Data Augmentation - Offline and online augmentation to add data diversity to your dataset which can then generalize the model

    • AI-assisted annotation - Class agnostic auto-labeler to generate segmentation masks provided the bounding box.

    • Data Analytics - Analyzes object-detection annotation files and image files, calculates insights, and generate graphs and a summary.

TAO also provides several features for service providers and NVIDIA partners looking to integrate TAO with their workflow to provide added services.

  • AutoML - Automatic hyperparameter sweeps and optimization to generate best accuracy on a given dataset.

  • REST APIs - Use cloud API endpoints to call into your managed TAO services in the cloud.

  • Kubernetes deployment - Deploy TAO services in K8s cluster either on-prem or with one of cloud managed Kubernetes services.

  • Source code availability - Access source code for TAO to add your own customization

The detail getting started is provided in TAO getting started guide.

The getting started package contains install scripts, Jupyter notebooks and model configuration files for training and optimization. There are Jupyter notebooks for all the models that can be used as templates to run your training. All notebooks comes with a call to download sample dataset to run training jobs. These can be replaced with your own datasets.

TAO is a Python package hosted on the NVIDIA Python Package Index. It interacts with lower-level TAO dockers available from the NVIDIA GPU Accelerated Container Registry (NGC); TAO containers come pre-installed with all dependencies required for training. The CLI is run from Jupyter notebooks packaged inside each docker container and consists of a few simple commands, such as train, evaluate, infer, prune, export, and augment (i.e. data augmentation). The output of the TAO workflow is a trained model that can be deployed for inference on NVIDIA devices using DeepStream and TensorRT

The TAO application layer is built on top of CUDA-X, which contains all the lower-level NVIDIA libraries, including NVIDIA Container Runtime for GPU acceleration, CUDA and cuDNN for deep learning (DL) operations, and TensorRT (the NVIDIA inference optimization and runtime engine) for optimizing models. Models that are generated with TAO are completely compatible with and accelerated for TensorRT, which ensures maximum inference performance without any extra effort.

tao_stack.png

Model pruning is one of the key differentiators for TAO. Pruning involves removing from the neural network nodes that contribute less to the overall accuracy of the model, reducing the overall size of the model, significantly reducing the memory footprint, and increasing inference throughput–all factors that are very important for edge deployment.

Currently, pruning is supported for a subset of Computer Vision models. The following graph provides an example of performance gains achieved when going from an unpruned CV model to a pruned CV model (inference was run on an NVIDIA T4; TrafficCamNet, DashCamNet, and PeopleNet are three of the custom pre-trained models that are available on NGC).

pruned_vs_unpruned.png

Pruned vs Unpruned Performance

Tutorial Videos

Getting started with NVIDIA TAO

Create Custom Multi-Modal Fusion Models

Use visual prompt for In-context segmentation with NVIDIA TAO

Estimate and track object poses with the NVIDIA TAO FoundationPose model

Open vocabulary object detection with NVIDIA Grounding-DINO

Use text prompts for auto-labeling with NVIDIA TAO

Visualize model training with TensorBoard

Developer blogs

To learn more about using TAO, read the technical blogs, which provide a step-by-step guide to training with TAO:

Webinars

If you have any questions when using TAO to train a model and deploy to Riva or DeepStream, post them here:

Previous TAO
Next TAO Launcher
© Copyright 2024, NVIDIA. Last updated on Aug 30, 2024.