Overview#

NVIDIA TAO is a low-code AI toolkit built on TensorFlow and PyTorch, which simplifies and accelerates the model training process by abstracting away the complexity of AI models and the deep learning framework, and provides applications to visualize and validate the trained models.

TAO focuses on training, finetuning, and optimizing computer vision foundation models. Users can select one of 100+ pretrained vision AI models from NGC, and can fine-tune and customize on their own dataset without writing a single line of code. The output of TAO is a trained model in ONNX format that can be deployed on any platform that supports ONNX. After training is complete, inference applications are provided to utilize and validate the trained models.

For training, finetuning Large-Language Models (LLMs) refer to NVIDIA NeMo.

../_images/tao_overview_image.png

TAO Overview Image#

TAO supports most of the popular CV tasks such as:

  • Self-supervised pretraining and domain adaptation for foundation models

  • Image classification

  • Object detection

  • Instance segmentation

  • Semantic segmentation

  • Optical character detection and recognition (OCD/OCR)

  • Body pose estimation

  • Key point estimation

  • Action recognition

  • Siamese network

  • Change detection

  • CenterPose

  • Segmentation in context

  • Synthetic data generation (using StyleGAN-XL)

  • Multimodal sensor fusion for computer vision

For image classification, object detection and segmentation, you can choose one of the many feature extractors and use it with one of many heads for classification, detection and segmentation tasks, giving you access to as many as 100+ model combinations. TAO supports some of the leading vision transformers (ViTs) like RADIOv2, DINOv2, FAN, GC-ViT, SWIN, DINO, D-DETR and SegFormer.

Backbone

Image classification

NVCLIP

C-RADIOv2

NvDINOv2

GcViT

ViT

FAN

FasterViT

ResNet

Swin

EfficientNet

TAO 6.0 introduces support for:

  • Pretraining and fine-tuning workflows for foundation models like the ConvNext series of backbones

  • Fine-tuning the NVIDIA-trained commercially viable foundation model C-RADIOv2 with several downstream tasks

  • Fine-tuning a StyleGAN-XL model with limited seed data for synthetic data generation

  • Distilling foundation models to smaller and faster backbone on structured and unstructured data

  • Distilling heavier object detection models into faster, smaller models for edge deployment

  • Deploying TAO Training workflows as scalable microservices via TAO Finetuning Microservices (FTMS)

NVIDIA also includes several inference applications as part of TAO.

Note

NVIDIA Triton Inference Server is open-source software designed for deploying and scaling AI models in production environments. It supports various machine learning frameworks, hardware platforms (GPUs and CPUs), and deployment environments (cloud, data center, and edge).

NVIDIA Inference Microservices (NIMs) are a collection of pre-built, optimized, ready-to-use microservices that enable developers to deploy and scale AI models in production environments. NIMs are designed to simplify the deployment process by providing a set of preconfigured components that can be used to build and deploy AI models.

For more information about Triton Inference Server and NIMs, visit the Triton Inference Server documentation and NIMs pages.

Note

As of version 6.0.0, the TAO containers can run on x86 and ARM64 platforms with discrete GPUs. For more information about the supported GPUs, refer to the Quick Start Guide.

Pretrained Models#

TAO has an extensive selection of foundation models and lighter pretrained models that are trained either on public datasets like ImageNet, COCO, and OpenImages, or on proprietary datasets for task-specific use cases like people detection, vehicle detection, and action recognition. The task-specific models can be used directly for inference, but can also be fine-tuned on custom datasets for better accuracy.

Go to the section Model Zoo to learn more about all of the pretrained models.

Key Features#

TAO packages have several key features to help developers accelerate their AI training and optimization. Here are a few of them:

  • Computer vision workflows

    • Model Pruning: Reduces the number of parameters in a model to reduce model size and improve accuracy.

    • Self-Supervised Learning: Pretrains and performs domain adaptation on foundation models (Transformers and CNN based models).

    • Distillation: Distills learnings from a heavier model to a smaller and faster model for edge deployment.

    • ONNX export: Supports model output in industry-standard ONNX format, which can then be used directly on any platform.

    • Quantization-Aware Training: Emulates lower precision quantization during training to reduce accuracy loss from training to lower precision inference.

    • Multi-GPU: Accelerates training by parallelizing training jobs across multiple GPUs on a single node.

    • Multi-Node: Accelerates training by parallelizing training jobs across multiple nodes.

    • Training Visualization: Visualizes training graphs and metrics in TensorBoard or in third-party services.

  • Data Services

    • Data Augmentation: Performs offline and online augmentation to add data diversity to your dataset, which can then generalize the model.

    • AI-assisted annotation: Class-agnostic auto-labeler that generates segmentation masks provided the bounding box.

    • Data Analytics: Analyzes object-detection annotation files and image files, calculates insights, and generate graphs and a summary.

TAO also provides several features for service providers and NVIDIA partners looking to integrate TAO with their workflows to provide additional services.

  • AutoML: Automatic hyperparameter sweeps and optimization to generate best accuracy on a given dataset.

  • REST APIs: Use cloud API endpoints to call into your managed TAO services in the cloud.

  • Kubernetes deployment: Deploy TAO services in a K8s cluster either on-premise or with one of the cloud-managed Kubernetes services.

  • Source code availability: Access source code for TAO to add your own customization.

How to Get Started#

You can find detailed information about getting started in TAO getting started guide.

The “getting started” package contains install scripts, Jupyter notebooks, and model configuration files for training and optimization. There are Jupyter notebooks for all the models that can be used as templates to run your training. Each notebook has a call to download a sample dataset to run a training job. You can replace this with your own dataset.

TAO Architecture#

TAO is a multi-container, cloud-native AI fine-tuning microservice. The component Docker containers are available from the NVIDIA GPU Accelerated Container Registry (NGC), which come preinstalled with all dependencies required for training. You can run the CLI from Jupyter notebooks packaged as part of the Getting Started Guide, or from the command line.

The output of the TAO workflow is a trained model that can be deployed for inference on NVIDIA devices using DeepStream and NVIDIA TensorRT.

Depending on the execution environment, the TAO CLI can be run on a local machine or as a cloud-based service running on a Kubernetes cluster.

The TAO application layer is built on top of NVIDIA CUDA-X, which contains all the lower-level NVIDIA libraries, including NVIDIA Container Runtime for GPU acceleration, NVIDIA® CUDA and NVIDIA cuDNN for deep learning (DL) operations, and TensorRT (the NVIDIA inference optimization and runtime engine) for optimizing models. Models that are generated with TAO are completely compatible with and accelerated for TensorRT, which ensures maximum inference performance without any extra effort.

../_images/tao_stack.png

TAO Deployment Modes#

Deployment Type

Use Case

User Level

Deployment Method

API/Access Type

Infrastructure

Data and Model Store

Multi-GPU

Multi-Node

AutoML

Job Orchestration

Local Single Machine

Quick and simple experiments

Beginner

Docker Run

Launcher CLI

Single Server/Workstation

Local (mounted at runtime)

Local Cluster

Larger models on multi-node cluster

Advanced

Docker Run

Docker CLI

Multi-node Cluster

Local (mounted at runtime)

Cloud Service Provider

At-scale deployment in cloud with multi-tenancy

Advanced

Kubernetes

FTMS API/Client CLI

Cloud

Cloud Storage

TAO Workflows#

The following diagram provides a visual guide for building, adapting, and deploying deep learning models using TAO. It outlines the decision points and actions, and provides a recipe for an end-to-end workflow that includes:

  • Selecting models

  • Preparing data

  • Adapting to new domains

  • Optimizing models

  • Deploying models for inference

TAO Recommended Workflow Diagram

Data Preparation#

  • Assess Data Availability

    • The process begins by determining if you have sufficient data for training your model.

    • If data is insufficient, TAO enables you to generate additional synthetic data using advanced tools like StyleGAN-XL, ensuring you have enough examples to achieve robust model performance.

  • Data Labeling Needs

    • If your data lacks labels, TAO offers built-in auto-labeling actions to generate annotations, preparing your dataset for fully supervised finetuning with tasks like object detection and instance segmentation.

    • If your data does not need labeling, you can proceed directly to the model training phase.

Training and Model Selection#

  • Model Selection

    • If you already know which model architecture suits your task, you can immediately begin training with TAO, leveraging its optimized pipelines for popular AI models.

    • If you are unsure what model to use, the workflow guides you to evaluate your available training compute resources.

  • Training Compute Assessment

    • With sufficient compute resources, you can pursue the highest degree of accuracy by training large, state-of-the-art foundation models (such as NV-DINOv2 or C-RADIOv2) within TAO.

    • If compute resources are limited, we recommend using knowledge distillation from foundation models, or as a fallback, previous-generation TAO models (although this is no longer the preferred option).

  • Domain Adaptation

    If your application requires adapting a model to a new domain, TAO supports domain adaptation using self-supervised learning (SSL) techniques, which can significantly improve performance on specialized datasets.

Inference Optimization and Deployment#

  • Inference Efficiency

    • Before you deploy you have the option to apply inference efficiency improvements, making your model run faster and use fewer resources in production environments.

    • If your model does not need further optimization, you can proceed to deployment.

  • Deployment

    • The final step is deploying your fine-tuned TAO model using the runtime engine, making it ready for efficient inference in your target application.

This workflow ensures a logical, efficient progression from raw data to a high-performance deployed AI model, utilizing NVIDIA TAO’s comprehensive suite of tools for data generation, labeling, training, adaptation, and inference optimization.

Learning Resources#

NVIDIA provides many tutorial videos, developer blogs, and other resources that can help you get started with TAO Toolkit.

Tutorial Videos#

TAO Toolkit provides the following tutorial videos to cover popular use cases:

Developer Blogs#

NVIDIA publishes several blogs that can help you learn to use TAO Toolkit:

White Papers#

You can learn more about different use cases for TAO Toolkit in the white paper Endless Ways to Adapt and Supercharge Your AI Workflows with Transfer Learning.

Webinars#

Support Information#

If you have any questions when using TAO to train a model and deploy to NVIDIA Riva or DeepStream, post them here: