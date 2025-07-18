NVIDIA TAO is a low-code AI toolkit built on TensorFlow and PyTorch, which simplifies and accelerates the model training process by abstracting away the complexity of AI models and the deep learning framework. With TAO, users can select one of 100+ pre-trained vision AI models from NGC and fine-tune and customize on their own dataset without writing a single line of code. The output of TAO is a trained model in ONNX format that can be deployed on any platform that supports ONNX.

TAO Overview Image

TAO supports most of the popular CV tasks such as:

Image Classification

Multi-Model Sensor Fusion for computer vision

Object Detection

Instance Segmentation

Semantic Segmentation

Optical character detection & recognition (OCD/OCR)

Body Pose Estimation

Key point estimation

Action Recognition

Siamese network

Change Detection

CenterPose

Segmentation-In Context

For image classification, object detection and segmentation, users can choose one of the many feature extractors and use it with one of many heads for classification, detection and segmentation tasks, opening a possibility of 100+ model combinations. TAO supports some of the leading Vision Transformers (ViT) like FAN, GC-ViT, SWIN, DINO, D-DETR and SegFormer.

Image Classification

Object Detection

Panoptic Segmentation

Character Recognition

Instance Segmentation

Semantic Segmentation

Object Recognition

Visual ChangeNet

Pose Classification Backbone Image classification NvDINOv2 X GcViT X ViT X FAN X FasterViT X ResNet X Swin X EfficientNet X ST-GCN (graph convolutional network) MIT-b Backbone DINO D-DETR Grounding DINO EfficientDet NvDINOv2 X GcViT X X ViT X X FAN X FasterViT ResNet X X Swin X EfficientNet X ST-GCN (graph convolutional network) MIT-b Backbone Mask2Former NvDINOv2 GcViT ViT FAN FasterViT ResNet Swin X EfficientNet ST-GCN (graph convolutional network) MIT-b Backbone OCD OCR NvDINOv2 GcViT ViT FAN X X FasterViT ResNet X X Swin EfficientNet ST-GCN (graph convolutional network) MIT-b Backbone MAL Mask GroundingDINO Mask2Former NvDINOv2 GcViT ViT X FAN FasterViT ResNet Swin X X EfficientNet ST-GCN (graph convolutional network) MIT-b Backbone SegFormer Mask2Former NvDINOv2 GcViT ViT FAN X FasterViT ResNet Swin X EfficientNet ST-GCN (graph convolutional network) MIT-b X Backbone Re-identification Metric Learning Recognition NvDINOv2 X GcViT ViT X FAN FasterViT ResNet X X Swin X EfficientNet ST-GCN (graph convolutional network) MIT-b Backbone Classification Segmentation NvDINOv2 X X GcViT ViT X X FAN X X FasterViT ResNet Swin EfficientNet ST-GCN (graph convolutional network) MIT-b Backbone Pose Classification NvDINOv2 GcViT ViT FAN FasterViT ResNet Swin EfficientNet ST-GCN (graph convolutional network) X MIT-b

TAO provides means to enhance a user’s dataset. These class of features and tasks are included under the Data Services modality.

TAO 5.5.0 introduces finetuning and inference support for Open Vocabulary Grounded Object Detection and Instance Segmentation through the GroundingDINO and Mask GroundingDINO. GitHub repository.

NVIDIA also includes two new inference applications as part of the TAO.