NVIDIA TAO Toolkit is a low-code AI toolkit built on TensorFlow and PyTorch, which simplifies and accelerates the model training process by abstracting away the complexity of AI models and the deep learning framework. With TAO, users can select one of 100+ pre-trained vision AI models from NGC and fine-tune and customize on their own dataset without writing a single line of code. The output of TAO is a trained model in ONNX format that can be deployed on any platform that supports ONNX.

TAO supports most of the popular CV tasks such as:

Image Classification

Object Detection

Instance Segmentation

Semantic Segmentation

Optical character detection & recognition (OCD/OCR)

Body Pose Estimation

Key point estimation

Action Recognition

Siamese network

Change Detection

CenterPose

For image classification, object detection and segmentation, users can choose one of the many feature extractors and use it with one of many heads for classification, detection and segmentation tasks, opening a possibility of 100+ model combinations. TAO supports some of the leading Vision Transformers (ViT) like FAN, GC-ViT, DINO, D-DETR and SegFormer.

TAO Toolkit provides means to enhance a user’s dataset. These class of features and tasks are included under the Data Services modality.

TAO Toolkit 5.2.0 introduces finetuning and inference support for Open Vocabulary Image Segmentation through a new and optimized version of the ODISE model released earlier by NVLABs as a Developer Preview feature. For more information about the model, refer to the TAO Toolkit PyTorch backend GitHub repository. NVIDIA also includes a gradio app to try out zero-shot inference of Open Vocabulary segmentation. Instructions to launch the gradio demo app are captured in this section of the GitHub repository.