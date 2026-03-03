Release Notes#

NVIDIA TAO is a Python package that gives you the ability to fine-tune pretrained models with your own data and export them for TensorRT based inference through an edge device.

NVIDIA Transfer Learning Toolkit has been renamed to TAO: For a detailed migration guide, refer to this page.

TAO 6.25.11# Key Features# OneFormer : Universal Image Segmentation for semantic, instance, and panoptic segmentation tasks Single unified architecture for multiple segmentation tasks Support for DiNAT and Swin Transformer backbones Training, evaluation, inference, and ONNX export support

Referring Expression Segmentation with Mask Grounding DINOv2 : * RES (Referring Expression Segmentation) task support with ReLA module

Multi-Architecture Support : ARM64 and x86 platform deployments Cross-platform Docker builds for ARM and x86 architectures Platform-specific Docker image pulls Enhanced deployment flexibility across different hardware platforms

Stack Upgrades : Upgraded to DLFW 25.09 base container (PyTorch 2.9.0 with cuda 13.0) Updated dependencies and library versions for improved performance and security

TAO Quant Enhancements : ModelOpt ONNX backend integration for quantization workflows Enhanced quantization support for classification_pyt and RT-DETR models Improved TensorRT engine generation from quantized models

Training Improvements : Segformer now supports saving top-k best models during training Synchronized logging between API and stdout for better debugging

API Enhancements : Added v2 API with improved performance and functionality File upload and download progress tracking for cloud-based workflows Graceful job termination with checkpoint retention Per-job timeout configuration for better resource management

AutoML Improvements : Multi-learning rate support for Cosmos-RL Supervised Fine-Tuning (SFT) Hyperband support for Cosmos-RL models AutoML support for LoRA alpha and r pattern configuration for Cosmos-RL Improved AutoML details merged into JobResult for better tracking Customizable parameter bounds for any parameter to be included in AutoML. Weighted priority for certain hyperparameter options

Cosmos-RL Enhancements : Multi-GPU support for Cosmos evaluate and inference actions Quantization support for Cosmos Reason VLM models Custom dataloader support for Cosmos models Validation during training support

Bugfixes# Fixed Segformer visualization rendering issues

Fixed checkpoint saving with register_buffer for proper state persistence

Resolved Mask2Former image format compatibility (PNG, JPG, JPEG)

Corrected TensorRT engine color space and discrete GPU handling issues

Fixed inference batch size limitations and related bugs

Fixed PointPillars dataclass defaults to include all KITTI dataset classes

Resolved Docker Compose pause, resume, and multi-GPU deployment issues

Fixed auto-label and PTM creation for certain model types

Fixed GPU scheduling for Docker Compose deployments

Fixed job pending issues when GPU request count exceeds available GPUs

Fix AutoML ETA calculation when experiments have different epoch counts

Corrected AutoML brain job pod deletion with helm delete

Fixed Cosmos inference microservice Docker Compose configuration

Resolved microservice Docker handler issues for Tegra systems Compute Stack# PyTorch 2.9.0 Container# Container name: nvcr.io/nvidia/tao/tao-toolkit Tag: 6.25.11-pyt Software Version Python 3.12 PyTorch 2.9.0 CUDA 13.0 CuDNN 9.7.0 TensorRT 10.13.3.9 Data Services Container# Container name: nvcr.io/nvidia/tao/tao-toolkit Tag: 6.25.11-dataservices Software Version Python 3.12 PyTorch 2.9.0 CUDA 13.0 CuDNN 9.7.0 TensorRT 10.13.3.9 Deploy Container# Container name: nvcr.io/nvidia/tao/tao-toolkit Tag: 6.25.11-deploy Software Version Python 3.12 CUDA 13.0 CuDNN 9.7.0 TensorRT 10.13.3.9 Known Issues and Limitations# Hugging Face model downloads for OneFormer may occasionally fail due to rate limiting when multiple jobs from the same IP address are triggered in succession

TAO 6.25.10# Key Features# Depth Estimation Workflows : End-to-end TAO FTMS workflow support for monocular and stereo depth estimation nvDepthAnythingv2 : State-of-the-art monocular depth estimation model achieving 2nd place on the LayerDepth benchmark FoundationStereo : Consolidated repository integration for stereo depth estimation with enhanced configuration management and improved iteration handling Complete training, evaluation, inference, and TensorRT deployment pipeline support Enabled dynamic batch size for exporting monocular and stereo depth estimation models Enabled dynamic image size for exporting stereo depth estimation models

General Software Improvements : Improved error classification and handling across TAO workflows Better error messages and diagnostics More robust error handling in Hydra-based configurations StateDictAdapter now supports model_type for Visual ChangeNet weights compatibility

Vision-Language Model (VLM) Finetuning : FTMS now supports finetuning for Cosmos-Reason VLMs End-to-end training, evaluation, and inference workflows via TAO Toolkit API Multi-node distributed training support for large-scale VLM finetuning Bayesian optimization-based AutoML for hyperparameter optimization

Inference Microservices : Deploy persistent model servers for fast, repeated inference without model reloading overhead Long-running servers keep models loaded in memory for low-latency inference Health monitoring and status endpoints for service readiness Kubernetes StatefulSet and Docker Compose deployment support

TAO models are now compatible with NSight DL Designer for visualization, debugging and profiling Bugfixes# Fixed Segformer ViT adapter freezing issue during training

Fixed Visual ChangeNet FAN head compatibility issues with pretrained models

Fixed PointPillars voxel generator batch_idx error in PyTorch implementation

Fixed EMA (Exponential Moving Average) callback loading issue

Fixed object detection inference color_map NoneType exception for optional configurations

Fixed data augmentation coordinate parsing issues

Fixed dynamic image size handling for ViT architectures

Fixed Segformer activation checkpoint setting to prevent training issues Compute Stack# PyTorch 2.1.0 Container# Container name: nvcr.io/nvidia/tao/tao-toolkit Tag: 6.25.10-pyt Software Version Python 3.12 CUDA 12.8 CuDNN 9.7.0 TensorRT 10.8.0 Cosmos-Reason Container# Container name: nvcr.io/nvidia/tao/tao-toolkit Tag: 6.25.10-cosmos-rl Software Version Python 3.12 CUDA 12.8 Data Services Container# Container name: nvcr.io/nvidia/tao/tao-toolkit Tag: 6.25.10-dataservices Software Version Python 3.12 CUDA 12.8 CuDNN 9.7.0 TensorRT 10.8.0 Deploy Container# Container name: nvcr.io/nvidia/tao/tao-toolkit Tag: 6.25.10-deploy Software Version Python 3.12 CUDA 12.8 CuDNN 9.7.0 TensorRT 10.8.0 Known Issues and Limitations# Multi-GPU training on machines with RTX-5080 GPUs may encounter NCCL errors. If you experience such issues, please use single GPU training or machines with different GPU models

Models quantized using TAO Quant may not be compatible for TensorRT ONNX export and deployment. Please refer to the TAO Quant documentation for more details

Hugging Face model downloads for Cosmos-RL VLM may occasionally fail due to rate limiting when multiple jobs from the same IP address are triggered in succession

TAO 6.25.09# Key Features# TAO Quant: Extensible APIs to quantizing TAO Models FP8/INT8 quantization support for classification_pyt and RT-DETR Runtime deployment support for quantized models in PyTorch TensorRT ONNX export and deployment (experimental)

Knowledge Distillation now supports Phi-Standard normalization for Distillation

C-RADIOv3 integrated into TAO for downstream finetuning for tasks that supported C-RADIOv2 classification_pyt rtdetr segformer visual_changenet

Backbone distillation extended to support backbones from the following downstream tasks: classification_pyt rtdetr dino mask2former segformer visual_changenet mask_grounding_dino grounding_dino mal

EfficientViT supported as Teacher backbones in RT-DETR

TAO APIs now support 2 new modes of deployment: helm chart deployment and support for airgapped deployments docker-compose deployment and support for airgapped deployments

Bugfixes# Fixed a bug in rtdetr where the deepstream config for the labels were generated with incorrect delimiters between classes

Fixed a bug in auto_label where multi-GPU auto_labelling using GroundingDINO failed intermittently due to race conditions

Fixed a bug in augmentation where COCO coordinate representation was incorrectly parsed Deprecations# All networks from TensorFlow2.x are deprecated and removed from the TAO 6.25.09 package and will be removed in a future release. Affected networks include: EfficientDet (TF2) Image Classification (TF2)

TAO API support for TensorFlow2.x models has been removed from this release. Compute Stack# PyTorch 2.1.0 Container# Container name: nvcr.io/nvidia/tao/tao-toolkit Tag: 6.25.09-pyt Software Version Python 3.12 CUDA 12.8 CuDNN 9.7.0 TensorRT 10.8.0 Data Services Container# Container name: nvcr.io/nvidia/tao/tao-toolkit Tag: 6.25.09-dataservices Software Version Python 3.12 CUDA 12.8 CuDNN 9.7.0 TensorRT 10.8.0 Deploy Container# Container name: nvcr.io/nvidia/tao/tao-toolkit Tag: 6.25.09-deploy Software Version Python 3.12 CUDA 12.8 CuDNN 9.7.0 TensorRT 10.8.0 Known Issues and Limitations# Models quantized using TAO Quant may not be compatible for TensorRT ONNX export and deployment. Please refer to the TAO Quant documentation for more details.

TAO 6.25.7# Key Features# Multi-camera 3D object detection and tracking with Sparse4D Pretrained models# Multi-camera 3D object detection and tracking with Sparse4D

State of the art depth estimation models: NvDepthAnythingv2 FoundationStereo

Compute Stack# PyTorch 2.1.0 Container# Container name: nvcr.io/nvidia/tao/tao-toolkit Tag: 6.25.7-pyt Software Version Python 3.12 CUDA 12.8 CuDNN 9.7.0 TensorRT 10.8.0 Data Services Container# Container name: nvcr.io/nvidia/tao/tao-toolkit Tag: 6.25.7-dataservice Software Version Python 3.12 CUDA 12.8 CuDNN 9.7.0 TensorRT 10.8.0

TAO 6.0.0# Key Features# TAO Toolkit deployment as a Finetuning Microservice

New commercially viable foundation models

Multi-node support for training all networks in TAO via Fine-Tuning Micro-Services (FTMS)

Self-supervised training for vision backbones NvDINOv2 for ViTs Mask Auto Encoder for CNN and ViTs

Real Time DETR based object detection model RT-DETR

New knowledge distillation paradigms as seen in Knowledge Distillation Backbone distillation - Logitcs and Summary feature distillation IOU-aware single-stage feature distillation for Object detection models

Synthetic Image Generation using StyleGAN-XL

Visual changenet with multiple golden images as input for classification

Foundation model finetuning for object detection, semantic segmentation, visual changenet and image classification Pretrained models# CRADIOv2 - ViT-B, ViT-L, ViT-H

ConvNextv2

TrafficCamNet Transformer - Lite Known Issues and Limitations# MAE is not supported for TensorRT inference and evaluate via tao-deploy

StyleGAN-XL is not supported for TensorRT inference and evaluate via tao-deploy

Grounding DINO and Mask Grounding DINO finetuning requires at least 16GB of RAM

Foundation model finetuning requires GPUs with at least 24GB VRAM.

Knowledge distillation is currently limited to object detection and backbone distillation classification_pyt for backbone distillation rtdetr and dino for object detection

Mask Grounding DINO deploy can only run TensorRT inference via tao-deploy with a batch-size of 1

BEVFusion is not supported for TensorRT deployment with 5.5.0 Deprecations# All networks from TensorFlow1.x are deprecated and removed from the TAO 6.0.0 package

TAO Converter is deprecated and removed from the TAO 6.0.0 package

Bring Your Own Model (BYOM) is deprecated and removed from the TAO 6.0.0 package Compute Stack# PyTorch 2.1.0 Container# Container name: nvcr.io/nvidia/tao/tao-toolkit Tag: 6.0.0-pyt Software Version Python 3.12 CUDA 12.8 CuDNN 9.7.0 TensorRT 10.8.0 Deploy Container# Container name: nvcr.io/nvidia/tao/tao-toolkit Tag: 6.0.0-deploy Software Version Python 3.12 CUDA 12.8 CuDNN 9.7.0 TensorRT 10.8.0 Data Services Container# Container name: nvcr.io/nvidia/tao/tao-toolkit Tag: 6.0.0-dataservice Software Version Python 3.12 CUDA 12.8 CuDNN 9.7.0 TensorRT 10.8.0 TensorFlow 2.15.0 Container# Container name: nvcr.io/nvidia/tao/tao-toolkit Tag: 6.0.0-tf2 Software Version Python 3.12 CUDA 12.8 CuDNN 9.7.0 TensorRT 10.8.0

TAO 5.5.0# Key Features# Open vocabulary object detection model (GroundingDINO)

Open vocabulary object detection model (Mask GroundingDINO)

Knowledge distillation for DINO object detection model

Multicamera and LIDAR early-fusion using BEVFusion

Semantic, Instance, and Panoptic Image Segmentation with Mask2Former

Interactive demo to run SEGIC (SEGmentation In Context)

Sample application to generate pose points for any object using the FoundationPose model Pretrained Models# Purpose-built models Commercially usable Grounding DINO TAO BevFusion using Synthetic data TAO Synthetic BEVFusion FoundationPose - Foundation model to return pose points of an object Commercially usable Mask GroundingDINO for segmentation Research-only Mask GroundingDINO finetuned on COCO NVCLIP - Commercial CLIP model

Known Issues and Limitations# Grounding DINO and Mask Grounding DINO finetuning requires at least 16GB of RAM

Foundation model finetuning requires GPUs with at least 24GB VRAM.

Knowledge distillation is currently limited to Object Detection

Mask Grounding DINO deploy can only run TensorRT inference via tao-deploy with a batch-size of 1

BEVFusion is not supported for TensorRT deployment with 5.5.0

FoundationPose doesn’t support finetuning via TAO Breaking changes# TF1 networks are deprecated from TAO API from TAO 5.0

Several new changes in the TAO API that have been summarized in this migration guide Compute Stack# PyTorch 2.1.0 Container# container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.5.0-pyt Software Version Python 3.10 CUDA 12.4 CuDNN 9.1.0 TensorRT 8.6.3.1 Deploy Container# container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.5.0-deploy Software Version Python 3.10 CUDA 12.4 CuDNN 9.1.0 TensorRT 8.6.3.1 Data Services Container# container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.5.0-dataservice Software Version Python 3.10 CUDA 12.3 CuDNN 8.9.7 TensorRT 8.6.3.1 TensorFlow 2.15.0 Container# container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.5.0-tf2 Software Version Python 3.10 CUDA 12.4 CuDNN 9.1.0 TensorRT 8.6.3.1

TAO 5.3.0# Key Features# Multiclass Centerpose model for 3D bbox detection

Integration of foundation model ( NvDINOv2 ) backbone to visual changenet

Migration of classification_pyt and segformer to pytorch 2.1.0 and collapse all PyTorch networks into a single container Pretrained Models# Purpose-built models Multiclass CenterPose Visual ChangeNet Classification with NvDINOv2 backbone Visual ChangeNet Segmentation NvDINOv2 backbone - LandSat-SCD Visual ChangeNet Segmentation NvDINOv2 backbone - LEVIR-CD Retail object recogition head with FAN-S model

Known Issues and Limitations# Visual Changenet and Foundation model finetuning is not supported via TAO API

Foundation model finetuning requires GPUs with atleast 24GB VRAM. Breaking changes# Several new changes in the TAO API that have been summarized in this migration guide Compute Stack# PyTorch 2.1.0 Container# container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.3.0-pyt Software Version Python 3.10 CUDA 12.3 CuDNN 8.9.7 TensorRT 8.6.1.6 Deploy Container# container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.3.0-deploy Software Version Python 3.10 CUDA 12.3 CuDNN 8.9.7 TensorRT 8.6.1.6 Data Services Container# container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.3.0-dataservice Software Version Python 3.10 CUDA 12.3 CuDNN 8.9.7 TensorRT 8.6.1.6

TAO 5.2.0# Key Features# New computer vision solutions End-to-end training pipeline for CenterPose model ViT Adaptor implementation to integrate ViT backbone with DINO Finetuning DINO Object detection models with ViT backbones and NvDINOv2 foundation models Finetuning and inference support for Open Vocabulary Image Segmentation as a developer preview feature on GitHub

TAO API Nightly crawler to update the list of TAO-compatible models on NGC dynamically AutoML enabled hyperparameter search for list based parameters Foundation model finetuning supported for classification_pyt AutoML enabled for visual changenet AutoML enabled for CenterPose

Miscellaneous Progress bar to show docker pull status via the launcher

Pretrained Models# Purpose-built models CenterPose ODISE

Known Issues and Limitations# Visual Changenet and Foundation model finetuning is not supported via TAO API

Foundation model finetuning requires GPUs with atleast 24GB VRAM.

DetectNet_v2 export via --onnx_route keras2onnx shows a 16x16 offset in visualized predictions.

FasterRCNN TensorRT engine generation raises false positive failure without actually causing any failures with engine generation or regressions in perf and accuracy. [ 06 /23/2023-13:19:40 ] [ TRT ] [ F ] Validation failed: libNamespace == nullptr /workspace/trt_oss_src/TensorRT/plugin/proposalPlugin/proposalPlugin.cpp:528 [ 06 /23/2023-13:19:40 ] [ TRT ] [ E ] std::exception [ 06 /23/2023-13:19:40 ] [ TRT ] [ I ] Successfully created plugin: ProposalDynamic [ 06 /23/2023-13:19:40 ] [ TRT ] [ F ] Validation failed: libNamespace == nullptr

OCRNet-ViT requires TensorRT 8.6 above to reach the best accuracy. With TensorRT 8.5, OCRNet-ViT should be exported with opset-version < 17 and FP32 precision is recommended to use. Breaking changes# From TAO 5.2.0, the TensorFlow backends are supported as only source code releases for new features on GitHub. NVIDIA recommends building the container from source to get the latest features and bugfixes.

From TAO 5.0.0, the UNet onnx model output is now argmax_1/output as opposed to softmax_1 Compute Stack# PyTorch 1.14.0 Container# container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.2.0-pyt1.14.0 Software Version Python 3.8 CUDA 12.0 CuDNN 8.6.0 TensorRT 8.5.3.1 PyTorch 2.1.0 Container# container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.2.0-pyt2.1.0 Software Version Python 3.10 CUDA 12.2 CuDNN 8.9.5 TensorRT 8.6.1 Deploy Container# container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.2.0-deploy Software Version Python 3.10 CUDA 12.2 CuDNN 8.9.5 TensorRT 8.6.1 Data Services Container# container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.2.0-dataservice Software Version Python 3.10 CUDA 12.2 CuDNN 8.9.5 TensorRT 8.6.1

TAO 5.1.0# Key Features# New computer vision solutions End-to-end training pipeline for Visual ChangetNet classification and segmentation Finetuning for the following foundation image model backbones for classification: OpenCLIP EvaCLIP Note Refer to the Foundation Models section for model details.

Pretrained Models# Purpose-built models Visual Changenet Classification Visual Changenet Segmentation - LEVIRCD (research only) Visual Changenet Segmentation - LandSat-SCD

Known Issues and Limitations# Visual Changenet and Foundation model finetuning is not supported via TAO API

Foundation model finetuning requires GPUs with atleast 24GB VRAM.

DetectNet_v2 export via --onnx_route keras2onnx shows a 16x16 offset in visualized predictions.

The DetectNet_v2 inferencer cannot set dbscan_min_samples > 1 . Breaking changes# The DetectNet_v2 inferencer configuration parameter dbscan_min_samples can only be set to an integer, as opposed to float32 from TAO 4.0.x. Compute Stack# PyTorch Container# container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.1.0-pyt Software Version Python 3.8 CUDA 12.0 CuDNN 8.6.0 TensorRT 8.5.3.1 Deploy Container# container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.1.0-deploy Software Version Python 3.8 CUDA 12.0 CuDNN 8.6.0 TensorRT 8.5.3.1

TAO 5.0.0# Key Features# New computer vision solutions Custom Siamese network training pipeline for Optical Inspection with TensorBoard visualization End-to-end training pipeline for Metric Learning Recognition Image Classification in TAO PyTorch with FAN and GCViT backbones New object detection architecture DINO with FAN, GCViT, and ResNet backbones SegFormer training now supports FAN based backbones Deformable DETR with GCViT backbones Training pipeline for Mask Auto Labeller network End-to-end TAO workflow pipeline for optical character detection and optical character recognition from a document New tools to enhance your datasets: Generate segmentation masks for user datasets using the Mask Auto Labeller Multi-GPU offline dataset augmentation for object detection use cases Tools to visualize, inspect, validate and correct annotations for object detection datasets Format converter between COCO and KITTI Object detection datasets

Launcher CLI New task_group hierarchy to help seggregate task actions: model dataset deploy

Pipeline features Export to deserialize ONNX models for direct integration with TensorRT (except MaskRCNN) Decrypted checkpoint serialization across all networks

RESTful APIs and Cloud deployment More networks added to the AutoML workflow Quick start support extended to the following new K8 Cloud Service Providers (CSPs): Google Cloud GKS Microsoft Azure AKS

Source code is now available for all TAO components on GitHub. For more information, refer to the TAO Source Code section. Pre-Trained Models# Purpose-built models PeopleSemSegFormer PCB Classification OCDNet OCRNet Retail Object Detection Retail Object Recognition Optical Inspection

Pre-trained starter weights Classification Pretrained GCViT NvImageNet Pretrained FAN NvImageNet Pretrained GCViT ImageNet Pretrained FAN ImageNet Object Detection Pretrained DINO NVImageNet Pretrained DINO ImageNet Pretrained Deformable-DETR NVImageNet Pretrained Deformable-DETR NVImageNet Pretrained EfficientNet NVImageNet EfficientDet COCO Deformable-DETR COCO DINO COCO Segmentation Pretrained SegFormer NVImageNet Pretrained SegFormer ImageNet Mask Auto Label CityScapes Segformer

Deprecated Features# All TAO Conversational AI integrations have been deprecated from TAO version 5.0.0

The ability to use tao-converter to generate TensorRT engine from .etlt files has deprecated. All networks support direct integration with TensorRT and the trtexec sample. For more information, refer to the Profiling with TensorRT section.

The following computer vision training pipelines have been deprecated: Gaze Estimation Emotion Classification Heart-rate Estimation Gesture Recognition

Breaking changes# All PyTorch and TensorFlow 2 networks have a rearchitected specification file with a concept of experiment specification

Common parameters have been renamed across all networks for configuration uniformity

SegFormer models from TAO version 4.0.0 cannot be loaded in version 5.0.0. For version 5.0.0, use the new pretrained models.

Models exported from TAO 5.0.0 will not work with tao-converter for TensorRT engine generation. You can use the trtexec command line wrapper from TensorRT directly to generate TensorRT engines.

All previous tao <network> <subtask> command hierarchies are now tao model <network> <subtash> . Therefore, sample notebooks released as part of TAO 4.0.x will not work directly with TAO 5.0.0. For more information about the new CLI structure, read the migration guide from TAO 4.0.x to TAO 5.0.0.

Offline augmentation tooling tao augment is not tao dataset augment under the dataset task_group . Bug Fixes# Fixes for errors in .etlt inference for DetectNet_v2

Fixes to improve stability of MultiGPU jobs for TensorFlow 1.x networks Known Issues and Limitations# FAN-based networks exported from TAO as .onnx files require TensorRT versions >= 8.6.x for deployment.

tao deploy for optical inspection model doesn’t support dynamic batching.

BodyPoseNet and FPENet are not integrated with tao deploy for TAO version 5.0.0.

DetectNet-v2 export to .onnx for a QAT INT8 model is only supported via the tf2onnx backend.

MIG training is currently only supported for single GPUs. For more information, refer to the Running training on Multi-GPU instance section.

All DNN containers require NVIDIA CUDA Driver version 525.85 and above to run.

Re-identification trainer doesn’t support multi-GPU training in 5.0.0 Compute Stack# TF 1.15.5 Container# container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.0.0-tf1.15.5 Software Version Python 3.8 CUDA 12.0 CuDNN 8.6.0 TensorRT 8.5.3.1 TF 2.11.0 Container# container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.0.0-tf2.11.0 Software Version Python 3.8 CUDA 12.0 CuDNN 8.6.0 TensorRT 8.5.3.1 PyTorch Container# container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.0.0-pyt Software Version Python 3.8 CUDA 12.0 CuDNN 8.6.0 TensorRT 8.5.3.1 Deploy Container# container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.0.0-deploy Software Version Python 3.8 CUDA 12.0 CuDNN 8.6.0 TensorRT 8.5.3.1 Data Services Container# container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.0.0-dataservice Software Version Python 3.8 CUDA 12.0 CuDNN 8.6.0 TensorRT 8.5.3.1

TAO 4.0.2# Incremental changes over 4.0.1. Bug Fixes# TAO API TAO API AutoML hanging TAO API support for HTTPS Proxy and Custom SSL CA Certificate TAO API inaccessible service on wireless interfaces TAO API MLOPs visualization for MaskRCNN UNet



TAO 4.0.1# Key Features# Enable third party MLOPs providers - ClearML and Weights and Biases for the following networks MaskRCNN UNet

Compute Stack# TF 1.15.5 Container# container name: nvcr.io/nvidia/tao/tao-toolkit tag: 4.0.1-tf1.15.5 Software Version Python 3.6 CUDA 11.8 CuDNN 8.6.0 TensorRT 8.5.1.7 Bug Fixes# YOLOv4 visualizer fails when running multiGPU training

Fix model cancel and resume function names in tao-client

TAO API Replace FLIR Google Drive links with public links Bare metal Quick Start Script Fix GPU Operator deployment issues when host drivers are installed Disable ingress-nginx controller admissionWebhooks as they fail on some systems Add support for MIG-based nodes Add support for overriding GPU Operator and driver versions

Known Issues/Limitations# MLOPs visualization for MaskRCNN and UNet are not available via the RestAPIs

TAO 4.0.0# Key Features# AutoML suite via TAO API

Integration with Third party MLOPS providers - ClearML and Weights and Biases

Support for Transformer based Deep Neural Network training and export Segformer - semantic segmentation Deformable DETR - object detection

Support for reidentification network

Seggregation of DNN commands into training and deploy containers

Pruning and finetuning of NGram language models

Add support for AWS EKS and Azure AKS

Quick start scripts for easy deployment of TAO via launcher and APIs Launcher APIs Bare Metal AWS EKS Azure AKS

Compute Stack# TF 1.15.5 Container# container name: nvcr.io/nvidia/tao/tao-toolkit tag: 4.0.0-tf1.15.5 Software Version Python 3.6 CUDA 11.8 CuDNN 8.6.0 TensorRT 8.5.1.7 TF 2.9.1 Container# container name: nvcr.io/nvidia/tao/tao-toolkit tag: 4.0.0-tf2.9.1 Software Version Python 3.8 CUDA 11.8 CuDNN 8.6.0 TensorRT 8.5.0.12 PyTorch Container# container name: nvcr.io/nvidia/tao/tao-toolkit tag: 4.0.0-pyt Software Version Python 3.8 CUDA 11.8 CuDNN 8..6.0 TensorRT 8.5.0.12 Deploy Container# container name: nvcr.io/nvidia/tao/tao-toolkit tag: 4.0.0-deploy Software Version Python 3.8 CUDA 11.8 CuDNN 8.6.0 TensorRT 8.5.1.7 Known Issues and Limitations# Wandb integraton requires that containers be instantiated by the root user.

The NLP Question Answering task doesn’t support egatron-based models for TAO workflows.

TAO 3.0-22.05# Key Features# Bring your own models into TAO using TAO BYOM converter.

Deploy TAO as a service on a Kubernetes cluster, detailed in this section

Integrate TAO into your workflow using RestAPIs

TensorBoard visualization is available for select models, as detailed in this section.

Train object detection networks from a pointcloud data file via PointPillars.

Train a classification network to classify poses from a pose skeleton via a Graph convolutional network.

Intermediate checkpointing is available for ASR and TTS models.

Support Conformer-CTC for ASR: train, finetune, evaluate, infer, and export. Compute Stack# TF 1.15.4 Container# container name: nvcr.io/nvidia/tao/tao-toolkit-tf tag: v3.22.05-tf1.15.4-py3 Software Version Python 3.6 CUDA 11.4 CuDNN 8.2.1.32 TensorRT 8.2.5.1 TF 1.15.5 Container# container name: nvcr.io/nvidia/tao/tao-toolkit-tf tag: v3.22.05-tf1.15.5-py3 Software Version Python 3.6 CUDA 11.6 CuDNN 8.2.1.32 TensorRT 8.2.5.1 PyTorch Container# container name: nvcr.io/nvidia/tao/tao-toolkit-pyt tag: v3.22.05-py3 Software Version Python 3.8 CUDA 11.5 CuDNN 8.2.1.32 TensorRT 8.2.5.1 Language Model Container# container name: nvcr.io/nvidia/tao/tao-toolkit-lm tag: v3.22.05-py3 Software Version Python 3.8 CUDA 11.5 CuDNN 8.2.1.32 TensorRT 8.2.5.1 Model Updates# Computer Vision# Image Classification Add verification for custom classmap file input. Add classmap file input to train. Add classmap file as optional input for evaluate. Add status callback and results_dir command line argument for evaluate and inference. Support TensorBoard visualization for train endpoint. Perform initial updates for BYOM custom layer. Add EFF package. Add EFF package and model loading. Enable BYOM in image classification.

DetectNet_v2 Limit GPU memory usage during tao detectnet_v2 evaluate , Add native support to convert COCO Dataset to TFRecords, Bring sampling mode parameter out in the spec file under dataset_config , Enable tensorboard visualization, Add configuration element for visualizer in dataset_config . Fix success state for TFRecords generation. Add status logging to all tasks as long as the --results_dir argument is set via command line.

UNet Update the --gen_ds_config option during UNet export. Add the dataset_convert endpoint to UNet. Add support for converting COCO Dataset to TFRecords. Support evaluation on a pruned model. Add graph collect for functions to improve memory consumption. Optimize ONNX for UNet inference. Fix bugs for re-training a pruned model. Add unified status_logging to UNet endpoints. Support custom layer pruning and direct evaluate from .tltb via BYOM. Enable Bring Your Own Model for UNet. Implement support for Quantization Aware Training (QAT). Add end-to-end support for ShuffleNet. Enable status logging during training via StatusCallBack . Improve the operation of dataloader during training. Enable TensorBoard visualization during training. Add a warning for output_width . Enable support for training with early stopping.

BYOM Enable custom layer pruning for Bring You Own Model (BYOM).

Common features Fix error handling in model_io . Support COCO TFRecord conversion for object detection and segmentation networks. Fix a typo in SoftStartAnnealingLearningRateScheduler. Implement status-logging callback.

YOLOv4 Enable smoothing to object loss. Support exponential moving average (EMA). Fix the YOLOv4 neck and head structure. Configure NMS per data-loader configuration. Fix YOLOv3 and YOLOv4 shapes. Enable manually setting class weighting. Enable TensorBoard visualization.

MaskRCNN Enable skip_crowd_during_training=False . Add an evaluation summary and patch exporter. Enable TensorBoard visualization.

EfficientDet Fix a typo in TRT inferencer.

SSD Enable status logging for all endpoints when --results_dir is added to the command line Enable support for training with early stopping.

DSSD Enable status logging for all endpoints when --results_dir is added to the command line. Enable support for training with early stopping.

RetinaNet Enable support for training with early stopping. Enable status logging for all endpoints when --results_dir is added to the command line. Fix a bug with resume checkpoint via sequence dataloader. Enable backward compatibility for a TLT 2.0 trained model. Enable Tensorboard visualization during training. Enable manually setting class weights.

FasterRCNN Enable status logging for all endpoints when --results_dir is added to the command line. Enable model as a CLI argument of evaluation and inference for TAO API. Enable Tensorboard visualization during training

Conversational AI# Generic Add status logging to TTS models similar to TAO CV models Fix issue in QA model evaluation for Chinese SQuAD*style dataset Fix bug of create_tokenizer on always using old corpus silently Update backend to use NeMo 1.7.0

TTS Remove duration check for TTS dataset from Riva Custom Voice Recorder Fix infer onnx endpoint when running infer from finetuned model Fix error handling for Vocoder Enable intermediate .tlt model checkpoint

PointPillars Enabled transfer learning with pretrained models Use TensorRT oss 22.02 from GitHub

Action Recognition Update metrics module

ASR Support Early Stopping Finetune on NeMo models Enable intermediate .tlt model checkpoint

Pretrained models# New models PointPillarNet PoseClassificationNet

Updated models PeopleNet PeopleSemSegNet PeopleSegNet LPDNet

Known Issues/Limitations# TAO DSSD/FasterRCNN/RetinaNet/YOLOv3/YOLOv4 can have intermittent illegal memory access errors with export or converter CLI commands. The root cause is unknown. In this case, simply run it again to resolve this issue.

The TAO BYOM Semantic Segmentation workflow is only supported with UNet and Image Classification.

TAO Image Classification networks require driver 510 or greater for training.

TAO as a Service doesn’t support user authentication and per-user workspace management.

TTS Finetuning is only supported for data originating from the NVIDIA Custom Voice Recorder.

TAO 3.0-22.02# Key Features# Features included in this release TAO Resources Jupyter notebook example for showing the end-to-end workflow for the following model TTS finetuning

TAO Conversational AI Support for finetuning a FastPitch and HiFiGAN from a pretrained model Update FastPitch and HiFiGAN export and infer endpoint to interface with RIVA

Known Issues/Limitations# TAO FastPitch finetuning is only supported on text transcripts that are defined in the NVIDIA Custom Voice Recorder.

The data from the NVIDIA Custom Voice Recorder can only be used for finetuning a FastPitch or HiFiGAN model.

For finetuning FastPitch, you are required to resample the new speaker data to the sampling rate of the dataset used to train the pretrained model.

TAO 3.0-21.11# Key Features# Features included in this release: TAO Resources: Jupyter notebook examples showing the end-to-end workflow for the following models ActionRecognitionNet EfficientDet Text-To-Speech using FastPitch and HiFiGAN

TAO CV: Pretrained models for several public architectures and reference applications serving computer vision related object classification, detection and segmentation use cases. Support for YOLOv4-tiny and EfficienetDet object detection models. Support for pruning EfficientDet models New pretrained models released on NGC PeopleNet version 2.5 ActionRecognitionNet Converter utility to generate device specific optimized TensorRT engines Jetson JP4.6 x86 + dGPU - TensorRT 8.0.1.6 with CUDA 11.4

TAO Conversational AI: Support for training FastPitch and HiFiGAN model from scratch Adding new encoders for Natural Language Processing tasks DistilBERT BioMegatron-BERT

Known Issues/Limitations# TAO CV Transfer Learning is not supported on pruned models across all applications. When training with multiple GPUs, you might need to scale down the batch_size and/or scale up the learning rate to get the same accuracy seen in single GPU training. When training DetectNet_v2 for object detection use-cases with more than 10 classes, you may need to either update the cost_weight parameter in the cost_function_config, or balance the number of samples per class in the dataset for better training. When training a DetectNet_v2 network for datasets with less than 20,000 images, please use smaller batch-sizes (1, 2 or 4) to get better accuracy. The infer subtask of DetectNet_v2 doesn’t output confidence and generates 0. as value. You may ignore these values and only consider the bbox and class labels as valid outputs. ResNet101 pre-trained weights from NGC is not supported on YOLOv3, YOLOv4, YOLOv4-tiny, SSD, DSSD and RetinaNet. When generating int8 engine with tao-converter , please use -s if there is TensorRT error message saying weights are outside of fp16 range. Due to the complexity of larger EfficientDet models, the pruning process will take significantly longer to finish. For example, pruning the EfficientDet-D5 model may take at least 25 minutes on a V100 server. When generating a TensorRT INT8 engine on A100 GPUs using the tao-converter for MaskRCNN, enable --strict_data_type Our EfficientDet codebase has source code taken from the automl github repo

TAO Conversational AI When running convAI models on a cloud VM, users should have root access to the VM Text-To-Speech pipelines only support training from scratch for a single speaker Text-To-Speech training pipeline requires the audio files to be .wav format TAO 3.0-21.11 exported .riva files will not be supported in RIVA < 21.09 BioMegatron-BERT and Megatron based NLP tasks doesn’t support resuming a previously completed model with more number of epochs than the previously completed experiment When running the end to end sample of Text-to-Speech, you may have to use expand abbreviations

Resolved Issues# TAO CV YOLOv4, YOLOv3, UNet and LPRNet exported .etlt model files can be integrated directly into DeepStream 6.0.

TAO Conversational AI ASR model support generating intermediate .tlt model files during training

Deprecated Features# The TAO Computer Vision Inference Pipeline is deprecated. Users can now use DeepStream to deploy the following out-of-the-box models via reference applications provided here: HeartRateNet GestureNet EmotionNet FpeNet FaceDetect GazeNet BodyPoseNet

Release Contents# Components included in this release: TAO Launcher pip package

TAO - TF docker

TAO - Pytorch Docker

TAO - Language Model Docker

Jupyter notebook with sample workflows Conversational AI Computer Vision

Getting Started Guide containing usage and installation instructions

tao-converter for x86 + discrete GPU platforms

tao-converter for Jetson (ARM64) available here.

Pre-trained weights trained on Open Image dataset available on NGC Classification Object Detection Object Detection - DetectNet_v2 Instance Segmentation Semantic Segmentation

Unpruned and Pruned models for Purpose-built models - Pruned models can be deployed out-of-box with DeepStream and unpruned models can be used for re-training. PeopleNet TrafficCamNet DashCamNet FaceDetectIR VehicleTypeNet VehicleMakeNet LPDNet

Trainable and out-of-box Deployable models for: PeopleSegNet HeartRateNet GestureNet EmotionNet FpeNet FaceDetect GazeNet LPRNet BodyPoseNet

