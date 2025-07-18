Release Notes
NVIDIA TAO is a Python package that gives you the ability to fine-tune pretrained models with your own data and export them for TensorRT based inference through an edge device.
NVIDIA Transfer Learning Toolkit has been renamed to TAO: For a detailed migration guide, refer to this page.
Key Features
Open vocabulary object detection model (GroundingDINO)
Open vocabulary object detection model (Mask GroundingDINO)
Knowledge distillation for DINO object detection model
Multicamera and LIDAR early-fusion using BEVFusion
Semantic, Instance, and Panoptic Image Segmentation with Mask2Former
Interactive demo to run SEGIC (SEGmentation In Context)
Sample application to generate pose points for any object using the FoundationPose model
Pretrained Models
Purpose-built models
Commercially usable Grounding DINO
TAO BevFusion using Synthetic data
TAO Synthetic BEVFusion
FoundationPose - Foundation model to return pose points of an object
Commercially usable Mask GroundingDINO for segmentation
Research-only Mask GroundingDINO finetuned on COCO
NVCLIP - Commercial CLIP model
-
Known Issues and Limitations
Grounding DINO and Mask Grounding DINO finetuning requires at least 16GB of RAM
Foundation model finetuning requires GPUs with at least 24GB VRAM.
Knowledge distillation is currently limited to Object Detection
Mask Grounding DINO deploy can only run TensorRT inference via tao-deploy with a batch-size of 1
BEVFusion is not supported for TensorRT deployment with 5.5.0
FoundationPose doesn’t support finetuning via TAO
Breaking changes
Several new changes in the TAO API that have been summarized in this migration guide
Compute Stack
PyTorch 2.1.0 Container
container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.5.0-pyt
|Software
|Version
|Python
|3.10
|CUDA
|12.4
|CuDNN
|9.1.0
|TensorRT
|8.6.3.1
Deploy Container
container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.5.0-deploy
|Software
|Version
|Python
|3.10
|CUDA
|12.4
|CuDNN
|9.1.0
|TensorRT
|8.6.3.1
Data Services Container
container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.5.0-dataservice
|Software
|Version
|Python
|3.10
|CUDA
|12.3
|CuDNN
|8.9.7
|TensorRT
|8.6.3.1
TensorFlow 2.15.0 Container
container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.5.0-tf2
|Software
|Version
|Python
|3.10
|CUDA
|12.4
|CuDNN
|9.1.0
|TensorRT
|8.6.3.1
Key Features
Multiclass Centerpose model for 3D bbox detection
Integration of foundation model (NvDINOv2) backbone to visual changenet
Migration of
classification_pytand
segformerto pytorch 2.1.0 and collapse all PyTorch networks into a single container
Pretrained Models
Purpose-built models
Multiclass CenterPose
Visual ChangeNet Classification with NvDINOv2 backbone
Visual ChangeNet Segmentation NvDINOv2 backbone - LandSat-SCD
Visual ChangeNet Segmentation NvDINOv2 backbone - LEVIR-CD
Retail object recogition head with FAN-S model
-
Known Issues and Limitations
Visual Changenet and Foundation model finetuning is not supported via TAO API
Foundation model finetuning requires GPUs with atleast 24GB VRAM.
Breaking changes
Several new changes in the TAO API that have been summarized in this migration guide
Compute Stack
PyTorch 2.1.0 Container
container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.3.0-pyt
|Software
|Version
|Python
|3.10
|CUDA
|12.3
|CuDNN
|8.9.7
|TensorRT
|8.6.1.6
Deploy Container
container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.3.0-deploy
|Software
|Version
|Python
|3.10
|CUDA
|12.3
|CuDNN
|8.9.7
|TensorRT
|8.6.1.6
Data Services Container
container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.3.0-dataservice
|Software
|Version
|Python
|3.10
|CUDA
|12.3
|CuDNN
|8.9.7
|TensorRT
|8.6.1.6
Key Features
New computer vision solutions
End-to-end training pipeline for CenterPose model
ViT Adaptor implementation to integrate ViT backbone with DINO
Finetuning DINO Object detection models with ViT backbones and NvDINOv2 foundation models
Finetuning and inference support for Open Vocabulary Image Segmentation as a developer preview feature on GitHub
-
TAO API
Nightly crawler to update the list of TAO-compatible models on NGC dynamically
AutoML enabled hyperparameter search for list based parameters
Foundation model finetuning supported for classification_pyt
AutoML enabled for visual changenet
AutoML enabled for CenterPose
-
Miscellaneous
Progress bar to show docker pull status via the launcher
-
Pretrained Models
Purpose-built models
CenterPose
ODISE
-
Known Issues and Limitations
Visual Changenet and Foundation model finetuning is not supported via TAO API
Foundation model finetuning requires GPUs with atleast 24GB VRAM.
DetectNet_v2 export via
--onnx_route keras2onnxshows a 16x16 offset in visualized predictions.
FasterRCNN TensorRT engine generation raises false positive failure without actually causing any failures with engine generation or regressions in perf and accuracy.
[06/23/2023-13:19:40] [TRT] [F] Validation failed: libNamespace == nullptr /workspace/trt_oss_src/TensorRT/plugin/proposalPlugin/proposalPlugin.cpp:528 [06/23/2023-13:19:40] [TRT] [E] std::exception [06/23/2023-13:19:40] [TRT] [I] Successfully created plugin: ProposalDynamic [06/23/2023-13:19:40] [TRT] [F] Validation failed: libNamespace == nullptr
OCRNet-ViT requires TensorRT 8.6 above to reach the best accuracy. With TensorRT 8.5, OCRNet-ViT should be exported with opset-version < 17 and FP32 precision is recommended to use.
Breaking changes
From TAO 5.2.0, the TensorFlow backends are supported as only source code releases for new features on GitHub. NVIDIA recommends building the container from source to get the latest features and bugfixes.
From TAO 5.0.0, the UNet onnx model output is now
argmax_1/outputas opposed to
softmax_1
Compute Stack
PyTorch 1.14.0 Container
container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.2.0-pyt1.14.0
|Software
|Version
|Python
|3.8
|CUDA
|12.0
|CuDNN
|8.6.0
|TensorRT
|8.5.3.1
PyTorch 2.1.0 Container
container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.2.0-pyt2.1.0
|Software
|Version
|Python
|3.10
|CUDA
|12.2
|CuDNN
|8.9.5
|TensorRT
|8.6.1
Deploy Container
container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.2.0-deploy
|Software
|Version
|Python
|3.10
|CUDA
|12.2
|CuDNN
|8.9.5
|TensorRT
|8.6.1
Data Services Container
container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.2.0-dataservice
|Software
|Version
|Python
|3.10
|CUDA
|12.2
|CuDNN
|8.9.5
|TensorRT
|8.6.1
Key Features
New computer vision solutions
End-to-end training pipeline for Visual ChangetNet classification and segmentation
Fine tuning for the following foundation image model backbones for classification:
Note
OpenCLIP
EvaCLIP
Refer to the Foundation Models section for model details.
-
-
Pretrained Models
Purpose-built models
Visual Changenet Classification
Visual Changenet Segmentation - LEVIRCD (research only)
Visual Changenet Segmentation - LandSat-SCD
-
Known Issues and Limitations
Visual Changenet and Foundation model finetuning is not supported via TAO API
Foundation model finetuning requires GPUs with atleast 24GB VRAM.
DetectNet_v2 export via
--onnx_route keras2onnxshows a 16x16 offset in visualized predictions.
The DetectNet_v2 inferencer cannot set
dbscan_min_samples>
1.
Breaking changes
The DetectNet_v2 inferencer configuration parameter
dbscan_min_samplescan only be set to an integer, as opposed to float32 from TAO 4.0.x.
Compute Stack
PyTorch Container
container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.1.0-pyt
|Software
|Version
|Python
|3.8
|CUDA
|12.0
|CuDNN
|8.6.0
|TensorRT
|8.5.3.1
Deploy Container
container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.1.0-deploy
|Software
|Version
|Python
|3.8
|CUDA
|12.0
|CuDNN
|8.6.0
|TensorRT
|8.5.3.1
Key Features
New computer vision solutions
Custom Siamese network training pipeline for Optical Inspection with TensorBoard visualization
End-to-end training pipeline for Metric Learning Recognition
Image Classification in TAO PyTorch with FAN and GCViT backbones
New object detection architecture DINO with FAN, GCViT, and ResNet backbones
SegFormer training now supports FAN based backbones
Deformable DETR with GCViT backbones
Training pipeline for Mask Auto Labeller network
End-to-end TAO workflow pipeline for optical character detection and optical character recognition from a document
New tools to enhance your datasets:
Generate segmentation masks for user datasets using the Mask Auto Labeller
Multi-GPU offline dataset augmentation for object detection use cases
Tools to visualize, inspect, validate and correct annotations for object detection datasets
Format converter between COCO and KITTI Object detection datasets
-
-
Launcher CLI
-
New
task_grouphierarchy to help seggregate task actions:
-
model
-
dataset
-
deploy
-
-
Pipeline features
Export to deserialize ONNX models for direct integration with TensorRT (except MaskRCNN)
Decrypted checkpoint serialization across all networks
-
RESTful APIs and Cloud deployment
More networks added to the AutoML workflow
Quick start support extended to the following new K8 Cloud Service Providers (CSPs):
-
Google cloud GKS
-
Microsoft Azure AKS
-
-
Source code is now available for all TAO components on GitHub. For more information, refer to the TAO Source Code section.
Pre-Trained Models
-
Purpose-built models
-
PeopleSemSegFormer
-
PCB Classification
-
OCDNet
-
OCRNet
-
Retail Object Detection
-
Retail Object Recognition
-
Optical Inspection
-
-
Pre-trained starter weights
-
Classification
-
Pretrained GCViT NvImageNet
-
Pretrained FAN NvImageNet
-
Pretrained GCViT ImageNet
-
Pretrained FAN ImageNet
-
-
Object Detection
-
Pretrained DINO NVImageNet
-
Pretrained DINO ImageNet
-
Pretrained Deformable-DETR NVImageNet
-
Pretrained Deformable-DETR NVImageNet
-
Pretrained EfficientNet NVImageNet
-
EfficientDet COCO
-
Deformable-DETR COCO
-
DINO COCO
-
-
Segmentation
-
Pretrained SegFormer NVImageNet
-
Pretrained SegFormer ImageNet
-
Mask Auto Label
-
CityScapes Segformer
-
-
Deprecated Features
All TAO Conversational AI integrations have been deprecated from TAO version 5.0.0
The ability to use
tao-converterto generate TensorRT engine from
.etltfiles has deprecated. All networks support direct integration with TensorRT and the trtexec sample. For more information, refer to the Profiling with TensorRT section.
The following computer vision training pipelines have been deprecated:
-
Gaze Estimation
-
Emotion Classification
-
Heart-rate Estimation
-
Gesture Recognition
-
Breaking changes
All PyTorch and TensorFlow 2 networks have a rearchitected specification file with a concept of experiment specification
Common parameters have been renamed across all networks for configuration uniformity
SegFormer models from TAO version 4.0.0 cannot be loaded in version 5.0.0. For version 5.0.0, use the new pretrained models.
Models exported from TAO 5.0.0 will not work with
tao-converterfor TensorRT engine generation. You can use the trtexec command line wrapper from TensorRT directly to generate TensorRT engines.
All previous
tao <network> <subtask>command hierarchies are now
tao model <network> <subtash>. Therefore, sample notebooks released as part of TAO 4.0.x will not work directly with TAO 5.0.0. For more information about the new CLI structure, read the migration guide from TAO 4.0.x to TAO 5.0.0.
Offline augmentation tooling
tao augmentis not
tao dataset augmentunder the dataset
task_group.
Bug Fixes
Fixes for errors in
.etltinference for DetectNet_v2
Fixes to improve stability of MultiGPU jobs for TensorFlow 1.x networks
Known Issues and Limitations
Training on multi-GPU is currently limited to single-node instances via TAO API
FAN-based networks exported from TAO as
.onnxfiles require TensorRT versions >= 8.6.x for deployment.
tao deployfor optical inspection model doesn’t support dynamic batching.
BodyPoseNet and FPENet are not integrated with
tao deployfor TAO version 5.0.0.
DetectNet-v2 export to
.onnxfor a QAT INT8 model is only supported via the
tf2onnxbackend.
Multi-Node execution is only supported via the container execution model as explained in the Working with the Containers section.
MIG training is currently only supported for single GPUs. For more information, refer to the Running training on Multi-GPU instance section.
All DNN containers require NVIDIA CUDA Driver version 525.85 and above to run.
Re-identification trainer doesn’t support multi-GPU training in 5.0.0
Compute Stack
TF 1.15.5 Container
container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.0.0-tf1.15.5
|Software
|Version
|Python
|3.8
|CUDA
|12.0
|CuDNN
|8.6.0
|TensorRT
|8.5.3.1
TF 2.11.0 Container
container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.0.0-tf2.11.0
|Software
|Version
|Python
|3.8
|CUDA
|12.0
|CuDNN
|8.6.0
|TensorRT
|8.5.3.1
PyTorch Container
container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.0.0-pyt
|Software
|Version
|Python
|3.8
|CUDA
|12.0
|CuDNN
|8.6.0
|TensorRT
|8.5.3.1
Deploy Container
container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.0.0-deploy
|Software
|Version
|Python
|3.8
|CUDA
|12.0
|CuDNN
|8.6.0
|TensorRT
|8.5.3.1
Data Services Container
container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.0.0-dataservice
|Software
|Version
|Python
|3.8
|CUDA
|12.0
|CuDNN
|8.6.0
|TensorRT
|8.5.3.1
Incremental changes over 4.0.1.
Bug Fixes
TAO API
TAO API AutoML hanging
TAO API support for HTTPS Proxy and Custom SSL CA Certificate
TAO API inaccessible service on wireless interfaces
TAO API MLOPs visualization for
MaskRCNN
UNet
-
-
Key Features
Enable third party MLOPs providers - ClearML and Weights and Biases for the following networks
MaskRCNN
UNet
-
Compute Stack
TF 1.15.5 Container
container name: nvcr.io/nvidia/tao/tao-toolkit tag: 4.0.1-tf1.15.5
|Software
|Version
|Python
|3.6
|CUDA
|11.8
|CuDNN
|8.6.0
|TensorRT
|8.5.1.7
Bug Fixes
YOLOv4 visualizer fails when running multiGPU training
Fix model cancel and resume function names in
tao-client
TAO API
Replace FLIR Google Drive links with public links
Bare metal Quick Start Script
Fix GPU Operator deployment issues when host drivers are installed
Disable ingress-nginx controller admissionWebhooks as they fail on some systems
Add support for MIG-based nodes
Add support for overriding GPU Operator and driver versions
-
-
Known Issues/Limitations
MLOPs visualization for MaskRCNN and UNet are not available via the RestAPIs
Key Features
AutoML suite via TAO API
Integration with Third party MLOPS providers - ClearML and Weights and Biases
Support for Transformer based Deep Neural Network training and export
Segformer - semantic segmentation
Deformable DETR - object detection
-
Support for reidentification network
Seggregation of DNN commands into training and deploy containers
Pruning and finetuning of NGram language models
Add support for AWS EKS and Azure AKS
Quick start scripts for easy deployment of TAO via launcher and APIs
Launcher
APIs
Bare Metal
AWS EKS
Azure AKS
-
-
Compute Stack
TF 1.15.5 Container
container name: nvcr.io/nvidia/tao/tao-toolkit tag: 4.0.0-tf1.15.5
|Software
|Version
|Python
|3.6
|CUDA
|11.8
|CuDNN
|8.6.0
|TensorRT
|8.5.1.7
TF 2.9.1 Container
container name: nvcr.io/nvidia/tao/tao-toolkit tag: 4.0.0-tf2.9.1
|Software
|Version
|Python
|3.8
|CUDA
|11.8
|CuDNN
|8.6.0
|TensorRT
|8.5.0.12
PyTorch Container
container name: nvcr.io/nvidia/tao/tao-toolkit tag: 4.0.0-pyt
|Software
|Version
|Python
|3.8
|CUDA
|11.8
|CuDNN
|8..6.0
|TensorRT
|8.5.0.12
Deploy Container
container name: nvcr.io/nvidia/tao/tao-toolkit tag: 4.0.0-deploy
|Software
|Version
|Python
|3.8
|CUDA
|11.8
|CuDNN
|8.6.0
|TensorRT
|8.5.1.7
Model Updates
Computer vision
Common
Upgrade TensorRT version to 8.5.1.7.
Integrate clearml and wandb into train tasks.
Pass
target_opsetto exporter for ONNX models.
Fix
status.jsonfor all networks required by TAO API.
Store
calib_jsonand suppress TensorRT-related arguments.
-
Classification
Perform recursive walkthrough of
image_dir.
Add valid input checks and corresponding logs.
-
FasterRCNN
Fix bug in pruning for VGG16.
-
UNet
Resolve BYOM Bug by adding param for removing FC head.
Add target opset to export model.
Fix resume training and save checkpoint.
Add
calib_jsonoption and remove
tensorrtoptions from export.
Fix modifying the number of classes while finetuning.
Fix retraining for QAT models.
-
DetectNet_v2
Fix bug in early stopping validation.
Add config file for DNv2 in wandb and clearml.
Add thresholding to evaluate.
Add early stopping to DetectNetv2.
-
Multitask Classification
Fix multitask classification export with deepstream config.
-
YOLOv3
Enable Tensorboard visualization.
-
MaskRCNN
Enable adaptive export for
mrcnn_resolution.
-
SSD
Fix resuming issue with DALI dataloader.
Reduce the call to
create_quantized_keras_modelwhen enabling QAT.
Fix dataset converter regression.
-
YOLOv4
Add automatic class weighting.
Support 16bit images.
-
Deformable-DETR
Initial commit for Deformable-DETR support,
-
Segformer
Initial commit for Segformer support
-
Core
Add logic for telemetry data upload.
-
ARNet
Enable
block_modedataloader for eval script.
Improve the inference script.
-
Conversational AI
ASR
Add opset, autocast and fold constants for ONNX export.
Fix misses in ASR metrics.
Update WER API changes for
infer_onnx.
-
TTS
Fix logging for telemetry.
Fix vocoder multiGPU logging.
Fix multiGPU failures in TTS.
Fix CUDA error in train.
-
Known Issues and Limitations
Wandb integraton requires that containers be instantiated by the
rootuser.
The NLP Question Answering task doesn’t support egatron-based models for TAO workflows.
Key Features
Bring your own models into TAO using TAO BYOM converter.
Deploy TAO as a service on a Kubernetes cluster, detailed in this section
Integrate TAO into your workflow using RestAPIs
TensorBoard visualization is available for select models, as detailed in this section.
Train object detection networks from a pointcloud data file via PointPillars.
Train a classification network to classify poses from a pose skeleton via a Graph convolutional network.
Intermediate checkpointing is available for ASR and TTS models.
Support Conformer-CTC for ASR: train, finetune, evaluate, infer, and export.
Compute Stack
TF 1.15.4 Container
container name: nvcr.io/nvidia/tao/tao-toolkit-tf tag: v3.22.05-tf1.15.4-py3
|Software
|Version
|Python
|3.6
|CUDA
|11.4
|CuDNN
|8.2.1.32
|TensorRT
|8.2.5.1
TF 1.15.5 Container
container name: nvcr.io/nvidia/tao/tao-toolkit-tf tag: v3.22.05-tf1.15.5-py3
|Software
|Version
|Python
|3.6
|CUDA
|11.6
|CuDNN
|8.2.1.32
|TensorRT
|8.2.5.1
PyTorch Container
container name: nvcr.io/nvidia/tao/tao-toolkit-pyt tag: v3.22.05-py3
|Software
|Version
|Python
|3.8
|CUDA
|11.5
|CuDNN
|8.2.1.32
|TensorRT
|8.2.5.1
Language Model Container
container name: nvcr.io/nvidia/tao/tao-toolkit-lm tag: v3.22.05-py3
|Software
|Version
|Python
|3.8
|CUDA
|11.5
|CuDNN
|8.2.1.32
|TensorRT
|8.2.5.1
Model Updates
Computer Vision
Image Classification
Add verification for custom classmap file input.
Add classmap file input to train.
Add classmap file as optional input for evaluate.
Add status callback and
results_dircommand line argument for evaluate and inference.
Support TensorBoard visualization for
trainendpoint.
Perform initial updates for BYOM custom layer.
Add EFF package.
Add EFF package and model loading.
Enable BYOM in image classification.
-
DetectNet_v2
Limit GPU memory usage during
tao detectnet_v2 evaluate,
Add native support to convert COCO Dataset to TFRecords,
Bring sampling mode parameter out in the spec file under
dataset_config,
Enable tensorboard visualization,
Add configuration element for
visualizerin
dataset_config.
Fix success state for TFRecords generation.
Add status logging to all tasks as long as the
--results_dirargument is set via command line.
-
UNet
Update the
--gen_ds_configoption during UNet export.
Add the
dataset_convertendpoint to UNet.
Add support for converting COCO Dataset to TFRecords.
Support evaluation on a pruned model.
Add graph collect for functions to improve memory consumption.
Optimize ONNX for UNet inference.
Fix bugs for re-training a pruned model.
Add unified
status_loggingto UNet endpoints.
Support custom layer pruning and direct evaluate from
.tltbvia BYOM.
Enable Bring Your Own Model for UNet.
Implement support for Quantization Aware Training (QAT).
Add end-to-end support for ShuffleNet.
Enable status logging during training via
StatusCallBack.
Improve the operation of dataloader during training.
Enable TensorBoard visualization during training.
Add a warning for
output_width.
Enable support for training with early stopping.
-
BYOM
Enable custom layer pruning for Bring You Own Model (BYOM).
-
Common features
Fix error handling in
model_io.
Support COCO TFRecord conversion for object detection and segmentation networks.
Fix a typo in SoftStartAnnealingLearningRateScheduler.
Implement status-logging callback.
-
YOLOv4
Enable smoothing to object loss.
Support exponential moving average (EMA).
Fix the YOLOv4 neck and head structure.
Configure NMS per data-loader configuration.
Fix YOLOv3 and YOLOv4 shapes.
Enable manually setting class weighting.
Enable TensorBoard visualization.
-
MaskRCNN
Enable
skip_crowd_during_training=False.
Add an evaluation summary and patch exporter.
Enable TensorBoard visualization.
-
EfficientDet
Fix a typo in TRT inferencer.
-
SSD
Enable status logging for all endpoints when
--results_diris added to the command line
Enable support for training with early stopping.
-
DSSD
Enable status logging for all endpoints when
--results_diris added to the command line.
Enable support for training with early stopping.
-
RetinaNet
Enable support for training with early stopping.
Enable status logging for all endpoints when
--results_diris added to the command line.
Fix a bug with resume checkpoint via sequence dataloader.
Enable backward compatibility for a TLT 2.0 trained model.
Enable Tensorboard visualization during training.
Enable manually setting class weights.
-
FasterRCNN
Enable status logging for all endpoints when
--results_diris added to the command line.
Enable model as a CLI argument of evaluation and inference for TAO API.
Enable Tensorboard visualization during training
-
Conversational AI
Generic
Add status logging to TTS models similar to TAO CV models
Fix issue in QA model evaluation for Chinese SQuAD*style dataset
Fix bug of create_tokenizer on always using old corpus silently
Update backend to use NeMo 1.7.0
-
TTS
Remove duration check for TTS dataset from Riva Custom Voice Recorder
Fix infer onnx endpoint when running infer from finetuned model
Fix error handling for Vocoder
Enable intermediate .tlt model checkpoint
-
PointPillars
Enabled transfer learning with pretrained models
Use TensorRT oss 22.02 from GitHub
-
Action Recognition
Update metrics module
-
ASR
Support Early Stopping
Finetune on NeMo models
Enable intermediate .tlt model checkpoint
-
Pretrained models
New models
PointPillarNet
PoseClassificationNet
-
Updated models
PeopleNet
PeopleSemSegNet
PeopleSegNet
LPDNet
-
Known Issues/Limitations
TAO DSSD/FasterRCNN/RetinaNet/YOLOv3/YOLOv4 can have intermittent illegal memory access errors with export or converter CLI commands. The root cause is unknown. In this case, simply run it again to resolve this issue.
The TAO BYOM Semantic Segmentation workflow is only supported with UNet and Image Classification.
TAO Image Classification networks require driver 510 or greater for training.
TAO as a Service doesn’t support user authentication and per-user workspace management.
TTS Finetuning is only supported for data originating from the NVIDIA Custom Voice Recorder.
Key Features
Features included in this release
TAO Resources
Jupyter notebook example for showing the end-to-end workflow for the following model
-
TAO Conversational AI
Support for finetuning a FastPitch and HiFiGAN from a pretrained model
Update FastPitch and HiFiGAN export and infer endpoint to interface with RIVA
-
Known Issues/Limitations
TAO FastPitch finetuning is only supported on text transcripts that are defined in the NVIDIA Custom Voice Recorder.
The data from the NVIDIA Custom Voice Recorder can only be used for fine tuning a
FastPitchor
HiFiGANmodel.
For finetuning FastPitch, you are required to resample the new speaker data to the sampling rate of the dataset used to train the pretrained model.
Key Features
Features included in this release:
TAO Resources:
Jupyter notebook examples showing the end-to-end workflow for the following models
ActionRecognitionNet
EfficientDet
Text-To-Speech using FastPitch and HiFiGAN
-
-
TAO CV:
Pretrained models for several public architectures and reference applications serving computer vision related object classification, detection and segmentation use cases.
Support for YOLOv4-tiny and EfficienetDet object detection models.
Support for pruning EfficientDet models
New pretrained models released on NGC
Converter utility to generate device specific optimized TensorRT engines
Jetson JP4.6
x86 + dGPU - TensorRT 8.0.1.6 with CUDA 11.4
-
-
TAO Conversational AI:
Support for training FastPitch and HiFiGAN model from scratch
Adding new encoders for Natural Language Processing tasks
DistilBERT
BioMegatron-BERT
-
-
Known Issues/Limitations
TAO CV
Transfer Learning is not supported on pruned models across all applications.
When training with multiple GPUs, you might need to scale down the batch_size and/or scale up the learning rate to get the same accuracy seen in single GPU training.
When training DetectNet_v2 for object detection use-cases with more than 10 classes, you may need to either update the cost_weight parameter in the cost_function_config, or balance the number of samples per class in the dataset for better training.
When training a DetectNet_v2 network for datasets with less than 20,000 images, please use smaller batch-sizes (1, 2 or 4) to get better accuracy.
The infer subtask of DetectNet_v2 doesn’t output confidence and generates 0. as value. You may ignore these values and only consider the bbox and class labels as valid outputs.
ResNet101 pre-trained weights from NGC is not supported on YOLOv3, YOLOv4, YOLOv4-tiny, SSD, DSSD and RetinaNet.
When generating int8 engine with
tao-converter, please use
-sif there is TensorRT error message saying weights are outside of fp16 range.
Due to the complexity of larger EfficientDet models, the pruning process will take significantly longer to finish. For example, pruning the EfficientDet-D5 model may take at least 25 minutes on a V100 server.
When generating a TensorRT INT8 engine on A100 GPUs using the
tao-converterfor MaskRCNN, enable
--strict_data_type
Our EfficientDet codebase has source code taken from the automl github repo
-
TAO Conversational AI
When running convAI models on a cloud VM, users should have root access to the VM
Text-To-Speech pipelines only support training from scratch for a single speaker
Text-To-Speech training pipeline requires the audio files to be
.wavformat
TAO 3.0-21.11 exported .riva files will not be supported in RIVA < 21.09
BioMegatron-BERT and Megatron based NLP tasks doesn’t support resuming a previously completed model with more number of epochs than the previously completed experiment
When running the end to end sample of Text-to-Speech, you may have to use expand abbreviations
-
Resolved Issues
TAO CV
YOLOv4, YOLOv3, UNet and LPRNet exported
.etltmodel files can be integrated directly into DeepStream 6.0.
-
TAO Conversational AI
ASR model support generating intermediate
.tltmodel files during training
-
Deprecated Features
The TAO Computer Vision Inference Pipeline is deprecated. Users can now use DeepStream to deploy the following out-of-the-box models via reference applications provided here:
Release Contents
Components included in this release:
TAO Launcher pip package
TAO - TF docker
TAO - Pytorch Docker
TAO - Language Model Docker
Jupyter notebook with sample workflows
Conversational AI
-
Getting Started Guide containing usage and installation instructions
tao-converter for x86 + discrete GPU platforms
tao-converter for Jetson (ARM64) available here.
Pre-trained weights trained on Open Image dataset available on NGC
Unpruned and Pruned models for Purpose-built models - Pruned models can be deployed out-of-box with DeepStream and unpruned models can be used for re-training.
Trainable and out-of-box Deployable models for:
Key Features
Transfer Learning Toolkit has been renamed to TAO
TAO Launcher:
Python3 pip package as a unified Command Line Interface (CLI)
Support for docker hosted from different registries
-
TAO Resources:
Jupyter notebook examples showing the end-to-end workflow for the following models
N-Gram Language model
-
-
TAO CV:
Support for MaskRCNN Instance segmentation model
Support for pruning MaskRCNN models
Support for serializing a template DeepStream config and labels file
Support for training highly accurate purpose-built models:
-
BodyPose Estimation
-
Instructions for running TAO in the cloud with Azure
Converter utility to generate device specific optimized TensorRT engines
New backbones added to UNet training
Vanilla UNet Dynamic
Efficient UNet
-
-
TAO Conversational AI:
Added support for validating an exported model for compliance with RIVA
Training an N-Gram language model implemented in KenLM
-
Known Issues/Limitations
TAO CV
Transfer Learning is not supported on pruned models across all applications.
When training with multiple GPUs, you might need to scale down the batch_size and/or scale up the learning rate to get the same accuracy seen in single GPU training.
When training DetectNet_v2 for object detection use-cases with more than 10 classes, you may need to either update the cost_weight parameter in the cost_function_config, or balance the number of samples per class in the dataset for better training.
When training a DetectNet_v2 network for datasets with less than 20,000 images, please use smaller batch-sizes (1, 2 or 4) to get better accuracy.
The infer subtask of DetectNet_v2 doesn’t output confidence and generates 0. as value. You may ignore these values and only consider the bbox and class labels as valid outputs.
ResNet101 pre-trained weights from NGC is not supported on YOLOv3, YOLOv4, YOLOv4-tiny, SSD, DSSD and RetinaNet.
When generating int8 engine with
tao-converter, please use
-sif there is TensorRT error message saying weights are outside of fp16 range.
-
TAO Conversational AI
When running convAI models on a cloud VM, users should have root access to the VM.
TAO Conv AI models cannot generate intermediate model.tlt files.
-