NVIDIA TAO Toolkit v5.2.0
TAO Toolkit v5.2.0

Release Notes

NVIDIA TAO Toolkit is a Python package that gives you the ability to fine-tune pretrained models with your own data and export them for TensorRT based inference through an edge device.

NVIDIA Transfer Learning Toolkit has been renamed to TAO Toolkit: For a detailed migration guide, refer to this page.

Key Features

  • New computer vision solutions

    • End-to-end training pipeline for CenterPose model

    • ViT Adaptor implementation to integrate ViT backbone with DINO

    • Finetuning DINO Object detection models with ViT backbones and NvDINOv2 foundation models

    • Finetuning and inference support for Open Vocabulary Image Segmentation as a developer preview feature on GitHub

  • TAO Toolkit API

    • Nightly crawler to update the list TAO Toolkit compatible models on NGC dynamically

    • AutoML enabled hyperparameter search for list based parameters

    • Foundation model finetuning supported for classification_pyt

    • AutoML enabled for visual changenet

    • AutoML enabled for CenterPose

  • Miscellaneous

    • Progress bar to show docker pull status via the launcher

Pretrained Models

  • Purpose-built models

    • CenterPose

    • ODISE

Known Issues and Limitations

  • Visual Changenet and Foundation model finetuning is not supported via TAO Toolkit API

  • Foundation model finetuning requires GPUs with atleast 24GB VRAM.

  • DetectNet_v2 export via --onnx_route keras2onnx shows a 16x16 offset in visualized predictions.

  • FasterRCNN TensorRT engine generation raises false positive failure without actually causing any failures with engine generation or regressions in perf and accuracy.

    Copy
    Copied!
                

    [06/23/2023-13:19:40] [TRT] [F] Validation failed: libNamespace == nullptr /workspace/trt_oss_src/TensorRT/plugin/proposalPlugin/proposalPlugin.cpp:528 [06/23/2023-13:19:40] [TRT] [E] std::exception [06/23/2023-13:19:40] [TRT] [I] Successfully created plugin: ProposalDynamic [06/23/2023-13:19:40] [TRT] [F] Validation failed: libNamespace == nullptr

  • OCRNet-ViT requires TensorRT 8.6 above to reach the best accuracy. With TensorRT 8.5, OCRNet-ViT should be exported with opset-version < 17 and FP32 precision is recommended to use.

Breaking changes

  • From TAO Toolkit 5.2.0, the TensorFlow backends are supported as only source code releases for new features on GitHub. NVIDIA recommends building the container from source to get the latest features and bugfixes.

Compute Stack

PyTorch 1.14.0 Container

container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.2.0-pyt1.14.0

Software Version
Python 3.8
CUDA 12.0
CuDNN 8.6.0
TensorRT 8.5.3.1

PyTorch 2.1.0 Container

container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.2.0-pyt2.1.0

Software Version
Python 3.10
CUDA 12.2
CuDNN 8.9.5
TensorRT 8.6.1

Deploy Container

container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.2.0-deploy

Software Version
Python 3.10
CUDA 12.2
CuDNN 8.9.5
TensorRT 8.6.1

Data Services Container

container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.2.0-dataservice

Software Version
Python 3.10
CUDA 12.2
CuDNN 8.9.5
TensorRT 8.6.1

Key Features

  • New computer vision solutions

    • End-to-end training pipeline for Visual ChangetNet classification and segmentation

    • Fine tuning for the following foundation image model backbones for classification:

      • OpenCLIP

      • EvaCLIP

      Note

      Refer to the Foundation Models section for model details.

Pretrained Models

  • Purpose-built models

    • Visual Changenet Classification

    • Visual Changenet Segmentation - LEVIRCD (research only)

    • Visual Changenet Segmentation - LandSat-SCD

Known Issues and Limitations

  • Visual Changenet and Foundation model finetuning is not supported via TAO Toolkit API

  • Foundation model finetuning requires GPUs with atleast 24GB VRAM.

  • DetectNet_v2 export via --onnx_route keras2onnx shows a 16x16 offset in visualized predictions.

  • The DetectNet_v2 inferencer cannot set dbscan_min_samples > 1.

Breaking changes

  • The DetectNet_v2 inferencer configuration parameter dbscan_min_samples can only be set to an integer, as opposed to float32 from TAO 4.0.x.

Compute Stack

PyTorch Container

container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.1.0-pyt

Software Version
Python 3.8
CUDA 12.0
CuDNN 8.6.0
TensorRT 8.5.3.1

Deploy Container

container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.1.0-deploy

Software Version
Python 3.8
CUDA 12.0
CuDNN 8.6.0
TensorRT 8.5.3.1

Key Features

  • New computer vision solutions

    • Custom Siamese network training pipeline for Optical Inspection with TensorBoard visualization

    • End-to-end training pipeline for Metric Learning Recognition

    • Image Classification in TAO Toolkit PyTorch with FAN and GCViT backbones

    • New object detection architecture DINO with FAN, GCViT, and ResNet backbones

    • SegFormer training now supports FAN based backbones

    • Deformable DETR with GCViT backbones

    • Training pipeline for Mask Auto Labeller network

    • End-to-end TAO workflow pipeline for optical character detection and optical character recognition from a document

    • New tools to enhance your datasets:

      • Generate segmentation masks for user datasets using the Mask Auto Labeller

      • Multi-GPU offline dataset augmentation for object detection use cases

      • Tools to visualize, inspect, validate and correct annotations for object detection datasets

      • Format converter between COCO and KITTI Object detection datasets

  • Launcher CLI

    • New task_group hierarchy to help seggregate task actions:

      • model

      • dataset

      • deploy

  • Pipeline features

    • Export to deserialize ONNX models for direct integration with TensorRT (except MaskRCNN)

    • Decrypted checkpoint serialization across all networks

  • RESTful APIs and Cloud deployment

    • More networks added to the AutoML workflow

    • Quick start support extended to the following new K8 Cloud Service Providers (CSPs):

      • Google cloud GKS

      • Microsoft Azure AKS

  • Source code is now available for all TAO Toolkit components on GitHub. For more information, refer to the TAO Toolkit Source Code section.

Pre-Trained Models

  • Purpose-built models

    • PeopleSemSegFormer

    • PCB Classification

    • OCDNet

    • OCRNet

    • Retail Object Detection

    • Retail Object Recognition

    • Optical Inspection

  • Pre-trained starter weights

    • Classification

      • Pretrained GCViT NvImageNet

      • Pretrained FAN NvImageNet

      • Pretrained GCViT ImageNet

      • Pretrained FAN ImageNet

    • Object Detection

      • Pretrained DINO NVImageNet

      • Pretrained DINO ImageNet

      • Pretrained Deformable-DETR NVImageNet

      • Pretrained Deformable-DETR NVImageNet

      • Pretrained EfficientNet NVImageNet

      • EfficientDet COCO

      • Deformable-DETR COCO

      • DINO COCO

    • Segmentation

      • Pretrained SegFormer NVImageNet

      • Pretrained SegFormer ImageNet

      • Mask Auto Label

      • CityScapes Segformer

Deprecated Features

  • All TAO Toolkit Conversational AI integrations have been deprecated from TAO Toolkit version 5.0.0

  • The ability to use tao-converter to generate TensorRT engine from .etlt files has deprecated. All networks support direct integration with TensorRT and the trtexec sample. For more information, refer to the Profiling with TensorRT section.

  • The following computer vision training pipelines have been deprecated:

    • Gaze Estimation

    • Emotion Classification

    • Heart-rate Estimation

    • Gesture Recognition

Breaking changes

  • All PyTorch and TensorFlow 2 networks have a rearchitected specification file with a concept of experiment specification

  • Common parameters have been renamed across all networks for configuration uniformity

  • SegFormer models from TAO Toolkit version 4.0.0 cannot be loaded in version 5.0.0. For version 5.0.0, use the new pretrained models.

  • Models exported from TAO 5.0.0 will not work with tao-converter for TensorRT engine generation. You can use the trtexec command line wrapper from TensorRT directly to generate TensorRT engines.

  • All previous tao <network> <subtask> command hierarchies are now tao model <network> <subtash>. Therefore, sample notebooks released as part of TAO 4.0.x will not work directly with TAO 5.0.0. For more information about the new CLI structure, read the migration guide from TAO 4.0.x to TAO 5.0.0.

  • Offline augmentation tooling tao augment is not tao dataset augment under the dataset task_group.

Bug Fixes

  • Fixes for errors in .etlt inference for DetectNet_v2

  • Fixes to improve stability of MultiGPU jobs for TensorFlow 1.x networks

Known Issues and Limitations

  • Training on multi-GPU is currently limited to single-node instances via TAO Toolkit API

  • FAN-based networks exported from TAO as .onnx files require TensorRT versions >= 8.6.x for deployment.

  • tao deploy for optical inspection model doesn’t support dynamic batching.

  • BodyPoseNet and FPENet are not integrated with tao deploy for TAO Toolkit version 5.0.0.

  • DetectNet-v2 export to .onnx for a QAT INT8 model is only supported via the tf2onnx backend.

  • Multi-Node execution is only supported via the container execution model as explained in the Working with the Containers section.

  • MIG training is currently only supported for single GPUs. For more information, refer to the Running training on Multi-GPU instance section.

  • All DNN containers require NVIDIA CUDA Driver version 525.85 and above to run.

  • Re-identification trainer doesn’t support multi-GPU training in 5.0.0

Compute Stack

TF 1.15.5 Container

container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.0.0-tf1.15.5

Software Version
Python 3.8
CUDA 12.0
CuDNN 8.6.0
TensorRT 8.5.3.1

TF 2.11.0 Container

container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.0.0-tf2.11.0

Software Version
Python 3.8
CUDA 12.0
CuDNN 8.6.0
TensorRT 8.5.3.1

PyTorch Container

container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.0.0-pyt

Software Version
Python 3.8
CUDA 12.0
CuDNN 8.6.0
TensorRT 8.5.3.1

Deploy Container

container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.0.0-deploy

Software Version
Python 3.8
CUDA 12.0
CuDNN 8.6.0
TensorRT 8.5.3.1

Data Services Container

container name: nvcr.io/nvidia/tao/tao-toolkit tag: 5.0.0-dataservice

Software Version
Python 3.8
CUDA 12.0
CuDNN 8.6.0
TensorRT 8.5.3.1

Incremental changes over 4.0.1.

Bug Fixes

  • TAO Toolkit API

    • TAO API AutoML hanging

    • TAO API support for HTTPS Proxy and Custom SSL CA Certificate

    • TAO API inaccessible service on wireless interfaces

    • TAO API MLOPs visualization for

      • MaskRCNN

      • UNet

Key Features

  • Enable third party MLOPs providers - ClearML and Weights and Biases for the following networks

    • MaskRCNN

    • UNet

Compute Stack

TF 1.15.5 Container

container name: nvcr.io/nvidia/tao/tao-toolkit tag: 4.0.1-tf1.15.5

Software Version
Python 3.6
CUDA 11.8
CuDNN 8.6.0
TensorRT 8.5.1.7

Bug Fixes

  • YOLOv4 visualizer fails when running multiGPU training

  • Fix model cancel and resume function names in tao-client

  • TAO Toolkit API

    • Replace FLIR Google Drive links with public links

    • Bare metal Quick Start Script

      • Fix GPU Operator deployment issues when host drivers are installed

      • Disable ingress-nginx controller admissionWebhooks as they fail on some systems

      • Add support for MIG-based nodes

      • Add support for overriding GPU Operator and driver versions

Known Issues/Limitations

  • MLOPs visualization for MaskRCNN and UNet are not available via the RestAPIs

Key Features

  • AutoML suite via TAO Toolkit API

  • Integration with Third party MLOPS providers - ClearML and Weights and Biases

  • Support for Transformer based Deep Neural Network training and export

    • Segformer - semantic segmentation

    • Deformable DETR - object detection

  • Support for reidentification network

  • Seggregation of DNN commands into training and deploy containers

  • Pruning and finetuning of NGram language models

  • Add support for AWS EKS and Azure AKS

  • Quick start scripts for easy deployment of TAO Toolkit via launcher and API’s

    • Launcher

    • APIs

      • Bare Metal

      • AWS EKS

      • Azure AKS

Compute Stack

TF 1.15.5 Container

container name: nvcr.io/nvidia/tao/tao-toolkit tag: 4.0.0-tf1.15.5

Software Version
Python 3.6
CUDA 11.8
CuDNN 8.6.0
TensorRT 8.5.1.7

TF 2.9.1 Container

container name: nvcr.io/nvidia/tao/tao-toolkit tag: 4.0.0-tf2.9.1

Software Version
Python 3.8
CUDA 11.8
CuDNN 8.6.0
TensorRT 8.5.0.12

PyTorch Container

container name: nvcr.io/nvidia/tao/tao-toolkit tag: 4.0.0-pyt

Software Version
Python 3.8
CUDA 11.8
CuDNN 8..6.0
TensorRT 8.5.0.12

Deploy Container

container name: nvcr.io/nvidia/tao/tao-toolkit tag: 4.0.0-deploy

Software Version
Python 3.8
CUDA 11.8
CuDNN 8.6.0
TensorRT 8.5.1.7

Model Updates

Computer vision

  • Common

    • Upgrade TensorRT version to 8.5.1.7.

    • Integrate clearml and wandb into train tasks.

    • Pass target_opset to exporter for ONNX models.

    • Fix status.json for all networks required by TAO Toolkit API.

    • Store calib_json and suppress TensorRT-related arguments.

  • Classification

    • Perform recursive walkthrough of image_dir.

    • Add valid input checks and corresponding logs.

  • FasterRCNN

    • Fix bug in pruning for VGG16.

  • UNet

    • Resolve BYOM Bug by adding param for removing FC head.

    • Add target opset to export model.

    • Fix resume training and save checkpoint.

    • Add calib_json option and remove tensorrt options from export.

    • Fix modifying the number of classes while finetuning.

    • Fix retraining for QAT models.

  • DetectNet_v2

    • Fix bug in early stopping validation.

    • Add config file for DNv2 in wandb and clearml.

    • Add thresholding to evaluate.

    • Add early stopping to DetectNetv2.

  • Multitask Classification

    • Fix multitask classification export with deepstream config.

  • YOLOv3

    • Enable Tensorboard visualization.

  • MaskRCNN

    • Enable adaptive export for mrcnn_resolution.

  • SSD

    • Fix resuming issue with DALI dataloader.

    • Reduce the call to create_quantized_keras_model when enabling QAT.

    • Fix dataset converter regression.

  • YOLOv4

    • Add automatic class weighting.

    • Support 16bit images.

  • Deformable-DETR

    • Initial commit for Deformable-DETR support,

  • Segformer

    • Initial commit for Segformer support

  • Core

    • Add logic for telemetry data upload.

  • ARNet

    • Enable block_mode dataloader for eval script.

    • Improve the inference script.

Conversational AI

  • ASR

    • Add opset, autocast and fold constants for ONNX export.

    • Fix misses in ASR metrics.

    • Update WER API changes for infer_onnx.

  • TTS

    • Fix logging for telemetry.

    • Fix vocoder multiGPU logging.

    • Fix multiGPU failures in TTS.

    • Fix CUDA error in train.

Known Issues and Limitations

  • Wandb integraton requires that containers be instantiated by the root user.

  • The NLP Question Answering task doesn’t support egatron-based models for TAO workflows.

Key Features

  • Bring your own models into TAO Toolkit using TAO BYOM converter.

  • Deploy TAO as a service on a Kubernetes cluster, detailed in this section

  • Integrate TAO into your workflow using RestAPIs

  • TensorBoard visualization is available for select models, as detailed in this section.

  • Train object detection networks from a pointcloud data file via PointPillars.

  • Train a classification network to classify poses from a pose skeleton via a Graph convolutional network.

  • Intermediate checkpointing is available for ASR and TTS models.

  • Support Conformer-CTC for ASR: train, finetune, evaluate, infer, and export.

Compute Stack

TF 1.15.4 Container

container name: nvcr.io/nvidia/tao/tao-toolkit-tf tag: v3.22.05-tf1.15.4-py3

Software Version
Python 3.6
CUDA 11.4
CuDNN 8.2.1.32
TensorRT 8.2.5.1

TF 1.15.5 Container

container name: nvcr.io/nvidia/tao/tao-toolkit-tf tag: v3.22.05-tf1.15.5-py3

Software Version
Python 3.6
CUDA 11.6
CuDNN 8.2.1.32
TensorRT 8.2.5.1

PyTorch Container

container name: nvcr.io/nvidia/tao/tao-toolkit-pyt tag: v3.22.05-py3

Software Version
Python 3.8
CUDA 11.5
CuDNN 8.2.1.32
TensorRT 8.2.5.1

Language Model Container

container name: nvcr.io/nvidia/tao/tao-toolkit-lm tag: v3.22.05-py3

Software Version
Python 3.8
CUDA 11.5
CuDNN 8.2.1.32
TensorRT 8.2.5.1

Model Updates

Computer Vision

  • Image Classification

    • Add verification for custom classmap file input.

    • Add classmap file input to train.

    • Add classmap file as optional input for evaluate.

    • Add status callback and results_dir command line argument for evaluate and inference.

    • Support TensorBoard visualization for train endpoint.

    • Perform initial updates for BYOM custom layer.

    • Add EFF package.

    • Add EFF package and model loading.

    • Enable BYOM in image classification.

  • DetectNet_v2

    • Limit GPU memory usage during tao detectnet_v2 evaluate,

    • Add native support to convert COCO Dataset to TFRecords,

    • Bring sampling mode parameter out in the spec file under dataset_config,

    • Enable tensorboard visualization,

    • Add configuration element for visualizer in dataset_config.

    • Fix success state for TFRecords generation.

    • Add status logging to all tasks as long as the --results_dir argument is set via command line.

  • UNet

    • Update the --gen_ds_config option during UNet export.

    • Add the dataset_convert endpoint to UNet.

    • Add support for converting COCO Dataset to TFRecords.

    • Support evaluation on a pruned model.

    • Add graph collect for functions to improve memory consumption.

    • Optimize ONNX for UNet inference.

    • Fix bugs for re-training a pruned model.

    • Add unified status_logging to UNet endpoints.

    • Support custom layer pruning and direct evaluate from .tltb via BYOM.

    • Enable Bring Your Own Model for UNet.

    • Implement support for Quantization Aware Training (QAT).

    • Add end-to-end support for ShuffleNet.

    • Enable status logging during training via StatusCallBack.

    • Improve the operation of dataloader during training.

    • Enable TensorBoard visualization during training.

    • Add a warning for output_width.

    • Enable support for training with early stopping.

  • BYOM

    • Enable custom layer pruning for Bring You Own Model (BYOM).

  • Common features

    • Fix error handling in model_io.

    • Support COCO TFRecord conversion for object detection and segmentation networks.

    • Fix a typo in SoftStartAnnealingLearningRateScheduler.

    • Implement status-logging callback.

  • YOLOv4

    • Enable smoothing to object loss.

    • Support exponential moving average (EMA).

    • Fix the YOLOv4 neck and head structure.

    • Configure NMS per data-loader configuration.

    • Fix YOLOv3 and YOLOv4 shapes.

    • Enable manually setting class weighting.

    • Enable TensorBoard visualization.

  • MaskRCNN

    • Enable skip_crowd_during_training=False.

    • Add an evaluation summary and patch exporter.

    • Enable TensorBoard visualization.

  • EfficientDet

    • Fix a typo in TRT inferencer.

  • SSD

    • Enable status logging for all endpoints when --results_dir is added to the command line

    • Enable support for training with early stopping.

  • DSSD

    • Enable status logging for all endpoints when --results_dir is added to the command line.

    • Enable support for training with early stopping.

  • RetinaNet

    • Enable support for training with early stopping.

    • Enable status logging for all endpoints when --results_dir is added to the command line.

    • Fix a bug with resume checkpoint via sequence dataloader.

    • Enable backward compatibility for a TLT 2.0 trained model.

    • Enable Tensorboard visualization during training.

    • Enable manually setting class weights.

  • FasterRCNN

    • Enable status logging for all endpoints when --results_dir is added to the command line.

    • Enable model as a CLI argument of evaluation and inference for TAO API.

    • Enable Tensorboard visualization during training

Conversational AI

  • Generic

    • Add status logging to TTS models similar to TAO Toolkit CV models

    • Fix issue in QA model evaluation for Chinese SQuAD*style dataset

    • Fix bug of create_tokenizer on always using old corpus silently

    • Update backend to use NeMo 1.7.0

  • TTS

    • Remove duration check for TTS dataset from Riva Custom Voice Recorder

    • Fix infer onnx endpoint when running infer from finetuned model

    • Fix error handling for Vocoder

    • Enable intermediate .tlt model checkpoint

  • PointPillars

    • Enabled transfer learning with pretrained models

    • Use TensorRT oss 22.02 from GitHub

  • Action Recognition

    • Update metrics module

  • ASR

    • Support Early Stopping

    • Finetune on NeMo models

    • Enable intermediate .tlt model checkpoint

Pretrained models

  • New models

    • PointPillarNet

    • PoseClassificationNet

  • Updated models

    • PeopleNet

    • PeopleSemSegNet

    • PeopleSegNet

    • LPDNet

Known Issues/Limitations

  • TAO DSSD/FasterRCNN/RetinaNet/YOLOv3/YOLOv4 can have intermittent illegal memory access errors with export or converter CLI commands. The root cause is unknown. In this case, simply run it again to resolve this issue.

  • The TAO BYOM Semantic Segmentation workflow is only supported with UNet and Image Classification.

  • TAO Image Classification networks require driver 510 or greater for training.

  • TAO Toolkit as a Service doesn’t support user authentication and per-user workspace management.

  • TTS Finetuning is only supported for data originating from the NVIDIA Custom Voice Recorder.

Key Features

Features included in this release

  • TAO Resources

    • Jupyter notebook example for showing the end-to-end workflow for the following model

  • TAO Conversational AI

    • Support for finetuning a FastPitch and HiFiGAN from a pretrained model

    • Update FastPitch and HiFiGAN export and infer endpoint to interface with RIVA

Known Issues/Limitations

  • TAO FastPitch finetuning is only supported on text transcripts that are defined in the NVIDIA Custom Voice Recorder.

  • The data from the NVIDIA Custom Voice Recorder can only be used for fine tuning a FastPitch or HiFiGAN model.

  • For finetuning FastPitch, you are required to resample the new speaker data to the sampling rate of the dataset used to train the pretrained model.

Key Features

Features included in this release:

  • TAO Resources:

    • Jupyter notebook examples showing the end-to-end workflow for the following models

      • ActionRecognitionNet

      • EfficientDet

      • Text-To-Speech using FastPitch and HiFiGAN

  • TAO CV:

    • Pretrained models for several public architectures and reference applications serving computer vision related object classification, detection and segmentation use cases.

    • Support for YOLOv4-tiny and EfficienetDet object detection models.

    • Support for pruning EfficientDet models

    • New pretrained models released on NGC

    • Converter utility to generate device specific optimized TensorRT engines

      • Jetson JP4.6

      • x86 + dGPU - TensorRT 8.0.1.6 with CUDA 11.4

  • TAO Conversational AI:

    • Support for training FastPitch and HiFiGAN model from scratch

    • Adding new encoders for Natural Language Processing tasks

      • DistilBERT

      • BioMegatron-BERT

Known Issues/Limitations

  • TAO CV

    • Transfer Learning is not supported on pruned models across all applications.

    • When training with multiple GPUs, you might need to scale down the batch_size and/or scale up the learning rate to get the same accuracy seen in single GPU training.

    • When training DetectNet_v2 for object detection use-cases with more than 10 classes, you may need to either update the cost_weight parameter in the cost_function_config, or balance the number of samples per class in the dataset for better training.

    • When training a DetectNet_v2 network for datasets with less than 20,000 images, please use smaller batch-sizes (1, 2 or 4) to get better accuracy.

    • The infer subtask of DetectNet_v2 doesn’t output confidence and generates 0. as value. You may ignore these values and only consider the bbox and class labels as valid outputs.

    • ResNet101 pre-trained weights from NGC is not supported on YOLOv3, YOLOv4, YOLOv4-tiny, SSD, DSSD and RetinaNet.

    • When generating int8 engine with tao-converter, please use -s if there is TensorRT error message saying weights are outside of fp16 range.

    • Due to the complexity of larger EfficientDet models, the pruning process will take significantly longer to finish. For example, pruning the EfficientDet-D5 model may take at least 25 minutes on a V100 server.

    • When generating a TensorRT INT8 engine on A100 GPUs using the tao-converter for MaskRCNN, enable --strict_data_type

    • Our EfficientDet codebase has source code taken from the automl github repo

  • TAO Conversational AI

    • When running convAI models on a cloud VM, users should have root access to the VM

    • Text-To-Speech pipelines only support training from scratch for a single speaker

    • Text-To-Speech training pipeline requires the audio files to be .wav format

    • TAO Toolkit 3.0-21.11 exported .riva files will not be supported in RIVA < 21.09

    • BioMegatron-BERT and Megatron based NLP tasks doesn’t support resuming a previously completed model with more number of epochs than the previously completed experiment

    • When running the end to end sample of Text-to-Speech, you may have to use expand abbreviations

Resolved Issues

  • TAO CV

    • YOLOv4, YOLOv3, UNet and LPRNet exported .etlt model files can be integrated directly into DeepStream 6.0.

  • TAO Conversational AI

    • ASR model support generating intermediate .tlt model files during training

Deprecated Features

Release Contents

Components included in this release:

  • TAO Launcher pip package

  • TAO - TF docker

  • TAO - Pytorch Docker

  • TAO - Language Model Docker

  • Jupyter notebook with sample workflows

Key Features

Transfer Learning Toolkit has been renamed to TAO Toolkit

  • TAO Toolkit Launcher:

    • Python3 pip package as a unified Command Line Interface (CLI)

    • Support for docker hosted from different registries

  • TAO Resources:

    • Jupyter notebook examples showing the end-to-end workflow for the following models

      • N-Gram Language model

  • TAO CV:

    • Support for MaskRCNN Instance segmentation model

    • Support for pruning MaskRCNN models

    • Support for serializing a template DeepStream config and labels file

    • Support for training highly accurate purpose-built models:

      • BodyPose Estimation

    • Instructions for running TAO in the cloud with Azure

    • Converter utility to generate device specific optimized TensorRT engines

    • New backbones added to UNet training

      • Vanilla UNet Dynamic

      • Efficient UNet

  • TAO Conversational AI:

    • Added support for validating an exported model for compliance with RIVA

    • Training an N-Gram language model implemented in KenLM

Known Issues/Limitations

  • TAO CV

    • Transfer Learning is not supported on pruned models across all applications.

    • When training with multiple GPUs, you might need to scale down the batch_size and/or scale up the learning rate to get the same accuracy seen in single GPU training.

    • When training DetectNet_v2 for object detection use-cases with more than 10 classes, you may need to either update the cost_weight parameter in the cost_function_config, or balance the number of samples per class in the dataset for better training.

    • When training a DetectNet_v2 network for datasets with less than 20,000 images, please use smaller batch-sizes (1, 2 or 4) to get better accuracy.

    • The infer subtask of DetectNet_v2 doesn’t output confidence and generates 0. as value. You may ignore these values and only consider the bbox and class labels as valid outputs.

    • ResNet101 pre-trained weights from NGC is not supported on YOLOv3, YOLOv4, YOLOv4-tiny, SSD, DSSD and RetinaNet.

    • When generating int8 engine with tao-converter, please use -s if there is TensorRT error message saying weights are outside of fp16 range.

  • TAO Conversational AI

    • When running convAI models on a cloud VM, users should have root access to the VM.

    • TAO Conv AI models cannot generate intermediate model.tlt files.

Previous TAO Toolkit Source Code
Next Frequently Asked Questions
© Copyright 2024, NVIDIA. Last updated on Mar 18, 2024.