Release Notes

NVIDIA TAO Toolkit is a Python package to enable NVIDIA customers the ability to fine-tune pretrained models with customer’s own data and export them for TensorRT based inference through an edge device.

NVIDIA Transfer Learning Toolkit has been renamed to TAO Toolkit. For detailed migration guide go to this section.

Key Features

Features included in this release

  • TAO Resources

    • Jupyter notebook example for showing the end-to-end workflow for the following model

  • TAO Conversational AI

    • Support for finetuning a FastPitch and HiFiGAN from a pretrained model

    • Update FastPitch and HiFiGAN export and infer endpoint to interface with RIVA

Known Issues/Limitations

  • TAO FastPitch finetuning is only supported on text transcripts that are defined in the NVIDIA Custom Voice Toolkit.

  • The data from the NVIDIA Custom Voice Toolkit can only be used for finetuning a FastPitch or HiFiGAN model.

  • For finetuning FastPitch, you are required to resample the new speaker data to the sampling rate of the dataset used to train the pretrained model.

Key Features

Features included in this release:

  • TAO Resources:

    • Jupyter notebook examples showing the end-to-end workflow for the following models

      • ActionRecognitionNet

      • EfficientDet

      • Text-To-Speech using FastPitch and HiFiGAN

  • TAO CV:

    • Pretrained models for several public architectures and reference applications serving computer vision related object classification, detection and segmentation use cases.

    • Support for YOLOv4-tiny and EfficienetDet object detection models.

    • Support for pruning EfficientDet models

    • New pretrained models released on NGC

    • Converter utility to generate device specific optimized TensorRT engines

      • Jetson JP4.6

      • x86 + dGPU - TensorRT 8.0.1.6 with CUDA 11.4

  • TAO Conversational AI:

    • Support for training FastPitch and HiFiGAN model from scratch

    • Adding new encoders for Natural Language Processing tasks

      • DistilBERT

      • BioMegatron-BERT

Known Issues/Limitations

  • TAO CV

    • Transfer Learning is not supported on pruned models across all applications.

    • When training with multiple GPUs, you might need to scale down the batch_size and/or scale up the learning rate to get the same accuracy seen in single GPU training.

    • When training DetectNet_v2 for object detection use-cases with more than 10 classes, you may need to either update the cost_weight parameter in the cost_function_config, or balance the number of samples per class in the dataset for better training.

    • When training a DetectNet_v2 network for datasets with less than 20,000 images, please use smaller batch-sizes (1, 2 or 4) to get better accuracy.

    • The infer subtask of DetectNet_v2 doesn’t output confidence and generates 0. as value. You may ignore these values and only consider the bbox and class labels as valid outputs.

    • ResNet101 pre-trained weights from NGC is not supported on YOLOv3, YOLOv4, YOLOv4-tiny, SSD, DSSD and RetinaNet.

    • When generating int8 engine with tao-converter, please use -s if there is TensorRT error message saying weights are outside of fp16 range.

    • Due to the complexity of larger EfficientDet models, the pruning process will take significantly longer to finish. For example, pruning the EfficientDet-D5 model may take at least 25 minutes on a V100 server.

    • When generating a TensorRT INT8 engine on A100 GPUs using the tao-converter for MaskRCNN, enable --strict_data_type

    • Our EfficientDet codebase has source code taken from the automl github repo

  • TAO Conversational AI

    • When running convAI models on a cloud VM, users should have root access to the VM

    • Text-To-Speech pipelines only support training from scratch for a single speaker

    • Text-To-Speech training pipeline requires the audio files to be .wav format

    • TAO Toolkit 3.0-21.11 exported .riva files will not be supported in RIVA < 21.09

    • BioMegatron-BERT and Megatron based NLP tasks doesn’t support resuming a previously completed model with more number of epochs than the previously completed experiment

    • When running the end to end sample of Text-to-Speech, you may have to use expand abbreviations

Resolved Issues

  • TAO CV

    • YOLOv4, YOLOv3, UNet and LPRNet exported .etlt model files can be integrated directly into DeepStream 6.0.

  • TAO Conversational AI

    • ASR model support generating intermediate .tlt model files during training

Deprecated Features

Release Contents

Components included in this release:

  • TAO Launcher pip package

  • TAO - TF docker

  • TAO - Pytorch Docker

  • TAO - Language Model Docker

  • Jupyter notebook with sample workflows

Key Features

Transfer Learning Toolkit has been renamed to TAO Toolkit

  • TAO Toolkit Launcher:

    • Python3 pip package as a unified Command Line Interface (CLI)

    • Support for docker hosted from different registries

  • TAO Resources:

    • Jupyter notebook examples showing the end-to-end workflow for the following models

      • N-Gram Language model

  • TAO CV:

    • Support for MaskRCNN Instance segmentation model

    • Support for pruning MaskRCNN models

    • Support for serializing a template DeepStream config and labels file

    • Support for training highly accurate purpose-built models:

      • BodyPose Estimation

    • Instructions for running TAO in the cloud with Azure

    • Converter utility to generate device specific optimized TensorRT engines

    • New backbones added to UNet training

      • Vanilla UNet Dynamic

      • Efficient UNet

  • TAO Conversational AI:

    • Added support for validating an exported model for compliance with RIVA

    • Training an N-Gram language model implemented in KenLM

Known Issues/Limitations

  • TAO CV

    • Transfer Learning is not supported on pruned models across all applications.

    • When training with multiple GPUs, you might need to scale down the batch_size and/or scale up the learning rate to get the same accuracy seen in single GPU training.

    • When training DetectNet_v2 for object detection use-cases with more than 10 classes, you may need to either update the cost_weight parameter in the cost_function_config, or balance the number of samples per class in the dataset for better training.

    • When training a DetectNet_v2 network for datasets with less than 20,000 images, please use smaller batch-sizes (1, 2 or 4) to get better accuracy.

    • The infer subtask of DetectNet_v2 doesn’t output confidence and generates 0. as value. You may ignore these values and only consider the bbox and class labels as valid outputs.

    • ResNet101 pre-trained weights from NGC is not supported on YOLOv3, YOLOv4, YOLOv4-tiny, SSD, DSSD and RetinaNet.

    • When generating int8 engine with tao-converter, please use -s if there is TensorRT error message saying weights are outside of fp16 range.

  • TAO Conversational AI

    • When running convAI models on a cloud VM, users should have root access to the VM.

    • TAO Conv AI models cannot generate intermediate model.tlt files.

© Copyright 2022, NVIDIA. Last updated on Jun 6, 2022.