Release Notes
=============


Transfer Learning Toolkit V3.0
------------------------------

NVIDIA Transfer Learning Toolkit (TLT) is a Python package to enable NVIDIA customers the ability
to fine-tune pretrained models with customer’s own data and export them for TensorRT based
inference through an edge device.

Key Features
^^^^^^^^^^^^

Features included in this release:

* TLT Launcher:
  
  * Python3 pip package as a unified Command Line Interface (CLI)
  * TLT Launcher now supports interactive sessions for debugging
  * Support for docker hosted from different registries
  * Support for overridable configs via environment variables

* TLT Resources:

  * Jupyter notebook examples showing how to use the pretrained models effectively.
  
* TLT CV:
  
  * Pretrained models for several public architectures and reference applications serving computer vision related 
    object classification, detection and segmentation use cases.
  * Support for YOLOv3, YOLOv4, FasterRCNN, SSD, RetinaNet and DSSD object detection models.
  * Support for MaskRCNN Instance segmentation model
  * Support for pruning MaskRCNN models
  * Support for UNet Semantic segmentation model
  * Support for pruning a UNet Semantic Segmentation model
  * Support for serializing a template DeepStream config and labels file
  * Support for training highly accurate purpose-built models:

      * PeopleNet
      * PeopleSegNet
      * PeopleSemSegNet
      * TrafficCamNet
      * DashCamNet
      * FaceDetectIR
      * VehicleTypeNet
      * VehicleMakeNet
      * Facial Landmarks Estimation
      * FaceDetect
      * Gaze Estimation
      * Gesture Recognition
      * Emotion Classification
      * HeartRate Estimation

  * Instructions for running TLT in the cloud with AWS or GCP
  * Quantization aware training for accurate INT8 models
  * Support for Automatic Mixed Precision (AMP) training
  * Offline augmentation tool for object detection datasets
  * Model adaptation and retraining that is easy to use in heterogeneous multiple GPU environments.
  * Reference Application for deploying the following models to NVIDIA Triton:

      * DetectNet_v2
      * Classification

  * Converter utility to generate device specific optimized TensorRT engines.
  * TLT uses the CUDA multi-process service which helps in optimizing GPU utilization during multiple GPU training.

* TLT Conversational AI:
  
  * Pretrained models for several public architectures and reference applications serving conversational AI related 
    speech to text and natural language processing use cases, namely

    * Speech to Text
    * Speech to Text with Citrinet
    * Text classification
    * Token classification
    * Punctuation and capitalization
    * Intent and slot classification
    * Question Answering


Contents
^^^^^^^^

Components included in this release:

* TLT Launcher pip package
* TLT - TF docker
* TLT - Pytorch Docker
* Jupyter notebook with sample workflows

  - :ref:`Conversational AI<integrating_tlt_trained_models_to_jarvis>`
  - `Computer Vision`_

.. _Computer Vision: https://ngc.nvidia.com/resources/nvidia:tlt_cv_samples

* Getting Started Guide containing usage and installation instructions
* tlt-converter for x86 + discrete GPU platforms
* tlt-converter for Jetson (ARM64) available here.
* Pre-trained weights trained on Open Image dataset available on NGC

  - `Classification`_
  - `Object Detection`_
  - `Object Detection - DetectNet_v2`_
  - `Instance Segmentation`_
  - `Semantic Segmentation`_

* Unpruned and Pruned models for Purpose-built models - Pruned models can be deployed out-of-box with
  DeepStream and unpruned models can be used for re-training.

  - `PeopleNet`_
  - `TrafficCamNet`_
  - `DashCamNet`_
  - `FaceDetectIR`_
  - `VehicleTypeNet`_
  - `VehicleMakeNet`_
  - `LPDNet`_

* Trainable and out-of-box Deployable models for:

  - `PeopleSegNet`_
  - `HeartRateNet`_
  - `GestureNet`_
  - `EmotionNet`_
  - `FpeNet`_
  - `FaceDetect`_
  - `GazeNet`_
  - `LPRNet`_

.. _Classification: https://ngc.nvidia.com/catalog/models/nvidia:tlt_pretrained_classification
.. _Object Detection: https://ngc.nvidia.com/catalog/models/nvidia:tlt_pretrained_object_detection
.. _Object Detection - DetectNet_v2: https://ngc.nvidia.com/catalog/models/nvidia:tlt_pretrained_detectnet_v2
.. _Instance Segmentation: https://ngc.nvidia.com/catalog/models/nvidia:tlt_instance_segmentation
.. _Semantic Segmentation: https://ngc.nvidia.com/catalog/models/nvidia:tlt_semantic_segmentation

.. _PeopleNet: https://ngc.nvidia.com/catalog/models/nvidia:tlt_peoplenet
.. _PeopleSegNet: https://ngc.nvidia.com/catalog/models/nvidia:tlt_peoplesegnet
.. _TrafficCamNet: https://ngc.nvidia.com/catalog/models/nvidia:tlt_trafficcamnet
.. _DashCamNet: https://ngc.nvidia.com/catalog/models/nvidia:tlt_dashcamnet
.. _FaceDetectIR: https://ngc.nvidia.com/catalog/models/nvidia:tlt_facedetectir
.. _VehicleTypeNet: https://ngc.nvidia.com/catalog/models/nvidia:tlt_vehicletypenet
.. _VehicleMakeNet: https://ngc.nvidia.com/catalog/models/nvidia:tlt_vehiclemakenet
.. _HeartRateNet: https://ngc.nvidia.com/catalog/models/nvidia:tlt_heartratenet
.. _GestureNet: https://ngc.nvidia.com/catalog/models/nvidia:tlt_gesturenet
.. _EmotionNet: https://ngc.nvidia.com/catalog/models/nvidia:tlt_emotionnet
.. _FpeNet: https://ngc.nvidia.com/catalog/models/nvidia:tlt_fpenet
.. _FaceDetect: https://ngc.nvidia.com/catalog/models/nvidia:tlt_facenet
.. _GazeNet: https://ngc.nvidia.com/catalog/models/nvidia:tlt_gazenet
.. _LPDNet: https://ngc.nvidia.com/catalog/models/nvidia:tlt_lpdnet
.. _LPRNet: https://ngc.nvidia.com/catalog/models/nvidia:tlt_lprnet


Software Requirements
^^^^^^^^^^^^^^^^^^^^^

* Ubuntu 18.04 LTS
* Docker API > 1.40
* Docker-ce > 19.03
* `NVIDIA GPU Cloud`_
* `nvidia-docker2`_
* Python > 3.6.9
* Jupyter Notebook
* NVIDIA GPU driver v455.xx or above

.. _NVIDIA GPU Cloud: https://ngc.nvidia.com/
.. _nvidia-docker2: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html

.. Note:: `DeepStream 5.1`_ for inference and deployment  is recommended.

.. _DeepStream 5.1: https://developer.nvidia.com/deepstream-sdk

Hardware Requirements
^^^^^^^^^^^^^^^^^^^^^

The following system configuration is recommended to achieve reasonable training performance with the TLT
and supported models provided:

* 32 GB system RAM
* 32 GB of GPU RAM
* 8 core CPU
* 1 NVIDIA GPU
* 100 GB of SSD space

TLT is supported on A100, V100 and RTX 30x0 GPUs.

Known Issues
^^^^^^^^^^^^

* TLT CV

  * The SSD, DSSD, YOLOv3, YOLOv4 RetinaNet, FasterRCNN and MaskRCNN integration with DeepStream
    requires custom plugins from the TensorRT Open Source Software (OSS) library. DeepStream 5.1
    doesn't natively support custom plugins from TensorRT OSS. Instructions to build TensorRT OSS
    and custom parsing code to run with Deepstream have been provided `here`_.
  * Transfer Learning is not supported on pruned models across all applications.
  * When training with multiple GPUs, you might need to scale down the batch_size and/or scale up the
    learning rate to get the same accuracy seen in single GPU training.
  * When training DetectNet_v2 for object detection use-cases with more than 10 classes, you may
    need to either update the cost_weight parameter in the cost_function_config, or balance the
    number of samples per class in the dataset for better training.
  * When training a DetectNet_v2 network for datasets with less than 20,000 images, please use
    smaller batch-sizes (1, 2 or 4) to get better accuracy.
  * The infer subtask of DetectNet_v2 doesn't output confidence and generates 0. as value. You may ignore
    these values and only consider the bbox and class labels as valid outputs.
  * When using MaskRCNN, please make sure GPU 0 is free.
  * ResNet101 pre-trained weights from NGC is not supported on YOLOv3, YOLOv4, SSD, DSSD and RetinaNet.
  * UNet is currently not supported for inference in DS 5.0. DeepStream 5.1 supports it.
  * New backbones added to UNet training

    * Vanilla UNet Dynamic
    * Efficient UNet

  * UNet supports multi-source data loading
  * When generating int8 engine with :code:`tlt-converter`, please use :code:`-s` if there is TensorRT error message saying weights are outside of fp16 range.
  * YOLOv3/YOLOv4, UNet :code:`.etlt` files are not directly compatible with DeepStream. Users are required to use the :code:`tlt-converter` to generate a TensorRT engine
    when integrating with DeepStream.

.. _here: https://github.com/NVIDIA-AI-IOT/deepstream_tlt_apps

* TLT Conversational AI
  
  * When running convAI models on a cloud VM, users should have root access to the VM.

* NGC CLI

  *  When running :code:`ngc config set`, the NGC CLI may not prompt the user to configure the team and org. In this case, users may
     run into an error when downloading models saying 

        .. code-block:: text

          Missing org - If apikey is set, org is also required.

     Please maintain a back-up of your existing NGC API key from the ngc config at :code:`~/.ngc/config`
     and clear the ngc config by running the following command.

      .. code-block:: text 

        ngc config clear

Resolved Issues
^^^^^^^^^^^^^^^

* Loading pretrained weights when retraining a pruned model