Release Notes¶
Transfer Learning Toolkit V3.0¶
NVIDIA Transfer Learning Toolkit (TLT) is a Python package to enable NVIDIA customers the ability to fine-tune pretrained models with customer’s own data and export them for TensorRT based inference through an edge device.
Key Features¶
Features included in this release:
TLT Launcher:
Python3 pip package as a unified Command Line Interface (CLI)
TLT Resources:
Jupyter notebook examples showing how to use the pretrained models effectively.
TLT CV:
Pretrained models for several public architectures and reference applications serving computer vision related object classification, detection and segmentation use cases.
Support for YOLOv3, YOLOv4 FasterRCNN, SSD, RetinaNet and DSSD object detection models.
Support for MaskRCNN Instance segmentation model
Support for UNet Semantic segmentation model
Support to train highly accurate purpose-built models -
PeopleNet, PeopleSegNet, TrafficCamNet, DashCamNet, FaceDetectIR, VehicleTypeNet, VehicleMakeNet, FpeNet, FaceDetect, GazeNet, GestureNet, EmotionNet
Quantization aware training for accurate INT8 models
Support for Automatic Mixed Precision (AMP) training
Offline augmentation tool for object detection datasets
Model adaptation and retraining that is easy to use in heterogeneous multiple GPU environments.
Pruning API that compresses the size of the model during training.
Model export API for integrating the model directly into the DeepStream or TLT CV Inference pipeline environment.
Converter utility to generate device specific optimized TensorRT engines.
TLT uses the CUDA multi-process service which helps in optimizing GPU utilization during multiple GPU training.
TLT Conversational AI:
Pretrained models for several public architectures and reference applications serving conversational AI related speech to text and natural language processing use cases, namely
Speech to Text
Text classification
Token classification
Punctuation and capitalization
Intent and slot classification
Question Answering
Contents¶
Components included in this release:
TLT Launcher pip package
TLT - TF docker
TLT - Pytorch Docker
Jupyter notebook with sample workflows
Getting Started Guide containing usage and installation instructions
tlt-converter for x86 + discrete GPU platforms
tlt-converter for Jetson (ARM64) available here.
Pre-trained weights trained on Open Image dataset available on NGC
Unpruned and Pruned models for Purpose-built models - Pruned models can be deployed out-of-box with DeepStream and unpruned models can be used for re-training.
Trainable and out-of-box Deployable models for:
Software Requirements¶
Ubuntu 18.04 LTS
Docker API > 1.40
Docker-ce > 19.03
Python > 3.6.9
Jupyter Notebook
NVIDIA GPU driver v455.xx or above
Note
DeepStream 5.0 for inference and deployment is recommended.
Hardware Requirements¶
The following system configuration is recommended to achieve reasonable training performance with the TLT and supported models provided:
32 GB system RAM
32 GB of GPU RAM
8 core CPU
1 NVIDIA GPU
100 GB of SSD space
TLT is supported on A100, V100 and RTX 30x0 GPUs.
Known Issues¶
TLT CV
SSD, DSSD, YOLOv3, RetinaNet, FasterRCNN and MaskRCNN integration to DeepStream is a feature and requires custom plugins from the TensorRT Open Source Software (OSS) library. DeepStream 5.0 doesn’t natively support custom plugins from TensorRT OSS. Instructions to build TensorRT OSS and custom parsing code to run with Deepstream have been provided here.
Transfer Learning is not supported on pruned models across all applications.
When training with multiple GPUs, you might need to scale down the batch_size and/or scale up the learning rate to get the same accuracy seen in single GPU training.
When resuming a DetectNet_v2 training from checkpoint, please make sure to maintain the same number of GPUs and the same command line to restart the training.
When training DetectNet_v2 for object detection use-cases with more than 10 classes, you may need to either update the cost_weight parameter in the cost_function_config, or balance the number of samples per class in the dataset for better training.
When training a DetectNet_v2 network for datasets with less than 20,000 images, please use smaller batch-sizes (1, 2 or 4) to get better accuracy.
When using MaskRCNN, please make sure GPU 0 is free.
ResNet101 pre-trained weights from NGC is not supported on YOLOV3, SSD, DSSD and RetinaNet.
UNet is currently not supported for inference in DS 5.0. DeepStream 5.1 supports it.
When generating int8 engine with
tlt-converter
, please use-s
if there is TensorRT error message saying weights are outside of fp16 range.Yolov3/v4 models are compatible with only the following combination of DeepStream and TensorRT
Model |
DeepStream |
TensorRT |
Compatible |
---|---|---|---|
YOLO v3/v4 |
5.0 / 5.1 |
7.0.x |
Yes |
YOLO v3/v4 |
5.0 / 5.1 |
7.1.x |
Yes |
YOLO v3/v4 |
5.0 / 5.1 |
7.2.1 |
Yes |
YOLO v3/v4 |
5.0 / 5.1 |
7.2.2 |
No |
TLT Conversational AI
In this beta release, Jarvis does not support fine-tuning of the QA model when using Megatron. It is recommended to use BERT for customized QA models.
Conversational AI models currently do not support resume from checkpoint for multiGPU training jobs. You will be able to resume resume training from a checkpoint for singleGPU training.
NGC CLI
When running
ngc config set
, the NGC CLI may not prompt the user to configure the team and org. In this case, users may run into an error when downloading models sayingMissing org - If apikey is set, org is also required.
Please maintain a back-up of your existing NGC API key from the ngc config at
~/.ngc/config
and clear the ngc config by running the following command.ngc config clear
Resolved Issues¶
Loading pretrained weights when retraining a pruned model