Release Notes ============= Transfer Learning Toolkit for Intelligent Video Analytics V2.0 -------------------------------------------------------------- NVIDIA Transfer Learning Toolkit (TLT) is a Python package to enable NVIDIA customers the ability to fine-tune pretrained models with customer’s own data and export them for TensorRT based inference through an edge device. Key Features ^^^^^^^^^^^^ Features included in this release: * Pretrained models for several public architectures and reference applications serving computer vision related object classification and detection Intelligent Video Analytics (IVA) use cases. * Support for YOLOv3, FasterRCNN, SSD, RetinaNet and DSSD object detection models. * Support for MaskRCNN Instance segmentation model * Support to train highly accurate purpose-built models - PeopleNet, TrafficCamNet, DashCamNet, FaceDetectIR, VehicleTypeNet, VehicleMakeNet * Quantization aware training for accurate INT8 models * Support for Automatic Mixed Precision (AMP) training * Offline augmentation tool * Jupyter notebook examples showing how to use the pretrained models effectively. * Model adaptation and retraining that is easy to use in heterogeneous multiple GPU environments. * Pruning API that compresses the size of the model during training. * Model export API for integrating the model directly into the DeepStream environment. * Converter utility to generate a device specific optimized TensorRT engine. * TLT uses the CUDA multi-process service which helps in optimizing GPU utilization during multiple GPU training. Contents ^^^^^^^^ Components included in this release: * TLT docker * Jupyter notebook with sample workflows * Getting Started Guide containing usage and installation instructions * tlt-converter (for x86 it’s part of the package) * tlt-converter for Jetson (ARM64) available here. * Pre-trained weights trained on Open Image dataset available on NGC - `Classification`_ - `Object Detection`_ - `Object Detection - DetectNet_v2`_ - `Instance Segmentation`_ * Unpruned and Pruned models for Purpose-built models - Pruned models can be deployed out-of-box with DeepStream and unpruned models can be used for re-training. - `PeopleNet`_ - `TrafficCamNet`_ - `DashCamNet`_ - `FaceDetectIR`_ - `VehicleTypeNet`_ - `VehicleMakeNet`_ .. _Classification: https://ngc.nvidia.com/catalog/models/nvidia:tlt_pretrained_classification .. _Object Detection: https://ngc.nvidia.com/catalog/models/nvidia:tlt_pretrained_object_detection .. _Object Detection - DetectNet_v2: https://ngc.nvidia.com/catalog/models/nvidia:tlt_pretrained_detectnet_v2 .. _Instance Segmentation: https://ngc.nvidia.com/catalog/models/nvidia:tlt_instance_segmentation .. _PeopleNet: https://ngc.nvidia.com/catalog/models/nvidia:tlt_peoplenet .. _TrafficCamNet: https://ngc.nvidia.com/catalog/models/nvidia:tlt_trafficcamnet .. _DashCamNet: https://ngc.nvidia.com/catalog/models/nvidia:tlt_dashcamnet .. _FaceDetectIR: https://ngc.nvidia.com/catalog/models/nvidia:tlt_facedetectir .. _VehicleTypeNet: https://ngc.nvidia.com/catalog/models/nvidia:tlt_vehicletypenet .. _VehicleMakeNet: https://ngc.nvidia.com/catalog/models/nvidia:tlt_vehiclemakenet Software Requirements ^^^^^^^^^^^^^^^^^^^^^ * Ubuntu 18.04 LTS * `NVIDIA GPU Cloud`_ * `nvidia-docker 2`_ * NVIDIA GPU driver v410.xx or above .. _NVIDIA GPU Cloud: https://ngc.nvidia.com/ .. _nvidia-docker 2: https://github.com/nvidia/nvidia-docker/wiki/Installation-(version-2.0) .. Note:: `DeepStream 5.0`_ for inference and deployment is recommended. .. _DeepStream 5.0: https://developer.nvidia.com/deepstream-sdk Hardware Requirements ^^^^^^^^^^^^^^^^^^^^^ Minimum ******* * 4 GB system RAM * 4 GB of GPU RAM * Single core CPU * 1 NVIDIA GPU * 50 GB of HDD space Recommended *********** * 32 GB system RAM * 32 GB of GPU RAM * 8 core CPU * 4 GPUs - V100 * 100 GB of SSD space Known Issues ^^^^^^^^^^^^ * SSD, DSSD, YOLOv3, RetinaNet, FasterRCNN and MaskRCNN integration to DeepStream is a feature and requires custom plugins from the TensorRT Open Source Software (OSS) library. DeepStream 5.0 doesn't natively support custom plugins from TensorRT OSS. Instructions to build TensorRT OSS and custom parsing code to run with Deepstream have been provided `here`_. * Transfer Learning is not supported on pruned models across all applications. * When training with multiple GPUs, you might need to scale down the batch_size and/or scale up the learning rate to get the same accuracy seen in single GPU training. * When resuming a DetectNet_v2 training from checkpoint, please make sure to maintain the same number of GPUs and the same command line to restart the training. * When training DetectNet_v2 for object detection use-cases with more than 10 classes, you may need to either update the cost_weight parameter in the cost_function_config, or balance the number of samples per class in the dataset for better training. * When training a DetectNet_v2 network for datasets with less than 20,000 images, please use smaller batch-sizes (1, 2 or 4) to get better accuracy. * When using MaskRCNN, please make sure GPU 0 is free. * ResNet101 pre-trained weights from NGC is not supported on YOLOV3, SSD, DSSD and RetinaNet. .. _here: https://github.com/NVIDIA-AI-IOT/deepstream_tlt_apps Resolved Issues ^^^^^^^^^^^^^^^ * Loading pretrained weights when retraining a pruned model. * Incorrect image_extension in detectnet dataset_config returns unrelated error during tlt-evaluate.