Transfer Learning Toolkit (TLT) Integration with DeepStream

NVIDIA TLT is a simple, easy-to-use training toolkit that requires minimal to zero coding to create vision AI models using the user’s own data. Using TLT, users can transfer learn from NVIDIA pre-trained models to create their own model. Users can add new classes to an existing pre-trained model, or they can re-train the model to adapt to their use case. Users can use model pruning capability to reduce the overall size of the model.

Pre-trained models

There are 2 types of pre-trained models that users can start with - purpose-built pre-trained models and meta-architecture vision models. Purpose-built pre-trained models are highly accurate models that are trained on millions of objects for a specific task. The pre-trained weights for meta-architecture vision models merely act as a starting point to build more complex models. These pre-trained weights are trained on Open image dataset and they provide a much better starting point for training versus starting from scratch or starting from random weights. With the latter choice, users can choose from 100+ permutations of model architecture and backbone. See the illustration below.

TLT Pretrained Models

The purpose-built models are built for high accuracy and performance. These models can be deployed out of the box for applications in smart city or smart places or can also be used to re-train with user’s own data. All models are trained on millions of objects and can achieve more than very high accuracy on our test data. More information about each of these models is available in Purpose-built models chapter of TLT documentation – Purpose built models or in the individual model cards. Typical use cases and some model KPIs are provided in the table below. PeopleNet can be used for detecting and counting people in smart buildings, retail, hospitals, etc. For smart traffic applications, TrafficCamNet and DashCamNet can be used to detect and track vehicles on the road.

TLT pretrained models - use cases

Model Name

Network Architecture

Number of classes

Accuracy

Use case

TrafficCamNet

DetectNet_v2-ResNet18

4

83.50%

Detect and track cars

PeopleNet

DetectNet_v2-ResNet18/34

3

84%

People counting, heatmap generation, social distancing.

DashCamNet

DetectNet_v2-ResNet18

4

80%

Identify objects from a moving object

FaceDetectIR

DetectNet_v2-ResNet18

1

96%

Detect face in a dark environment with IR camera

VehicleMakeNet

ResNet18

20

91%

Classifying car models

VehicleTypeNet

ResNet18

6

96%

Classifying type of cars as coupe, sedan, truck, etc

License Plate Detection

DetectNet_v2-ResNet18

1

98%

Detect License plates on Vehicles

License Plate Recognition

Tuned ResNet18

36(US)/68(CH)

97%(US)/99%(CH)

Recognize characters in License plates. Available in American and Chinese License plates

PeopleSegNet

MaskRCNN-ResNet50

1

85%

Detect and segment people in crowded environment

Most models trained with TLT are natively integrated for inference with DeepStream. If the model is integrated, it is supported by the reference deepstream-app. If the model is not natively integrated in the SDK, you can find a reference application on the GitHub repo. See the table below for information on the models supported. For models integrated into deepstream-app, we have provided sample config files for each of the networks. The sample config files are available in the samples/configs/tlt_pretrained_models folder. The table below also lists config files for each model.

The TLT pre-trained models table shows the deployment information of purpose-built pre-trained models.

TLT pre-trained models in DeepStream

Pre-Trained model

DeepStream reference app

Config files in SDK

DLA supported

TrafficCamNet

deepstream-app

deepstream_app_source1_trafficcamnet, config_infer_primary_trafficcamnet.txt, labels_trafficnet.txt

Yes

PeopleNet

deepstream-app

deepstream_app_source1_peoplenet.txt, config_infer_primary_peoplenet.txt, labels_peoplenet.txt

Yes

DashCamNet

deepstream-app

deepstream_app_source1_dashcamnet_vehiclemakenet_vehicletypenet.txt, config_infer_primary_dashcamnet.txt, labels_dashcamnet.txt

Yes

FaceDetect-IR

deepstream-app

deepstream_app_source1_faceirnet.txt, config_infer_primary_faceirnet.txt, labels_faceirnet.txt

Yes

VehicleMakeNet

deepstream-app

deepstream_app_source1_dashcamnet_vehiclemakenet_vehicletypenet.txt, config_infer_secondary_vehiclemakenet.txt, labels_vehiclemakenet.txt

Yes

VehicleTypeNet

deepstream-app

deepstream_app_source1_dashcamnet_vehiclemakenet_vehicletypenet.txt, config_infer_secondary_vehicletypenet.txt, labels_vehicletypenet.txt

Yes

License Plate Detection

deepstream-lpr-app

Yes

License Plate Recognition

deepstream-lpr-app

No

PeopleSegNet

DeepStream TLT app - PeopleSegNet

Yes


The TLT model arch table shows the deployment information of the open model architecture models from TLT.

TLT model architecture in DeepStream

Model

Architecture

Model

Reference app

Config file

DetectNet_v2

https://nvidia.box.com/shared/static/i1cer4s3ox4v8svbfkuj5js8yqm3yazo.zip

deepstream-app

config_infer_primary_detectnet_v2.txt,

detectnet_v2_labels.txt

YoloV3

https://nvidia.box.com/shared/static/i1cer4s3ox4v8svbfkuj5js8yqm3yazo.zip

deepstream-app

config_infer_primary_yolov3.txt,

yolov3_labels.txt

YoloV4

https://nvidia.box.com/shared/static/i1cer4s3ox4v8svbfkuj5js8yqm3yazo.zip

DeepStream TLT app - YOLOV4

FasterRCNN

https://nvidia.box.com/shared/static/i1cer4s3ox4v8svbfkuj5js8yqm3yazo.zip

deepstream-app

config_infer_primary_frcnn.txt,

frcnn_labels.txt

SSD

https://nvidia.box.com/shared/static/i1cer4s3ox4v8svbfkuj5js8yqm3yazo.zip

deepstream-app

config_infer_primary_ssd.txt,

ssd_labels.txt

DSSD

https://nvidia.box.com/shared/static/i1cer4s3ox4v8svbfkuj5js8yqm3yazo.zip

deepstream-app

config_infer_primary_dssd.txt,

dssd_labels.txt

RetinaNet

https://nvidia.box.com/shared/static/i1cer4s3ox4v8svbfkuj5js8yqm3yazo.zip

deepstream-app

config_infer_primary_retinanet.txt,

retinanet_labels.txt

MaskRCNN

https://nvidia.box.com/shared/static/i1cer4s3ox4v8svbfkuj5js8yqm3yazo.zip

deepstream-app

config_infer_primary_mrcnn.txt,

mrcnn_labels.txt

UNET

https://nvidia.box.com/shared/static/i1cer4s3ox4v8svbfkuj5js8yqm3yazo.zip

DeepStream TLT app - UNET

Except for UNET and License plate recognition (LPR), all models can be deployed in native etlt format. etlt is the TLT export format. For UNET and LPR model, you will need to convert the etlt file to TensorRT engine before running with DeepStream. Use tlt-converter to converter etlt to TensorRT engine. tlt-converter are very Hardware specific, see table below for the appropriate version for your Hardware.

TLT converter

Platform

Compute

Link

x86 + GPU

CUDA 10.2 / cuDNN 8.0 / TensorRT 7.1

https://developer.nvidia.com/cuda102-cudnn80-trt71

x86 + GPU

CUDA 10.2 / cuDNN 8.0 / TensorRT 7.1

https://developer.nvidia.com/cuda102-cudnn80-trt72

x86 + GPU

CUDA 11.0 / cuDNN 8.0 / TensorRT 7.1

https://developer.nvidia.com/cuda110-cudnn80-trt71

x86 + GPU

CUDA 11.0 / cuDNN 8.0 / TensorRT 7.1

https://developer.nvidia.com/cuda110-cudnn80-trt72

Jetson

JetPack 4.4

https://developer.nvidia.com/cuda102-trt71-jp44

Jetson

JetPack 4.5

https://developer.nvidia.com/cuda102-trt71-jp45

For more information about TLT and how to deploy TLT models with DeepStream, refer to DeepStream-TLT integration chapter of TLT user guide.

For more information about deployment of architecture specific models, refer to https://github.com/NVIDIA-AI-IOT/deepstream_tlt_apps and https://github.com/NVIDIA-AI-IOT/deepstream_lpr_app GitHub repo.