NVIDIA Docs Hub NVIDIA TAO NVIDIA TAO Toolkit v2.0 Purpose-Built Models

Purpose-Built Models

The purpose-built AI models are primarily built for applications in smart cities, parking management, smart buildings and are trained on millions of images. Both unpruned and pruned versions of these models are available on NGC. The unpruned models are used with TLT to re-train with your dataset. On the other hand, pruned models are deployment ready that allows you to directly deploy on your edge device. In addition, the pruned model also contains a calibration table for INT8 precision. The pruned INT8 model will provide the highest inference throughput.

The table below shows the network architecture and accuracy measured on our dataset.

Model Name	Network Architecture	Number of classes	Accuracy
TrafficCamNet	DetectNet_v2-ResNet18	4	83.5%
PeopleNet	DetectNet_v2-ResNet34	3	84%
PeopleNet	DetectNet_v2-ResNet18	3	80%
DashCamNet	DetectNet_v2-ResNet18	4	80%
FaceDetect-IR	DetectNet_v2-ResNet18	1	96%
VehicleMakeNet	ResNet18	20	91%
VehicleTypeNet	ResNet18	6	96%

Training

The PeopleNet, TrafficCamNet, DashCamNet and FaceDetect-IR are detection models based on DetectNet_v2 and either ResNet18 or ResNet34 backbone. To re-train these models with your data, use the unpruned model from NGC and follow the DetectNet_v2 object detection training guidelines from chapters Preparing the Input Data Structure to Exporting the model. The entire training workflow is given in the prior section.

The VehicleMakeNet and VehicleTypeNet are classification models based on the ResNet18 backbone. To re-train these models, use the unpruned model from NGC and follow the Image classification training guideline from chapters Preparing the Input Data Structure to Exporting the model

Deployment

You can deploy your own trained or the provided pruned model on any edge device using DeepStream. The deployment instructions are provided in Deploying to DeepStream.

TrafficCamNet

TrafficCamNet is a 4-class object detection network built on NVIDIA’s detectnet_v2 architecture with ResNet18 as the backbone feature extractor. It’s trained on 544x960 RGB images to detect cars, persons, road signs and two wheelers. The dataset contains images from real traffic intersections from cities in the US (at about 20ft vantage point). This model is trained to overcome the problem of separating a line of cars as they come to stop at a red traffic light or a stop sign. This model is ideal for smart city applications, where you want to count the number of cars on the road and understand flow of traffic.

PeopleNet

PeopleNet is a 3-class object detection network built on NVIDIA’s detectnet_v2 architecture with ResNet34 as the backbone feature extractor. It’s trained on 544x960 RGB images to detect person, bag, and face. Several million images of both indoor and outdoor scenes were labeled in-house to adapt to a variety of use cases, such as airports, shopping malls and retail stores. This dataset contains images from various vantage points. PeopleNet can be used for smart places or building applications where you need to accurately count people in a crowded environment for security or higher level business insights.

DashCamNet

DashCamNet is a 4-class object detection network built on NVIDIA’s detectnet_v2 architecture with ResNet18 as the backbone feature extractor. It’s trained on 544x960 RGB images to detect cars, pedestrians, traffic signs and two wheelers. The training data for this network contains real images collected, annotated and curated in-house from different dashboard cameras in cars at about 4-5ft height in vantage point. Unlike the other models the camera in this case is moving. The use case for this model is to identify objects from a moving object, which can be a car or a robot.

FaceDetect-IR

FaceDetect_IR is a single class face detection network built on NVIDIA’s detectnet_v2 architecture with ResNet18 as the backbone feature extractor. The model is trained on 384x240x3 IR (infrared) images augmented with synthetic noises and is trained for use cases where the person’s face is close to the camera, such as a laptop camera during video conferencing or a camera placed inside a vehicle to observe a distracted driver. When infrared illuminators are used this model can continue to work even when visible light conditions are considered too dark for normal color cameras.

VehicleMakeNet

VehicleMakeNet is a classification network based on ResNet18, which aims to classify car images of size 224 x 224. This model can identify 20 popular car makes. VehicleMakeNet is generally cascaded with DashCamNet or TrafficCamNet for smart city applications. For example, DashCamNet or TrafficCamNet acts as a primary detector, detecting the objects of interest and for each detected car the VehicleMakeNet acts as a secondary classifier determining the make of the car. Businesses such as smart parking or gas stations can use the insights of the make of vehicles to understand their customers.

VehicleTypeNet

VehicleTypeNet is a classification network based on ResNet18, which aims to classify cropped vehicle images of size 224 x 224 into 6 classes: Coupe, Large Vehicle, Sedan, SUV, Truck, and Vans. The typical use case for this model is in smart city applications such as smart garage or toll booth, where you can charge based on size of the vehicle.