PeopleNet

The PeopleNet models detect one or more physical objects from three categories within an image and return a box around each object, as well as a category label for each object. Three categories of objects detected by these models are:

persons
bags
faces

These models are based on NVIDIA DetectNet_v2 detector with ResNet34 as the feature extractor. This architecture, also known as GridBox object detection, uses bounding-box regression on a uniform grid on the input image. Grid-box system divides an input image into a grid which predicts four normalized bounding-box parameters (xc, yc, w, h) and confidence value per output class.

The raw normalized bounding-box and confidence detections need to be post-processed by a clustering algorithm such as DBSCAN or NMS to produce the final bounding-box coordinates and category labels.

Model Card

The datasheet for the models is captured in the model card hosted at NGC, which includes the detailed instructions to deploy the models with DeepStream.

TAO Fine-Tuning

You can retrain/fine-tune the PeopleNet models on customized datasets. This model uses NVIDIA DetectNet_v2 as the object detector with ResNet34 as the feature extractor.

For more details, refer to TAO tutorial notebook for DetectNet_v2 and TAO documentation for PeopleNet.

Accuracy

The accuracy of PeopleNet v2.6 model was measured against more than 90,000 proprietary images across a variety of environments. The frames are high resolution images 1920x1080 pixels resized to 960x544 pixels before passing to the PeopleNet detection model.

The true positives, false positives, false negatives are calculated using intersection-over-union (IOU) criterion greater than 0.5. In addition, we have also added KPI with IOU criterion of greater than 0.8 for low-density and extended-hand sequences where tight bounding box is a requirement for subsequent human pose estimation algorithms. The KPI for the evaluation data are reported in the table below. Model is evaluated based on precision, recall and accuracy.

Content	Precision (FP16)	Recall (FP16)	Accuracy (FP16)	Precision (INT8)	Recall (INT8)	Accuracy (INT8)
Generic	91.82%	88.60%	82.17%	92.97%	85.33%	80.21%
Office	95.64%	91.92%	88.22%	96.08%	89.48%	86.33%
Cafe	93.38%	86.97%	81.92%	94.62%	81.81%	78.18%
People (IOU > 0.8)	80.41%	78.13%	65.71%	78.71%	73.90%	61.68%
Extended-hands (IOU > 0.8)	93.98%	86.82%	82.25%	94.41%	86.80%	82.55%