The model described in this card detects one or more faces in the given image / video. Compared to the PeopleNet model, this model gives better results detecting large faces, such as faces in webcam images.
The model is based on NVIDIA DetectNet_v2 detector with ResNet18 as a feature extractor. This architecture, also known as GridBox object detection, uses bounding-box regression on a uniform grid on the input image. Gridbox system divides an input image into a grid which predicts four normalized bounding-box parameters (xc, yc, w, h) and confidence value per output class.
The raw normalized bounding-box and confidence detections needs to be post-processed by a clustering algorithm such as DBSCAN or NMS to produce final bounding-box coordinates and category labels.
The training algorithm optimizes the network to minimize the localization and confidence loss for the objects. The training is carried out in two phases. This model was trained using the DetectNet_v2 training app in TLT v3.0. In the first phase, the network is trained with regularization to facilitate pruning. Following the first phase, we prune the network removing channels whose kernel norms are below the pruning threshold. In the second phase the pruned network is retrained. Regularization is not included during the second phase.
Primary use case for this model is to detect faces from an IR (infrared) camera. The model can be used to detect faces from photos and videos by using appropriate video or image decoding and pre-processing. As a secondary use case the model can also be used to detect faces in RGB (color) images. However, this is not the main intended use for this model.
The datasheet for the model is captured in it’s model card hosted at NGC.