DashCamNet

The DashCamNet model detects one or more physical objects from four categories within an image and returns a box around each object, as well as a category label for each object. The four categories of objects detected by this model are

  • car

  • persons

  • road signs

  • bicycles

These models are based on NVIDIA DetectNet_v2 detector with ResNet 18 as a feature extractor. This architecture, also known as GridBox object detection, uses bounding-box regression on a uniform grid on the input image. Gridbox system divides an input image into a grid which predicts four normalized bounding-box parameters (xc, yc, w, h) and confidence value per output class.

The raw normalized bounding-box and confidence detections need to be post-processed by a clustering algorithm such as DBSCAN or NMS to produce the final bounding-box coordinates and category labels.

Training algorithm

The training algorithm optimizes the network to minimize the localization and confidence loss for the objects. This model was trained using the DetectNet_v2 training app in TAO Toolkit v3.0. The training is carried out in two phases. In the first phase, the network is trained with regularization to facilitate pruning. Following the first phase, we prune the network removing channels whose kernel norms are below the pruning threshold. In the second phase the pruned network is retrained. Regularization is not included in second phase.

Intended use case

Primary use case for this model is to detect cars from a moving camera. The model can be used to detect cars from photos and videos by using appropriate video or image decoding and pre-processing. As a secondary use case the model can also be used to detect persons, road signs and two-wheelers from images or videos. However, these additional classes are not the main intended use for this model.

The datasheet for the model is captured in it’s model card hosted at NGC.