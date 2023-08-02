The PeopleNet Transformer model detects one or more physical objects from three categories within an image and returns a bounding box around each object, as well as a category label for each object. Three categories of objects detected by these models are:

persons

bags

faces

This model is based on the Deformable DETR object detector with ResNet50 as a feature extractor. This architecture utilizes the attention modules that only attend to a small set of key sampling points around a reference to optimize training and inference speed. PeopleNet-Transformer was modified from the original Deformable DETR by reducing the number of features from the backbone from 4 to 2 for optimized performance on Deformable-DETR object detector.