The model described in this card segments one or more “person” object within an image and returns a semantic segmentation mask for all people within an image.
PeopleSemSegFormer is based on SegFormer. Segformer is a real-time state of the art transformer based semantic segmentation model. SegFormer is a simple, efficient yet powerful semantic segmentation framework which unifies Transformers with lightweight multilayer perception (MLP) decoders. It then predicts a class label for every pixel in the input image.
The training algorithm optimizes the network to minimize the cross-entropy loss for every pixel of the mask.
The primary use case intended for the model is segmenting urban city classes in a color (RGB) image. The model can be used to segment urban city transport/ setting from photos and videos by using appropriate video or image decoding and pre-processing. Note this model performs semantic segmentation and not instance based segmentation.
The datasheet for the model is captured in the model card hosted at NGC.