The model described in this card segments one or more “person” object within an image and returns a semantic segmentation mask for all people within an image.
PeopleSemSegFormer is based on SegFormer. Segformer is a real-time state of the art transformer based semantic segmentation model. SegFormer is a simple, efficient yet powerful semantic segmentation framework which unifies Transformers with lightweight multilayer perception (MLP) decoders. It then predicts a class label for every pixel in the input image.
![peoplesemsegnet.jpg](https://docscontent.nvidia.com/dims4/default/da8d495/2147483647/strip/true/crop/1200x675+0+0/resize/1200x675!/quality/90/?url=https%3A%2F%2Fk3-prod-nvidia-docs.s3.us-west-2.amazonaws.com%2Fbrightspot%2Fsphinx%2F0000018e-5838-dda8-a78f-d97fcb9b0000%2Ftao%2Ftao-toolkit-archive%2F5.2.0%2F_images%2Fpeoplesemsegnet.jpg)
PeopleSemSegNet use case
The training algorithm optimizes the network to minimize the cross-entropy loss for every pixel of the mask.
The primary use case intended for the model is segmenting urban city classes in a color (RGB) image. The model can be used to segment urban city transport/ setting from photos and videos by using appropriate video or image decoding and pre-processing. Note this model performs semantic segmentation and not instance based segmentation.
The datasheet for the model is captured in the model card hosted at NGC.