NVIDIA Docs Hub NVIDIA TAO NVIDIA TAO Toolkit v4.0.1 PeopleSemSegFormer

PeopleSemSegFormer

The model described in this card segments one or more “person” object within an image and returns a semantic segmentation mask for all people within an image.

PeopleSemSegFormer is based on SegFormer. Segformer is a real-time state of the art transformer based semantic segmentation model. SegFormer is a simple, efficient yet powerful semantic segmentation framework which unifies Transformers with lightweight multilayer perception (MLP) decoders. It then predicts a class label for every pixel in the input image.

PeopleSemSegNet use case

Training Algorithm

The training algorithm optimizes the network to minimize the cross-entropy loss for every pixel of the mask.

Intended Use

The primary use case intended for the model is segmenting urban city classes in a color (RGB) image. The model can be used to segment urban city transport/ setting from photos and videos by using appropriate video or image decoding and pre-processing. Note this model performs semantic segmentation and not instance based segmentation.

The datasheet for the model is captured in the model card hosted at NGC.