NVIDIA Docs Hub NVIDIA TAO NVIDIA TAO Toolkit v5.2.0 PeopleSemSegFormer

PeopleSemSegFormer

The model described in this card segments one or more “person” object within an image and returns a semantic segmentation mask for all people within an image.

PeopleSemSegFormer is based on SegFormer. Segformer is a real-time state of the art transformer based semantic segmentation model. SegFormer is a simple, efficient yet powerful semantic segmentation framework which unifies Transformers with lightweight multilayer perception (MLP) decoders. It then predicts a class label for every pixel in the input image.

PeopleSemSegNet use case

Training Algorithm

The training algorithm optimizes the network to minimize the cross-entropy loss for every pixel of the mask.

Intended Use

The primary use case intended for the model is segmenting urban city classes in a color (RGB) image. The model can be used to segment urban city transport/ setting from photos and videos by using appropriate video or image decoding and pre-processing. Note this model performs semantic segmentation and not instance based segmentation.

The datasheet for the model is captured in the model card hosted at NGC.

Previous PeopleNet Transformer

Next PCB Classification