TAO Toolkit v5.3.0
TAO Toolkit v5.3.0


The model described in this card segments one or more “person” object within an image and returns a semantic segmentation mask for all people within an image.

PeopleSemSegFormer is based on SegFormer. Segformer is a real-time state of the art transformer based semantic segmentation model. SegFormer is a simple, efficient yet powerful semantic segmentation framework which unifies Transformers with lightweight multilayer perception (MLP) decoders. It then predicts a class label for every pixel in the input image.


PeopleSemSegNet use case

The training algorithm optimizes the network to minimize the cross-entropy loss for every pixel of the mask.

The primary use case intended for the model is segmenting urban city classes in a color (RGB) image. The model can be used to segment urban city transport/ setting from photos and videos by using appropriate video or image decoding and pre-processing. Note this model performs semantic segmentation and not instance based segmentation.

The datasheet for the model is captured in the model card hosted at NGC.

Previous PeopleNet Transformer
Next PCB Classification
© Copyright 2023, NVIDIA.. Last updated on May 24, 2024.