The model described in this card detects people in an image and provides a semantic segmentation mask. The semantic segmentation mask is comprised of every pixel that belongs to a person in the image.

PeopleSemSegNet is based on Unet, which has an encoder-decoder like architecture. Peoplesemsegnet supports two types of architectures namely Vanilla UNet Dynamic and Shuffleseg. Please refer to VanillaUnetDynamic, ShuffleSeg for more details on the architectue. VanillaUnetDynamic is high accuracy and high complexity model. VanillaUnetDynamic is high complexity and high accuracy model. ShuffleSeg is a low complexity and high performant model with slightly lesser accuracy than VanillaUnetDynamic.

Introduced in U-Net-Convolutional Networks for Biomedical Image Segmentation, UNet is a widely adopted network for performing semantic segmentation, which has applications in autonomous vehicles, industries, smart cities, etc. Vanilla Unet Dynamic and Shuffleseg differ in the encoders employed by the respective architectures. Vanilla Unet Dynamic is a fully convolutional network with an encoder that is comprised of convolutional layers and a decoder that is comprised of transposed convolutions or upsampling layers. Shuffleseg uses Shufflenet as encoder and series of tranposed convolutional layers as decoder. It models predict a class label for every pixel in the input image.


PeopleSemSegNet use case

Training Algorithm

The training algorithm optimizes the network to minimize the cross-entropy loss of the class for every pixel. This model was trained using the UNet training app in TAO Toolkit v3.0.

Intended Use

The primary use case for the PeopleSemSegNet model is segmenting people in a color (RGB) image. The model can be used to segment people from photos and videos using appropriate video or image decoding and pre-processing. Note that PeoplSemSegNet performs semantic segmentation (i.e. it generates a single mask for all the people in an image) and does not distinguish between different person instances.

The datasheet for the model is captured in the model card hosted at NGC.