PeopleSemSegNet
===============

.. _VanillaUnetDynamic: https://keras.io/examples/vision/oxford_pets_image_segmentation/
.. _U-Net-Convolutional Networks for Biomedical Image Segmentation: https://arxiv.org/abs/1505.04597


The model described in this card detects people in an image and provides a semantic segmentation mask.
The semantic segmentation mask is comprised of every pixel that belongs to a person in the image. 

PeopleSemSegNet is based on Unet, which has an encoder-decoder like architecture. Please refer to `VanillaUnetDynamic`_
for architecture details.  Introduced in `U-Net-Convolutional Networks for Biomedical Image Segmentation`_,
UNet is a widely adopted network for performing semantic segmentation, which has applications in autonomous vehicles,
industries, smart cities, etc. UNet is a fully convolutional network with an encoder that is comprised of convolutional
layers and a decoder that is comprised of transposed convolutions or upsampling layers. It then predicts a class label
for every pixel in the input image.

Training Algorithm
------------------

The training algorithm optimizes the network to minimize the cross-entropy loss of the class for every pixel.
This model was trained using the :ref:`UNet<unet>` training app in TLT v3.0. 

Intended Use
------------

The primary use case for the PeopleSemSegNet model is segmenting people in a color (RGB) image. The model can be used
to segment people from photos and videos using appropriate video or image decoding and pre-processing.
Note that PeoplSemSegNet performs semantic segmentation (i.e. it generates a single mask for all the people in an image) and 
does not distinguish between different person instances.

The datasheet for the model is captured in the model card hosted at `NGC`_.

.. _NGC: https://ngc.nvidia.com/catalog/models/nvidia:tlt_peoplesemsegnet