PeopleSemSegNet =============== .. _VanillaUnetDynamic: https://keras.io/examples/vision/oxford_pets_image_segmentation/ .. _U-Net-Convolutional Networks for Biomedical Image Segmentation: https://arxiv.org/abs/1505.04597 The model described in this card detects people in an image and provides a semantic segmentation mask. The semantic segmentation mask is comprised of every pixel that belongs to a person in the image. PeopleSemSegNet is based on Unet, which has an encoder-decoder like architecture. Please refer to `VanillaUnetDynamic`_ for architecture details. Introduced in `U-Net-Convolutional Networks for Biomedical Image Segmentation`_, UNet is a widely adopted network for performing semantic segmentation, which has applications in autonomous vehicles, industries, smart cities, etc. UNet is a fully convolutional network with an encoder that is comprised of convolutional layers and a decoder that is comprised of transposed convolutions or upsampling layers. It then predicts a class label for every pixel in the input image. Training Algorithm ------------------ The training algorithm optimizes the network to minimize the cross-entropy loss of the class for every pixel. This model was trained using the :ref:`UNet` training app in TLT v3.0. Intended Use ------------ The primary use case for the PeopleSemSegNet model is segmenting people in a color (RGB) image. The model can be used to segment people from photos and videos using appropriate video or image decoding and pre-processing. Note that PeoplSemSegNet performs semantic segmentation (i.e. it generates a single mask for all the people in an image) and does not distinguish between different person instances. The datasheet for the model is captured in the model card hosted at `NGC`_. .. _NGC: https://ngc.nvidia.com/catalog/models/nvidia:tlt_peoplesemsegnet