The model described in this card segments urban city classes in an image. Classes include the following:
The semantic segmentation mask is comprised of every pixel that belongs to a particular class in the image.
CitySemSegFormer is based on SegFormer, a simple and efficient, yet powerful, semantic segmentation framework that unifies Transformers with lightweight multilayer perception (MLP) decoders.
SegFormer has a lightweight and efficient design, allowing it to achieve significantly better performance than previous counterparts. It predicts a class label for every pixel in the input image.
The training algorithm optimizes the network to minimize the cross-entropy loss of the class for every pixel. This model was trained using the SegFormer training app in TAO Toolkit version 4.0.
The primary use case for the CitySemSegformer model is segmenting city classes from a color (RGB) image or video using appropriate image/video decoding and pre-processing. Note that CitySemSegformer performs semantic segmentation–that is, it generates a single mask for all instances of the same class in an image.
The datasheet for the model is captured in the model card hosted at NGC.