CitySemSegFormer

The model described in this card segments urban city classes in an image. Classes include the following:

road
sidewalk
building
wall
fence
pole
traffic light
traffic sign
vegetation
terrain
sky
person
rider
car
truck
bus
train
motorcycle
bicycle

The semantic segmentation mask is comprised of every pixel that belongs to a particular class in the image.

CitySemSegFormer is based on SegFormer, a simple and efficient, yet powerful, semantic segmentation framework that unifies Transformers with lightweight multilayer perception (MLP) decoders.

SegFormer has a lightweight and efficient design, allowing it to achieve significantly better performance than previous counterparts. It predicts a class label for every pixel in the input image.

CitySemSegFormer use case

Training Algorithm

The training algorithm optimizes the network to minimize the cross-entropy loss of the class for every pixel. This model was trained using the SegFormer training app in TAO Toolkit version 4.0.

The primary use case for the CitySemSegformer model is segmenting city classes from a color (RGB) image or video using appropriate image/video decoding and pre-processing. Note that CitySemSegformer performs semantic segmentation–that is, it generates a single mask for all instances of the same class in an image.

The datasheet for the model is captured in the model card hosted at NGC.

Previous Body Pose Estimation

Next ReidentificationNet

CitySemSegFormer

Training Algorithm

Intended Use