The model described in this card segments urban city classes in an image. Classes include the following:

road

sidewalk

building

wall

fence

pole

traffic light

traffic sign

vegetation

terrain

sky

person

rider

car

truck

bus

train

motorcycle

bicycle

The semantic segmentation mask is comprised of every pixel that belongs to a particular class in the image.

CitySemSegFormer is based on SegFormer, a simple and efficient, yet powerful, semantic segmentation framework that unifies Transformers with lightweight multilayer perception (MLP) decoders.

SegFormer has a lightweight and efficient design, allowing it to achieve significantly better performance than previous counterparts. It predicts a class label for every pixel in the input image.