SegFormer#

SegFormer is an NVIDIA-developed semantic-segmentation model that is included in TAO. SegFormer supports the following tasks:

  • train

  • evaluate

  • inference

  • export

  • quantize

Data Input for SegFormer#

Segformer requires the data to be provided as image and mask folders. See the Data Annotation Format page for more information about the input data format for Segformer.

Creating Training Experiment Specification File#

Configuration for Custom Dataset#

Here is an example specification file for training a SegFormer model with an NVDINOv2 backbone.

Please noted that the specification file is for reference. The user should create their own specification file based on their own dataset.

The experiment specification consists of several main components:

  • train

  • evaluate

  • inference

  • export

  • model

  • dataset

  • gen_trt_engine

train#

The train config contains the parameters related to training. They are described as follows:

train:
  resume_training_checkpoint_path: null
  segment:
    loss: "ce"
  num_epochs: 50
  num_nodes: 1
  validation_interval: 1
  checkpoint_interval: 50
  optim:
    lr: 0.0001
    optim: "adamw"
    policy: "linear"
    weight_decay: 0.0005

Parameter

Datatype

Default

Description

Supported Values

optim

dict config

Optimizer config.

pretrained_model_path

str

None

Pretrained model path.

segment

dict config

Segmentation loss Config.

num_gpus

int

1

The number of GPUs to run the train job.

gpu_ids

List[int]

[0]

List of GPU IDs to run the training on.

num_nodes

int

1

Number of nodes to run the training on.

seed

int

1234

The seed for the initializer in PyTorch.

num_epochs

int

10

Number of epochs to run the training.

checkpoint_interval

int

1

Checkpoint interval.

validation_interval

int

1

Validation interval.

resume_training_checkpoint_path

str

None

Path to the checkpoint to resume training

results_dir

str

None

Path to where all the assets are stored.

optim#

Parameter

Datatype

Default

Description

Supported Values

monitor_name

str

val_loss

Monitor Name

optim

str

adamw

Optimizer

adamw,adam,sgd

lr

float

0.00006

Optimizer learning rate

policy

str

linear

Optimizer policy

linear,step

momentum

float

0.9

The momentum for the AdamW optimizer.

weight_decay

float

0.01

The weight decay coefficient.

segment#

Parameter

Datatype

Default

Description

Supported Values

loss

str

ce

Segment loss

ce

weights

List[float]

[0.5, 0.5, 0.5, 0.8, 1.0]

Multi-scale Segment loss weight

tensorboard#

Parameter

Datatype

Default

Description

Supported Values

enabled

bool

False

Flag to enable tensorboard

infrequent_logging_frequency

int

2

infrequent_logging_frequency

evaluate#

The evaluate config contains the parameters related to training. They are described as follows:

Set the evaluate checkpoint path in the evaluate specification:

evaluate:
  checkpoint: ${results_dir}/train/segformer_model_latest.pth
  vis_after_n_batches: 1

Parameter

Datatype

Default

Description

Supported Values

vis_after_n_batches

int

1

Visualize evaluation segmentation results after n batches.

batch_size

int

8

Batch Size.

checkpoint

str

Path to checkpoint file.

num_gpus

int

1

The number of GPUs to run the evaluate job.

gpu_ids

List[int]

[0]

List of GPU IDs to run the evaluate on.

num_nodes

int

1

Number of nodes to run the evaluate on.

checkpoint

str

Path to the checkpoint used for evaluation.

trt_engine

Optional[str]

None

Path to the TensorRT engine to be used for evaluation.

results_dir

Optional[str]

None

Path to where all the assets are stored.

inference#

The inference config contains the parameters related to training. They are described as follows:

Set the inference checkpoint path in the inference specification:

inference:
  checkpoint: ${results_dir}/train/segformer_model_latest.pth
  vis_after_n_batches: 1

Parameter

Datatype

Default

Description

Supported Values

vis_after_n_batches

int

1

Visualize inference segmentation results after n batches.

batch_size

int

8

Batch Size.

checkpoint

str

Path to checkpoint file.

num_gpus

int

1

The number of GPUs to run the inference job.

gpu_ids

List[int]

[0]

List of GPU IDs to run the inference on.

num_nodes

int

1

Number of nodes to run the inference on.

checkpoint

str

Path to the checkpoint used for inference.

trt_engine

Optional[str]

None

Path to the TensorRT engine to be used for inference.

results_dir

Optional[str]

None

Path to where all the assets are stored.

export#

The export config contains the parameters related to export. They are described as follows:

Set the export checkpoint path in the export specification:

export:
  results_dir: "${results_dir}/export"
  gpu_id: 0
  checkpoint: ${results_dir}/train/segformer_model_latest.pth
  onnx_file: "${export.results_dir}/segformer.onnx"
  input_width: 224
  input_height: 224
  batch_size: -1

Parameter

Datatype

Default

Description

Supported Values

results_dir

Optional[str]

None

Path to where all the assets are stored.

gpu_ids

int

0

The index of the GPU to build the TensorRT engine.

checkpoint

str

Path to the checkpoint file to run export.

onnx_file

str

Path to the onnx model file.

on_cpu

bool

False

Flag to export CPU compatible model.

True,False

input_channel

int

3

Number of channels in the input Tensor.

1,3

input_width

int

960

Width of the input image tensor.

input_height

int

544

Height of the input image tensor.

opset_version

int

17

Operator set version.

batch_size

int

-1

The batch size of the input Tensor for the engine.

model#

The following example model provides options to define the SegFormer backbone and decoder head.

model:
  backbone:
    type: "vit_large_nvdinov2"
    pretrained_backbone_path: <path_to_pretrained_weight>
    freeze_backbone: False
  decode_head:
    feature_strides: [4, 8, 16, 32]

Parameter

Datatype

Default

Description

Supported Values

backbone

dict config

The configuration of the backbone.

decode_head

dict config

The configuration of the decoder head.

backbone#

Parameter

Datatype

Default

Description

Supported Values

type













str













fan_small_12_p4_hybrid













The name of the backbone to be used













mit_b0, mit_b1
mit_b2, mit_b3
mit_b4, mit_b5
fan_tiny_8_p4_hybrid
fan_large_16_p4_hybrid
fan_small_12_p4_hybrid
fan_base_16_p4_hybrid
vit_large_nvdinov2
vit_giant_nvdinov2
vit_base_nvclip_16_siglip
vit_huge_nvclip_14_siglip
c_radio_v2_vit_base_patch16_224
c_radio_v2_vit_large_patch16_224
c_radio_v2_vit_huge_patch16_224

pretrained_backbone_path

str

Path to the pretrained model

freeze_backbone

bool

False

Flag to freeze backbone

True,False

decode_head#

Parameter

Datatype

Default

Description

Supported Values

feature_strides

List[int]

[4, 8, 16, 32]

Feature strides for the head.

dataset#

The dataset parameter defines the dataset source, training batch size, and augmentation. An example dataset is provided below.

dataset:
  segment:
    dataset: "SFDataset"
    root_dir: <dataset_root>
    batch_size: 32
    workers: 8
    num_classes: 6
    img_size: 224
    train_split: "train"
    validation_split: "val"
    test_split: "val"
    predict_split: "val"
    augmentation:
      random_flip:
        vflip_probability: 0.5
        hflip_probability: 0.5
        enable: True
      random_rotate:
        rotate_probability: 0.5
        angle_list: [90, 180, 270]
        enable: True
      random_color:
        brightness: 0.3
        contrast: 0.3
        saturation: 0.3
        hue: 0.3
        enable: False
      with_scale_random_crop:
        enable: True
      with_random_crop: True
      with_random_blur: False
    label_transform: None
    palette:
      - seg_class: urban
        rgb:
          - 0
          - 255
          - 255
        label_id: 0
        mapping_class: urban
      - seg_class: agriculture
        rgb:
          - 255
          - 255
          - 0
        label_id: 1
        mapping_class: agriculture
      - seg_class: rangeland
        rgb:
          - 255
          - 0
          - 255
        label_id: 2
        mapping_class: rangeland
      - seg_class: forest
        rgb:
          - 0
          - 255
          - 0
        label_id: 3
        mapping_class: forest
      - seg_class: water
        rgb:
          - 0
          - 0
          - 255
        label_id: 4
        mapping_class: water
      - seg_class: barren
        rgb:
          - 255
          - 255
          - 255
        label_id: 5
        mapping_class: barren
      - seg_class: unknown
        rgb:
          - 0
          - 0
          - 0
        label_id: 255
        mapping_class: unknown

Parameter

Datatype

Default

Description

Supported Values

segment

dict config

Segmentation Dataset Config.

segment#

Parameter

Datatype

Default

Description

Supported Values

root_dir

str

Path to root directory for dataset.

dataset

str

SFDataset

dataset class.

SFDataset

num_classes

int

2

The number of classes in the training data.

img_size

int

256

The input image size.

batch_size

int

8

Batch size.

workers

int

1

Workers.

shuffle

bool

True

Shuffle dataloader.

True,False

train_split

str

train

Train split folder name.

validation_split

str

val

Validation split folder name.

test_split

str

val

Test split folder name.

predict_split

str

test

Predict split folder name.

augmentation

dict config

Augmentation.

label_transform

str

norm

label transform.

norm,None

palette

List[Dict]

{“label_id”: 0, “mapping_class”: “foreground”, “rgb”: [0, 0, 0], “seg_class”: “foreground”}
{“label_id”: 1, “mapping_class”: “background”, “rgb”: [1, 1, 1], “seg_class”: “background”}
Palette, be careful of label_transform, if norm then RGB value from 0~1, else 0~255.

augmentation#

Parameter

Datatype

Default

Description

Supported Values

random_flip

dict config

RandomFlip augmentation config.

random_rotate

dict config

RandomRotation augmentation config.

random_color

dict config

RandomColor augmentation config.

with_scale_random_crop

dict config

RandomCropWithScale augmentation config.

with_random_blur

bool

Flag to enable with_random_blur.

with_random_crop

bool

Flag to enable with_random_crop.

mean

List[float]

Mean for the augmentation.

std

List[float]

Standard deviation for the augmentation.

RandomFlip#

Parameter

Datatype

Default

Description

Supported Values

vflip_probability

float

0.5

Vertical Flip probability.

hflip_probability

float

0.5

Horizontal Flip probability.

enable

bool

True

Flag to enable augmentation.

True,False

RandomRotation#

Parameter

Datatype

Default

Description

Supported Values

rotate_probability

float

0.5

Random Rotate probability.

angle_list

List[float]

[90, 180, 270]

Random rotate angle.

enable

bool

True

Flag to enable augmentation.

True,False

RandomColor#

Parameter

Datatype

Default

Description

Supported Values

brightness

float

0.3

Random Color Brightness.

contrast

float

0.3

Random Color Contrast.

saturation

float

0.3

Random Color Saturation.

hue

float

0.3

Random Color Hue.

enable

bool

True

Flag to enable Random Color.

True,False

color_probability

float

0.5

Random Color Probability.

RandomCropWithScale#

Parameter

Datatype

Default

Description

Supported Values

scale_range

float

[1, 1.2]

Random Scale range.

enable

bool

True

Flag to enable augmentation.

True,False

Evaluating the model#

The evaluation metric of SegFormer is mIoU. For more details on the mean IoU metric, refer to mIoU.

Here’s an example of Segformer evaluation output:

+------------+-------+-------+
| Class      | IoU   | Acc   |
+------------+-------+-------+
| foreground | 37.81 | 44.56 |
| background | 83.81 | 95.51 |
+------------+-------+-------+
Summary:

+--------+-------+-------+-------+
| Scope  | mIoU  | mAcc  | aAcc  |
+--------+-------+-------+-------+
| global | 60.81 | 70.03 | 85.26 |
+--------+-------+-------+-------+
  ...

Running Inference on the Model#

The output mask PNG images with class ID’s is saved in vis_tao. The overlaid mask images are saved in mask_tao.

Quantization#

SegFormer supports PTQ via TAO Quant using either the torchao (weight-only) or modelopt (static PTQ) backends.

  • Add a quantize section to your experiment specification (see TAO Quant documentation for schema and backend options).

  • Use the quantized checkpoint by setting evaluate.is_quantized: true or inference.is_quantized: true and pointing to the artifact saved under results_dir (for example, quantized_model_torchao.pth or quantized_model_modelopt.pth). For ModelOpt artifacts, the model weights are stored under model_state_dict.

Notes#

  • For modelopt static PTQ, ensure that your dataset configuration provides a representative calibration loader.

  • For torchao, activation settings in the configuration are ignored.

Calibration Dataset (ModelOpt)#

When you use the modelopt backend (static PTQ), provide a calibration dataset via dataset.segment.quant_calibration_dataset.

Minimal example:

quantize:
  backend: "modelopt"
  mode: "static_ptq"
  algorithm: "minmax"
dataset:
  segment:
    quant_calibration_dataset:
      images_dir: "/path/to/calib/images"

See also: TAO Quant overview and its Configuration and backend pages.

Deploying to DeepStream#

Refer to the Integrating a SegFormer Model page for more information about deploying a SegFormer model to DeepStream.