SegFormer#

SegFormer is an NVIDIA-developed semantic-segmentation model that is included in TAO. SegFormer supports the following tasks:

train
evaluate
inference
export
quantize

Data Input for SegFormer#

Segformer requires the data to be provided as image and mask folders. See the Data Annotation Format page for more information about the input data format for Segformer.

Creating Training Experiment Specification File#

Configuration for Custom Dataset#

Here is an example specification file for training a SegFormer model with an NVDINOv2 backbone.

Please noted that the specification file is for reference. The user should create their own specification file based on their own dataset.

The experiment specification consists of several main components:

train
evaluate
inference
export
model
dataset
gen_trt_engine

train#

The train config contains the parameters related to training. They are described as follows:

train:
  resume_training_checkpoint_path: null
  segment:
    loss: "ce"
  num_epochs: 50
  num_nodes: 1
  validation_interval: 1
  checkpoint_interval: 50
  optim:
    lr: 0.0001
    optim: "adamw"
    policy: "linear"
    weight_decay: 0.0005

Parameter	Datatype	Default	Description	Supported Values
`optim`	dict config	–	Optimizer config.	–
`pretrained_model_path`	str	None	Pretrained model path.	–
`segment`	dict config	–	Segmentation loss Config.	–
`num_gpus`	int	1	The number of GPUs to run the train job.	–
`gpu_ids`	List[int]	[0]	List of GPU IDs to run the training on.	–
`num_nodes`	int	1	Number of nodes to run the training on.	–
`seed`	int	1234	The seed for the initializer in PyTorch.	–
`num_epochs`	int	10	Number of epochs to run the training.	–
`checkpoint_interval`	int	1	Checkpoint interval.	–
`validation_interval`	int	1	Validation interval.	–
`resume_training_checkpoint_path`	str	None	Path to the checkpoint to resume training	–
`results_dir`	str	None	Path to where all the assets are stored.	–

optim#

Parameter	Datatype	Default	Description	Supported Values
`monitor_name`	str	val_loss	Monitor Name	–
`optim`	str	adamw	Optimizer	adamw,adam,sgd
`lr`	float	0.00006	Optimizer learning rate	–
`policy`	str	linear	Optimizer policy	linear,step
`momentum`	float	0.9	The momentum for the AdamW optimizer.	–
`weight_decay`	float	0.01	The weight decay coefficient.	–

segment#

Parameter	Datatype	Default	Description	Supported Values
`loss`	str	ce	Segment loss	ce
`weights`	List[float]	[0.5, 0.5, 0.5, 0.8, 1.0]	Multi-scale Segment loss weight	–

tensorboard#

Parameter	Datatype	Default	Description	Supported Values
`enabled`	bool	False	Flag to enable tensorboard	–
`infrequent_logging_frequency`	int	2	infrequent_logging_frequency	–

evaluate#

The evaluate config contains the parameters related to training. They are described as follows:

Set the evaluate checkpoint path in the evaluate specification:

evaluate:
  checkpoint: ${results_dir}/train/segformer_model_latest.pth
  vis_after_n_batches: 1

Parameter	Datatype	Default	Description	Supported Values
`vis_after_n_batches`	int	1	Visualize evaluation segmentation results after n batches.	–
`batch_size`	int	8	Batch Size.	–
`checkpoint`	str	–	Path to checkpoint file.	–
`num_gpus`	int	1	The number of GPUs to run the evaluate job.	–
`gpu_ids`	List[int]	[0]	List of GPU IDs to run the evaluate on.	–
`num_nodes`	int	1	Number of nodes to run the evaluate on.	–
`checkpoint`	str	–	Path to the checkpoint used for evaluation.	–
`trt_engine`	Optional[str]	None	Path to the TensorRT engine to be used for evaluation.	–
`results_dir`	Optional[str]	None	Path to where all the assets are stored.	–

inference#

The inference config contains the parameters related to training. They are described as follows:

Set the inference checkpoint path in the inference specification:

inference:
  checkpoint: ${results_dir}/train/segformer_model_latest.pth
  vis_after_n_batches: 1

Parameter	Datatype	Default	Description	Supported Values
`vis_after_n_batches`	int	1	Visualize inference segmentation results after n batches.	–
`batch_size`	int	8	Batch Size.	–
`checkpoint`	str	–	Path to checkpoint file.	–
`num_gpus`	int	1	The number of GPUs to run the inference job.	–
`gpu_ids`	List[int]	[0]	List of GPU IDs to run the inference on.	–
`num_nodes`	int	1	Number of nodes to run the inference on.	–
`checkpoint`	str	–	Path to the checkpoint used for inference.	–
`trt_engine`	Optional[str]	None	Path to the TensorRT engine to be used for inference.	–
`results_dir`	Optional[str]	None	Path to where all the assets are stored.	–

export#

The export config contains the parameters related to export. They are described as follows:

Set the export checkpoint path in the export specification:

export:
  results_dir: "${results_dir}/export"
  gpu_id: 0
  checkpoint: ${results_dir}/train/segformer_model_latest.pth
  onnx_file: "${export.results_dir}/segformer.onnx"
  input_width: 224
  input_height: 224
  batch_size: -1

Parameter	Datatype	Default	Description	Supported Values
`results_dir`	Optional[str]	None	Path to where all the assets are stored.	–
`gpu_ids`	int	0	The index of the GPU to build the TensorRT engine.	–
`checkpoint`	str	–	Path to the checkpoint file to run export.	–
`onnx_file`	str	–	Path to the onnx model file.	–
`on_cpu`	bool	False	Flag to export CPU compatible model.	True,False
`input_channel`	int	3	Number of channels in the input Tensor.	1,3
`input_width`	int	960	Width of the input image tensor.	–
`input_height`	int	544	Height of the input image tensor.	–
`opset_version`	int	17	Operator set version.	–
`batch_size`	int	-1	The batch size of the input Tensor for the engine.	–

model#

The following example model provides options to define the SegFormer backbone and decoder head.

model:
  backbone:
    type: "vit_large_nvdinov2"
    pretrained_backbone_path: <path_to_pretrained_weight>
    freeze_backbone: False
  decode_head:
    feature_strides: [4, 8, 16, 32]

Parameter	Datatype	Default	Description	Supported Values
`backbone`	dict config	–	The configuration of the backbone.
`decode_head`	dict config	–	The configuration of the decoder head.

backbone#

Parameter	Datatype	Default	Description	Supported Values
`type`	str	fan_small_12_p4_hybrid	The name of the backbone to be used	mit_b0, mit_b1 mit_b2, mit_b3 mit_b4, mit_b5 fan_tiny_8_p4_hybrid fan_large_16_p4_hybrid fan_small_12_p4_hybrid fan_base_16_p4_hybrid vit_large_nvdinov2 vit_giant_nvdinov2 vit_base_nvclip_16_siglip vit_huge_nvclip_14_siglip c_radio_v2_vit_base_patch16_224 c_radio_v2_vit_large_patch16_224 c_radio_v2_vit_huge_patch16_224
`pretrained_backbone_path`	str	–	Path to the pretrained model	–
`freeze_backbone`	bool	False	Flag to freeze backbone	True,False

decode_head#

Parameter	Datatype	Default	Description	Supported Values
`feature_strides`	List[int]	[4, 8, 16, 32]	Feature strides for the head.	–

dataset#

The dataset parameter defines the dataset source, training batch size, and augmentation. An example dataset is provided below.

dataset:
  segment:
    dataset: "SFDataset"
    root_dir: <dataset_root>
    batch_size: 32
    workers: 8
    num_classes: 6
    img_size: 224
    train_split: "train"
    validation_split: "val"
    test_split: "val"
    predict_split: "val"
    augmentation:
      random_flip:
        vflip_probability: 0.5
        hflip_probability: 0.5
        enable: True
      random_rotate:
        rotate_probability: 0.5
        angle_list: [90, 180, 270]
        enable: True
      random_color:
        brightness: 0.3
        contrast: 0.3
        saturation: 0.3
        hue: 0.3
        enable: False
      with_scale_random_crop:
        enable: True
      with_random_crop: True
      with_random_blur: False
    label_transform: None
    palette:
      - seg_class: urban
        rgb:
          - 0
          - 255
          - 255
        label_id: 0
        mapping_class: urban
      - seg_class: agriculture
        rgb:
          - 255
          - 255
          - 0
        label_id: 1
        mapping_class: agriculture
      - seg_class: rangeland
        rgb:
          - 255
          - 0
          - 255
        label_id: 2
        mapping_class: rangeland
      - seg_class: forest
        rgb:
          - 0
          - 255
          - 0
        label_id: 3
        mapping_class: forest
      - seg_class: water
        rgb:
          - 0
          - 0
          - 255
        label_id: 4
        mapping_class: water
      - seg_class: barren
        rgb:
          - 255
          - 255
          - 255
        label_id: 5
        mapping_class: barren
      - seg_class: unknown
        rgb:
          - 0
          - 0
          - 0
        label_id: 255
        mapping_class: unknown

Parameter	Datatype	Default	Description	Supported Values
`segment`	dict config	–	Segmentation Dataset Config.	–

segment#

Parameter	Datatype	Default	Description	Supported Values
`root_dir`	str	–	Path to root directory for dataset.	–
`dataset`	str	SFDataset	dataset class.	SFDataset
`num_classes`	int	2	The number of classes in the training data.	–
`img_size`	int	256	The input image size.	–
`batch_size`	int	8	Batch size.	–
`workers`	int	1	Workers.	–
`shuffle`	bool	True	Shuffle dataloader.	True,False
`train_split`	str	train	Train split folder name.	–
`validation_split`	str	val	Validation split folder name.	–
`test_split`	str	val	Test split folder name.	–
`predict_split`	str	test	Predict split folder name.	–
`augmentation`	dict config	–	Augmentation.	–
`label_transform`	str	norm	label transform.	norm,None
`palette`	List[Dict]	{“label_id”: 0, “mapping_class”: “foreground”, “rgb”: [0, 0, 0], “seg_class”: “foreground”} {“label_id”: 1, “mapping_class”: “background”, “rgb”: [1, 1, 1], “seg_class”: “background”}	Palette, be careful of label_transform, if norm then RGB value from 0~1, else 0~255.	– –

augmentation#

Parameter	Datatype	Default	Description	Supported Values
`random_flip`	dict config	–	RandomFlip augmentation config.	–
`random_rotate`	dict config	–	RandomRotation augmentation config.	–
`random_color`	dict config	–	RandomColor augmentation config.	–
`with_scale_random_crop`	dict config	–	RandomCropWithScale augmentation config.	–
`with_random_blur`	bool	–	Flag to enable with_random_blur.	–
`with_random_crop`	bool	–	Flag to enable with_random_crop.	–
`mean`	List[float]	–	Mean for the augmentation.	–
`std`	List[float]	–	Standard deviation for the augmentation.	–

RandomFlip#

Parameter	Datatype	Default	Description	Supported Values
`vflip_probability`	float	0.5	Vertical Flip probability.	–
`hflip_probability`	float	0.5	Horizontal Flip probability.	–
`enable`	bool	True	Flag to enable augmentation.	True,False

RandomRotation#

Parameter	Datatype	Default	Description	Supported Values
`rotate_probability`	float	0.5	Random Rotate probability.	–
`angle_list`	List[float]	[90, 180, 270]	Random rotate angle.	–
`enable`	bool	True	Flag to enable augmentation.	True,False

RandomColor#

Parameter	Datatype	Default	Description	Supported Values
`brightness`	float	0.3	Random Color Brightness.	–
`contrast`	float	0.3	Random Color Contrast.	–
`saturation`	float	0.3	Random Color Saturation.	–
`hue`	float	0.3	Random Color Hue.	–
`enable`	bool	True	Flag to enable Random Color.	True,False
`color_probability`	float	0.5	Random Color Probability.	–

RandomCropWithScale#

Parameter	Datatype	Default	Description	Supported Values
`scale_range`	float	[1, 1.2]	Random Scale range.	–
`enable`	bool	True	Flag to enable augmentation.	True,False

Evaluating the model#

The evaluation metric of SegFormer is mIoU. For more details on the mean IoU metric, refer to mIoU.

Here’s an example of Segformer evaluation output:

+------------+-------+-------+
| Class      | IoU   | Acc   |
+------------+-------+-------+
| foreground | 37.81 | 44.56 |
| background | 83.81 | 95.51 |
+------------+-------+-------+
Summary:

+--------+-------+-------+-------+
| Scope  | mIoU  | mAcc  | aAcc  |
+--------+-------+-------+-------+
| global | 60.81 | 70.03 | 85.26 |
+--------+-------+-------+-------+
  ...

Running Inference on the Model#

The output mask PNG images with class ID’s is saved in vis_tao. The overlaid mask images are saved in mask_tao.

Quantization#

SegFormer supports PTQ via TAO Quant using either the torchao (weight-only) or modelopt (static PTQ) backends.

Add a quantize section to your experiment specification (see TAO Quant documentation for schema and backend options).
Use the quantized checkpoint by setting evaluate.is_quantized: true or inference.is_quantized: true and pointing to the artifact saved under results_dir (for example, quantized_model_torchao.pth or quantized_model_modelopt.pth). For ModelOpt artifacts, the model weights are stored under model_state_dict.

Notes#

For modelopt static PTQ, ensure that your dataset configuration provides a representative calibration loader.
For torchao, activation settings in the configuration are ignored.

Calibration Dataset (ModelOpt)#

When you use the modelopt backend (static PTQ), provide a calibration dataset via dataset.segment.quant_calibration_dataset.

Minimal example:

quantize:
  backend: "modelopt"
  mode: "static_ptq"
  algorithm: "minmax"
dataset:
  segment:
    quant_calibration_dataset:
      images_dir: "/path/to/calib/images"

See also: TAO Quant overview and its Configuration and backend pages.

Deploying to DeepStream#

Refer to the Integrating a SegFormer Model page for more information about deploying a SegFormer model to DeepStream.