SegFormer#
SegFormer is an NVIDIA-developed semantic-segmentation model that is included in TAO. SegFormer supports the following tasks:
trainevaluateinferenceexportquantize
Data Input for SegFormer#
Segformer requires the data to be provided as image and mask folders. See the Data Annotation Format page for more information about the input data format for Segformer.
Creating Training Experiment Specification File#
Configuration for Custom Dataset#
Here is an example specification file for training a SegFormer model with an NVDINOv2 backbone.
Please noted that the specification file is for reference. The user should create their own specification file based on their own dataset.
The experiment specification consists of several main components:
trainevaluateinferenceexportmodeldatasetgen_trt_engine
train#
The train config contains the parameters related to training. They are described as follows:
train:
resume_training_checkpoint_path: null
segment:
loss: "ce"
num_epochs: 50
num_nodes: 1
validation_interval: 1
checkpoint_interval: 50
optim:
lr: 0.0001
optim: "adamw"
policy: "linear"
weight_decay: 0.0005
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
|
dict config |
– |
Optimizer config. |
– |
|
str |
None |
Pretrained model path. |
– |
|
dict config |
– |
Segmentation loss Config. |
– |
|
int |
1 |
The number of GPUs to run the train job. |
– |
|
List[int] |
[0] |
List of GPU IDs to run the training on. |
– |
|
int |
1 |
Number of nodes to run the training on. |
– |
|
int |
1234 |
The seed for the initializer in PyTorch. |
– |
|
int |
10 |
Number of epochs to run the training. |
– |
|
int |
1 |
Checkpoint interval. |
– |
|
int |
1 |
Validation interval. |
– |
|
str |
None |
Path to the checkpoint to resume training |
– |
|
str |
None |
Path to where all the assets are stored. |
– |
optim#
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
|
str |
val_loss |
Monitor Name |
– |
|
str |
adamw |
Optimizer |
adamw,adam,sgd |
|
float |
0.00006 |
Optimizer learning rate |
– |
|
str |
linear |
Optimizer policy |
linear,step |
|
float |
0.9 |
The momentum for the AdamW optimizer. |
– |
|
float |
0.01 |
The weight decay coefficient. |
– |
segment#
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
|
str |
ce |
Segment loss |
ce |
|
List[float] |
[0.5, 0.5, 0.5, 0.8, 1.0] |
Multi-scale Segment loss weight |
– |
tensorboard#
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
|
bool |
False |
Flag to enable tensorboard |
– |
|
int |
2 |
infrequent_logging_frequency |
– |
evaluate#
The evaluate config contains the parameters related to training. They are described as follows:
Set the evaluate checkpoint path in the evaluate specification:
evaluate:
checkpoint: ${results_dir}/train/segformer_model_latest.pth
vis_after_n_batches: 1
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
|
int |
1 |
Visualize evaluation segmentation results after n batches. |
– |
|
int |
8 |
Batch Size. |
– |
|
str |
– |
Path to checkpoint file. |
– |
|
int |
1 |
The number of GPUs to run the evaluate job. |
– |
|
List[int] |
[0] |
List of GPU IDs to run the evaluate on. |
– |
|
int |
1 |
Number of nodes to run the evaluate on. |
– |
|
str |
– |
Path to the checkpoint used for evaluation. |
– |
|
Optional[str] |
None |
Path to the TensorRT engine to be used for evaluation. |
– |
|
Optional[str] |
None |
Path to where all the assets are stored. |
– |
inference#
The inference config contains the parameters related to training. They are described as follows:
Set the inference checkpoint path in the inference specification:
inference:
checkpoint: ${results_dir}/train/segformer_model_latest.pth
vis_after_n_batches: 1
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
|
int |
1 |
Visualize inference segmentation results after n batches. |
– |
|
int |
8 |
Batch Size. |
– |
|
str |
– |
Path to checkpoint file. |
– |
|
int |
1 |
The number of GPUs to run the inference job. |
– |
|
List[int] |
[0] |
List of GPU IDs to run the inference on. |
– |
|
int |
1 |
Number of nodes to run the inference on. |
– |
|
str |
– |
Path to the checkpoint used for inference. |
– |
|
Optional[str] |
None |
Path to the TensorRT engine to be used for inference. |
– |
|
Optional[str] |
None |
Path to where all the assets are stored. |
– |
export#
The export config contains the parameters related to export. They are described as follows:
Set the export checkpoint path in the export specification:
export:
results_dir: "${results_dir}/export"
gpu_id: 0
checkpoint: ${results_dir}/train/segformer_model_latest.pth
onnx_file: "${export.results_dir}/segformer.onnx"
input_width: 224
input_height: 224
batch_size: -1
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
|
Optional[str] |
None |
Path to where all the assets are stored. |
– |
|
int |
0 |
The index of the GPU to build the TensorRT engine. |
– |
|
str |
– |
Path to the checkpoint file to run export. |
– |
|
str |
– |
Path to the onnx model file. |
– |
|
bool |
False |
Flag to export CPU compatible model. |
True,False |
|
int |
3 |
Number of channels in the input Tensor. |
1,3 |
|
int |
960 |
Width of the input image tensor. |
– |
|
int |
544 |
Height of the input image tensor. |
– |
|
int |
17 |
Operator set version. |
– |
|
int |
-1 |
The batch size of the input Tensor for the engine. |
– |
model#
The following example model provides options to define the SegFormer backbone and decoder head.
model:
backbone:
type: "vit_large_nvdinov2"
pretrained_backbone_path: <path_to_pretrained_weight>
freeze_backbone: False
decode_head:
feature_strides: [4, 8, 16, 32]
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
|
dict config |
– |
The configuration of the backbone. |
|
|
dict config |
– |
The configuration of the decoder head. |
backbone#
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
type |
str
|
fan_small_12_p4_hybrid
|
The name of the backbone to be used
|
mit_b0, mit_b1
mit_b2, mit_b3
mit_b4, mit_b5
fan_tiny_8_p4_hybrid
fan_large_16_p4_hybrid
fan_small_12_p4_hybrid
fan_base_16_p4_hybrid
vit_large_nvdinov2
vit_giant_nvdinov2
vit_base_nvclip_16_siglip
vit_huge_nvclip_14_siglip
c_radio_v2_vit_base_patch16_224
c_radio_v2_vit_large_patch16_224
c_radio_v2_vit_huge_patch16_224
|
|
str |
– |
Path to the pretrained model |
– |
|
bool |
False |
Flag to freeze backbone |
True,False |
decode_head#
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
|
List[int] |
[4, 8, 16, 32] |
Feature strides for the head. |
– |
dataset#
The dataset parameter defines the dataset source, training batch size, and
augmentation. An example dataset is provided below.
dataset:
segment:
dataset: "SFDataset"
root_dir: <dataset_root>
batch_size: 32
workers: 8
num_classes: 6
img_size: 224
train_split: "train"
validation_split: "val"
test_split: "val"
predict_split: "val"
augmentation:
random_flip:
vflip_probability: 0.5
hflip_probability: 0.5
enable: True
random_rotate:
rotate_probability: 0.5
angle_list: [90, 180, 270]
enable: True
random_color:
brightness: 0.3
contrast: 0.3
saturation: 0.3
hue: 0.3
enable: False
with_scale_random_crop:
enable: True
with_random_crop: True
with_random_blur: False
label_transform: None
palette:
- seg_class: urban
rgb:
- 0
- 255
- 255
label_id: 0
mapping_class: urban
- seg_class: agriculture
rgb:
- 255
- 255
- 0
label_id: 1
mapping_class: agriculture
- seg_class: rangeland
rgb:
- 255
- 0
- 255
label_id: 2
mapping_class: rangeland
- seg_class: forest
rgb:
- 0
- 255
- 0
label_id: 3
mapping_class: forest
- seg_class: water
rgb:
- 0
- 0
- 255
label_id: 4
mapping_class: water
- seg_class: barren
rgb:
- 255
- 255
- 255
label_id: 5
mapping_class: barren
- seg_class: unknown
rgb:
- 0
- 0
- 0
label_id: 255
mapping_class: unknown
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
|
dict config |
– |
Segmentation Dataset Config. |
– |
segment#
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
|
str |
– |
Path to root directory for dataset. |
– |
|
str |
SFDataset |
dataset class. |
SFDataset |
|
int |
2 |
The number of classes in the training data. |
– |
|
int |
256 |
The input image size. |
– |
|
int |
8 |
Batch size. |
– |
|
int |
1 |
Workers. |
– |
|
bool |
True |
Shuffle dataloader. |
True,False |
|
str |
train |
Train split folder name. |
– |
|
str |
val |
Validation split folder name. |
– |
|
str |
val |
Test split folder name. |
– |
|
str |
test |
Predict split folder name. |
– |
|
dict config |
– |
Augmentation. |
– |
|
str |
norm |
label transform. |
norm,None |
palette |
List[Dict]
|
{“label_id”: 0, “mapping_class”: “foreground”, “rgb”: [0, 0, 0], “seg_class”: “foreground”}
{“label_id”: 1, “mapping_class”: “background”, “rgb”: [1, 1, 1], “seg_class”: “background”}
|
Palette, be careful of label_transform, if norm then RGB value from 0~1, else 0~255.
|
–
–
|
augmentation#
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
|
dict config |
– |
RandomFlip augmentation config. |
– |
|
dict config |
– |
RandomRotation augmentation config. |
– |
|
dict config |
– |
RandomColor augmentation config. |
– |
|
dict config |
– |
RandomCropWithScale augmentation config. |
– |
|
bool |
– |
Flag to enable with_random_blur. |
– |
|
bool |
– |
Flag to enable with_random_crop. |
– |
|
List[float] |
– |
Mean for the augmentation. |
– |
|
List[float] |
– |
Standard deviation for the augmentation. |
– |
RandomFlip#
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
|
float |
0.5 |
Vertical Flip probability. |
– |
|
float |
0.5 |
Horizontal Flip probability. |
– |
|
bool |
True |
Flag to enable augmentation. |
True,False |
RandomRotation#
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
|
float |
0.5 |
Random Rotate probability. |
– |
|
List[float] |
[90, 180, 270] |
Random rotate angle. |
– |
|
bool |
True |
Flag to enable augmentation. |
True,False |
RandomColor#
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
|
float |
0.3 |
Random Color Brightness. |
– |
|
float |
0.3 |
Random Color Contrast. |
– |
|
float |
0.3 |
Random Color Saturation. |
– |
|
float |
0.3 |
Random Color Hue. |
– |
|
bool |
True |
Flag to enable Random Color. |
True,False |
|
float |
0.5 |
Random Color Probability. |
– |
RandomCropWithScale#
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
|
float |
[1, 1.2] |
Random Scale range. |
– |
|
bool |
True |
Flag to enable augmentation. |
True,False |
Evaluating the model#
The evaluation metric of SegFormer is mIoU. For more details on the mean IoU metric, refer to mIoU.
Here’s an example of Segformer evaluation output:
+------------+-------+-------+
| Class | IoU | Acc |
+------------+-------+-------+
| foreground | 37.81 | 44.56 |
| background | 83.81 | 95.51 |
+------------+-------+-------+
Summary:
+--------+-------+-------+-------+
| Scope | mIoU | mAcc | aAcc |
+--------+-------+-------+-------+
| global | 60.81 | 70.03 | 85.26 |
+--------+-------+-------+-------+
...
Running Inference on the Model#
The output mask PNG images with class ID’s is saved in vis_tao.
The overlaid mask images are saved in mask_tao.
Quantization#
SegFormer supports PTQ via TAO Quant using either the torchao (weight-only) or modelopt (static PTQ) backends.
Add a
quantizesection to your experiment specification (see TAO Quant documentation for schema and backend options).Use the quantized checkpoint by setting
evaluate.is_quantized: trueorinference.is_quantized: trueand pointing to the artifact saved underresults_dir(for example,quantized_model_torchao.pthorquantized_model_modelopt.pth). For ModelOpt artifacts, the model weights are stored undermodel_state_dict.
Notes#
For
modeloptstatic PTQ, ensure that your dataset configuration provides a representative calibration loader.For
torchao, activation settings in the configuration are ignored.
Calibration Dataset (ModelOpt)#
When you use the modelopt backend (static PTQ), provide a calibration dataset via dataset.segment.quant_calibration_dataset.
Minimal example:
quantize:
backend: "modelopt"
mode: "static_ptq"
algorithm: "minmax"
dataset:
segment:
quant_calibration_dataset:
images_dir: "/path/to/calib/images"
See also: TAO Quant overview and its Configuration and backend pages.
Deploying to DeepStream#
Refer to the Integrating a SegFormer Model page for more information about deploying a SegFormer model to DeepStream.