SegFormer#

SegFormer is an NVIDIA-developed semantic-segmentation model that is included in TAO. SegFormer supports the following tasks:

  • train

  • evaluate

  • inference

  • export

These tasks can be invoked from the TAO Launcher using the following convention on the command-line:

SPECS=$(tao-client segformer get-spec --action <sub_task> --job_type experiment --id $EXPERIMENT_ID)

JOB_ID=$(tao-client segformer experiment-run-action --action <sub_task> --id $EXPERIMENT_ID --specs "$SPECS")

Required Arguments

  • --id: The unique identifier of the experiment from which to train the model

See also

For information on how to create an experiment using the FTMS client, refer to the Creating an experiment section in the Remote Client documentation.

Data Input for SegFormer#

Segformer requires the data to be provided as image and mask folders. See the Data Annotation Format page for more information about the input data format for Segformer.

Creating Training Experiment Spec File#

Configuration for Custom Dataset#

  • In this doucmentation, we show example configuration and commands for training on multi-class dataset. For more details, please refer to the example notebook TAO Computer Vision samples.

Here is an example spec file for training a SegFormer model with an NVDINOv2 backbone.

Please noted that the spec file is for reference. The user should create their own spec file based on their own dataset.

We first need to set the base_experiment.

FILTER_PARAMS='{"network_arch": "segformer"}'

$BASE_EXPERIMENTS=$(tao-client segformer list-base-experiments --filter_params "$FILTER_PARAMS")

Retrieve the PTM_ID for NVDINOv2 backbone from $BASE_EXPERIMENTS before setting base_experiment.

PTM_INFORMATION="{\"base_experiment\": [$PTM_ID]}"

tao-client segformer patch-artifact-metadata --id $EXPERIMENT_ID --job_type experiment --update_info $PTM_INFORMATION

Then retrieve the specifications.

TRAIN_SPECS=$(tao-client getformer get-spec --action train --job_type experiment --id $EXPERIMENT_ID)

Get specifications from $TRAIN_SPECS. You can override values as needed.

The experiment specification consists of several main components:

  • train

  • evaluate

  • inference

  • export

  • model

  • dataset

  • gen_trt_engine

train#

The train config contains the parameters related to training. They are described as follows:

Note

For FTMS Client, these parameters are set in json format.

train:
  resume_training_checkpoint_path: null
  segment:
    loss: "ce"
  num_epochs: 50
  num_nodes: 1
  validation_interval: 1
  checkpoint_interval: 50
  optim:
    lr: 0.0001
    optim: "adamw"
    policy: "linear"
    weight_decay: 0.0005

Parameter

Datatype

Default

Description

Supported Values

optim

dict config

Optimizer config.

pretrained_model_path

str

None

Pretrained model path.

segment

dict config

Segmentation loss Config.

num_gpus

int

1

The number of GPUs to run the train job.

gpu_ids

List[int]

[0]

List of GPU IDs to run the training on.

num_nodes

int

1

Number of nodes to run the training on.

seed

int

1234

The seed for the initializer in PyTorch.

num_epochs

int

10

Number of epochs to run the training.

checkpoint_interval

int

1

Checkpoint interval.

validation_interval

int

1

Validation interval.

resume_training_checkpoint_path

str

None

Path to the checkpoint to resume training

results_dir

str

None

Path to where all the assets are stored.

optim#

Parameter

Datatype

Default

Description

Supported Values

monitor_name

str

val_loss

Monitor Name

optim

str

adamw

Optimizer

adamw,adam,sgd

lr

float

0.00006

Optimizer learning rate

policy

str

linear

Optimizer policy

linear,step

momentum

float

0.9

The momentum for the AdamW optimizer.

weight_decay

float

0.01

The weight decay coefficient.

segment#

Parameter

Datatype

Default

Description

Supported Values

loss

str

ce

Segment loss

ce

weights

List[float]

[0.5, 0.5, 0.5, 0.8, 1.0]

Multi-scale Segment loss weight

tensorboard#

Parameter

Datatype

Default

Description

Supported Values

enabled

bool

False

Flag to enable tensorboard

infrequent_logging_frequency

int

2

infrequent_logging_frequency

evaluate#

The evaluate config contains the parameters related to training. They are described as follows:

Note

For FTMS Client, these parameters are set in json format and the evaluate checkpoint is deduced from the previous train job ID as specified with the –parent_job_id argument. For TAO Launcher, one must set the path in the evaluate specification:

evaluate:
  checkpoint: ${results_dir}/train/segformer_model_latest.pth
  vis_after_n_batches: 1

Parameter

Datatype

Default

Description

Supported Values

vis_after_n_batches

int

1

Visualize evaluation segmentation results after n batches.

batch_size

int

8

Batch Size.

checkpoint

str

Path to checkpoint file.

num_gpus

int

1

The number of GPUs to run the evaluate job.

gpu_ids

List[int]

[0]

List of GPU IDs to run the evaluate on.

num_nodes

int

1

Number of nodes to run the evaluate on.

checkpoint

str

Path to the checkpoint used for evaluation.

trt_engine

Optional[str]

None

Path to the TensorRT engine to be used for evaluation.

results_dir

Optional[str]

None

Path to where all the assets are stored.

inference#

The inference config contains the parameters related to training. They are described as follows:

Note

For FTMS Client, these parameters are set in json format and the inference checkpoint is deduced from the previous train job ID as specified with the –parent_job_id argument. For TAO Launcher, one must set the path in the inference specification:

inference:
  checkpoint: ${results_dir}/train/segformer_model_latest.pth
  vis_after_n_batches: 1

Parameter

Datatype

Default

Description

Supported Values

vis_after_n_batches

int

1

Visualize inference segmentation results after n batches.

batch_size

int

8

Batch Size.

checkpoint

str

Path to checkpoint file.

num_gpus

int

1

The number of GPUs to run the inference job.

gpu_ids

List[int]

[0]

List of GPU IDs to run the inference on.

num_nodes

int

1

Number of nodes to run the inference on.

checkpoint

str

Path to the checkpoint used for inference.

trt_engine

Optional[str]

None

Path to the TensorRT engine to be used for inference.

results_dir

Optional[str]

None

Path to where all the assets are stored.

export#

The export config contains the parameters related to export. They are described as follows:

Note

For FTMS Client, these parameters are set in json format and the export checkpoint is deduced from the previous train job ID as specified with the –parent_job_id argument. For TAO Launcher, one must set the path in the export specification:

export:
  results_dir: "${results_dir}/export"
  gpu_id: 0
  checkpoint: ${results_dir}/train/segformer_model_latest.pth
  onnx_file: "${export.results_dir}/segformer.onnx"
  input_width: 224
  input_height: 224
  batch_size: -1

Parameter

Datatype

Default

Description

Supported Values

results_dir

Optional[str]

None

Path to where all the assets are stored.

gpu_ids

int

0

The index of the GPU to build the TensorRT engine.

checkpoint

str

Path to the checkpoint file to run export.

onnx_file

str

Path to the onnx model file.

on_cpu

bool

False

Flag to export CPU compatible model.

True,False

input_channel

int

3

Number of channels in the input Tensor.

1,3

input_width

int

960

Width of the input image tensor.

input_height

int

544

Height of the input image tensor.

opset_version

int

17

Operator set version.

batch_size

int

-1

The batch size of the input Tensor for the engine.

model#

The following example model provides options to define the SegFormer backbone and decoder head.

Note

For FTMS Client, these parameters are set in json format.

model:
  backbone:
    type: "vit_large_nvdinov2"
    pretrained_backbone_path: <path_to_pretrained_weight>
    freeze_backbone: False
  decode_head:
    feature_strides: [4, 8, 16, 32]

Parameter

Datatype

Default

Description

Supported Values

backbone

dict config

The configuration of the backbone.

decode_head

dict config

The configuration of the decoder head.

backbone#

Parameter

Datatype

Default

Description

Supported Values

type













str













fan_small_12_p4_hybrid













The name of the backbone to be used













mit_b0, mit_b1
mit_b2, mit_b3
mit_b4, mit_b5
fan_tiny_8_p4_hybrid
fan_large_16_p4_hybrid
fan_small_12_p4_hybrid
fan_base_16_p4_hybrid
vit_large_nvdinov2
vit_giant_nvdinov2
vit_base_nvclip_16_siglip
vit_huge_nvclip_14_siglip
c_radio_v2_vit_base_patch16_224
c_radio_v2_vit_large_patch16_224
c_radio_v2_vit_huge_patch16_224

pretrained_backbone_path

str

Path to the pretrained model

freeze_backbone

bool

False

Flag to freeze backbone

True,False

decode_head#

Parameter

Datatype

Default

Description

Supported Values

feature_strides

List[int]

[4, 8, 16, 32]

Feature strides for the head.

dataset#

The dataset parameter defines the dataset source, training batch size, and augmentation. An example dataset is provided below.

Note

For FTMS Client, these parameters are set in json format.

dataset:
  segment:
    dataset: "SFDataset"
    root_dir: <dataset_root>
    batch_size: 32
    workers: 8
    num_classes: 6
    img_size: 224
    train_split: "train"
    validation_split: "val"
    test_split: "val"
    predict_split: "val"
    augmentation:
      random_flip:
        vflip_probability: 0.5
        hflip_probability: 0.5
        enable: True
      random_rotate:
        rotate_probability: 0.5
        angle_list: [90, 180, 270]
        enable: True
      random_color:
        brightness: 0.3
        contrast: 0.3
        saturation: 0.3
        hue: 0.3
        enable: False
      with_scale_random_crop:
        enable: True
      with_random_crop: True
      with_random_blur: False
    label_transform: None
    palette:
      - seg_class: urban
        rgb:
          - 0
          - 255
          - 255
        label_id: 0
        mapping_class: urban
      - seg_class: agriculture
        rgb:
          - 255
          - 255
          - 0
        label_id: 1
        mapping_class: agriculture
      - seg_class: rangeland
        rgb:
          - 255
          - 0
          - 255
        label_id: 2
        mapping_class: rangeland
      - seg_class: forest
        rgb:
          - 0
          - 255
          - 0
        label_id: 3
        mapping_class: forest
      - seg_class: water
        rgb:
          - 0
          - 0
          - 255
        label_id: 4
        mapping_class: water
      - seg_class: barren
        rgb:
          - 255
          - 255
          - 255
        label_id: 5
        mapping_class: barren
      - seg_class: unknown
        rgb:
          - 0
          - 0
          - 0
        label_id: 255
        mapping_class: unknown

Parameter

Datatype

Default

Description

Supported Values

segment

dict config

Segmentation Dataset Config.

segment#

Parameter

Datatype

Default

Description

Supported Values

root_dir

str

Path to root directory for dataset.

dataset

str

SFDataset

dataset class.

SFDataset

num_classes

int

2

The number of classes in the training data.

img_size

int

256

The input image size.

batch_size

int

8

Batch size.

workers

int

1

Workers.

shuffle

bool

True

Shuffle dataloader.

True,False

train_split

str

train

Train split folder name.

validation_split

str

val

Validation split folder name.

test_split

str

val

Test split folder name.

predict_split

str

test

Predict split folder name.

augmentation

dict config

Augmentation.

label_transform

str

norm

label transform.

norm,None

palette

List[Dict]

{“label_id”: 0, “mapping_class”: “foreground”, “rgb”: [0, 0, 0], “seg_class”: “foreground”}
{“label_id”: 1, “mapping_class”: “background”, “rgb”: [1, 1, 1], “seg_class”: “background”}
Palette, be careful of label_transform, if norm then RGB value from 0~1, else 0~255.

augmentation#

Parameter

Datatype

Default

Description

Supported Values

random_flip

dict config

RandomFlip augmentation config.

random_rotate

dict config

RandomRotation augmentation config.

random_color

dict config

RandomColor augmentation config.

with_scale_random_crop

dict config

RandomCropWithScale augmentation config.

with_random_blur

bool

Flag to enable with_random_blur.

with_random_crop

bool

Flag to enable with_random_crop.

mean

List[float]

Mean for the augmentation.

std

List[float]

Standard deviation for the augmentation.

RandomFlip#

Parameter

Datatype

Default

Description

Supported Values

vflip_probability

float

0.5

Vertical Flip probability.

hflip_probability

float

0.5

Horizontal Flip probability.

enable

bool

True

Flag to enable augmentation.

True,False

RandomRotation#

Parameter

Datatype

Default

Description

Supported Values

rotate_probability

float

0.5

Random Rotate probability.

angle_list

List[float]

[90, 180, 270]

Random rotate angle.

enable

bool

True

Flag to enable augmentation.

True,False

RandomColor#

Parameter

Datatype

Default

Description

Supported Values

brightness

float

0.3

Random Color Brightness.

contrast

float

0.3

Random Color Contrast.

saturation

float

0.3

Random Color Saturation.

hue

float

0.3

Random Color Hue.

enable

bool

True

Flag to enable Random Color.

True,False

color_probability

float

0.5

Random Color Probability.

RandomCropWithScale#

Parameter

Datatype

Default

Description

Supported Values

scale_range

float

[1, 1.2]

Random Scale range.

enable

bool

True

Flag to enable augmentation.

True,False

Training the Model#

Use the following command to run Segformer training:

TRAIN_JOB_ID=$(tao-client segformer experiment-run-action --action train --id $EXPERIMENT_ID --specs "$TRAIN_SPECS")

Evaluating the model#

The evaluation metric of Segformer is the meanIOU. For more details on the mean IOU metric, please refer here meanIOU.:

Use the following command to run Segformer evaluation:

EVAL_JOB_ID=$(tao-client segformer experiment-run-action --action evaluate --id $EXPERIMENT_ID --specs "$EVAL_SPECS" --parent_job_id $TRAIN_JOB_ID)

Here’s an example of using the Segformer evaluation command:

Note

For FTMS Client, the job output will be in your experiment’s cloud workspace.

+------------+-------+-------+
| Class      | IoU   | Acc   |
+------------+-------+-------+
| foreground | 37.81 | 44.56 |
| background | 83.81 | 95.51 |
+------------+-------+-------+
Summary:

+--------+-------+-------+-------+
| Scope  | mIoU  | mAcc  | aAcc  |
+--------+-------+-------+-------+
| global | 60.81 | 70.03 | 85.26 |
+--------+-------+-------+-------+
  ...

Running Inference on the Model#

Use the following command to run inference on Segformer with the .pth model.

INFER_JOB_ID=$(tao-client segformer experiment-run-action --action inference --id $EXPERIMENT_ID --specs "$INFER_SPECS" --parent_job_id $TRAIN_JOB_ID)

Note

For FTMS Client, the job output will be in your experiment’s cloud workspace.

The output mask PNG images with class ID’s is saved in vis_tao. The overlaid mask images are saved in mask_tao.

Exporting the Model#

Use the following command to export the model.

EXPORT_JOB_ID=$(tao-client segformer experiment-run-action --action export --id $EXPERIMENT_ID --specs "$EXPORT_SPECS" --parent_job_id $TRAIN_JOB_ID)

TensorRT engine generation, validation, and int8 calibration#

For deployment, refer to the TAO Deploy documentation

Deploying to DeepStream#

Refer to the Integrating a SegFormer Model page for more information about deploying a SegFormer model to DeepStream.