SegFormer#

SegFormer is an NVIDIA-developed semantic-segmentation model that is included in TAO. SegFormer supports the following tasks:

  • train

  • evaluate

  • inference

  • export

These tasks can be invoked from the TAO Launcher using the following convention on the command-line:

tao model segformer <sub_task> <args_per_subtask>

where args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.

Data Input for SegFormer#

Segformer requires the data to be provided as image and mask folders. See the Data Annotation Format page for more information about the input data format for Segformer.

Creating Training Experiment Spec File#

Configuration for Custom Dataset#

  • In this doucmentation, we show example configuration and commands for training on ISBI dataset. ISBI Challenge: Segmentation of neuronal structures in EM stacks dataset for the binary segmentation. It contains grayscale images. For more details, please refer to the example notebook TAO Computer Vision samples. Hence, we set :code: input_type is set to grayscale.

  • For “RGB” input the images the :code: input_type should be set to rgb instead of grayscale.

  • Please configure the img_norm_cfg mean, standard deviation based on your input dataset.

Here is an example spec file for training a SegFormer model with an mit_b5 backbone on an ISBI dataset.

train:
  exp_config:
    manual_seed: 49
  checkpoint_interval: 200
  logging_interval: 50
  max_iters: 1000
  resume_training_checkpoint_path: null
  validate: True
  validation_interval: 500
  trainer:
    find_unused_parameters: True
    sf_optim:
      lr: 0.00006
model:
  input_height: 512
  input_width: 512
  pretrained_model_path: null
  backbone:
    type: "mit_b5"
dataset:
  data_root: /tlt-pytorch
  input_type: "grayscale"
  img_norm_cfg:
    mean:
      - 127.5
      - 127.5
      - 127.5
    std:
      - 127.5
      - 127.5
      - 127.5
    to_rgb: True
  train_dataset:
    img_dir:
      - /data/images/train
    ann_dir:
      - /data/masks/train
    pipeline:
      augmentation_config:
        random_crop:
          cat_max_ratio: 0.75
          resize:
            ratio_range:
              - 0.5
              - 2.0
          random_flip:
            prob: 0.5
  val_dataset:
    img_dir: /data/images/val
    ann_dir: /data/masks/val
  palette:
    - seg_class: foreground
      rgb:
        - 0
        - 0
        - 0
      label_id: 0
      mapping_class: foreground
    - seg_class: background
      rgb:
        - 255
        - 255
        - 255
      label_id: 1
      mapping_class: background
  repeat_data_times: 500
  batch_size: 4
  workers_per_gpu: 1

The train classification experiment specification consists of three main components:

  • train

  • dataset

  • model

train#

The train config contains the parameters related to training. They are described as follows:

Parameter

Datatype

Default

Description

Supported Values

exp_config

Dict

int
None

49
The exp_config Dict contains the following parameters:

* The random seed to make the trainig deterministic


max_iters

int

10

The maximum number of iterations/ steps for which the training should be conducted

checkpoint_interval

int

1

The number of steps at which the checkpoint needs to be saved

logging_interval

int

10

The number of steps at which the experiment logs need to be saved. The logs are saved in the logs directory.

resume_training_checkpoint_path

str

None

The path to the checkpoint for resuming training

validate

bool

False

A flag to enable validation during training

validation_interval

int

int

The interval number of iterations at which validation should be performed during training
Note that the validation interval should be atleast 1 less than checkpoint interval to prevent status overriding


trainer




Dict

bool
Dict
Dict
None

False
None
None
This config contains parameters required by MMSeg trainer:

* find_unused_parameters: Sets this param in DDP. For more information, refer to DDP_PyT.
* sf_optim: The Segformer optimizer config. For more information, refer to optimizer_spec.
* lr_config: The Segformer lr config. For more information, refer to creating_lr_config_sf.

sf_optim#

sf_optim:
 lr: 0.00006
  betas:
   - 0.0
   - 0.999
  paramwise_cfg:
   pos_block:
   decay_mult: 0.0
   norm:
    decay_mult: 0.0
   head:
    lr_mut: 10.0
  weight_decay: 5e-4

Parameter

Datatype

Default

Description

Supported Values

lr

float

0.00006

The learning rate

>=0.0

betas

List[float]

[0.0, 0.9]

The beta parameters in the Adam optimizer

>=0.0

paramwise_cfg






Dict
Dict
float
Dict
float
Dict
float
None
None
0.0
None
0.0
None
10.0
Configuration parameters for the Adam optimizer:
* pos_block
* decay_mult
* norm
* decay_mult
* head
* lr_mult
>=0.0
>=0.0
>=0.0
>=0.0

weight_decay

float

5e-4

weight_decay hyper-parameter for regularization.

>=0.0

lr_config#

lr_config:
  warmup_iters: 1500
  warmup_ratio: 1e-6
  power: 1.0
  min_lr: 0.0

Parameter

Datatype

Default

Description

Supported Values

warmup_iters

int

1500

The number of iterations or epochs that warmup lasts.

>=0.0

warmup_ratio

float

1e-6

The LR used at the beginning of warmup is equal to warmup_ratio * initial_lr

>=0.0

power

float

1.0

The power to which the multiplying coefficients are raised to.

>=0.0

min_lr

float

0.0

The minimum LR to start the LR scheduler

>=0.0

model#

The following example model provides options to change the SegFormer architecture for training.

model:
  input_height: 512
  input_width: 512
  pretrained_model_path: null
  backbone:
    type: "mit_b5"

The following example model is used during Segformer evaluation/inference.

Parameter

Datatype

Default

Description

Supported Values

pretrained_model_path

string

None

The optional path to the pretrained backbone file

string to the path

backbone






Dict
string





None






A dictionary containing the following configurable parameters:
* type: The name of the backbone to be used





mit_b0, mit_b1
mit_b2, mit_b3
mit_b4, mit_b5
fan_tiny_8_p4_hybrid
fan_large_16_p4_hybrid
fan_small_12_p4_hybrid
fan_base_16_p4_hybrid
decode_head







Dict
int
Bool


Float

None
768
False


0.1
A dictionary containing the decoder parameters:

* decoder_params: Contains the following network parameters:
* embed_dims: The embedding dimensions
* align_corners: If set to True, the input and output tensors are aligned by the center points of their corner pixels,
preserving the values at the corner pixels.
* dropout_ratio: The dropout probability ratio to drop the neurons in the neural network


256, 512, 768
True, False


>=0.0

input_width

int

512

Input height of the model

>0

input_height

int

512

Input width of the model

>0

dataset#

The dataset parameter defines the dataset source, training batch size, and augmentation. An example dataset is provided below.

dataset:
  data_root: /tlt-pytorch
  input_type: "grayscale"
  img_norm_cfg:
    mean:
      - 127.5
      - 127.5
      - 127.5
    std:
      - 127.5
      - 127.5
      - 127.5
    to_rgb: True
  train_dataset:
    img_dir:
      - /data/images/train
    ann_dir:
      - /data/masks/train
    pipeline:
      augmentation_config:
        random_crop:
          cat_max_ratio: 0.75
          resize:
            ratio_range:
              - 0.5
              - 2.0
          random_flip:
            prob: 0.5
  val_dataset:
    img_dir: /data/images/val
    ann_dir: /data/masks/val
  palette:
    - seg_class: foreground
      rgb:
        - 0
        - 0
        - 0
      label_id: 0
      mapping_class: foreground
    - seg_class: background
      rgb:
        - 255
        - 255
        - 255
      label_id: 1
      mapping_class: background
  repeat_data_times: 500
  batch_size: 4
  workers_per_gpu: 1

Parameter

Datatype

Default

Description

Supported Values

img_norm_cfg



Dict
List[float]
List[float]
bool
None
[123.675, 116.28, 103.53]
[58.395, 57.12, 57.375]
True
The mage normalization config, which contains the following parameters:
* mean: The mean to be subtracted for pre-processing
* std: The standard deviation to divide the image by
* to_rgb: Whether to convert the input format from BGR to RGB

>=0, <=255
>=0.0
True, False

input_type

String

“rgb”

Whether the input type is RGB or grayscale

“rgb”, “grayscale”

palette




List[Dict]
string
string
int
List[int]
None
background
background
0
[255, 255, 255]
The pallate config:
* seg_class: The segmentation category
* mapping_class: The category to group it with
* label_id: The integer class ID
* rgb: The color to be overlaid for this class during inference

string
string
>=0
>=0, <=255

batch_size

unsigned int

32

The batch size for training and validation

>0

workers_per_gpu

unsigned int

8

The number of parallel workers processing data

>0

train_dataset











dict config

str
str
dict config
dict config

dict config




None



None
None

None




The parameters to define the training dataset:

* img_dir: The path to the images directory
* ann_dir: The path to the PNG masks directory
* pipeline:
* augmentation_config: The augmentation config details
(refer to augmentation_config for more information)
* Pad: The padding augmentation config:
* size_ht (int): The height at which to pad the image/mask
* size_wd (int): The width at which pad the image/mask
* pad_val (int): The padding value for the input image
* seg_pad_val (int): The padding value for the segmentation
Dict Config






None
1024
1024
0
255
val_dataset





dict config

str
str
dict config
List[int]
None

None
None
None
[2048, 1024]
The validation config contains the following parameters for validation
during training:
* img_dir: The path to images directory
* ann_dir: The path to PNG masks directory
* pipeline:
* multi_scale: The largest scale of image





>=0
test_dataset





dict config

str
str
dict config
List[int]
None

None
None
None
[2048, 1024]
The validation config contains the following parameters for validation
during training:
* img_dir: The path to the images directory
* ann_dir: The path to PNG masks directory
* pipeline:
* multi_scale: The largest scale of image





>=0

augmentation_config#

Parameter

Datatype

Default

Description

Supported Values

random_crop


Dict
List[int]
Float
None
[1024, 1024]
0.75
The random_crop config has following parameters:
* crop_size: Crop size for augmentation
* cat_max_ratio

0< h,w <= img_ht, img_wd
>= 0.0
resize





Dict




Bool
None

[0.5, 2.0]


True
The resize Config has the following configurable parameters:
* img_scale: [height, width] scale to which the input image should be rescaled
* ratio_range: A ratio will be randomly sampled from the range specified by
ratio_range. Then it would be multiplied with img_scale to
generate sampled scale.
* keep_ratio: Whether to preserve aspect ratio

>=0
>=0.0


True/ False
random_flip

Dict

None
0.5
The random_flip config contains the following parameters for flipping aug:
* prob: Probability with which the image should be flipped

>=0.0

Training the Model#

Use the following command to run Segformer training:

tao model segformer train [-h] -e <experiment_spec_file>
                    [results_dir=<global_results_dir>]
                    [model.<model_option>=<model_option_value>]
                    [dataset.<dataset_option>=<dataset_option_value>]
                    [train.<train_option>=<train_option_value>]
                    [train.gpu_ids=<gpu indices>]
                    [train.num_gpus=<number of gpus>]

Required Arguments#

The only required argument is the path to the experiment spec:

  • -e, --experiment_spec: The experiment specification file to set up the training experiment

Optional Arguments#

You can set optional arguments to override the option values in the experiment spec file.

Note

For training, evaluation, and inference, we expose 2 variables for each respective task: num_gpus and gpu_ids, which default to 1 and [0], respectively. If both are passed, but inconsistent, for example num_gpus = 1, gpu_ids = [0, 1], then they are modified to follow the setting with more GPUs, for example num_gpus = 1 -> num_gpus = 2.

Evaluating the model#

The evaluation metric of Segformer is the meanIOU. For more details on the mean IOU metric, please refer here meanIOU.:

Use the following command to run Segformer evaluation:

tao model segformer evaluate -e <experiment_spec>
                    evaluate.checkpoint=<evaluation model>
                    results_dir=<path to output evaluation results>
                    [evaluate.gpu_ids=<gpu indices>]
                    [evaluate.num_gpus=<number of gpus>]

Required Arguments#

  • -e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment.

  • evaluate.checkpoint: The .pth model.

Here’s an example of using the Segformer evaluation command:

+------------+-------+-------+
| Class      | IoU   | Acc   |
+------------+-------+-------+
| foreground | 37.81 | 44.56 |
| background | 83.81 | 95.51 |
+------------+-------+-------+
Summary:

+--------+-------+-------+-------+
| Scope  | mIoU  | mAcc  | aAcc  |
+--------+-------+-------+-------+
| global | 60.81 | 70.03 | 85.26 |
+--------+-------+-------+-------+
  ...

Running Inference on the Model#

Use the following command to run inference on Segformer with the .pth model.

tao model segformer inference -e <experiment_spec>
                    inference.checkpoint=<inference model>
                    results_dir=<path to output directory for inference>
                    [inference.gpu_ids=<gpu indices>]
                    [inference.num_gpus=<number of gpus>]

The output mask PNG images with class ID’s is saved in vis_tao. The overlaid mask images are saved in mask_tao.

Required Arguments#

  • -e, --experiment_spec: The experiment spec file to set up inference

  • inference.checkpoint: The .pth model to perform inference with

  • results_dir: The path to save the inference masks and mask overlaid images to. Inference creates two directories.

Exporting the Model#

Use the following command to export the model.

tao model segformer export [-h] -e <experiment spec file>
                    results_dir=<path to results dir>
                    export.checkpoint=<trained pth model to be xported>
                    export.onnx_file=<onnx path>

Required Arguments#

  • -e, --experiment_spec: The path to an experiment spec file

  • results_dir: The path where the logs for export will be saved

  • export.checkpoint: The .pth model to be exported

  • export.onnx_file: The :code:.`onnx` file to be stored

TensorRT engine generation, validation, and int8 calibration#

For deployment, refer to the TAO Deploy documentation

Deploying to DeepStream#

Refer to the Integrating a SegFormer Model page for more information about deploying a SegFormer model to DeepStream.