SegFormer - NVIDIA Docs

SegFormer is an NVIDIA-developed semantic-segmentation model that is included in the TAO Toolkit. SegFormer supports the following tasks:

train
evaluate
inference
export

These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:

Copy
Copied!

            
            tao segformer <sub_task> <args_per_subtask>

where args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.

Data Input for SegFormer

Segformer requires the data to be provided as image and mask folders. See the Data Annotation Format page for more information about the input data format for Segformer.

Creating Training Experiment Spec File

Configuration for Custom Dataset

In this doucmentation, we show example configuration and commands for training on ISBI dataset. ISBI Challenge: Segmentation of neuronal structures in EM stacks dataset for the binary segmentation. It contains grayscale images. For more details, please refer to the example notebook TAO Computer Vision samples. Hence, we set :code: input_type is set to grayscale.
For “RGB” input the images the :code: input_type should be set to rgb instead of grayscale.
Please configure the img_norm_cfg mean, standard deviation based on your input dataset.

Here is an example spec file for training a SegFormer model with an mit_b5 backbone on an ISBI dataset.

Copy
Copied!

            
            exp_config:
  manual_seed: 49
  distributed: True
train_config:
  runner:
    max_iters: 4000
  checkpoint_config:
   interval: 500
  logging:
   interval: 100
  resume_training_checkpoint_path: /path/to/checkpoint_to_resume
  sf_optim:
   lr: 0.00006
  validate: True
  validation_config:
   interval: 250
dataset_config:
  train_img_dirs:
    - /home/projects1_metropolis/tmp/subha/unet/data/isbi/images/train
  train_ann_dirs:
    - /home/projects1_metropolis/tmp/subha/unet/data/isbi/masks/train
  val_img_dir: /home/projects1_metropolis/tmp/subha/unet/data/isbi/images/val
  val_ann_dir: /home/projects1_metropolis/tmp/subha/unet/data/isbi/masks/val
  img_norm_cfg:
    mean:
      - 127.5
      - 127.5
      - 127.5
    std:
      - 127.5
      - 127.5
      - 127.5
    to_rgb: True
  input_type: "grayscale"
  palette:
    - seg_class: foreground
      rgb:
        - 0
        - 0
        - 0
      label_id: 0
      mapping_class: foreground
    - seg_class: background
       rgb:
        - 255
        - 255
        - 255
       label_id: 1
       mapping_class: background
  val_pipeline:
   multi_scale:
    - 2048
    - 512
  train_pipeline:
    augmentation_config:
      RandomCrop:
       crop_size:
        - 512
        - 512
       cat_max_ratio: 0.75
      Resize:
       img_scale:
        - 1024
        - 512
       ratio_range:
        - 0.5
        - 2.0
      RandomFlip:
       prob: 0.5
      colorAug:
       type: PhotoMetricDistortion
      Pad:
       size_ht: 512
       size_wd: 512
       pad_val: 0
       seg_pad_val: 255
  repeat_data_times: 500
  batch_size_per_gpu: 4
  workers_per_gpu: 1
model_config:
  pretrained: /path/to/your-pretrained-backbone-model
  backbone:
    type: "mit_b5"
  train_backbone: True
  num_feature_levels: 4
  dec_layers: 6
  enc_layers: 6
  num_queries: 300
  with_box_refine: True
  dropout_ratio: 0.3
output_dir: /path/to/experiment_results

Parameter	Data Type	Default	Description
`exp_config`	dict config	–	The configuration of the experiment (explained in model_config)
`model_config`	dict config	–	The configuration of the model architecture and is detailed in model_config.
`dataset_config`	dict config	–	The configuration for the dataset and detailed in dataset_config.
`train_config`	dict config	–	The configuration for training parameters, which is detailed in train_config
`output_dir`	string	–	The path to save the model experiment log outputs and model checkpoints

train_config

`runner`	dict config int	None 100	Contains the following configurable parameters: * `max_iters`: The maximum number of iterations for which the training should be conducted
`checkpoint_config`	dict config int	None 1	Contains the following configurable parameters: * `interval`: The number of steps at which the checkpoint needs to be saved
`logging`	dict config int	None 10	Contains the following configurable parameters: * `interval`: The number of steps at which the experiment logs need to be saved. The logs are saved in the `logs` directory in the output directory.
`sf_optim`	dict config	None	Contains the configurable parameters for Segformer optimizer and is detailed in sf_optim.
`validation_config`	dict config int	None int	Contains the following configurable parameters: * `interval` : The interval number of iterations at which validation should be performed during training
`validate`	bool	False	A flag that enables validation during training

sf_optim

Copy
Copied!

            
            sf_optim:
 lr: 0.00006
  betas:
   - 0.0
   - 0.999
  paramwise_cfg:
   pos_block:
   decay_mult: 0.0
   norm:
    decay_mult: 0.0
   head:
    lr_mut: 10.0
  weight_decay: 5e-4

Parameter	Datatype	Default	Description	Supported Values
`lr`	float	0.00006	The learning rate	>=0.0
`betas`	List[float]	[0.0, 0.9]	The beta parameters in the Adam optimizer	>=0.0
`paramwise_cfg`	dict config Dict Dict Dict	None 0.0 0.0 10.0	Configuration parameters for the Adam optimizer * pos_block: {“decay_mult”:0.0} * norm: {“decay_mult”: 0.0} * head: {“lr_mult”: 10.0}	>=0.0 >=0.0 >=0.0
`weight_decay`	float	5e-4	weight_decay hyper-parameter for regularization.	>=0.0

exp_config

The exp_config parameter defines the hyperparameters of the experiment.

Copy
Copied!

            
            exp_config:
 manual_seed: 49

Parameter	Datatype	Default	Description	Supported Values
`manual_seed`	unsigned int	49	The random seed to make the trainig deterministic	>0

model_config

The following example model_config provides options to change the SegFormer architecture for training.

Copy
Copied!

            
            model_config:
  pretrained: /path/to/pretrained_mit_b5.pth
  backbone:
   type: "mit_b5"
  decode_head:
    decoder_params:
     embed_dims: 768
    align_corners: False
    dropout_ratio: 0.1

The following example model_config is used during Segformer evaluation/inference.

Parameter	Datatype	Default	Description	Supported Values
`pretrained`	string	None	The optional path to the pretrained backbone file	string to the path
`backbone`	Dict string	None	Contains the following configurable parameters * `type`: The name of the backbone to be used	mit_b0, mit_b1 mit_b2, mit_b3 mit_b4, mit_b5
`decode_head`	Dict Dict int Bool Float	None None 768 False 0.1	A dictionary containing the decoder parameters: `decoder_params`: Contains the following network parameters `embed_dims`: Embedding dimensions `align_corners`: If set to True, the input and output tensors are aligned by the center points of their corner pixels, preserving the values at the corner pixels. `dropout_ratio`: The dropout probability ratio to drop the neurons in the neural network	256, 512, 768 True, False >=0.0

dataset_config

The dataset_config parameter defines the dataset source, training batch size, and augmentation. An example dataset_config is provided below.

Copy
Copied!

            
            dataset_config:
 train_img_dirs:
    - /home/projects1_metropolis/tmp/subha/unet/data/isbi/images/train
 train_ann_dirs:
    - /home/projects1_metropolis/tmp/subha/unet/data/isbi/masks/train
 val_img_dir: /home/projects1_metropolis/tmp/subha/unet/data/isbi/images/val
 val_ann_dir: /home/projects1_metropolis/tmp/subha/unet/data/isbi/masks/val
 img_norm_cfg:
  mean:
    - 127.5
    - 127.5
    - 127.5
  std:
    - 127.5
    - 127.5
    - 127.5
  to_rgb: True
 input_type: "grayscale"
 palette:
    - seg_class: foreground
      rgb:
        - 0
        - 0
        - 0
      label_id: 0
      mapping_class: foreground
    - seg_class: background
      rgb:
        - 255
        - 255
        - 255
      label_id: 1
      mapping_class: background
 train_pipeline:
  Pad:
   size_ht: 512
   size_wd: 512
   pad_val: 0
   seg_pad_val: 255
  augmentation_config:
   RandomCrop:
    crop_size:
      - 512
      - 512
    cat_max_ratio: 0.75
   resize:
    img_scale:
      - 1024
      - 512
    ratio_range:
      - 0.5
      - 2.0
   random_flip:
    prob: 0.5

Parameter	Datatype	Default	Description	Supported Values
`train_img_dirs`	list dict		A list of Paths to image directories	List of strings
`train_ann_dirs`	list dict		A list of Paths to PNG mask directories	List of strings
`val_img_dir`	str		The path to the Validation Image directory	string
`val_ann_dir`	str		The path to the validation PNG masks directory	string
`img_norm_cfg`	Dict List[float] List[float] bool	None [123.675, 116.28, 103.53] [58.395, 57.12, 57.375] True	The mage normalization config, which contains the following parameters: * `mean`: The mean to be subtracted for pre-processing * `std`: The standard deviation to divide the image by * `to_rgb`: Whether to convert the input format from BGR to RGB	>=0, <=255 >=0.0 True, False
`input_type`	String	“rgb”	Whether the input type is RGB or grayscale	“rgb”, “grayscale”
`palette`	List[Dict] string string int List[int]	None background background 0 [255, 255, 255]	The pallate config: seg_class: The segmentation category mapping_class: The category to group it with label_id: The integer class ID rgb: The color to be overlaid for this class during inference	string string >=0 >=0, <=255
`batch_size_per_gpu`	unsigned int	32	The batch size for training and validation	>0
`workers_per_gpu`	unsigned int	8	The number of parallel workers processing data	>0
`train_pipeline`	dict config \| None dict config \| None dict config \| None		The parameters to define the augmentation method: augmentation_config: Augmentation config details (refer to augmentation_config) Pad: The padding augmentation config: size_ht (int): The height at which to pad the image/mask size_wd (int): The width at which pad the image/mask pad_val (int): The padding value for the input image seg_pad_val (int): The padding value for the segmentation	Dict Config None 1024 1024 0 255
`val_pipeline`	dict config \| None List[int] \| [2048, 1024]		The validation config contains the following parameters for validation during training: multi_scale: The largest scale of image	>=0

augmentation_config

Parameter	Datatype	Default	Description	Supported Values
`random_crop`	Dict List[int] Float	None [1024, 1024] 0.75	The random_crop config has following parameters: `crop_size`: Crop size for augmentation `cat_max_ratio`	0< h,w <= img_ht, img_wd >= 0.0
`resize`	Dict Bool	None [0.5, 2.0] True	The resize Config has the following configurable parameters: img_scale: [height, width] scale to which the input image should be rescaled ratio_range: A ratio will be randomly sampled from the range specified by `ratio_range`. Then it would be multiplied with `img_scale` to generate sampled scale. keep_ratio: Whether to preserve aspect ratio	>=0 >=0.0 True/ False
`random_flip`	Dict	None 0.5	The random_flip config contains the following parameters for flipping aug: `prob`: Probability with which the image should be flipped	>=0.0

Training the Model

Use the following command to run Segformer training:

Copy
Copied!

            
            tao segformer train -e <experiment_spec_file>
                    -r <results_dir>
                    -k <key>
                    -g <num_gpus>
                    [resume_training_checkpoint_path=<absolute path to \*.tlt checkpoint>]

Required Arguments

-e, --experiment_spec_file: The path to the experiment spec file.
-r, --results_dir: The path to a folder where the experiment outputs should be written.
-k, --key: The user-specific encoding key to save or load a .tlt model.

Optional Arguments

resume_training_checkpoint_path: The path to a checkpoint to continue training
-g, --num_gpus: The number ogf GPUs to be used for training. The default value is 1.

Here’s an example of using the SegFormer training command:

Copy
Copied!

            
            tao segformer train -e $DEFAULT_SPEC -r $RESULTS_DIR -k $YOUR_KEY -g $NUM_GPUs

Creating Testing Experiment Spec File

Here is an example spec file for testing evaluation and inference of a trained SegFormer model.

Copy
Copied!

            
            exp_config:
 manual_seed: 49
 distributed: True
model_config:
backbone:
 type: "mit_b5"
dataset_config:
 img_norm_cfg:
    mean:
        - 127.5
        - 127.5
        - 127.5
    std:
        - 127.5
        - 127.5
        - 127.5
 test_pipeline:
    multi_scale:
        - 2048
        - 512
 input_type: "grayscale"
 data_root: /tlt-pytorch
 test_img_dir: /home/projects1_metropolis/tmp/subha/unet/data/isbi/images/test
 test_ann_dir: /home/projects1_metropolis/tmp/subha/unet/data/isbi/masks/test
 palette:
    - seg_class: foreground
      rgb:
        - 0
        - 0
        - 0
      label_id: 0
      mapping_class: foreground
    - seg_class: background
      rgb:
        - 255
        - 255
        - 255
      label_id: 1
      mapping_class: background
 batch_size_per_gpu: 4
 workers_per_gpu: 1

Evaluating the model

The evaluation metric of Segformer is the meanIOU. For more details on the mean IOU metric, please refer here meanIOU.:

Use the following command to run Segformer evaluation:

Copy
Copied!

            
            tao segformer evaluate -e <experiment_spec>
                       -k <key>
                       model_path=<inference model>
                       output_dir=<path to output file>

Required Arguments

-e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment.
-k, --key：The encoding key for the .tlt model.
model_path: The .tlt model.

Optional Argument

-g, --num_gpus: The number ogf GPUs to be used for training. The default value is 1.

Here’s an example of using the Segformer evaluation command:

Copy
Copied!

            
            +------------+-------+-------+
| Class      | IoU   | Acc   |
+------------+-------+-------+
| foreground | 37.81 | 44.56 |
| background | 83.81 | 95.51 |
+------------+-------+-------+
Summary:

+--------+-------+-------+-------+
| Scope  | mIoU  | mAcc  | aAcc  |
+--------+-------+-------+-------+
| global | 60.81 | 70.03 | 85.26 |
+--------+-------+-------+-------+
  ...

Copy
Copied!

            
            tao segformer evaluate -e $DEFAULT_SPEC -k $YOUR_KEY model=$TRAINED_TLT_MODEL data=$TEST_DATA label=$TEST_LABEL

Running Inference on the Model

Use the following command to run inference on Segformer with the .tlt model.

Copy
Copied!

            
            tao segformer inference -e <experiment_spec>
                        -k <key>
                        model_path=<inference model>
                        output_dir=<path to output file>

The output mask PNG images with class ID’s is saved in vis_tao. The overlaid mask images are saved in mask_tao.

Required Arguments

-e, --experiment_spec: The experiment spec file to set up inference
-k, --key：The encoding key for the .tlt model
model_path: The .tlt model to perform inference with
output_dir: The path to save the inference masks and mask overlaid images to. Inference creates two directories.

Optional Argument

-g, --num_gpus: The number ogf GPUs to be used for training. The default value is 1.

Here’s an example of using the Segformer inference command:

Copy
Copied!

            
            tao segformer inference -e $DEFAULT_SPEC -k $KEY model_path=$TRAINED_TLT_MODEL output_dir=$OUTPUT_FOLDER

Exporting the Model

Use the following command to export the model.

Copy
Copied!

            
            tao segformer export [-h] -e <experiment spec file>
                          -k <key>
                          model_path=<trained tlt model to be xported>
                          output_file=<etlt path>

Required Arguments

-e, --experiment_spec: The path to an experiment spec file
-k, --key: A user-specific encoding key to save or load a .tlt model
model_path: The .tlt model to be exported
output_file: The :code:.`etlt` file to be stored

Sample Usage

The following is an example export command:

Copy
Copied!

            
            tao segformer export -e /path/to/spec.yaml -k $YOUR_KEY model_path=/path/to/model.tlt output_file=/path/to/model.etlt

TensorRT engine generation, validation, and int8 calibration

For deployment, refer to :ref:TAO Deploy documentation <segformer_with_tao_deploy>

Deploying to DeepStream

Refer to the Integrating a SegFormer Model page for more information about deploying a SegFormer model to DeepStream.