SegFormer - NVIDIA Docs

SegFormer is an NVIDIA-developed semantic-segmentation model that is included in the TAO Toolkit. SegFormer supports the following tasks:

train
evaluate
inference
export

These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:

Copy
Copied!

            
            tao model segformer <sub_task> <args_per_subtask>

where args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.

Data Input for SegFormer

Segformer requires the data to be provided as image and mask folders. See the Data Annotation Format page for more information about the input data format for Segformer.

Creating Training Experiment Spec File

Configuration for Custom Dataset

In this doucmentation, we show example configuration and commands for training on ISBI dataset. ISBI Challenge: Segmentation of neuronal structures in EM stacks dataset for the binary segmentation. It contains grayscale images. For more details, please refer to the example notebook TAO Computer Vision samples. Hence, we set :code: input_type is set to grayscale.
For “RGB” input the images the :code: input_type should be set to rgb instead of grayscale.
Please configure the img_norm_cfg mean, standard deviation based on your input dataset.

Here is an example spec file for training a SegFormer model with an mit_b5 backbone on an ISBI dataset.

Copy
Copied!

            
            train:
  exp_config:
    manual_seed: 49
  checkpoint_interval: 200
  logging_interval: 50
  max_iters: 1000
  resume_training_checkpoint_path: null
  validate: True
  validation_interval: 500
  trainer:
    find_unused_parameters: True
    sf_optim:
      lr: 0.00006
model:
  input_height: 512
  input_width: 512
  pretrained_model_path: null
  backbone:
    type: "mit_b5"
dataset:
  data_root: /tlt-pytorch
  input_type: "grayscale"
  img_norm_cfg:
    mean:
      - 127.5
      - 127.5
      - 127.5
    std:
      - 127.5
      - 127.5
      - 127.5
    to_rgb: True
  train_dataset:
    img_dir:
      - /data/images/train
    ann_dir:
      - /data/masks/train
    pipeline:
      augmentation_config:
        random_crop:
          cat_max_ratio: 0.75
          resize:
            ratio_range:
              - 0.5
              - 2.0
          random_flip:
            prob: 0.5
  val_dataset:
    img_dir: /data/images/val
    ann_dir: /data/masks/val
  palette:
    - seg_class: foreground
      rgb:
        - 0
        - 0
        - 0
      label_id: 0
      mapping_class: foreground
    - seg_class: background
      rgb:
        - 255
        - 255
        - 255
      label_id: 1
      mapping_class: background
  repeat_data_times: 500
  batch_size: 4
  workers_per_gpu: 1

The train classification experiment specification consists of three main components:

train
dataset
model

train

The train config contains the parameters related to training. They are described as follows:

Parameter	Datatype	Default	Description	Supported Values
`exp_config`	Dict int	None 49	The `exp_config` Dict contains the following parameters: * The random seed to make the trainig deterministic	–
`max_iters`	int	10	The maximum number of iterations/ steps for which the training should be conducted
`checkpoint_interval`	int	1	The number of steps at which the checkpoint needs to be saved
`logging_interval`	int	10	The number of steps at which the experiment logs need to be saved. The logs are saved in the logs directory.
`resume_training_checkpoint_path`	str	None	The path to the checkpoint for resuming training
`validate`	bool	False	A flag to enable validation during training
`validation_interval`	int	int	The interval number of iterations at which validation should be performed during training Note that the validation interval should be atleast 1 less than checkpoint interval to prevent status overriding
`trainer`	Dict bool Dict Dict	None False None None	This config contains parameters required by MMSeg trainer: * `find_unused_parameters`: Sets this param in DDP. For more information, refer to DDP_PyT. * `sf_optim`: The Segformer optimizer config. For more information, refer to optimizer_spec. * `lr_config`: The Segformer lr config. For more information, refer to creating_lr_config_sf.	– – – – –

sf_optim

Copy
Copied!

            
            sf_optim:
 lr: 0.00006
  betas:
   - 0.0
   - 0.999
  paramwise_cfg:
   pos_block:
   decay_mult: 0.0
   norm:
    decay_mult: 0.0
   head:
    lr_mut: 10.0
  weight_decay: 5e-4

Parameter	Datatype	Default	Description	Supported Values
`lr`	float	0.00006	The learning rate	>=0.0
`betas`	List[float]	[0.0, 0.9]	The beta parameters in the Adam optimizer	>=0.0
`paramwise_cfg`	Dict Dict float Dict float Dict float	None None 0.0 None 0.0 None 10.0	Configuration parameters for the Adam optimizer: * `pos_block` * decay_mult * `norm` * decay_mult * `head` * lr_mult	– – >=0.0 >=0.0 >=0.0 – >=0.0
`weight_decay`	float	5e-4	weight_decay hyper-parameter for regularization.	>=0.0

lr_config

Copy
Copied!

            
            lr_config:
  warmup_iters: 1500
  warmup_ratio: 1e-6
  power: 1.0
  min_lr: 0.0

Parameter	Datatype	Default	Description	Supported Values
`warmup_iters`	int	1500	The number of iterations or epochs that warmup lasts.	>=0.0
`warmup_ratio`	float	1e-6	The LR used at the beginning of warmup is equal to `warmup_ratio * initial_lr`	>=0.0
`power`	float	1.0	The power to which the multiplying coefficients are raised to.	>=0.0
`min_lr`	float	0.0	The minimum LR to start the LR scheduler	>=0.0

model

The following example model provides options to change the SegFormer architecture for training.

Copy
Copied!

            
            model:
  input_height: 512
  input_width: 512
  pretrained_model_path: null
  backbone:
    type: "mit_b5"

The following example model is used during Segformer evaluation/inference.

Parameter	Datatype	Default	Description	Supported Values
`pretrained_model_path`	string	None	The optional path to the pretrained backbone file	string to the path
`backbone`	Dict string	None	A dictionary containing the following configurable parameters: * `type`: The name of the backbone to be used	mit_b0, mit_b1 mit_b2, mit_b3 mit_b4, mit_b5 fan_tiny_8_p4_hybrid fan_large_16_p4_hybrid fan_small_12_p4_hybrid fan_base_16_p4_hybrid
`decode_head`	Dict int Bool Float	None 768 False 0.1	A dictionary containing the decoder parameters: * `decoder_params`: Contains the following network parameters: * `embed_dims`: The embedding dimensions * `align_corners`: If set to True, the input and output tensors are aligned by the center points of their corner pixels, preserving the values at the corner pixels. * `dropout_ratio`: The dropout probability ratio to drop the neurons in the neural network	256, 512, 768 True, False >=0.0
`input_width`	int	512	Input height of the model	>0
`input_height`	int	512	Input width of the model	>0

dataset

The dataset parameter defines the dataset source, training batch size, and augmentation. An example dataset is provided below.

Copy
Copied!

            
            dataset:
  data_root: /tlt-pytorch
  input_type: "grayscale"
  img_norm_cfg:
    mean:
      - 127.5
      - 127.5
      - 127.5
    std:
      - 127.5
      - 127.5
      - 127.5
    to_rgb: True
  train_dataset:
    img_dir:
      - /data/images/train
    ann_dir:
      - /data/masks/train
    pipeline:
      augmentation_config:
        random_crop:
          cat_max_ratio: 0.75
          resize:
            ratio_range:
              - 0.5
              - 2.0
          random_flip:
            prob: 0.5
  val_dataset:
    img_dir: /data/images/val
    ann_dir: /data/masks/val
  palette:
    - seg_class: foreground
      rgb:
        - 0
        - 0
        - 0
      label_id: 0
      mapping_class: foreground
    - seg_class: background
      rgb:
        - 255
        - 255
        - 255
      label_id: 1
      mapping_class: background
  repeat_data_times: 500
  batch_size: 4
  workers_per_gpu: 1

Parameter	Datatype	Default	Description	Supported Values
`img_norm_cfg`	Dict List[float] List[float] bool	None [123.675, 116.28, 103.53] [58.395, 57.12, 57.375] True	The mage normalization config, which contains the following parameters: * `mean`: The mean to be subtracted for pre-processing * `std`: The standard deviation to divide the image by * `to_rgb`: Whether to convert the input format from BGR to RGB	>=0, <=255 >=0.0 True, False
`input_type`	String	“rgb”	Whether the input type is RGB or grayscale	“rgb”, “grayscale”
`palette`	List[Dict] string string int List[int]	None background background 0 [255, 255, 255]	The pallate config: * `seg_class`: The segmentation category * `mapping_class`: The category to group it with * `label_id`: The integer class ID * `rgb`: The color to be overlaid for this class during inference	string string >=0 >=0, <=255
`batch_size`	unsigned int	32	The batch size for training and validation	>0
`workers_per_gpu`	unsigned int	8	The number of parallel workers processing data	>0
`train_dataset`	dict config str str dict config dict config dict config	None None None None	The parameters to define the training dataset: * `img_dir`: The path to the images directory * `ann_dir`: The path to the PNG masks directory * `pipeline`: * `augmentation_config`: The augmentation config details (refer to augmentation_config for more information) * `Pad`: The padding augmentation config: * `size_ht (int)`: The height at which to pad the image/mask * `size_wd (int)`: The width at which pad the image/mask * `pad_val (int)`: The padding value for the input image * `seg_pad_val (int)`: The padding value for the segmentation	Dict Config None 1024 1024 0 255
`val_dataset`	dict config str str dict config List[int]	None None None None [2048, 1024]	The validation config contains the following parameters for validation during training: * `img_dir`: The path to images directory * `ann_dir`: The path to PNG masks directory * `pipeline`: * `multi_scale`: The largest scale of image	>=0
`test_dataset`	dict config str str dict config List[int]	None None None None [2048, 1024]	The validation config contains the following parameters for validation during training: * `img_dir`: The path to the images directory * `ann_dir`: The path to PNG masks directory * `pipeline`: * `multi_scale`: The largest scale of image	>=0

augmentation_config

Parameter Datatype Default Description Supported Values

random_crop

Dict
List[int]
Float

None
[1024, 1024]
0.75

The random_crop config has following parameters:
* crop_size: Crop size for augmentation
* cat_max_ratio

0< h,w <= img_ht, img_wd
>= 0.0

resize

Dict

Bool

None

[0.5, 2.0]

True

The resize Config has the following configurable parameters:
* img_scale: [height, width] scale to which the input image should be rescaled
* ratio_range: A ratio will be randomly sampled from the range specified by
ratio_range. Then it would be multiplied with img_scale to
generate sampled scale.

* keep_ratio: Whether to preserve aspect ratio

>=0
>=0.0

True/ False

random_flip

Dict

None
0.5

The random_flip config contains the following parameters for flipping aug:
* prob: Probability with which the image should be flipped

>=0.0

Training the Model

Use the following command to run Segformer training:

Copy
Copied!

            
            tao model segformer train -e <experiment_spec_file>
                    -r <results_dir>
                    -g <num_gpus>

Required Arguments

-e, --experiment_spec_file: The path to the experiment spec file.
-r, --results_dir: The path to a folder where the experiment outputs should be written.

Optional Arguments

-g, --num_gpus: The number ogf GPUs to be used for training. The default value is 1.

Here’s an example of using the SegFormer training command:

Copy
Copied!

            
            tao model segformer train -e $DEFAULT_SPEC -r $RESULTS_DIR -g $NUM_GPUs

Evaluating the model

The evaluation metric of Segformer is the meanIOU. For more details on the mean IOU metric, please refer here meanIOU.:

Use the following command to run Segformer evaluation:

Copy
Copied!

            
            tao model segformer evaluate -e <experiment_spec>
                       -g <num GPUs>
                       evaluate.checkpoint=<evaluation model>
                       results_dir=<path to output evaluation results>

Required Arguments

-e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment.
evaluate.checkpoint: The .pth model.

Optional Argument

-g, --num_gpus: The number ogf GPUs to be used for training. The default value is 1.

Here’s an example of using the Segformer evaluation command:

Copy
Copied!

            
            +------------+-------+-------+
| Class      | IoU   | Acc   |
+------------+-------+-------+
| foreground | 37.81 | 44.56 |
| background | 83.81 | 95.51 |
+------------+-------+-------+
Summary:

+--------+-------+-------+-------+
| Scope  | mIoU  | mAcc  | aAcc  |
+--------+-------+-------+-------+
| global | 60.81 | 70.03 | 85.26 |
+--------+-------+-------+-------+
  ...

Copy
Copied!

            
            tao model segformer evaluate -e $DEFAULT_SPEC -g $NUM_GPUS evaluate.checkpoint=$TRAINED_PTH_MODEL results_dir=$PATH_TO_RESULTS_DIR

Running Inference on the Model

Use the following command to run inference on Segformer with the .pth model.

Copy
Copied!

            
            tao model segformer inference -e <experiment_spec>
                        inference.checkpoint=<inference model>
                        results_dir=<path to output directory for inference>

The output mask PNG images with class ID’s is saved in vis_tao. The overlaid mask images are saved in mask_tao.

Required Arguments

-e, --experiment_spec: The experiment spec file to set up inference
inference.checkpoint: The .pth model to perform inference with
results_dir: The path to save the inference masks and mask overlaid images to. Inference creates two directories.

Optional Argument

-g, --num_gpus: The number ogf GPUs to be used for training. The default value is 1.

Here’s an example of using the Segformer inference command:

Copy
Copied!

            
            tao model segformer inference -e $DEFAULT_SPEC -g $NUM_GPUS inference.checkpoint=$TRAINED_PTH_MODEL results_dir=$OUTPUT_FOLDER

Exporting the Model

Use the following command to export the model.

Copy
Copied!

            
            tao model segformer export [-h] -e <experiment spec file>
                          results_dir=<path to results dir>
                          export.checkpoint=<trained pth model to be xported>
                          export.onnx_file=<onnx path>

Required Arguments

-e, --experiment_spec: The path to an experiment spec file
results_dir: The path where the logs for export will be saved
export.checkpoint: The .pth model to be exported
export.onnx_file: The :code:.`onnx` file to be stored

Sample Usage

The following is an example export command:

Copy
Copied!

            
            tao model segformer export -e /path/to/spec.yaml export.checkpoint=/path/to/model.pth export.onnx_file=/path/to/model.onnx results_dir=/path/to/export_result_dir

TensorRT engine generation, validation, and int8 calibration

For deployment, refer to the TAO Deploy documentation

Deploying to DeepStream

Refer to the Integrating a SegFormer Model page for more information about deploying a SegFormer model to DeepStream.