NVIDIA TAO Toolkit v4.0
NVIDIA TAO Release tlt.40

SegFormer

SegFormer is an NVIDIA-developed semantic-segmentation model that is included in the TAO Toolkit. SegFormer supports the following tasks:

  • train

  • evaluate

  • inference

  • export

These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:

Copy
Copied!
            

tao segformer <sub_task> <args_per_subtask>

where args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.

Segformer requires the data to be provided as image and mask folders. See the Data Annotation Format page for more information about the input data format for Segformer.

Configuration for Custom Dataset

  • In this doucmentation, we show example configuration and commands for training on ISBI dataset. ISBI Challenge: Segmentation of neuronal structures in EM stacks dataset for the binary segmentation. It contains grayscale images. For more details, please refer to the example notebook TAO Computer Vision samples. Hence, we set :code: input_type is set to grayscale.

  • For “RGB” input the images the :code: input_type should be set to rgb instead of grayscale.

  • Please configure the img_norm_cfg mean, standard deviation based on your input dataset.

Here is an example spec file for training a SegFormer model with an mit_b5 backbone on an ISBI dataset.

Copy
Copied!
            

exp_config: manual_seed: 49 distributed: True train_config: runner: max_iters: 4000 checkpoint_config: interval: 500 logging: interval: 100 resume_training_checkpoint_path: /path/to/checkpoint_to_resume sf_optim: lr: 0.00006 validate: True validation_config: interval: 250 dataset_config: train_img_dirs: - /home/projects1_metropolis/tmp/subha/unet/data/isbi/images/train train_ann_dirs: - /home/projects1_metropolis/tmp/subha/unet/data/isbi/masks/train val_img_dir: /home/projects1_metropolis/tmp/subha/unet/data/isbi/images/val val_ann_dir: /home/projects1_metropolis/tmp/subha/unet/data/isbi/masks/val img_norm_cfg: mean: - 127.5 - 127.5 - 127.5 std: - 127.5 - 127.5 - 127.5 to_rgb: True input_type: "grayscale" palette: - seg_class: foreground rgb: - 0 - 0 - 0 label_id: 0 mapping_class: foreground - seg_class: background rgb: - 255 - 255 - 255 label_id: 1 mapping_class: background val_pipeline: multi_scale: - 2048 - 512 train_pipeline: augmentation_config: RandomCrop: crop_size: - 512 - 512 cat_max_ratio: 0.75 Resize: img_scale: - 1024 - 512 ratio_range: - 0.5 - 2.0 RandomFlip: prob: 0.5 colorAug: type: PhotoMetricDistortion Pad: size_ht: 512 size_wd: 512 pad_val: 0 seg_pad_val: 255 repeat_data_times: 500 batch_size_per_gpu: 4 workers_per_gpu: 1 model_config: pretrained: /path/to/your-pretrained-backbone-model backbone: type: "mit_b5" train_backbone: True num_feature_levels: 4 dec_layers: 6 enc_layers: 6 num_queries: 300 with_box_refine: True dropout_ratio: 0.3 output_dir: /path/to/experiment_results

Parameter

Data Type

Default

Description

exp_config

dict config

The configuration of the experiment (explained in model_config)

model_config

dict config

The configuration of the model architecture and is detailed in model_config.

dataset_config

dict config

The configuration for the dataset and detailed in dataset_config.

train_config

dict config

The configuration for training parameters, which is detailed in train_config

output_dir

string

The path to save the model experiment log outputs and model checkpoints

runner

dict config int

None 100

Contains the following configurable parameters: * max_iters: The maximum number of iterations for which the training should be conducted

checkpoint_config

dict config int

None 1

Contains the following configurable parameters: * interval: The number of steps at which the checkpoint needs to be saved

logging

dict config int

None 10

Contains the following configurable parameters: * interval: The number of steps at which the experiment logs need to be saved. The logs are saved in the logs directory in the output directory.

sf_optim

dict config

None

Contains the configurable parameters for Segformer optimizer and is detailed in sf_optim.

validation_config

dict config int

None int

Contains the following configurable parameters: * interval : The interval number of iterations at which validation should be performed during training

validate

bool

False

A flag that enables validation during training

sf_optim

Copy
Copied!
            

sf_optim: lr: 0.00006 betas: - 0.0 - 0.999 paramwise_cfg: pos_block: decay_mult: 0.0 norm: decay_mult: 0.0 head: lr_mut: 10.0 weight_decay: 5e-4

Parameter

Datatype

Default

Description

Supported Values

lr

float

0.00006

The learning rate

>=0.0

betas

List[float]

[0.0, 0.9]

The beta parameters in the Adam optimizer

>=0.0

paramwise_cfg

dict config Dict Dict Dict

None 0.0 0.0 10.0

Configuration parameters for the Adam optimizer * pos_block: {“decay_mult”:0.0} * norm: {“decay_mult”: 0.0} * head: {“lr_mult”: 10.0}

>=0.0 >=0.0 >=0.0

weight_decay

float

5e-4

weight_decay hyper-parameter for regularization.

>=0.0

The exp_config parameter defines the hyperparameters of the experiment.

Copy
Copied!
            

exp_config: manual_seed: 49

Parameter

Datatype

Default

Description

Supported Values

manual_seed

unsigned int

49

The random seed to make the trainig deterministic

>0

The following example model_config provides options to change the SegFormer architecture for training.

Copy
Copied!
            

model_config: pretrained: /path/to/pretrained_mit_b5.pth backbone: type: "mit_b5" decode_head: decoder_params: embed_dims: 768 align_corners: False dropout_ratio: 0.1

The following example model_config is used during Segformer evaluation/inference.

Parameter

Datatype

Default

Description

Supported Values

pretrained

string

None

The optional path to the pretrained backbone file

string to the path

backbone

Dict string

None

Contains the following configurable parameters * type: The name of the backbone to be used

mit_b0, mit_b1 mit_b2, mit_b3 mit_b4, mit_b5

decode_head

Dict Dict int Bool

Float

None None 768 False

0.1

A dictionary containing the decoder parameters:
  • decoder_params: Contains the following network parameters
    • embed_dims: Embedding dimensions

  • align_corners: If set to True, the input and output tensors are aligned by the center points of their corner pixels, preserving the values at the corner pixels.

  • dropout_ratio: The dropout probability ratio to drop the neurons in the neural network

256, 512, 768 True, False

>=0.0

The dataset_config parameter defines the dataset source, training batch size, and augmentation. An example dataset_config is provided below.

Copy
Copied!
            

dataset_config: train_img_dirs: - /home/projects1_metropolis/tmp/subha/unet/data/isbi/images/train train_ann_dirs: - /home/projects1_metropolis/tmp/subha/unet/data/isbi/masks/train val_img_dir: /home/projects1_metropolis/tmp/subha/unet/data/isbi/images/val val_ann_dir: /home/projects1_metropolis/tmp/subha/unet/data/isbi/masks/val img_norm_cfg: mean: - 127.5 - 127.5 - 127.5 std: - 127.5 - 127.5 - 127.5 to_rgb: True input_type: "grayscale" palette: - seg_class: foreground rgb: - 0 - 0 - 0 label_id: 0 mapping_class: foreground - seg_class: background rgb: - 255 - 255 - 255 label_id: 1 mapping_class: background train_pipeline: Pad: size_ht: 512 size_wd: 512 pad_val: 0 seg_pad_val: 255 augmentation_config: RandomCrop: crop_size: - 512 - 512 cat_max_ratio: 0.75 resize: img_scale: - 1024 - 512 ratio_range: - 0.5 - 2.0 random_flip: prob: 0.5

Parameter

Datatype

Default

Description

Supported Values

train_img_dirs

list dict

A list of Paths to image directories

List of strings

train_ann_dirs

list dict

A list of Paths to PNG mask directories

List of strings

val_img_dir

str

The path to the Validation Image directory

string

val_ann_dir

str

The path to the validation PNG masks directory

string

img_norm_cfg

Dict List[float] List[float] bool

None [123.675, 116.28, 103.53] [58.395, 57.12, 57.375] True

The mage normalization config, which contains the following parameters: * mean: The mean to be subtracted for pre-processing * std: The standard deviation to divide the image by * to_rgb: Whether to convert the input format from BGR to RGB

>=0, <=255 >=0.0 True, False

input_type

String

“rgb”

Whether the input type is RGB or grayscale

“rgb”, “grayscale”

palette

List[Dict] string string int List[int]

None background background 0 [255, 255, 255]

The pallate config:
  • seg_class: The segmentation category

  • mapping_class: The category to group it with

  • label_id: The integer class ID

  • rgb: The color to be overlaid for this class during inference

string string >=0 >=0, <=255

batch_size_per_gpu

unsigned int

32

The batch size for training and validation

>0

workers_per_gpu

unsigned int

8

The number of parallel workers processing data

>0

train_pipeline

dict config | None



dict config | None





dict config | None











The parameters to define the augmentation method:

  • augmentation_config: Augmentation config details (refer to augmentation_config)

  • Pad: The padding augmentation config:

    • size_ht (int): The height at which to pad the image/mask

    • size_wd (int): The width at which pad the image/mask

    • pad_val (int): The padding value for the input image

    • seg_pad_val (int): The padding value for the segmentation

Dict Config

None

1024 1024 0 255

val_pipeline

dict config | None





List[int] | [2048, 1024]



The validation config contains the following parameters for validation during training:

  • multi_scale: The largest scale of image

>=0

augmentation_config

Parameter

Datatype

Default

Description

Supported Values

random_crop

Dict List[int] Float

None [1024, 1024] 0.75

The random_crop config has following parameters:
  • crop_size: Crop size for augmentation

  • cat_max_ratio

0< h,w <= img_ht, img_wd >= 0.0

resize

Dict

Bool

None

[0.5, 2.0]

True

The resize Config has the following configurable parameters:
  • img_scale: [height, width] scale to which the input image should be rescaled

  • ratio_range: A ratio will be randomly sampled from the range specified by ratio_range. Then it would be multiplied with img_scale to generate sampled scale.

  • keep_ratio: Whether to preserve aspect ratio

>=0 >=0.0

True/ False

random_flip

Dict

None 0.5

The random_flip config contains the following parameters for flipping aug:
  • prob: Probability with which the image should be flipped

>=0.0

Use the following command to run Segformer training:

Copy
Copied!
            

tao segformer train -e <experiment_spec_file> -r <results_dir> -k <key> -g <num_gpus> [resume_training_checkpoint_path=<absolute path to \*.tlt checkpoint>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file.

  • -r, --results_dir: The path to a folder where the experiment outputs should be written.

  • -k, --key: The user-specific encoding key to save or load a .tlt model.

Optional Arguments

  • resume_training_checkpoint_path: The path to a checkpoint to continue training

  • -g, --num_gpus: The number ogf GPUs to be used for training. The default value is 1.

Here’s an example of using the SegFormer training command:

Copy
Copied!
            

tao segformer train -e $DEFAULT_SPEC -r $RESULTS_DIR -k $YOUR_KEY -g $NUM_GPUs


Here is an example spec file for testing evaluation and inference of a trained SegFormer model.

Copy
Copied!
            

exp_config: manual_seed: 49 distributed: True model_config: backbone: type: "mit_b5" dataset_config: img_norm_cfg: mean: - 127.5 - 127.5 - 127.5 std: - 127.5 - 127.5 - 127.5 test_pipeline: multi_scale: - 2048 - 512 input_type: "grayscale" data_root: /tlt-pytorch test_img_dir: /home/projects1_metropolis/tmp/subha/unet/data/isbi/images/test test_ann_dir: /home/projects1_metropolis/tmp/subha/unet/data/isbi/masks/test palette: - seg_class: foreground rgb: - 0 - 0 - 0 label_id: 0 mapping_class: foreground - seg_class: background rgb: - 255 - 255 - 255 label_id: 1 mapping_class: background batch_size_per_gpu: 4 workers_per_gpu: 1

The evaluation metric of Segformer is the meanIOU. For more details on the mean IOU metric, please refer here meanIOU.:

Use the following command to run Segformer evaluation:

Copy
Copied!
            

tao segformer evaluate -e <experiment_spec> -k <key> model_path=<inference model> output_dir=<path to output file>

Required Arguments

  • -e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment.

  • -k, --key:The encoding key for the .tlt model.

  • model_path: The .tlt model.

Optional Argument

  • -g, --num_gpus: The number ogf GPUs to be used for training. The default value is 1.

Here’s an example of using the Segformer evaluation command:

Copy
Copied!
            

+------------+-------+-------+ | Class | IoU | Acc | +------------+-------+-------+ | foreground | 37.81 | 44.56 | | background | 83.81 | 95.51 | +------------+-------+-------+ Summary: +--------+-------+-------+-------+ | Scope | mIoU | mAcc | aAcc | +--------+-------+-------+-------+ | global | 60.81 | 70.03 | 85.26 | +--------+-------+-------+-------+ ...

Copy
Copied!
            

tao segformer evaluate -e $DEFAULT_SPEC -k $YOUR_KEY model=$TRAINED_TLT_MODEL data=$TEST_DATA label=$TEST_LABEL


Use the following command to run inference on Segformer with the .tlt model.

Copy
Copied!
            

tao segformer inference -e <experiment_spec> -k <key> model_path=<inference model> output_dir=<path to output file>

The output mask PNG images with class ID’s is saved in vis_tao. The overlaid mask images are saved in mask_tao.

Required Arguments

  • -e, --experiment_spec: The experiment spec file to set up inference

  • -k, --key:The encoding key for the .tlt model

  • model_path: The .tlt model to perform inference with

  • output_dir: The path to save the inference masks and mask overlaid images to. Inference creates two directories.

Optional Argument

  • -g, --num_gpus: The number ogf GPUs to be used for training. The default value is 1.

Here’s an example of using the Segformer inference command:

Copy
Copied!
            

tao segformer inference -e $DEFAULT_SPEC -k $KEY model_path=$TRAINED_TLT_MODEL output_dir=$OUTPUT_FOLDER


Use the following command to export the model.

Copy
Copied!
            

tao segformer export [-h] -e <experiment spec file> -k <key> model_path=<trained tlt model to be xported> output_file=<etlt path>

Required Arguments

  • -e, --experiment_spec: The path to an experiment spec file

  • -k, --key: A user-specific encoding key to save or load a .tlt model

  • model_path: The .tlt model to be exported

  • output_file: The :code:.`etlt` file to be stored

Sample Usage

The following is an example export command:

Copy
Copied!
            

tao segformer export -e /path/to/spec.yaml -k $YOUR_KEY model_path=/path/to/model.tlt output_file=/path/to/model.etlt


For deployment, refer to :ref:TAO Deploy documentation <segformer_with_tao_deploy>

Refer to the Integrating a SegFormer Model page for more information about deploying a SegFormer model to DeepStream.

© Copyright 2022, NVIDIA.. Last updated on Mar 23, 2023.