NVIDIA TAO Toolkit v5.2.0
v5.2.0

SegFormer

SegFormer is an NVIDIA-developed semantic-segmentation model that is included in the TAO Toolkit. SegFormer supports the following tasks:

  • train

  • evaluate

  • inference

  • export

These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:

Copy
Copied!
            

tao model segformer <sub_task> <args_per_subtask>

where args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.

Segformer requires the data to be provided as image and mask folders. See the Data Annotation Format page for more information about the input data format for Segformer.

Configuration for Custom Dataset

  • In this doucmentation, we show example configuration and commands for training on ISBI dataset. ISBI Challenge: Segmentation of neuronal structures in EM stacks dataset for the binary segmentation. It contains grayscale images. For more details, please refer to the example notebook TAO Computer Vision samples. Hence, we set :code: input_type is set to grayscale.

  • For “RGB” input the images the :code: input_type should be set to rgb instead of grayscale.

  • Please configure the img_norm_cfg mean, standard deviation based on your input dataset.

Here is an example spec file for training a SegFormer model with an mit_b5 backbone on an ISBI dataset.

Copy
Copied!
            

train: exp_config: manual_seed: 49 checkpoint_interval: 200 logging_interval: 50 max_iters: 1000 resume_training_checkpoint_path: null validate: True validation_interval: 500 trainer: find_unused_parameters: True sf_optim: lr: 0.00006 model: input_height: 512 input_width: 512 pretrained_model_path: null backbone: type: "mit_b5" dataset: data_root: /tlt-pytorch input_type: "grayscale" img_norm_cfg: mean: - 127.5 - 127.5 - 127.5 std: - 127.5 - 127.5 - 127.5 to_rgb: True train_dataset: img_dir: - /data/images/train ann_dir: - /data/masks/train pipeline: augmentation_config: random_crop: cat_max_ratio: 0.75 resize: ratio_range: - 0.5 - 2.0 random_flip: prob: 0.5 val_dataset: img_dir: /data/images/val ann_dir: /data/masks/val palette: - seg_class: foreground rgb: - 0 - 0 - 0 label_id: 0 mapping_class: foreground - seg_class: background rgb: - 255 - 255 - 255 label_id: 1 mapping_class: background repeat_data_times: 500 batch_size: 4 workers_per_gpu: 1

The train classification experiment specification consists of three main components:

  • train

  • dataset

  • model

The train config contains the parameters related to training. They are described as follows:

Parameter Datatype Default Description Supported Values
exp_config

Dict

int

None

49

The exp_config Dict contains the following parameters:

* The random seed to make the trainig deterministic

max_iters int 10 The maximum number of iterations/ steps for which the training should be conducted
checkpoint_interval int 1 The number of steps at which the checkpoint needs to be saved
logging_interval int 10 The number of steps at which the experiment logs need to be saved. The logs are saved in the logs directory.
resume_training_checkpoint_path str None The path to the checkpoint for resuming training
validate bool False A flag to enable validation during training

validation_interval

int

int

The interval number of iterations at which validation should be performed during training
Note that the validation interval should be atleast 1 less than checkpoint interval to prevent status overriding

trainer

Dict

bool
Dict
Dict

None

False
None
None

This config contains parameters required by MMSeg trainer:

* find_unused_parameters: Sets this param in DDP. For more information, refer to DDP_PyT.
* sf_optim: The Segformer optimizer config. For more information, refer to optimizer_spec.
* lr_config: The Segformer lr config. For more information, refer to creating_lr_config_sf.





sf_optim

Copy
Copied!
            

sf_optim: lr: 0.00006 betas: - 0.0 - 0.999 paramwise_cfg: pos_block: decay_mult: 0.0 norm: decay_mult: 0.0 head: lr_mut: 10.0 weight_decay: 5e-4

Parameter Datatype Default Description Supported Values
lr float 0.00006 The learning rate >=0.0
betas List[float] [0.0, 0.9] The beta parameters in the Adam optimizer >=0.0

paramwise_cfg

Dict
Dict
float
Dict
float
Dict
float

None
None
0.0
None
0.0
None
10.0

Configuration parameters for the Adam optimizer:
* pos_block
* decay_mult

* norm
* decay_mult

* head
* lr_mult



>=0.0
>=0.0
>=0.0

>=0.0

weight_decay float 5e-4 weight_decay hyper-parameter for regularization. >=0.0

lr_config

Copy
Copied!
            

lr_config: warmup_iters: 1500 warmup_ratio: 1e-6 power: 1.0 min_lr: 0.0

Parameter Datatype Default Description Supported Values
warmup_iters int 1500 The number of iterations or epochs that warmup lasts. >=0.0
warmup_ratio float 1e-6 The LR used at the beginning of warmup is equal to warmup_ratio * initial_lr >=0.0
power float 1.0 The power to which the multiplying coefficients are raised to. >=0.0
min_lr float 0.0 The minimum LR to start the LR scheduler >=0.0

The following example model provides options to change the SegFormer architecture for training.

Copy
Copied!
            

model: input_height: 512 input_width: 512 pretrained_model_path: null backbone: type: "mit_b5"

The following example model is used during Segformer evaluation/inference.

Parameter Datatype Default Description Supported Values
pretrained_model_path string None The optional path to the pretrained backbone file string to the path

backbone

Dict
string

None

A dictionary containing the following configurable parameters:
* type: The name of the backbone to be used

mit_b0, mit_b1
mit_b2, mit_b3
mit_b4, mit_b5
fan_tiny_8_p4_hybrid
fan_large_16_p4_hybrid
fan_small_12_p4_hybrid
fan_base_16_p4_hybrid

decode_head

Dict
int
Bool

Float

None
768
False

0.1

A dictionary containing the decoder parameters:

* decoder_params: Contains the following network parameters:
* embed_dims: The embedding dimensions
* align_corners: If set to True, the input and output tensors are aligned by the center points of their corner pixels,
preserving the values at the corner pixels.
* dropout_ratio: The dropout probability ratio to drop the neurons in the neural network

256, 512, 768
True, False

>=0.0

input_width int 512 Input height of the model >0
input_height int 512 Input width of the model >0

The dataset parameter defines the dataset source, training batch size, and augmentation. An example dataset is provided below.

Copy
Copied!
            

dataset: data_root: /tlt-pytorch input_type: "grayscale" img_norm_cfg: mean: - 127.5 - 127.5 - 127.5 std: - 127.5 - 127.5 - 127.5 to_rgb: True train_dataset: img_dir: - /data/images/train ann_dir: - /data/masks/train pipeline: augmentation_config: random_crop: cat_max_ratio: 0.75 resize: ratio_range: - 0.5 - 2.0 random_flip: prob: 0.5 val_dataset: img_dir: /data/images/val ann_dir: /data/masks/val palette: - seg_class: foreground rgb: - 0 - 0 - 0 label_id: 0 mapping_class: foreground - seg_class: background rgb: - 255 - 255 - 255 label_id: 1 mapping_class: background repeat_data_times: 500 batch_size: 4 workers_per_gpu: 1

Parameter Datatype Default Description Supported Values

img_norm_cfg

Dict
List[float]
List[float]
bool

None
[123.675, 116.28, 103.53]
[58.395, 57.12, 57.375]
True

The mage normalization config, which contains the following parameters:
* mean: The mean to be subtracted for pre-processing
* std: The standard deviation to divide the image by
* to_rgb: Whether to convert the input format from BGR to RGB

>=0, <=255
>=0.0
True, False

input_type String “rgb” Whether the input type is RGB or grayscale “rgb”, “grayscale”

palette

List[Dict]
string
string
int
List[int]

None
background
background
0
[255, 255, 255]

The pallate config:
* seg_class: The segmentation category
* mapping_class: The category to group it with
* label_id: The integer class ID
* rgb: The color to be overlaid for this class during inference

string
string
>=0
>=0, <=255

batch_size unsigned int 32 The batch size for training and validation >0
workers_per_gpu unsigned int 8 The number of parallel workers processing data >0

train_dataset

dict config

str
str
dict config
dict config

dict config

None

None
None

None

The parameters to define the training dataset:

* img_dir: The path to the images directory
* ann_dir: The path to the PNG masks directory
* pipeline:
* augmentation_config: The augmentation config details
(refer to augmentation_config for more information)

* Pad: The padding augmentation config:
* size_ht (int): The height at which to pad the image/mask
* size_wd (int): The width at which pad the image/mask
* pad_val (int): The padding value for the input image
* seg_pad_val (int): The padding value for the segmentation

Dict Config

None
1024
1024
0
255

val_dataset

dict config

str
str
dict config
List[int]

None

None
None
None
[2048, 1024]

The validation config contains the following parameters for validation
during training:
* img_dir: The path to images directory
* ann_dir: The path to PNG masks directory
* pipeline:
* multi_scale: The largest scale of image

>=0

test_dataset

dict config

str
str
dict config
List[int]

None

None
None
None
[2048, 1024]

The validation config contains the following parameters for validation
during training:
* img_dir: The path to the images directory
* ann_dir: The path to PNG masks directory
* pipeline:
* multi_scale: The largest scale of image

>=0

augmentation_config

Parameter Datatype Default Description Supported Values

random_crop

Dict
List[int]
Float

None
[1024, 1024]
0.75

The random_crop config has following parameters:
* crop_size: Crop size for augmentation
* cat_max_ratio

0< h,w <= img_ht, img_wd
>= 0.0

resize

Dict

Bool

None

[0.5, 2.0]

True

The resize Config has the following configurable parameters:
* img_scale: [height, width] scale to which the input image should be rescaled
* ratio_range: A ratio will be randomly sampled from the range specified by
ratio_range. Then it would be multiplied with img_scale to
generate sampled scale.

* keep_ratio: Whether to preserve aspect ratio

>=0
>=0.0

True/ False

random_flip

Dict

None
0.5

The random_flip config contains the following parameters for flipping aug:
* prob: Probability with which the image should be flipped

>=0.0

Use the following command to run Segformer training:

Copy
Copied!
            

tao model segformer train -e <experiment_spec_file> -r <results_dir> -g <num_gpus>

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file.

  • -r, --results_dir: The path to a folder where the experiment outputs should be written.

Optional Arguments

  • -g, --num_gpus: The number ogf GPUs to be used for training. The default value is 1.

Here’s an example of using the SegFormer training command:

Copy
Copied!
            

tao model segformer train -e $DEFAULT_SPEC -r $RESULTS_DIR -g $NUM_GPUs

The evaluation metric of Segformer is the meanIOU. For more details on the mean IOU metric, please refer here meanIOU.:

Use the following command to run Segformer evaluation:

Copy
Copied!
            

tao model segformer evaluate -e <experiment_spec> -g <num GPUs> evaluate.checkpoint=<evaluation model> results_dir=<path to output evaluation results>

Required Arguments

  • -e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment.

  • evaluate.checkpoint: The .pth model.

Optional Argument

  • -g, --num_gpus: The number ogf GPUs to be used for training. The default value is 1.

Here’s an example of using the Segformer evaluation command:

Copy
Copied!
            

+------------+-------+-------+ | Class | IoU | Acc | +------------+-------+-------+ | foreground | 37.81 | 44.56 | | background | 83.81 | 95.51 | +------------+-------+-------+ Summary: +--------+-------+-------+-------+ | Scope | mIoU | mAcc | aAcc | +--------+-------+-------+-------+ | global | 60.81 | 70.03 | 85.26 | +--------+-------+-------+-------+ ...

Copy
Copied!
            

tao model segformer evaluate -e $DEFAULT_SPEC -g $NUM_GPUS evaluate.checkpoint=$TRAINED_PTH_MODEL results_dir=$PATH_TO_RESULTS_DIR

Use the following command to run inference on Segformer with the .pth model.

Copy
Copied!
            

tao model segformer inference -e <experiment_spec> inference.checkpoint=<inference model> results_dir=<path to output directory for inference>

The output mask PNG images with class ID’s is saved in vis_tao. The overlaid mask images are saved in mask_tao.

Required Arguments

  • -e, --experiment_spec: The experiment spec file to set up inference

  • inference.checkpoint: The .pth model to perform inference with

  • results_dir: The path to save the inference masks and mask overlaid images to. Inference creates two directories.

Optional Argument

  • -g, --num_gpus: The number ogf GPUs to be used for training. The default value is 1.

Here’s an example of using the Segformer inference command:

Copy
Copied!
            

tao model segformer inference -e $DEFAULT_SPEC -g $NUM_GPUS inference.checkpoint=$TRAINED_PTH_MODEL results_dir=$OUTPUT_FOLDER

Use the following command to export the model.

Copy
Copied!
            

tao model segformer export [-h] -e <experiment spec file> results_dir=<path to results dir> export.checkpoint=<trained pth model to be xported> export.onnx_file=<onnx path>

Required Arguments

  • -e, --experiment_spec: The path to an experiment spec file

  • results_dir: The path where the logs for export will be saved

  • export.checkpoint: The .pth model to be exported

  • export.onnx_file: The :code:.`onnx` file to be stored

Sample Usage

The following is an example export command:

Copy
Copied!
            

tao model segformer export -e /path/to/spec.yaml export.checkpoint=/path/to/model.pth export.onnx_file=/path/to/model.onnx results_dir=/path/to/export_result_dir

Refer to the Integrating a SegFormer Model page for more information about deploying a SegFormer model to DeepStream.

Previous UNET
Next Gaze Estimation
© Copyright 2024, NVIDIA. Last updated on Mar 18, 2024.