SegFormer
SegFormer is an NVIDIA-developed semantic-segmentation model that is included in the TAO Toolkit. SegFormer supports the following tasks:
train
evaluate
inference
export
These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:
tao segformer <sub_task> <args_per_subtask>
where args_per_subtask
are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.
Segformer requires the data to be provided as image and mask folders. See the Data Annotation Format page for more information about the input data format for Segformer.
Configuration for Custom Dataset
In this doucmentation, we show example configuration and commands for training on ISBI dataset. ISBI Challenge: Segmentation of neuronal structures in EM stacks dataset for the binary segmentation. It contains grayscale images. For more details, please refer to the example notebook TAO Computer Vision samples. Hence, we set :code: input_type is set to
grayscale
.For “RGB” input the images the :code: input_type should be set to
rgb
instead ofgrayscale
.Please configure the
img_norm_cfg
mean, standard deviation based on your input dataset.
Here is an example spec file for training a SegFormer model with an mit_b5 backbone on an ISBI dataset.
exp_config:
manual_seed: 49
distributed: True
train_config:
runner:
max_iters: 4000
checkpoint_config:
interval: 500
logging:
interval: 100
resume_training_checkpoint_path: /path/to/checkpoint_to_resume
sf_optim:
lr: 0.00006
validate: True
validation_config:
interval: 250
dataset_config:
train_img_dirs:
- /home/projects1_metropolis/tmp/subha/unet/data/isbi/images/train
train_ann_dirs:
- /home/projects1_metropolis/tmp/subha/unet/data/isbi/masks/train
val_img_dir: /home/projects1_metropolis/tmp/subha/unet/data/isbi/images/val
val_ann_dir: /home/projects1_metropolis/tmp/subha/unet/data/isbi/masks/val
img_norm_cfg:
mean:
- 127.5
- 127.5
- 127.5
std:
- 127.5
- 127.5
- 127.5
to_rgb: True
input_type: "grayscale"
palette:
- seg_class: foreground
rgb:
- 0
- 0
- 0
label_id: 0
mapping_class: foreground
- seg_class: background
rgb:
- 255
- 255
- 255
label_id: 1
mapping_class: background
val_pipeline:
multi_scale:
- 2048
- 512
train_pipeline:
augmentation_config:
RandomCrop:
crop_size:
- 512
- 512
cat_max_ratio: 0.75
Resize:
img_scale:
- 1024
- 512
ratio_range:
- 0.5
- 2.0
RandomFlip:
prob: 0.5
colorAug:
type: PhotoMetricDistortion
Pad:
size_ht: 512
size_wd: 512
pad_val: 0
seg_pad_val: 255
repeat_data_times: 500
batch_size_per_gpu: 4
workers_per_gpu: 1
model_config:
pretrained: /path/to/your-pretrained-backbone-model
backbone:
type: "mit_b5"
train_backbone: True
num_feature_levels: 4
dec_layers: 6
enc_layers: 6
num_queries: 300
with_box_refine: True
dropout_ratio: 0.3
output_dir: /path/to/experiment_results
Parameter |
Data Type |
Default |
Description |
|
dict config |
– |
The configuration of the experiment (explained in model_config) |
|
dict config |
– |
The configuration of the model architecture and is detailed in model_config. |
|
dict config |
– |
The configuration for the dataset and detailed in dataset_config. |
|
dict config |
– |
The configuration for training parameters, which is detailed in train_config |
|
string |
– |
The path to save the model experiment log outputs and model checkpoints |
|
dict config int |
None 100 |
Contains the following configurable parameters:
* |
|
dict config int |
None 1 |
Contains the following configurable parameters:
* |
|
dict config int |
None 10 |
Contains the following configurable parameters:
* |
|
dict config |
None |
Contains the configurable parameters for Segformer optimizer and is detailed in sf_optim. |
|
dict config int |
None int |
Contains the following configurable parameters:
* |
|
bool |
False |
A flag that enables validation during training |
sf_optim
sf_optim:
lr: 0.00006
betas:
- 0.0
- 0.999
paramwise_cfg:
pos_block:
decay_mult: 0.0
norm:
decay_mult: 0.0
head:
lr_mut: 10.0
weight_decay: 5e-4
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
float |
0.00006 |
The learning rate |
>=0.0 |
|
List[float] |
[0.0, 0.9] |
The beta parameters in the Adam optimizer |
>=0.0 |
|
dict config Dict Dict Dict |
None 0.0 0.0 10.0 |
Configuration parameters for the Adam optimizer * pos_block: {“decay_mult”:0.0} * norm: {“decay_mult”: 0.0} * head: {“lr_mult”: 10.0} |
>=0.0 >=0.0 >=0.0 |
|
float |
5e-4 |
weight_decay hyper-parameter for regularization. |
>=0.0 |
The exp_config
parameter defines the hyperparameters of the experiment.
exp_config:
manual_seed: 49
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
unsigned int |
49 |
The random seed to make the trainig deterministic |
>0 |
The following example model_config
provides options to change the SegFormer architecture for training.
model_config:
pretrained: /path/to/pretrained_mit_b5.pth
backbone:
type: "mit_b5"
decode_head:
decoder_params:
embed_dims: 768
align_corners: False
dropout_ratio: 0.1
The following example model_config
is used during Segformer evaluation/inference.
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
string |
None |
The optional path to the pretrained backbone file |
string to the path |
|
Dict string |
None |
Contains the following configurable parameters
* |
mit_b0, mit_b1 mit_b2, mit_b3 mit_b4, mit_b5 |
|
Dict Dict int Bool Float |
None None 768 False 0.1 |
|
256, 512, 768 True, False >=0.0 |
The dataset_config
parameter defines the dataset source, training batch size, and
augmentation. An example dataset_config
is provided below.
dataset_config:
train_img_dirs:
- /home/projects1_metropolis/tmp/subha/unet/data/isbi/images/train
train_ann_dirs:
- /home/projects1_metropolis/tmp/subha/unet/data/isbi/masks/train
val_img_dir: /home/projects1_metropolis/tmp/subha/unet/data/isbi/images/val
val_ann_dir: /home/projects1_metropolis/tmp/subha/unet/data/isbi/masks/val
img_norm_cfg:
mean:
- 127.5
- 127.5
- 127.5
std:
- 127.5
- 127.5
- 127.5
to_rgb: True
input_type: "grayscale"
palette:
- seg_class: foreground
rgb:
- 0
- 0
- 0
label_id: 0
mapping_class: foreground
- seg_class: background
rgb:
- 255
- 255
- 255
label_id: 1
mapping_class: background
train_pipeline:
Pad:
size_ht: 512
size_wd: 512
pad_val: 0
seg_pad_val: 255
augmentation_config:
RandomCrop:
crop_size:
- 512
- 512
cat_max_ratio: 0.75
resize:
img_scale:
- 1024
- 512
ratio_range:
- 0.5
- 2.0
random_flip:
prob: 0.5
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
list dict |
A list of Paths to image directories |
List of strings |
|
|
list dict |
A list of Paths to PNG mask directories |
List of strings |
|
|
str |
The path to the Validation Image directory |
string |
|
|
str |
The path to the validation PNG masks directory |
string |
|
|
Dict List[float] List[float] bool |
None [123.675, 116.28, 103.53] [58.395, 57.12, 57.375] True |
The mage normalization config, which contains the following parameters:
* |
>=0, <=255 >=0.0 True, False |
|
String |
“rgb” |
Whether the input type is RGB or grayscale |
“rgb”, “grayscale” |
|
List[Dict] string string int List[int] |
None background background 0 [255, 255, 255] |
|
string string >=0 >=0, <=255 |
|
unsigned int |
32 |
The batch size for training and validation |
>0 |
|
unsigned int |
8 |
The number of parallel workers processing data |
>0 |
|
|
The parameters to define the augmentation method:
|
Dict Config None 1024 1024 0 255 |
|
|
|
The validation config contains the following parameters for validation during training:
|
>=0 |
augmentation_config
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
Dict List[int] Float |
None [1024, 1024] 0.75 |
|
0< h,w <= img_ht, img_wd >= 0.0 |
|
Dict Bool |
None [0.5, 2.0] True |
|
>=0 >=0.0 True/ False |
|
Dict |
None 0.5 |
|
>=0.0 |
Use the following command to run Segformer training:
tao segformer train -e <experiment_spec_file>
-r <results_dir>
-k <key>
-g <num_gpus>
[resume_training_checkpoint_path=<absolute path to \*.tlt checkpoint>]
Required Arguments
-e, --experiment_spec_file
: The path to the experiment spec file.-r, --results_dir
: The path to a folder where the experiment outputs should be written.-k, --key
: The user-specific encoding key to save or load a.tlt
model.
Optional Arguments
resume_training_checkpoint_path
: The path to a checkpoint to continue training-g, --num_gpus
: The number ogf GPUs to be used for training. The default value is 1.
Here’s an example of using the SegFormer training command:
tao segformer train -e $DEFAULT_SPEC -r $RESULTS_DIR -k $YOUR_KEY -g $NUM_GPUs
Here is an example spec file for testing evaluation and inference of a trained SegFormer model.
exp_config:
manual_seed: 49
distributed: True
model_config:
backbone:
type: "mit_b5"
dataset_config:
img_norm_cfg:
mean:
- 127.5
- 127.5
- 127.5
std:
- 127.5
- 127.5
- 127.5
test_pipeline:
multi_scale:
- 2048
- 512
input_type: "grayscale"
data_root: /tlt-pytorch
test_img_dir: /home/projects1_metropolis/tmp/subha/unet/data/isbi/images/test
test_ann_dir: /home/projects1_metropolis/tmp/subha/unet/data/isbi/masks/test
palette:
- seg_class: foreground
rgb:
- 0
- 0
- 0
label_id: 0
mapping_class: foreground
- seg_class: background
rgb:
- 255
- 255
- 255
label_id: 1
mapping_class: background
batch_size_per_gpu: 4
workers_per_gpu: 1
The evaluation metric of Segformer is the meanIOU. For more details on the mean IOU metric, please refer here meanIOU.:
Use the following command to run Segformer evaluation:
tao segformer evaluate -e <experiment_spec>
-k <key>
model_path=<inference model>
output_dir=<path to output file>
Required Arguments
-e, --experiment_spec_file
: The experiment spec file to set up the evaluation experiment.-k, --key
:The encoding key for the.tlt
model.model_path
: The.tlt
model.
Optional Argument
-g, --num_gpus
: The number ogf GPUs to be used for training. The default value is 1.
Here’s an example of using the Segformer evaluation command:
+------------+-------+-------+
| Class | IoU | Acc |
+------------+-------+-------+
| foreground | 37.81 | 44.56 |
| background | 83.81 | 95.51 |
+------------+-------+-------+
Summary:
+--------+-------+-------+-------+
| Scope | mIoU | mAcc | aAcc |
+--------+-------+-------+-------+
| global | 60.81 | 70.03 | 85.26 |
+--------+-------+-------+-------+
...
tao segformer evaluate -e $DEFAULT_SPEC -k $YOUR_KEY model=$TRAINED_TLT_MODEL data=$TEST_DATA label=$TEST_LABEL
Use the following command to run inference on Segformer with the .tlt
model.
tao segformer inference -e <experiment_spec>
-k <key>
model_path=<inference model>
output_dir=<path to output file>
The output mask PNG images with class ID’s is saved in vis_tao
.
The overlaid mask images are saved in mask_tao
.
Required Arguments
-e, --experiment_spec
: The experiment spec file to set up inference-k, --key
:The encoding key for the.tlt
modelmodel_path
: The.tlt
model to perform inference withoutput_dir
: The path to save the inference masks and mask overlaid images to. Inference creates two directories.
Optional Argument
-g, --num_gpus
: The number ogf GPUs to be used for training. The default value is 1.
Here’s an example of using the Segformer inference command:
tao segformer inference -e $DEFAULT_SPEC -k $KEY model_path=$TRAINED_TLT_MODEL output_dir=$OUTPUT_FOLDER
Use the following command to export the model.
tao segformer export [-h] -e <experiment spec file>
-k <key>
model_path=<trained tlt model to be xported>
output_file=<etlt path>
Required Arguments
-e, --experiment_spec
: The path to an experiment spec file-k, --key
: A user-specific encoding key to save or load a.tlt
modelmodel_path
: The.tlt
model to be exportedoutput_file
: The :code:.`etlt` file to be stored
Sample Usage
The following is an example export
command:
tao segformer export -e /path/to/spec.yaml -k $YOUR_KEY model_path=/path/to/model.tlt output_file=/path/to/model.etlt
For deployment, refer to :ref:TAO Deploy documentation <segformer_with_tao_deploy>
Refer to the Integrating a SegFormer Model page for more information about deploying a SegFormer model to DeepStream.