SegFormer#
SegFormer is an NVIDIA-developed semantic-segmentation model that is included in TAO. SegFormer supports the following tasks:
train
evaluate
inference
export
These tasks can be invoked from the TAO Launcher using the following convention on the command-line:
tao model segformer <sub_task> <args_per_subtask>
where args_per_subtask
are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.
Data Input for SegFormer#
Segformer requires the data to be provided as image and mask folders. See the Data Annotation Format page for more information about the input data format for Segformer.
Creating Training Experiment Spec File#
Configuration for Custom Dataset#
In this doucmentation, we show example configuration and commands for training on ISBI dataset. ISBI Challenge: Segmentation of neuronal structures in EM stacks dataset for the binary segmentation. It contains grayscale images. For more details, please refer to the example notebook TAO Computer Vision samples. Hence, we set :code: input_type is set to
grayscale
.For “RGB” input the images the :code: input_type should be set to
rgb
instead ofgrayscale
.Please configure the
img_norm_cfg
mean, standard deviation based on your input dataset.
Here is an example spec file for training a SegFormer model with an mit_b5 backbone on an ISBI dataset.
train:
exp_config:
manual_seed: 49
checkpoint_interval: 200
logging_interval: 50
max_iters: 1000
resume_training_checkpoint_path: null
validate: True
validation_interval: 500
trainer:
find_unused_parameters: True
sf_optim:
lr: 0.00006
model:
input_height: 512
input_width: 512
pretrained_model_path: null
backbone:
type: "mit_b5"
dataset:
data_root: /tlt-pytorch
input_type: "grayscale"
img_norm_cfg:
mean:
- 127.5
- 127.5
- 127.5
std:
- 127.5
- 127.5
- 127.5
to_rgb: True
train_dataset:
img_dir:
- /data/images/train
ann_dir:
- /data/masks/train
pipeline:
augmentation_config:
random_crop:
cat_max_ratio: 0.75
resize:
ratio_range:
- 0.5
- 2.0
random_flip:
prob: 0.5
val_dataset:
img_dir: /data/images/val
ann_dir: /data/masks/val
palette:
- seg_class: foreground
rgb:
- 0
- 0
- 0
label_id: 0
mapping_class: foreground
- seg_class: background
rgb:
- 255
- 255
- 255
label_id: 1
mapping_class: background
repeat_data_times: 500
batch_size: 4
workers_per_gpu: 1
The train classification experiment specification consists of three main components:
train
dataset
model
train#
The train config contains the parameters related to training. They are described as follows:
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
Dict
int
|
None
49
|
The
exp_config Dict contains the following parameters:* The random seed to make the trainig deterministic
|
–
|
|
int |
10 |
The maximum number of iterations/ steps for which the training should be conducted |
|
|
int |
1 |
The number of steps at which the checkpoint needs to be saved |
|
|
int |
10 |
The number of steps at which the experiment logs need to be saved. The logs are saved in the logs directory. |
|
|
str |
None |
The path to the checkpoint for resuming training |
|
|
bool |
False |
A flag to enable validation during training |
|
validation_interval |
int
|
int
|
The interval number of iterations at which validation should be performed during training
Note that the validation interval should be atleast 1 less than checkpoint interval to prevent status overriding
|
|
trainer |
Dict
bool
Dict
Dict
|
None
False
None
None
|
This config contains parameters required by MMSeg trainer:
*
find_unused_parameters : Sets this param in DDP. For more information, refer to DDP_PyT.*
sf_optim : The Segformer optimizer config. For more information, refer to optimizer_spec.*
lr_config : The Segformer lr config. For more information, refer to creating_lr_config_sf. |
–
–
–
–
–
|
sf_optim#
sf_optim:
lr: 0.00006
betas:
- 0.0
- 0.999
paramwise_cfg:
pos_block:
decay_mult: 0.0
norm:
decay_mult: 0.0
head:
lr_mut: 10.0
weight_decay: 5e-4
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
float |
0.00006 |
The learning rate |
>=0.0 |
|
List[float] |
[0.0, 0.9] |
The beta parameters in the Adam optimizer |
>=0.0 |
paramwise_cfg |
Dict
Dict
float
Dict
float
Dict
float
|
None
None
0.0
None
0.0
None
10.0
|
Configuration parameters for the Adam optimizer:
*
pos_block * decay_mult
*
norm * decay_mult
*
head * lr_mult
|
–
–
>=0.0
>=0.0
>=0.0
–
>=0.0
|
|
float |
5e-4 |
weight_decay hyper-parameter for regularization. |
>=0.0 |
lr_config#
lr_config:
warmup_iters: 1500
warmup_ratio: 1e-6
power: 1.0
min_lr: 0.0
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
int |
1500 |
The number of iterations or epochs that warmup lasts. |
>=0.0 |
|
float |
1e-6 |
The LR used at the beginning of warmup is equal to |
>=0.0 |
|
float |
1.0 |
The power to which the multiplying coefficients are raised to. |
>=0.0 |
|
float |
0.0 |
The minimum LR to start the LR scheduler |
>=0.0 |
model#
The following example model
provides options to change the SegFormer architecture for training.
model:
input_height: 512
input_width: 512
pretrained_model_path: null
backbone:
type: "mit_b5"
The following example model
is used during Segformer evaluation/inference.
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
string |
None |
The optional path to the pretrained backbone file |
string to the path |
backbone |
Dict
string
|
None
|
A dictionary containing the following configurable parameters:
*
type : The name of the backbone to be used |
mit_b0, mit_b1
mit_b2, mit_b3
mit_b4, mit_b5
fan_tiny_8_p4_hybrid
fan_large_16_p4_hybrid
fan_small_12_p4_hybrid
fan_base_16_p4_hybrid
|
decode_head |
Dict
int
Bool
Float
|
None
768
False
0.1
|
A dictionary containing the decoder parameters:
*
decoder_params : Contains the following network parameters:*
embed_dims : The embedding dimensions*
align_corners : If set to True, the input and output tensors are aligned by the center points of their corner pixels,preserving the values at the corner pixels.
*
dropout_ratio : The dropout probability ratio to drop the neurons in the neural network |
256, 512, 768
True, False
>=0.0
|
|
int |
512 |
Input height of the model |
>0 |
|
int |
512 |
Input width of the model |
>0 |
dataset#
The dataset
parameter defines the dataset source, training batch size, and
augmentation. An example dataset
is provided below.
dataset:
data_root: /tlt-pytorch
input_type: "grayscale"
img_norm_cfg:
mean:
- 127.5
- 127.5
- 127.5
std:
- 127.5
- 127.5
- 127.5
to_rgb: True
train_dataset:
img_dir:
- /data/images/train
ann_dir:
- /data/masks/train
pipeline:
augmentation_config:
random_crop:
cat_max_ratio: 0.75
resize:
ratio_range:
- 0.5
- 2.0
random_flip:
prob: 0.5
val_dataset:
img_dir: /data/images/val
ann_dir: /data/masks/val
palette:
- seg_class: foreground
rgb:
- 0
- 0
- 0
label_id: 0
mapping_class: foreground
- seg_class: background
rgb:
- 255
- 255
- 255
label_id: 1
mapping_class: background
repeat_data_times: 500
batch_size: 4
workers_per_gpu: 1
Parameter |
Datatype |
Default |
Description |
Supported Values |
img_norm_cfg |
Dict
List[float]
List[float]
bool
|
None
[123.675, 116.28, 103.53]
[58.395, 57.12, 57.375]
True
|
The mage normalization config, which contains the following parameters:
*
mean : The mean to be subtracted for pre-processing*
std : The standard deviation to divide the image by*
to_rgb : Whether to convert the input format from BGR to RGB |
>=0, <=255
>=0.0
True, False
|
|
String |
“rgb” |
Whether the input type is RGB or grayscale |
“rgb”, “grayscale” |
palette |
List[Dict]
string
string
int
List[int]
|
None
background
background
0
[255, 255, 255]
|
The pallate config:
*
seg_class : The segmentation category*
mapping_class : The category to group it with*
label_id : The integer class ID*
rgb : The color to be overlaid for this class during inference |
string
string
>=0
>=0, <=255
|
|
unsigned int |
32 |
The batch size for training and validation |
>0 |
|
unsigned int |
8 |
The number of parallel workers processing data |
>0 |
train_dataset |
dict config
str
str
dict config
dict config
dict config
|
None
None
None
None
|
The parameters to define the training dataset:
*
img_dir : The path to the images directory*
ann_dir : The path to the PNG masks directory*
pipeline :*
augmentation_config : The augmentation config details(refer to augmentation_config for more information)
*
Pad : The padding augmentation config:*
size_ht (int) : The height at which to pad the image/mask*
size_wd (int) : The width at which pad the image/mask*
pad_val (int) : The padding value for the input image*
seg_pad_val (int) : The padding value for the segmentation |
Dict Config
None
1024
1024
0
255
|
val_dataset |
dict config
str
str
dict config
List[int]
|
None
None
None
None
[2048, 1024]
|
The validation config contains the following parameters for validation
during training:
*
img_dir : The path to images directory*
ann_dir : The path to PNG masks directory*
pipeline :*
multi_scale : The largest scale of image |
>=0
|
test_dataset |
dict config
str
str
dict config
List[int]
|
None
None
None
None
[2048, 1024]
|
The validation config contains the following parameters for validation
during training:
*
img_dir : The path to the images directory*
ann_dir : The path to PNG masks directory*
pipeline :*
multi_scale : The largest scale of image |
>=0
|
augmentation_config#
Parameter |
Datatype |
Default |
Description |
Supported Values |
random_crop |
Dict
List[int]
Float
|
None
[1024, 1024]
0.75
|
The random_crop config has following parameters:
*
crop_size : Crop size for augmentation*
cat_max_ratio |
0< h,w <= img_ht, img_wd
>= 0.0
|
resize |
Dict
Bool
|
None
[0.5, 2.0]
True
|
The resize Config has the following configurable parameters:
*
img_scale : [height, width] scale to which the input image should be rescaled*
ratio_range : A ratio will be randomly sampled from the range specified byratio_range . Then it would be multiplied with img_scale togenerate sampled scale.
*
keep_ratio : Whether to preserve aspect ratio |
>=0
>=0.0
True/ False
|
random_flip |
Dict
|
None
0.5
|
The random_flip config contains the following parameters for flipping aug:
*
prob : Probability with which the image should be flipped |
>=0.0
|
Training the Model#
Use the following command to run Segformer training:
tao model segformer train [-h] -e <experiment_spec_file>
[results_dir=<global_results_dir>]
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[train.<train_option>=<train_option_value>]
[train.gpu_ids=<gpu indices>]
[train.num_gpus=<number of gpus>]
Required Arguments#
The only required argument is the path to the experiment spec:
-e, --experiment_spec
: The experiment specification file to set up the training experiment
Optional Arguments#
You can set optional arguments to override the option values in the experiment spec file.
-h, --help
: Show this help message and exit.model.<model_option>
: The model options.dataset.<dataset_option>
: The dataset options.train.<train_option>
: The train options.train.train_config.optimizer.<optim_option>
: The optimizer options
Note
For training, evaluation, and inference, we expose 2 variables for each respective task: num_gpus
and gpu_ids
, which
default to 1
and [0]
, respectively. If both are passed, but inconsistent, for example num_gpus = 1
,
gpu_ids = [0, 1]
, then they are modified to follow the setting with more GPUs, for example num_gpus = 1 -> num_gpus = 2
.
Evaluating the model#
The evaluation metric of Segformer is the meanIOU. For more details on the mean IOU metric, please refer here meanIOU.:
Use the following command to run Segformer evaluation:
tao model segformer evaluate -e <experiment_spec>
evaluate.checkpoint=<evaluation model>
results_dir=<path to output evaluation results>
[evaluate.gpu_ids=<gpu indices>]
[evaluate.num_gpus=<number of gpus>]
Required Arguments#
-e, --experiment_spec_file
: The experiment spec file to set up the evaluation experiment.evaluate.checkpoint
: The.pth
model.
Here’s an example of using the Segformer evaluation command:
+------------+-------+-------+
| Class | IoU | Acc |
+------------+-------+-------+
| foreground | 37.81 | 44.56 |
| background | 83.81 | 95.51 |
+------------+-------+-------+
Summary:
+--------+-------+-------+-------+
| Scope | mIoU | mAcc | aAcc |
+--------+-------+-------+-------+
| global | 60.81 | 70.03 | 85.26 |
+--------+-------+-------+-------+
...
Running Inference on the Model#
Use the following command to run inference on Segformer with the .pth
model.
tao model segformer inference -e <experiment_spec>
inference.checkpoint=<inference model>
results_dir=<path to output directory for inference>
[inference.gpu_ids=<gpu indices>]
[inference.num_gpus=<number of gpus>]
The output mask PNG images with class ID’s is saved in vis_tao
.
The overlaid mask images are saved in mask_tao
.
Required Arguments#
-e, --experiment_spec
: The experiment spec file to set up inferenceinference.checkpoint
: The.pth
model to perform inference withresults_dir
: The path to save the inference masks and mask overlaid images to. Inference creates two directories.
Exporting the Model#
Use the following command to export the model.
tao model segformer export [-h] -e <experiment spec file>
results_dir=<path to results dir>
export.checkpoint=<trained pth model to be xported>
export.onnx_file=<onnx path>
Required Arguments#
-e, --experiment_spec
: The path to an experiment spec fileresults_dir
: The path where the logs for export will be savedexport.checkpoint
: The.pth
model to be exportedexport.onnx_file
: The :code:.`onnx` file to be stored
TensorRT engine generation, validation, and int8 calibration#
For deployment, refer to the TAO Deploy documentation
Deploying to DeepStream#
Refer to the Integrating a SegFormer Model page for more information about deploying a SegFormer model to DeepStream.