SegFormer#
SegFormer is an NVIDIA-developed semantic-segmentation model that is included in TAO. SegFormer supports the following tasks:
train
evaluate
inference
export
These tasks can be invoked from the TAO Launcher using the following convention on the command-line:
SPECS=$(tao-client segformer get-spec --action <sub_task> --job_type experiment --id $EXPERIMENT_ID)
JOB_ID=$(tao-client segformer experiment-run-action --action <sub_task> --id $EXPERIMENT_ID --specs "$SPECS")
Required Arguments
--id
: The unique identifier of the experiment from which to train the model
See also
For information on how to create an experiment using the FTMS client, refer to the Creating an experiment section in the Remote Client documentation.
tao model segformer <sub_task> <args_per_subtask>
Where args_per_subtask
are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.
Data Input for SegFormer#
Segformer requires the data to be provided as image and mask folders. See the Data Annotation Format page for more information about the input data format for Segformer.
Creating Training Experiment Spec File#
Configuration for Custom Dataset#
In this doucmentation, we show example configuration and commands for training on multi-class dataset. For more details, please refer to the example notebook TAO Computer Vision samples.
Here is an example spec file for training a SegFormer model with an NVDINOv2 backbone.
Please noted that the spec file is for reference. The user should create their own spec file based on their own dataset.
We first need to set the base_experiment.
FILTER_PARAMS='{"network_arch": "segformer"}'
$BASE_EXPERIMENTS=$(tao-client segformer list-base-experiments --filter_params "$FILTER_PARAMS")
Retrieve the PTM_ID for NVDINOv2 backbone from $BASE_EXPERIMENTS before setting base_experiment.
PTM_INFORMATION="{\"base_experiment\": [$PTM_ID]}"
tao-client segformer patch-artifact-metadata --id $EXPERIMENT_ID --job_type experiment --update_info $PTM_INFORMATION
Then retrieve the specifications.
TRAIN_SPECS=$(tao-client getformer get-spec --action train --job_type experiment --id $EXPERIMENT_ID)
Get specifications from $TRAIN_SPECS. You can override values as needed.
encryption_key: tlt_encode
results_dir: <path_to_output_dir>
train:
resume_training_checkpoint_path: null
segment:
loss: "ce"
num_epochs: 50
num_nodes: 1
validation_interval: 1
checkpoint_interval: 50
optim:
lr: 0.0001
optim: "adamw"
policy: "linear"
weight_decay: 0.0005
evaluate:
checkpoint: ${results_dir}/train/segformer_model_latest.pth
vis_after_n_batches: 1
inference:
checkpoint: ${results_dir}/train/segformer_model_latest.pth
vis_after_n_batches: 1
export:
results_dir: "${results_dir}/export"
gpu_id: 0
checkpoint: ${results_dir}/train/segformer_model_latest.pth
onnx_file: "${export.results_dir}/segformer.onnx"
input_width: 224
input_height: 224
batch_size: -1
model:
backbone:
type: "vit_large_nvdinov2"
pretrained_backbone_path: <path_to_pretrained_weight>
freeze_backbone: False
decode_head:
feature_strides: [4, 8, 16, 32]
dataset:
segment:
dataset: "SFDataset"
root_dir: <dataset_root>
batch_size: 32
workers: 8
num_classes: 6
img_size: 224
train_split: "train"
validation_split: "val"
test_split: "val"
predict_split: "val"
augmentation:
random_flip:
vflip_probability: 0.5
hflip_probability: 0.5
enable: True
random_rotate:
rotate_probability: 0.5
angle_list: [90, 180, 270]
enable: True
random_color:
brightness: 0.3
contrast: 0.3
saturation: 0.3
hue: 0.3
enable: False
with_scale_random_crop:
enable: True
with_random_crop: True
with_random_blur: False
label_transform: None
palette:
- seg_class: urban
rgb:
- 0
- 255
- 255
label_id: 0
mapping_class: urban
- seg_class: agriculture
rgb:
- 255
- 255
- 0
label_id: 1
mapping_class: agriculture
- seg_class: rangeland
rgb:
- 255
- 0
- 255
label_id: 2
mapping_class: rangeland
- seg_class: forest
rgb:
- 0
- 255
- 0
label_id: 3
mapping_class: forest
- seg_class: water
rgb:
- 0
- 0
- 255
label_id: 4
mapping_class: water
- seg_class: barren
rgb:
- 255
- 255
- 255
label_id: 5
mapping_class: barren
- seg_class: unknown
rgb:
- 0
- 0
- 0
label_id: 255
mapping_class: unknown
The experiment specification consists of several main components:
train
evaluate
inference
export
model
dataset
gen_trt_engine
train#
The train config contains the parameters related to training. They are described as follows:
Note
For FTMS Client, these parameters are set in json format.
train:
resume_training_checkpoint_path: null
segment:
loss: "ce"
num_epochs: 50
num_nodes: 1
validation_interval: 1
checkpoint_interval: 50
optim:
lr: 0.0001
optim: "adamw"
policy: "linear"
weight_decay: 0.0005
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
dict config |
– |
Optimizer config. |
– |
|
str |
None |
Pretrained model path. |
– |
|
dict config |
– |
Segmentation loss Config. |
– |
|
int |
1 |
The number of GPUs to run the train job. |
– |
|
List[int] |
[0] |
List of GPU IDs to run the training on. |
– |
|
int |
1 |
Number of nodes to run the training on. |
– |
|
int |
1234 |
The seed for the initializer in PyTorch. |
– |
|
int |
10 |
Number of epochs to run the training. |
– |
|
int |
1 |
Checkpoint interval. |
– |
|
int |
1 |
Validation interval. |
– |
|
str |
None |
Path to the checkpoint to resume training |
– |
|
str |
None |
Path to where all the assets are stored. |
– |
optim#
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
str |
val_loss |
Monitor Name |
– |
|
str |
adamw |
Optimizer |
adamw,adam,sgd |
|
float |
0.00006 |
Optimizer learning rate |
– |
|
str |
linear |
Optimizer policy |
linear,step |
|
float |
0.9 |
The momentum for the AdamW optimizer. |
– |
|
float |
0.01 |
The weight decay coefficient. |
– |
segment#
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
str |
ce |
Segment loss |
ce |
|
List[float] |
[0.5, 0.5, 0.5, 0.8, 1.0] |
Multi-scale Segment loss weight |
– |
tensorboard#
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
bool |
False |
Flag to enable tensorboard |
– |
|
int |
2 |
infrequent_logging_frequency |
– |
evaluate#
The evaluate config contains the parameters related to training. They are described as follows:
Note
For FTMS Client, these parameters are set in json format and the evaluate checkpoint is deduced from the previous train job ID as specified with the –parent_job_id argument. For TAO Launcher, one must set the path in the evaluate specification:
evaluate:
checkpoint: ${results_dir}/train/segformer_model_latest.pth
vis_after_n_batches: 1
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
int |
1 |
Visualize evaluation segmentation results after n batches. |
– |
|
int |
8 |
Batch Size. |
– |
|
str |
– |
Path to checkpoint file. |
– |
|
int |
1 |
The number of GPUs to run the evaluate job. |
– |
|
List[int] |
[0] |
List of GPU IDs to run the evaluate on. |
– |
|
int |
1 |
Number of nodes to run the evaluate on. |
– |
|
str |
– |
Path to the checkpoint used for evaluation. |
– |
|
Optional[str] |
None |
Path to the TensorRT engine to be used for evaluation. |
– |
|
Optional[str] |
None |
Path to where all the assets are stored. |
– |
inference#
The inference config contains the parameters related to training. They are described as follows:
Note
For FTMS Client, these parameters are set in json format and the inference checkpoint is deduced from the previous train job ID as specified with the –parent_job_id argument. For TAO Launcher, one must set the path in the inference specification:
inference:
checkpoint: ${results_dir}/train/segformer_model_latest.pth
vis_after_n_batches: 1
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
int |
1 |
Visualize inference segmentation results after n batches. |
– |
|
int |
8 |
Batch Size. |
– |
|
str |
– |
Path to checkpoint file. |
– |
|
int |
1 |
The number of GPUs to run the inference job. |
– |
|
List[int] |
[0] |
List of GPU IDs to run the inference on. |
– |
|
int |
1 |
Number of nodes to run the inference on. |
– |
|
str |
– |
Path to the checkpoint used for inference. |
– |
|
Optional[str] |
None |
Path to the TensorRT engine to be used for inference. |
– |
|
Optional[str] |
None |
Path to where all the assets are stored. |
– |
export#
The export config contains the parameters related to export. They are described as follows:
Note
For FTMS Client, these parameters are set in json format and the export checkpoint is deduced from the previous train job ID as specified with the –parent_job_id argument. For TAO Launcher, one must set the path in the export specification:
export:
results_dir: "${results_dir}/export"
gpu_id: 0
checkpoint: ${results_dir}/train/segformer_model_latest.pth
onnx_file: "${export.results_dir}/segformer.onnx"
input_width: 224
input_height: 224
batch_size: -1
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
Optional[str] |
None |
Path to where all the assets are stored. |
– |
|
int |
0 |
The index of the GPU to build the TensorRT engine. |
– |
|
str |
– |
Path to the checkpoint file to run export. |
– |
|
str |
– |
Path to the onnx model file. |
– |
|
bool |
False |
Flag to export CPU compatible model. |
True,False |
|
int |
3 |
Number of channels in the input Tensor. |
1,3 |
|
int |
960 |
Width of the input image tensor. |
– |
|
int |
544 |
Height of the input image tensor. |
– |
|
int |
17 |
Operator set version. |
– |
|
int |
-1 |
The batch size of the input Tensor for the engine. |
– |
model#
The following example model
provides options to define the SegFormer backbone and decoder head.
Note
For FTMS Client, these parameters are set in json format.
model:
backbone:
type: "vit_large_nvdinov2"
pretrained_backbone_path: <path_to_pretrained_weight>
freeze_backbone: False
decode_head:
feature_strides: [4, 8, 16, 32]
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
dict config |
– |
The configuration of the backbone. |
|
|
dict config |
– |
The configuration of the decoder head. |
backbone#
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
type |
str
|
fan_small_12_p4_hybrid
|
The name of the backbone to be used
|
mit_b0, mit_b1
mit_b2, mit_b3
mit_b4, mit_b5
fan_tiny_8_p4_hybrid
fan_large_16_p4_hybrid
fan_small_12_p4_hybrid
fan_base_16_p4_hybrid
vit_large_nvdinov2
vit_giant_nvdinov2
vit_base_nvclip_16_siglip
vit_huge_nvclip_14_siglip
c_radio_v2_vit_base_patch16_224
c_radio_v2_vit_large_patch16_224
c_radio_v2_vit_huge_patch16_224
|
|
str |
– |
Path to the pretrained model |
– |
|
bool |
False |
Flag to freeze backbone |
True,False |
decode_head#
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
List[int] |
[4, 8, 16, 32] |
Feature strides for the head. |
– |
dataset#
The dataset
parameter defines the dataset source, training batch size, and
augmentation. An example dataset
is provided below.
Note
For FTMS Client, these parameters are set in json format.
dataset:
segment:
dataset: "SFDataset"
root_dir: <dataset_root>
batch_size: 32
workers: 8
num_classes: 6
img_size: 224
train_split: "train"
validation_split: "val"
test_split: "val"
predict_split: "val"
augmentation:
random_flip:
vflip_probability: 0.5
hflip_probability: 0.5
enable: True
random_rotate:
rotate_probability: 0.5
angle_list: [90, 180, 270]
enable: True
random_color:
brightness: 0.3
contrast: 0.3
saturation: 0.3
hue: 0.3
enable: False
with_scale_random_crop:
enable: True
with_random_crop: True
with_random_blur: False
label_transform: None
palette:
- seg_class: urban
rgb:
- 0
- 255
- 255
label_id: 0
mapping_class: urban
- seg_class: agriculture
rgb:
- 255
- 255
- 0
label_id: 1
mapping_class: agriculture
- seg_class: rangeland
rgb:
- 255
- 0
- 255
label_id: 2
mapping_class: rangeland
- seg_class: forest
rgb:
- 0
- 255
- 0
label_id: 3
mapping_class: forest
- seg_class: water
rgb:
- 0
- 0
- 255
label_id: 4
mapping_class: water
- seg_class: barren
rgb:
- 255
- 255
- 255
label_id: 5
mapping_class: barren
- seg_class: unknown
rgb:
- 0
- 0
- 0
label_id: 255
mapping_class: unknown
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
dict config |
– |
Segmentation Dataset Config. |
– |
segment#
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
str |
– |
Path to root directory for dataset. |
– |
|
str |
SFDataset |
dataset class. |
SFDataset |
|
int |
2 |
The number of classes in the training data. |
– |
|
int |
256 |
The input image size. |
– |
|
int |
8 |
Batch size. |
– |
|
int |
1 |
Workers. |
– |
|
bool |
True |
Shuffle dataloader. |
True,False |
|
str |
train |
Train split folder name. |
– |
|
str |
val |
Validation split folder name. |
– |
|
str |
val |
Test split folder name. |
– |
|
str |
test |
Predict split folder name. |
– |
|
dict config |
– |
Augmentation. |
– |
|
str |
norm |
label transform. |
norm,None |
palette |
List[Dict]
|
{“label_id”: 0, “mapping_class”: “foreground”, “rgb”: [0, 0, 0], “seg_class”: “foreground”}
{“label_id”: 1, “mapping_class”: “background”, “rgb”: [1, 1, 1], “seg_class”: “background”}
|
Palette, be careful of label_transform, if norm then RGB value from 0~1, else 0~255.
|
–
–
|
augmentation#
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
dict config |
– |
RandomFlip augmentation config. |
– |
|
dict config |
– |
RandomRotation augmentation config. |
– |
|
dict config |
– |
RandomColor augmentation config. |
– |
|
dict config |
– |
RandomCropWithScale augmentation config. |
– |
|
bool |
– |
Flag to enable with_random_blur. |
– |
|
bool |
– |
Flag to enable with_random_crop. |
– |
|
List[float] |
– |
Mean for the augmentation. |
– |
|
List[float] |
– |
Standard deviation for the augmentation. |
– |
RandomFlip#
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
float |
0.5 |
Vertical Flip probability. |
– |
|
float |
0.5 |
Horizontal Flip probability. |
– |
|
bool |
True |
Flag to enable augmentation. |
True,False |
RandomRotation#
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
float |
0.5 |
Random Rotate probability. |
– |
|
List[float] |
[90, 180, 270] |
Random rotate angle. |
– |
|
bool |
True |
Flag to enable augmentation. |
True,False |
RandomColor#
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
float |
0.3 |
Random Color Brightness. |
– |
|
float |
0.3 |
Random Color Contrast. |
– |
|
float |
0.3 |
Random Color Saturation. |
– |
|
float |
0.3 |
Random Color Hue. |
– |
|
bool |
True |
Flag to enable Random Color. |
True,False |
|
float |
0.5 |
Random Color Probability. |
– |
RandomCropWithScale#
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
float |
[1, 1.2] |
Random Scale range. |
– |
|
bool |
True |
Flag to enable augmentation. |
True,False |
Training the Model#
Use the following command to run Segformer training:
TRAIN_JOB_ID=$(tao-client segformer experiment-run-action --action train --id $EXPERIMENT_ID --specs "$TRAIN_SPECS")
tao model segformer train [-h] -e <experiment_spec_file>
[results_dir=<global_results_dir>]
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[train.<train_option>=<train_option_value>]
[train.gpu_ids=<gpu indices>]
[train.num_gpus=<number of gpus>]
Required Arguments
The only required argument is the path to the experiment spec:
-e, --experiment_spec
: The experiment specification file to set up the training experiment
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
-h, --help
: Show this help message and exit.model.<model_option>
: The model options.dataset.<dataset_option>
: The dataset options.train.<train_option>
: The train options.
Note
For training, evaluation, and inference, we expose 2 variables for each respective task: num_gpus
and gpu_ids
, which
default to 1
and [0]
, respectively. If both are passed, but inconsistent, for example num_gpus = 1
,
gpu_ids = [0, 1]`, then they are modified to follow the setting with more GPUs, for example num_gpus = 1 -> num_gpus = 2
.
In some cases, you may encounter an issue with multi-GPU training resulting in a segmentation fault. You may circumvent this by setting the OMP_NUM_THREADS enviroment variable to 1. Depending upon your model of execution, you may use the following methods to set this variable
CLI Launcher
You may set this env variable by adding the following fields to the Envs field of your ~/.tao_mounts.json
file as mentioned in bullet 3
in this section
{
"Envs": [
{
"variable": "OMP_NUM_THREADSR",
"value": "1"
}
]
}
Docker
You may set environment variables in the docker by setting the -e
flag in the docker command line.
docker run -it --rm --gpus all \
-e OMP_NUM_THREADS=1 \
-v /path/to/local/mount:/path/to/docker/mount nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt <model> train -e
Evaluating the model#
The evaluation metric of Segformer is the meanIOU. For more details on the mean IOU metric, please refer here meanIOU.:
Use the following command to run Segformer evaluation:
EVAL_JOB_ID=$(tao-client segformer experiment-run-action --action evaluate --id $EXPERIMENT_ID --specs "$EVAL_SPECS" --parent_job_id $TRAIN_JOB_ID)
tao model segformer evaluate -e <experiment_spec>
evaluate.checkpoint=<evaluation model>
results_dir=<path to output evaluation results>
[evaluate.gpu_ids=<gpu indices>]
[evaluate.num_gpus=<number of gpus>]
Required Arguments
The following arguments are required.
-e, --experiment_spec_file
: The experiment spec file to set up the evaluation experiment.evaluate.checkpoint
: The.pth
model.
Here’s an example of using the Segformer evaluation command:
Note
For FTMS Client, the job output will be in your experiment’s cloud workspace.
+------------+-------+-------+
| Class | IoU | Acc |
+------------+-------+-------+
| foreground | 37.81 | 44.56 |
| background | 83.81 | 95.51 |
+------------+-------+-------+
Summary:
+--------+-------+-------+-------+
| Scope | mIoU | mAcc | aAcc |
+--------+-------+-------+-------+
| global | 60.81 | 70.03 | 85.26 |
+--------+-------+-------+-------+
...
Running Inference on the Model#
Use the following command to run inference on Segformer with the .pth
model.
INFER_JOB_ID=$(tao-client segformer experiment-run-action --action inference --id $EXPERIMENT_ID --specs "$INFER_SPECS" --parent_job_id $TRAIN_JOB_ID)
tao model segformer inference -e <experiment_spec>
inference.checkpoint=<inference model>
results_dir=<path to output directory for inference>
[inference.gpu_ids=<gpu indices>]
[inference.num_gpus=<number of gpus>]
Required Arguments
The following arguments are required.
-e, --experiment_spec
: The experiment spec file to set up inferenceinference.checkpoint
: The.pth
model to perform inference withresults_dir
: The path to save the inference masks and mask overlaid images to. Inference creates two directories.
Note
For FTMS Client, the job output will be in your experiment’s cloud workspace.
The output mask PNG images with class ID’s is saved in vis_tao
.
The overlaid mask images are saved in mask_tao
.
Exporting the Model#
Use the following command to export the model.
EXPORT_JOB_ID=$(tao-client segformer experiment-run-action --action export --id $EXPERIMENT_ID --specs "$EXPORT_SPECS" --parent_job_id $TRAIN_JOB_ID)
tao model segformer export [-h] -e <experiment spec file>
results_dir=<path to results dir>
export.checkpoint=<trained pth model to be xported>
export.onnx_file=<onnx path>
Required Arguments
The following arguments are required to run the command.
-e, --experiment_spec
: The path to an experiment spec fileresults_dir
: The path where the logs for export will be savedexport.checkpoint
: The.pth
model to be exportedexport.onnx_file
: The :code:.`onnx` file to be stored
TensorRT engine generation, validation, and int8 calibration#
For deployment, refer to the TAO Deploy documentation
Deploying to DeepStream#
Refer to the Integrating a SegFormer Model page for more information about deploying a SegFormer model to DeepStream.