SegFormer#

SegFormer is an NVIDIA-developed semantic-segmentation model that is included in TAO. SegFormer supports the following tasks:

train
evaluate
inference
export

These tasks can be invoked from the TAO Launcher using the following convention on the command-line:

SPECS=$(tao-client segformer get-spec --action <sub_task> --job_type experiment --id $EXPERIMENT_ID)

JOB_ID=$(tao-client segformer experiment-run-action --action <sub_task> --id $EXPERIMENT_ID --specs "$SPECS")

Required Arguments

--id: The unique identifier of the experiment from which to train the model

Data Input for SegFormer#

Segformer requires the data to be provided as image and mask folders. See the Data Annotation Format page for more information about the input data format for Segformer.

Creating Training Experiment Spec File#

Configuration for Custom Dataset#

In this doucmentation, we show example configuration and commands for training on multi-class dataset. For more details, please refer to the example notebook TAO Computer Vision samples.

Here is an example spec file for training a SegFormer model with an NVDINOv2 backbone.

Please noted that the spec file is for reference. The user should create their own spec file based on their own dataset.

We first need to set the base_experiment.

FILTER_PARAMS='{"network_arch": "segformer"}'

$BASE_EXPERIMENTS=$(tao-client segformer list-base-experiments --filter_params "$FILTER_PARAMS")

Retrieve the PTM_ID for NVDINOv2 backbone from $BASE_EXPERIMENTS before setting base_experiment.

PTM_INFORMATION="{\"base_experiment\": [$PTM_ID]}"

tao-client segformer patch-artifact-metadata --id $EXPERIMENT_ID --job_type experiment --update_info $PTM_INFORMATION

Then retrieve the specifications.

TRAIN_SPECS=$(tao-client getformer get-spec --action train --job_type experiment --id $EXPERIMENT_ID)

Get specifications from $TRAIN_SPECS. You can override values as needed.

encryption_key: tlt_encode
results_dir: <path_to_output_dir>

train:
  resume_training_checkpoint_path: null
  segment:
    loss: "ce"
  num_epochs: 50
  num_nodes: 1
  validation_interval: 1
  checkpoint_interval: 50
  optim:
    lr: 0.0001
    optim: "adamw"
    policy: "linear"
    weight_decay: 0.0005

evaluate:
  checkpoint: ${results_dir}/train/segformer_model_latest.pth
  vis_after_n_batches: 1

inference:
  checkpoint: ${results_dir}/train/segformer_model_latest.pth
  vis_after_n_batches: 1

export:
  results_dir: "${results_dir}/export"
  gpu_id: 0
  checkpoint: ${results_dir}/train/segformer_model_latest.pth
  onnx_file: "${export.results_dir}/segformer.onnx"
  input_width: 224
  input_height: 224
  batch_size: -1

model:
  backbone:
    type: "vit_large_nvdinov2"
    pretrained_backbone_path: <path_to_pretrained_weight>
    freeze_backbone: False
  decode_head:
    feature_strides: [4, 8, 16, 32]

dataset:
  segment:
    dataset: "SFDataset"
    root_dir: <dataset_root>
    batch_size: 32
    workers: 8
    num_classes: 6
    img_size: 224
    train_split: "train"
    validation_split: "val"
    test_split: "val"
    predict_split: "val"
    augmentation:
      random_flip:
        vflip_probability: 0.5
        hflip_probability: 0.5
        enable: True
      random_rotate:
        rotate_probability: 0.5
        angle_list: [90, 180, 270]
        enable: True
      random_color:
        brightness: 0.3
        contrast: 0.3
        saturation: 0.3
        hue: 0.3
        enable: False
      with_scale_random_crop:
        enable: True
      with_random_crop: True
      with_random_blur: False
    label_transform: None
    palette:
      - seg_class: urban
        rgb:
          - 0
          - 255
          - 255
        label_id: 0
        mapping_class: urban
      - seg_class: agriculture
        rgb:
          - 255
          - 255
          - 0
        label_id: 1
        mapping_class: agriculture
      - seg_class: rangeland
        rgb:
          - 255
          - 0
          - 255
        label_id: 2
        mapping_class: rangeland
      - seg_class: forest
        rgb:
          - 0
          - 255
          - 0
        label_id: 3
        mapping_class: forest
      - seg_class: water
        rgb:
          - 0
          - 0
          - 255
        label_id: 4
        mapping_class: water
      - seg_class: barren
        rgb:
          - 255
          - 255
          - 255
        label_id: 5
        mapping_class: barren
      - seg_class: unknown
        rgb:
          - 0
          - 0
          - 0
        label_id: 255
        mapping_class: unknown

The experiment specification consists of several main components:

train
evaluate
inference
export
model
dataset
gen_trt_engine

train#

The train config contains the parameters related to training. They are described as follows:

Note

For FTMS Client, these parameters are set in json format.

train:
  resume_training_checkpoint_path: null
  segment:
    loss: "ce"
  num_epochs: 50
  num_nodes: 1
  validation_interval: 1
  checkpoint_interval: 50
  optim:
    lr: 0.0001
    optim: "adamw"
    policy: "linear"
    weight_decay: 0.0005

Parameter	Datatype	Default	Description	Supported Values
`optim`	dict config	–	Optimizer config.	–
`pretrained_model_path`	str	None	Pretrained model path.	–
`segment`	dict config	–	Segmentation loss Config.	–
`num_gpus`	int	1	The number of GPUs to run the train job.	–
`gpu_ids`	List[int]	[0]	List of GPU IDs to run the training on.	–
`num_nodes`	int	1	Number of nodes to run the training on.	–
`seed`	int	1234	The seed for the initializer in PyTorch.	–
`num_epochs`	int	10	Number of epochs to run the training.	–
`checkpoint_interval`	int	1	Checkpoint interval.	–
`validation_interval`	int	1	Validation interval.	–
`resume_training_checkpoint_path`	str	None	Path to the checkpoint to resume training	–
`results_dir`	str	None	Path to where all the assets are stored.	–

optim#

Parameter	Datatype	Default	Description	Supported Values
`monitor_name`	str	val_loss	Monitor Name	–
`optim`	str	adamw	Optimizer	adamw,adam,sgd
`lr`	float	0.00006	Optimizer learning rate	–
`policy`	str	linear	Optimizer policy	linear,step
`momentum`	float	0.9	The momentum for the AdamW optimizer.	–
`weight_decay`	float	0.01	The weight decay coefficient.	–

segment#

Parameter	Datatype	Default	Description	Supported Values
`loss`	str	ce	Segment loss	ce
`weights`	List[float]	[0.5, 0.5, 0.5, 0.8, 1.0]	Multi-scale Segment loss weight	–

tensorboard#

Parameter	Datatype	Default	Description	Supported Values
`enabled`	bool	False	Flag to enable tensorboard	–
`infrequent_logging_frequency`	int	2	infrequent_logging_frequency	–

evaluate#

The evaluate config contains the parameters related to training. They are described as follows:

Note

For FTMS Client, these parameters are set in json format and the evaluate checkpoint is deduced from the previous train job ID as specified with the –parent_job_id argument. For TAO Launcher, one must set the path in the evaluate specification:

evaluate:
  checkpoint: ${results_dir}/train/segformer_model_latest.pth
  vis_after_n_batches: 1

Parameter	Datatype	Default	Description	Supported Values
`vis_after_n_batches`	int	1	Visualize evaluation segmentation results after n batches.	–
`batch_size`	int	8	Batch Size.	–
`checkpoint`	str	–	Path to checkpoint file.	–
`num_gpus`	int	1	The number of GPUs to run the evaluate job.	–
`gpu_ids`	List[int]	[0]	List of GPU IDs to run the evaluate on.	–
`num_nodes`	int	1	Number of nodes to run the evaluate on.	–
`checkpoint`	str	–	Path to the checkpoint used for evaluation.	–
`trt_engine`	Optional[str]	None	Path to the TensorRT engine to be used for evaluation.	–
`results_dir`	Optional[str]	None	Path to where all the assets are stored.	–

inference#

The inference config contains the parameters related to training. They are described as follows:

Note

For FTMS Client, these parameters are set in json format and the inference checkpoint is deduced from the previous train job ID as specified with the –parent_job_id argument. For TAO Launcher, one must set the path in the inference specification:

inference:
  checkpoint: ${results_dir}/train/segformer_model_latest.pth
  vis_after_n_batches: 1

Parameter	Datatype	Default	Description	Supported Values
`vis_after_n_batches`	int	1	Visualize inference segmentation results after n batches.	–
`batch_size`	int	8	Batch Size.	–
`checkpoint`	str	–	Path to checkpoint file.	–
`num_gpus`	int	1	The number of GPUs to run the inference job.	–
`gpu_ids`	List[int]	[0]	List of GPU IDs to run the inference on.	–
`num_nodes`	int	1	Number of nodes to run the inference on.	–
`checkpoint`	str	–	Path to the checkpoint used for inference.	–
`trt_engine`	Optional[str]	None	Path to the TensorRT engine to be used for inference.	–
`results_dir`	Optional[str]	None	Path to where all the assets are stored.	–

export#

The export config contains the parameters related to export. They are described as follows:

Note

For FTMS Client, these parameters are set in json format and the export checkpoint is deduced from the previous train job ID as specified with the –parent_job_id argument. For TAO Launcher, one must set the path in the export specification:

export:
  results_dir: "${results_dir}/export"
  gpu_id: 0
  checkpoint: ${results_dir}/train/segformer_model_latest.pth
  onnx_file: "${export.results_dir}/segformer.onnx"
  input_width: 224
  input_height: 224
  batch_size: -1

Parameter	Datatype	Default	Description	Supported Values
`results_dir`	Optional[str]	None	Path to where all the assets are stored.	–
`gpu_ids`	int	0	The index of the GPU to build the TensorRT engine.	–
`checkpoint`	str	–	Path to the checkpoint file to run export.	–
`onnx_file`	str	–	Path to the onnx model file.	–
`on_cpu`	bool	False	Flag to export CPU compatible model.	True,False
`input_channel`	int	3	Number of channels in the input Tensor.	1,3
`input_width`	int	960	Width of the input image tensor.	–
`input_height`	int	544	Height of the input image tensor.	–
`opset_version`	int	17	Operator set version.	–
`batch_size`	int	-1	The batch size of the input Tensor for the engine.	–

model#

The following example model provides options to define the SegFormer backbone and decoder head.

Note

For FTMS Client, these parameters are set in json format.

model:
  backbone:
    type: "vit_large_nvdinov2"
    pretrained_backbone_path: <path_to_pretrained_weight>
    freeze_backbone: False
  decode_head:
    feature_strides: [4, 8, 16, 32]

Parameter	Datatype	Default	Description	Supported Values
`backbone`	dict config	–	The configuration of the backbone.
`decode_head`	dict config	–	The configuration of the decoder head.

backbone#

Parameter	Datatype	Default	Description	Supported Values
`type`	str	fan_small_12_p4_hybrid	The name of the backbone to be used	mit_b0, mit_b1 mit_b2, mit_b3 mit_b4, mit_b5 fan_tiny_8_p4_hybrid fan_large_16_p4_hybrid fan_small_12_p4_hybrid fan_base_16_p4_hybrid vit_large_nvdinov2 vit_giant_nvdinov2 vit_base_nvclip_16_siglip vit_huge_nvclip_14_siglip c_radio_v2_vit_base_patch16_224 c_radio_v2_vit_large_patch16_224 c_radio_v2_vit_huge_patch16_224
`pretrained_backbone_path`	str	–	Path to the pretrained model	–
`freeze_backbone`	bool	False	Flag to freeze backbone	True,False

decode_head#

Parameter	Datatype	Default	Description	Supported Values
`feature_strides`	List[int]	[4, 8, 16, 32]	Feature strides for the head.	–

dataset#

The dataset parameter defines the dataset source, training batch size, and augmentation. An example dataset is provided below.

Note

For FTMS Client, these parameters are set in json format.

dataset:
  segment:
    dataset: "SFDataset"
    root_dir: <dataset_root>
    batch_size: 32
    workers: 8
    num_classes: 6
    img_size: 224
    train_split: "train"
    validation_split: "val"
    test_split: "val"
    predict_split: "val"
    augmentation:
      random_flip:
        vflip_probability: 0.5
        hflip_probability: 0.5
        enable: True
      random_rotate:
        rotate_probability: 0.5
        angle_list: [90, 180, 270]
        enable: True
      random_color:
        brightness: 0.3
        contrast: 0.3
        saturation: 0.3
        hue: 0.3
        enable: False
      with_scale_random_crop:
        enable: True
      with_random_crop: True
      with_random_blur: False
    label_transform: None
    palette:
      - seg_class: urban
        rgb:
          - 0
          - 255
          - 255
        label_id: 0
        mapping_class: urban
      - seg_class: agriculture
        rgb:
          - 255
          - 255
          - 0
        label_id: 1
        mapping_class: agriculture
      - seg_class: rangeland
        rgb:
          - 255
          - 0
          - 255
        label_id: 2
        mapping_class: rangeland
      - seg_class: forest
        rgb:
          - 0
          - 255
          - 0
        label_id: 3
        mapping_class: forest
      - seg_class: water
        rgb:
          - 0
          - 0
          - 255
        label_id: 4
        mapping_class: water
      - seg_class: barren
        rgb:
          - 255
          - 255
          - 255
        label_id: 5
        mapping_class: barren
      - seg_class: unknown
        rgb:
          - 0
          - 0
          - 0
        label_id: 255
        mapping_class: unknown

Parameter	Datatype	Default	Description	Supported Values
`segment`	dict config	–	Segmentation Dataset Config.	–

segment#

Parameter	Datatype	Default	Description	Supported Values
`root_dir`	str	–	Path to root directory for dataset.	–
`dataset`	str	SFDataset	dataset class.	SFDataset
`num_classes`	int	2	The number of classes in the training data.	–
`img_size`	int	256	The input image size.	–
`batch_size`	int	8	Batch size.	–
`workers`	int	1	Workers.	–
`shuffle`	bool	True	Shuffle dataloader.	True,False
`train_split`	str	train	Train split folder name.	–
`validation_split`	str	val	Validation split folder name.	–
`test_split`	str	val	Test split folder name.	–
`predict_split`	str	test	Predict split folder name.	–
`augmentation`	dict config	–	Augmentation.	–
`label_transform`	str	norm	label transform.	norm,None
`palette`	List[Dict]	{“label_id”: 0, “mapping_class”: “foreground”, “rgb”: [0, 0, 0], “seg_class”: “foreground”} {“label_id”: 1, “mapping_class”: “background”, “rgb”: [1, 1, 1], “seg_class”: “background”}	Palette, be careful of label_transform, if norm then RGB value from 0~1, else 0~255.	– –

augmentation#

Parameter	Datatype	Default	Description	Supported Values
`random_flip`	dict config	–	RandomFlip augmentation config.	–
`random_rotate`	dict config	–	RandomRotation augmentation config.	–
`random_color`	dict config	–	RandomColor augmentation config.	–
`with_scale_random_crop`	dict config	–	RandomCropWithScale augmentation config.	–
`with_random_blur`	bool	–	Flag to enable with_random_blur.	–
`with_random_crop`	bool	–	Flag to enable with_random_crop.	–
`mean`	List[float]	–	Mean for the augmentation.	–
`std`	List[float]	–	Standard deviation for the augmentation.	–

RandomFlip#

Parameter	Datatype	Default	Description	Supported Values
`vflip_probability`	float	0.5	Vertical Flip probability.	–
`hflip_probability`	float	0.5	Horizontal Flip probability.	–
`enable`	bool	True	Flag to enable augmentation.	True,False

RandomRotation#

Parameter	Datatype	Default	Description	Supported Values
`rotate_probability`	float	0.5	Random Rotate probability.	–
`angle_list`	List[float]	[90, 180, 270]	Random rotate angle.	–
`enable`	bool	True	Flag to enable augmentation.	True,False

RandomColor#

Parameter	Datatype	Default	Description	Supported Values
`brightness`	float	0.3	Random Color Brightness.	–
`contrast`	float	0.3	Random Color Contrast.	–
`saturation`	float	0.3	Random Color Saturation.	–
`hue`	float	0.3	Random Color Hue.	–
`enable`	bool	True	Flag to enable Random Color.	True,False
`color_probability`	float	0.5	Random Color Probability.	–

RandomCropWithScale#

Parameter	Datatype	Default	Description	Supported Values
`scale_range`	float	[1, 1.2]	Random Scale range.	–
`enable`	bool	True	Flag to enable augmentation.	True,False

Training the Model#

Use the following command to run Segformer training:

TRAIN_JOB_ID=$(tao-client segformer experiment-run-action --action train --id $EXPERIMENT_ID --specs "$TRAIN_SPECS")

tao model segformer train [-h] -e <experiment_spec_file>
                    [results_dir=<global_results_dir>]
                    [model.<model_option>=<model_option_value>]
                    [dataset.<dataset_option>=<dataset_option_value>]
                    [train.<train_option>=<train_option_value>]
                    [train.gpu_ids=<gpu indices>]
                    [train.num_gpus=<number of gpus>]

Required Arguments

The only required argument is the path to the experiment spec:

-e, --experiment_spec: The experiment specification file to set up the training experiment

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

-h, --help: Show this help message and exit.
model.<model_option>: The model options.
dataset.<dataset_option>: The dataset options.
train.<train_option>: The train options.

Note

For training, evaluation, and inference, we expose 2 variables for each respective task: num_gpus and gpu_ids, which default to 1 and [0], respectively. If both are passed, but inconsistent, for example num_gpus = 1, gpu_ids = [0, 1]`, then they are modified to follow the setting with more GPUs, for example num_gpus = 1 -> num_gpus = 2.

In some cases, you may encounter an issue with multi-GPU training resulting in a segmentation fault. You may circumvent this by setting the OMP_NUM_THREADS enviroment variable to 1. Depending upon your model of execution, you may use the following methods to set this variable

CLI Launcher

You may set this env variable by adding the following fields to the Envs field of your ~/.tao_mounts.json file as mentioned in bullet 3 in this section

{
    "Envs": [
        {
            "variable": "OMP_NUM_THREADSR",
            "value": "1"
        }
    ]
}

Docker

You may set environment variables in the docker by setting the -e flag in the docker command line.

docker run -it --rm --gpus all \
    -e OMP_NUM_THREADS=1 \
    -v /path/to/local/mount:/path/to/docker/mount nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt <model> train -e

Evaluating the model#

The evaluation metric of Segformer is the meanIOU. For more details on the mean IOU metric, please refer here meanIOU.:

Use the following command to run Segformer evaluation:

EVAL_JOB_ID=$(tao-client segformer experiment-run-action --action evaluate --id $EXPERIMENT_ID --specs "$EVAL_SPECS" --parent_job_id $TRAIN_JOB_ID)

tao model segformer evaluate -e <experiment_spec>
                    evaluate.checkpoint=<evaluation model>
                    results_dir=<path to output evaluation results>
                    [evaluate.gpu_ids=<gpu indices>]
                    [evaluate.num_gpus=<number of gpus>]

Required Arguments

The following arguments are required.

-e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment.
evaluate.checkpoint: The .pth model.

Here’s an example of using the Segformer evaluation command:

Note

For FTMS Client, the job output will be in your experiment’s cloud workspace.

+------------+-------+-------+
| Class      | IoU   | Acc   |
+------------+-------+-------+
| foreground | 37.81 | 44.56 |
| background | 83.81 | 95.51 |
+------------+-------+-------+
Summary:

+--------+-------+-------+-------+
| Scope  | mIoU  | mAcc  | aAcc  |
+--------+-------+-------+-------+
| global | 60.81 | 70.03 | 85.26 |
+--------+-------+-------+-------+
  ...

Running Inference on the Model#

Use the following command to run inference on Segformer with the .pth model.

INFER_JOB_ID=$(tao-client segformer experiment-run-action --action inference --id $EXPERIMENT_ID --specs "$INFER_SPECS" --parent_job_id $TRAIN_JOB_ID)

tao model segformer inference -e <experiment_spec>
                    inference.checkpoint=<inference model>
                    results_dir=<path to output directory for inference>
                    [inference.gpu_ids=<gpu indices>]
                    [inference.num_gpus=<number of gpus>]

Required Arguments

The following arguments are required.

-e, --experiment_spec: The experiment spec file to set up inference
inference.checkpoint: The .pth model to perform inference with
results_dir: The path to save the inference masks and mask overlaid images to. Inference creates two directories.

Note

For FTMS Client, the job output will be in your experiment’s cloud workspace.

The output mask PNG images with class ID’s is saved in vis_tao. The overlaid mask images are saved in mask_tao.

Exporting the Model#

Use the following command to export the model.

EXPORT_JOB_ID=$(tao-client segformer experiment-run-action --action export --id $EXPERIMENT_ID --specs "$EXPORT_SPECS" --parent_job_id $TRAIN_JOB_ID)

tao model segformer export [-h] -e <experiment spec file>
                    results_dir=<path to results dir>
                    export.checkpoint=<trained pth model to be xported>
                    export.onnx_file=<onnx path>

Required Arguments

The following arguments are required to run the command.

-e, --experiment_spec: The path to an experiment spec file
results_dir: The path where the logs for export will be saved
export.checkpoint: The .pth model to be exported
export.onnx_file: The :code:.`onnx` file to be stored

TensorRT engine generation, validation, and int8 calibration#

For deployment, refer to the TAO Deploy documentation

Deploying to DeepStream#

Refer to the Integrating a SegFormer Model page for more information about deploying a SegFormer model to DeepStream.