mask2former
Mask2Former supports the following tasks:
train
evaluate
inference
export
These tasks may be invoked from the TAO Launcher using the following convention on the command line:
tao model mask2former <sub_task> <args_per_subtask>
where args_per_subtask
are the command-line arguments required for a given subtask. Each of
these subtasks are explained as follows.
Mask2Former supports 3 type of dataloaders corresponding to the semantic, panoptic and instance segmentation tasks.
Each dataloader requires a certain annotation format.
For the semantic segmentation task, each line of the JSONL annotation file encodes the locations of the raw image and the mask groundtruth.
For the panoptic and instance segmentation tasks, the annotation format follows the COCO panoptic and COCO format respectively.
The category ids and annotation ids must be greater than 0.
Below is a sample Mask2Former spec file. It has six components –model
, inference
,
evaluate
, dataset
, export
, and train
–as well as several global parameters,
which are described below. The format of the spec file is a YAML file.
Here’s a sample of the Mask2Former spec file:
results_dir: /workspace/mask2former_coco_swint
data:
contiguous_id: False
label_map: /tlt3_experiments/mask2former_coco_effvit_b2/colormap.json
type: 'coco_panoptic'
train:
panoptic_json: "/datasets/coco/annotations/panoptic_train2017.json"
img_dir: "/datasets/coco/train2017"
panoptic_dir: "/datasets/coco/panoptic_train2017"
batch_size: 16
num_workers: 20
val:
panoptic_json: "/datasets/coco/annotations/panoptic_val2017.json"
img_dir: "/datasets/coco/val2017"
panoptic_dir: "/datasets/coco/panoptic_val2017"
batch_size: 1
num_workers: 2
target_size: [1024, 1024]
test:
img_dir: /workspace/test_images/
batch_size: 1
augmentation:
train_min_size: [1024]
train_max_size: 2560
train_crop_size: [1024, 1024]
test_min_size: 1024
test_max_size: 2560
train:
precision: 'fp16'
num_gpus: 1
checkpoint_interval: 1
validation_interval: 5
num_epochs: 50
optim:
lr_scheduler: "MultiStep"
milestones: [44, 48]
type: "AdamW"
lr: 0.0001
weight_decay: 0.05
model:
object_mask_threshold: 0.
overlap_threshold: 0.8
mode: "semantic"
backbone:
pretrained_weights: "/workspace/mask2former_coco_swint/swin_tiny_patch4_window7_224_22k.pth"
type: "swin"
swin:
type: "tiny"
window_size: 7
ape: False
pretrain_img_size: 224
mask_former:
num_object_queries: 100
sem_seg_head:
norm: "GN"
num_classes: 200
inference:
checkpoint: "/workspace/mask2former_coco_swint/train/model_epoch=049.pth"
evaluate:
checkpoint: "/workspace/mask2former_coco_swint/train/model_epoch=049.pth"
export:
checkpoint: "/workspace/mask2former_coco_swint/train/model_epoch=049.pth"
input_channel: 3
input_width: 1024
input_height: 1024
opset_version: 17
Parameter | Data Type | Default | Description | Supported Values |
model |
dict config | – | The configuration of the model architecture | |
dataset |
dict config | – | The configuration of the dataset | |
train |
dict config | – | The configuration of the training task | |
evaluate |
dict config | – | The configuration of the evaluation task | |
inference |
dict config | – | The configuration of the inference task | |
encryption_key |
string | None | The encryption key to encrypt and decrypt model files | |
results_dir |
string | /results | The directory where experiment results are saved | |
export |
dict config | – | The configuration of the ONNX export task |
Model Config
The model configuration (model
) defines the Mask2Former model structure. This model
is used for training, evaluation, and inference. A detailed description is included in the
table below. Currently, Mask2Former only supports Swin-Transformers and EfficientViT (experimental feature) models.
Field | Description | Data Type and Constraints | Supported Value |
backbone | The backbone configuration | Dict | |
sem_seg_head | The configuration for the segmentation head | Dict | |
mask_former | The configuration for the mask2former architecture | Dict | |
mode | The postprocesing mode | string | ‘panoptic’, ‘semantic’, ‘instance’ |
object_mask_threshold | Classification confidence threshold | float | 0.4 |
overlap_threshold | Overlap threshold for panoptic inference | float | 0.8 |
test_topk_per_image | Keep topk instances per image for instance inference | Unsigned int | 100 |
Backbone Config
The backbone configuration (backbone
) defines the backbone structure. A detailed description is included in the
table below. Currently, Mask2Former only supports Swin-Transformers and EfficientViT models.
Field | Description | Data Type and Constraints | Recommended/Typical Value |
type | The backbone type | str | “swin” |
pretrained_weights | The path to the pretrained backbone model | str | |
swin | The configuration for the Swin backbones | Dict | |
efficientvit | The configuration for the EfficientViT backbones | Dict |
Swin Config
The swin configuration (swin
) specifies the key parameters in a Swin Transformer backbone.
Field | Description | Data Type and Constraints | Recommended/Typical Value |
type | The type of Swin Transformer (from tiny to huge ) |
str | “large” |
pretrain_img_size | The image size used in pretraining | Unsigned int | 384 |
out_indices | The stages to extract feature maps | List | [0, 1, 2, 3] |
out_features | The names of the extracted feature maps | List | [“res2”, “res3”, “res4”, “res5”] |
EfficientViT Config
The efficientvit configuration (efficientvit
) specifies the key parameters in a EfficientViT backbone.
Field | Description | Data Type and Constraints | Recommended/Typical Value |
name | The name of EfficientViT model (“b0”-“b3”, “l0”-“l3”) | str | “l2” |
pretrain_img_size | The image size used in pretraining | Unsigned int | 384 |
out_indices | The stages to extract feature maps | List | [0, 1, 2, 3] |
out_features | The names of the extracted feature maps | List | [“res2”, “res3”, “res4”, “res5”] |
Data Config
The data configuration (data
) defines the data source, augmentation methods and pre-processing hyperparameters.
Field | Description | Data Type and Constraints | Recommended/Typical Value |
pixel_mean | Image mean in RGB order | List | [0.485, 0.456, 0.406] |
pixel_std | Image standard deviation in RGB order | List | [0.229, 0.224, 0.225] |
augmentation | The augmentation settings | Dict | |
contiguous_id | Whether to use contiguous ids | bool | |
label_map | The path to the label mapping file | string | |
workers | The number of workers to load data for each GPU | Unsigned int | |
train | The train dataset config | Dict | |
val | The validation dataset config | Dict | |
test | The test dataset config | Dict |
Augmentation Config
The augmentation configuration (augmentation
) defines the augmentation methods.
Parameter | Datatype | Description | Supported Values |
train_min_size |
int list | A list of sizes to perform random resize for training data | int list |
train_max_size |
unsigned int | The minimum random crop size for training data | >0 |
train_crop_size |
int list | The random crop size for training data in [H, W] | int list |
test_min_size |
unsigned int | The minimum resize size for test data | >0 |
test_max_size |
unsigned int | The maximum resize size for test data | >0 |
Dataset Config
The dataset configuration (dataset
) defines the dataset directories, annotation file and batch size for either train
, val
or test
.
Parameter | Datatype | Description |
type |
str | Dataset type (“ade”, “coco”, “coco_panoptic”) |
panoptic_json |
str | JSON file in COCO panoptic format |
img_dir |
str | Image directory (can be relative path to root_dir ) |
panoptic_dir |
str | Directory of panoptic segmentation annotation images |
root_dir |
str | Root directory to img_dir |
annot_file |
str | JSON file in COCO/COCO_panoptic format or JSONL format for image/mask pair |
batch_size |
unsigned int | Batch size |
num_workers |
unsigned int | Number of workers to process the input data |
Train Config
The train
configuration defines the hyperparameters of the training process.
train:
precision: 'fp16'
num_gpus: 1
checkpoint_interval: 10
validation_interval: 10
num_epochs: 50
optim:
type: "AdamW"
lr: 0.0001
weight_decay: 0.05
Parameter | Datatype | Default | Description | Supported Values |
num_gpus |
unsigned int | 1 | The number of GPUs to use for distributed training | >0 |
gpu_ids |
List[int] | [0] | The indices of the GPU’s to use for distributed training | |
seed |
unsigned int | 1234 | The random seed for random, NumPy, and torch | >0 |
num_epochs |
unsigned int | 10 | The total number of epochs to run the experiment | >0 |
checkpoint_interval |
unsigned int | 1 | The epoch interval at which the checkpoints are saved | >0 |
validation_interval |
unsigned int | 1 | The epoch interval at which the validation is run | >0 |
resume_training_checkpoint_path |
string | The intermediate PyTorch Lightning checkpoint to resume training from | ||
results_dir |
string | /results/train | The directory to save training results | |
optim |
dict config | The config for the optimizer, including the learning rate, learning scheduler, and weight decay | >0 | |
clip_grad_type |
str | full | The type of gradient clip method | |
clip_grad_norm |
float | 0.1 | amount to clip the gradient by the L2 norm. A value of 0.0 specifies no clipping | >=0 |
precision |
string | fp32 | Specifying “fp16” enables precision training. Training with fp16 can help save GPU memory. | fp32, fp16 |
distributed_strategy |
string | ddp | The multi-GPU training strategy. DDP (Distributed Data Parallel) and Sharded DDP are supported. | ddp, ddp_sharded |
activation_checkpoint |
bool | True | A True value instructs train to recompute in backward pass to save GPU memory, rather than storing activations. | True, False |
pretrained_model_path |
string | Path to pretrained model checkpoint path to load for finetuning | ||
num_nodes |
unsigned int | 1 | The number of nodes. If the value is larger than 1, multi-node is enabled | >0 |
freeze |
string list | [] | The list of layer names in the model to freeze. Example [“backbone”, “transformer.encoder”, “input_proj”] | |
verbose |
bool | False | Whether to print detailed learning rate scaling from the optimizer | True, False |
iters_per_epoch |
unsigned int | The number of samples per epoch |
Optimizer Config
The optim
parameter defines the config for the optimizer in training, including the
learning rate, learning scheduler, and weight decay.
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
lr |
float | 2e-4 | The initial learning rate for training the model, excluding the backbone | >0.0 |
momentum |
float | 0.9 | The momentum for the AdamW optimizer | >0.0 |
weight_decay |
float | 1e-4 | The weight decay coefficient | >0.0 |
|
string |
MultiStep |
The learning scheduler: |
MultiStep/StepLR |
gamma |
float | 0.1 | The decreasing factor for the learning rate scheduler | >0.0 |
milestones |
int list | [11] | The steps to decrease the learning rate for the MultiStep scheduler |
int list |
monitor_name |
string | val_loss | The monitor value for the AutoReduce scheduler |
val_loss/train_loss |
type |
string | AdamW | The type of optimizer to use during training | AdamW/SGD |
Evaluation Config
The evaluate
parameter defines the hyperparameters of the evaluation process.
evaluate:
checkpoint: /path/to/model.pth
num_gpus: 1
Parameter | Datatype | Default | Description | Supported Values |
checkpoint |
string | Path to PyTorch model to evaluate | ||
trt_engine |
string | Path to TensorRT model to evaluate. Must be only used with tao deploy | ||
num_gpus |
unsigned int | 1 | The number of GPUs to use | >0 |
gpu_ids |
unsigned int | [0] | The GPU ids to use | |
results_dir |
string | /results/evaluate | Path to the evaluation results directory |
Inference Config
The inference
parameter defines the hyperparameters of the inference process.
inference:
checkpoint: /path/to/model.pth
num_gpus: 1
Parameter | Datatype | Default | Description | Supported Values |
checkpoint |
string | Path to PyTorch model to inference | ||
trt_engine |
string | Path to TensorRT model to inference. Must be only used with tao deploy | ||
num_gpus |
unsigned int | 1 | The number of GPUs to use | >0 |
gpu_ids |
unsigned int | [0] | The GPU ids to use | |
results_dir |
string | /results/inference | Path to the inference results directory |
Export Config
The export
parameter defines the hyperparameters of the export process.
export:
checkpoint: /path/to/model.pth
onnx_file: /path/to/model.onnx
on_cpu: False
opset_version: 12
input_channel: 3
input_width: 960
input_height: 544
batch_size: -1
Parameter | Datatype | Default | Description | Supported Values |
checkpoint |
string | The path to the PyTorch model to export | ||
onnx_file |
string | The path to the .onnx file |
||
on_cpu |
bool | True | If this value is True, the DMHA module will be exported as standard PyTorch. If this value is False, the module will be exported using the TRT Plugin. | True, False |
opset_version |
unsigned int | 12 | The opset version of the exported ONNX | >0 |
input_channel |
unsigned int | 3 | The input channel size. Only the value 3 is supported. | 3 |
input_width |
unsigned int | 960 | The input width | >0 |
input_height |
unsigned int | 544 | The input height | >0 |
batch_size |
unsigned int | -1 | The batch size of the ONNX model. If this value is set to -1, the export uses dynamic batch size. | >=-1 |
To train a Mask2Former model, use this command:
tao model mask2former train [-h] -e <experiment_spec>
[results_dir=<global_results_dir>]
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[train.<train_option>=<train_option_value>]
[train.gpu_ids=<gpu indices>]
[train.num_gpus=<number of gpus>]
Required Arguments
-e, --experiment_spec
: The experiment specification file to set up the training experiment.
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
-h, --help
: Show this help message and exit.model.<model_option>
: The model options.dataset.<dataset_option>
: The dataset options.train.<train_option>
: The train options.train.optim.<optim_option>
: The optimizer options
For training, evaluation, and inference, we expose 2 variables for each respective task: num_gpus
and gpu_ids
, which
default to 1
and [0]
, respectively. If both are passed, but inconsistent, for example num_gpus = 1
,
gpu_ids = [0, 1]
, then they are modified to follow the setting with more GPUs, for example num_gpus = 1 -> num_gpus = 2
.
Checkpointing and Resuming Training
At every train.checkpoint_interval
, a PyTorch Lightning checkpoint is saved. It is called model_epoch_<epoch_num>.pth
.
These are saved in train.results_dir
, like so:
$ ls /results/train
'model_epoch_000.pth'
'model_epoch_001.pth'
'model_epoch_002.pth'
'model_epoch_003.pth'
'model_epoch_004.pth'
The latest checkpoint will also be saved as mask2former_model_latest.pth
.
Training automatically resumes from mask2former_model_latest.pth
, if it exists in train.results_dir
.
This is superseded by train.resume_training_checkpoint_path
, if it is provided.
The major implication of this logic is that, if you wish to trigger fresh training from scratch, either:
Specify a new, empty results directory (Recommended)
Remove the latest checkpoint from the results directory
Optimizing Resource for Training Mask2Former
Training Mask2Former requires strong GPUs (for example, V100/A100) with at least 15GB of VRAM and a lot of CPU memory to be trained on a standard dataset like COCO. This section outlines some of the strategies you can use to launch training with only limited resources.
Optimize GPU Memory
There are various ways to optimize GPU memory usage. A typical option is to reduce dataset.batch_size
. However, this can cause your training to take longer than usual.
We recommend setting the following configurations to optimize GPU consumption:
Set
train.precision
tofp16
to enable automatic mixed precision training. This can reduce your GPU memory usage by 50%.Set
train.activation_checkpoint
toTrue
to enable activation checkpointing. By recomputing the activations instead of caching them into memory, the memory usage can be improved.Set
train.distributed_strategy
toddp_sharded
to enabled Sharded DDP training. This shares gradient calculation across different processes to help reduce GPU memory.Try using more lightweight backbones or freeze the backbone through setting
train.freeze
.Try changing the augmentation resolution in
dataset.augmentation
depending on your dataset.
Optimize CPU Memory
To speed up data loading, it is a common practice to set high number of workers to spawn multiple processes. However, this can cause your CPU memory to become Out of Memory, if the size of your annotation file is very large. We recommend setting the following configurations to optimize CPU consumption.
To run evaluation with a Mask2Former model, use this command:
tao model mask2former evaluate [-h] -e <experiment_spec>
evaluate.checkpoint=<model to be evaluated>
[evaluate.<evaluate_option>=<evaluate_option_value>]
[evaluate.gpu_ids=<gpu indices>]
[evaluate.num_gpus=<number of gpus>]
Required Arguments
-e, --experiment_spec
: The experiment spec file to set up the evaluation experiment.evaluate.checkpoint
: The.pth
model to be evaluated.
Optional Arguments
evaluate.<evaluate_option>
: The evaluate options.
The inference tool for Mask2Former models can be used to visualize bboxes and masks.
tao model mask2former inference [-h] -e <experiment spec file>
inference.checkpoint=<inference model>
[inference.<evaluate_option>=<evaluate_option_value>]
[inference.gpu_ids=<gpu indices>]
[inference.num_gpus=<number of gpus>]
Required Arguments
-e, --experiment_spec
: The experiment spec file to set up the inference experiment.inference.checkpoint
: The.pth
model to run inference on.
Optional Arguments
inference.<inference_option>
: The inference options.
tao model mask2former export [-h] -e <experiment spec file>
[results_dir=<results_dir>]
export.checkpoint=<model to export>
export.onnx_file=<onnx path>
Required Arguments
-e, --experiment_spec
: The path to an experiment spec fileexport.checkpoint
: The.pth
model to export.export.onnx_file
: The path where the.etlt
or.onnx
model is saved.
Optional Arguments
export.<export_option>
: The export options.
For deployment, refer to TAO Deploy documentation for Mask2Former.
Refer to the Integrating a Mask2Former Model page for more information about deploying a Mask2Former model to DeepStream.