Mask Auto Labeler
Mask Auto Labeler (MAL) is a high-quality, transformer-based mask auto-labeling framework for instance segmentation using only box annotations. It supports the following tasks:
train
evaluate
inference
These tasks may be invoked from the TAO Launcher using the following convention on the command line:
tao mal <sub_task> <args_per_subtask>
Where args_per_subtask
are the command-line arguments required for a given subtask. Each of
these subtasks are explained in detail below.
Below is a sample MAL spec file. It has five components–model
, inference
,
evaluate
, dataset
, and train
–as well as several global parameters,
which are described below. The format of the spec file is a YAML file.
strategy: 'fsdp'
results_dir: '/path/to/result/dir'
dataset:
train_ann_path: '/datasets/coco/annotations/instances_train2017.json'
train_img_dir: '/datasets/coco/raw-data/train2017'
val_ann_path: '/coco/annotations/instances_val2017.json'
val_img_dir: '/datasets/coco/raw-data/val2017'
load_mask: True
crop_size: 512
inference:
ann_path: '/dataset/sample.json'
img_dir: '/dataset/sample_dir'
label_dump_path: '/dataset/sample_output.json'
model:
arch: 'vit-mae-base/16'
train:
num_epochs: 10
checkpoint_interval: 5
validation_interval: 5
batch_size: 4
seed: 1234
num_gpus: 1
gpu_ids: [0]
use_amp: True
optim_momentum: 0.9
lr: 0.0000015
min_lr_rate: 0.2
wd: 0.0005
warmup_epochs: 1
crf_kernel_size: 3
crf_num_iter: 100
loss_mil_weight: 4
loss_crf_weight: 0.5
Parameter | Datatype | Default | Description | Supported Values |
model |
dict config | – | The configuration of the model architecture | |
dataset |
dict config | – | The configuration of the dataset | |
train |
dict config | – | The configuration of the training task | |
evaluate |
dict config | – | The configuration of the evaluation task | |
inference |
dict config | – | The configuration of the inference task | |
encryption_key |
string | None | The encryption key to encrypt and decrypt model files | |
results_dir |
string | /results | The directory where experiment results are saved | |
strategy |
string | ‘ddp’ | The distributed training strategy | ‘ddp’, ‘fsdp’ |
Dataset Config
The dataset configuration (dataset
) defines the data source and input size.
Field | Datatype | Default | Description | Supported Values |
train_ann_path |
string | – | The path to the training annotation JSON file | |
val_ann_path |
string | – | The path to the validation annotation JSON file | |
train_img_dir |
string | – | The path to the training image directory | |
val_img_dir |
string | – | The path to the validation annotation JSON file | |
crop_size |
Unsigned int | 512 | The effective input size of the model | |
load_mask |
boolean | True | A flag specifying whether to load the segmentation mask from the JSON file | |
min_obj_size |
float | 2048 | The minimum object size for training | |
max_obj_size |
float | 1e10 | The maximum object size for training | |
num_workers_per_gpu |
Unsigned int | The number of workers to load data for each GPU |
Model Config
The model configuration (model
) defines the model architecture.
Field | Datatype | Default | Description | Supported Values |
arch |
string | vit-mae-base/16 | The backbone architecture
Supported backbones include the following:
|
|
frozen_stages |
List[int] | [-1] | The indices of the frozen blocks | |
mask_head_num_convs |
Unsigned int | 4 | The number of conv layers in the mask head | |
mask_head_hidden_channel |
Unsigned int | 256 | The number of conv channels in the mask head | |
mask_head_out_channel |
Unsigned int | 256 | The number of output channels in the mask head | |
teacher_momentum |
float | 0.996 | The momentum of the teacher model |
Train Config
The training configuration (train
) specifies the parameters for the training process.
Parameter | Datatype | Default | Description | Supported Values |
num_gpus |
unsigned int | 1 | The number of GPUs to use for distributed training | >0 |
gpu_ids |
List[int] | [0] | The indices of the GPU’s to use for distributed training | |
seed |
unsigned int | 1234 | The random seed for random, numpy, and torch | >0 |
num_epochs |
unsigned int | 10 | The total number of epochs to run the experiment | >0 |
checkpoint_interval |
unsigned int | 1 | The epoch interval at which the checkpoints are saved | >0 |
validation_interval |
unsigned int | 1 | The epoch interval at which the validation is run | >0 |
resume_training_checkpoint_path |
string | The intermediate PyTorch Lightning checkpoint to resume training from | ||
results_dir |
string | /results/train | The directory to save training results | |
batch_size |
Unsigned int | The training batch size | ||
use_amp |
boolean | True | A flag specifying whether to use mixed precision | |
optim_momentum |
float | 0.9 | The momentum of the AdamW optimizer | |
lr |
float | 0.0000015 | The learning rate | |
min_lr_rate |
float | 0.2 | The minimum learning rate ratio | |
wd |
float | 0.0005 | The weight decay | |
warmup_epochs |
Unsigned int | 1 | The number of epochs for warmup | |
crf_kernel_size |
Unsigned int | 3 | The kernel size of the mean field approximation | |
crf_num_iter |
Unsigned int | 100 | The number of iterations to run mask refinement | |
loss_mil_weight |
float | 4 | The weight of multiple instance learning loss | |
loss_crf_weight |
float | 0.5 | The weight of conditional random field loss |
Evaluation Config
The evaluation configuration (evaluate
) specifies the parameters for the validation during training as well as the standalone evaluation.
Field | Datatype | Default | Description | Supported Values |
checkpoint |
string | Path to PyTorch model to evaluate | ||
results_dir |
string | /results/evaluate | The directory to save evaluation results | |
num_gpus |
unsigned int | 1 | The number of GPUs to use for distributed evaluation | >0 |
gpu_ids |
List[int] | [0] | The indices of the GPU’s to use for distributed evaluation | |
batch_size |
Unsigned int | The evaluation batch size | ||
use_mixed_model_test |
boolean | False | A flag specifying whether to evaluate with a mixed model | |
use_teacher_test |
boolean | False | A flag specifying whether to evaluate with the teacher model |
Inference Config
The inference configuration (inference
) specifies the parameters for generating pseudo masks given the groundtruth bounding boxes in COCO format.
Field | Datatype | Default | Description | Supported Values |
checkpoint |
string | Path to PyTorch model to inference | ||
results_dir |
string | /results/inference | The directory to save inference results | |
num_gpus |
unsigned int | 1 | The number of GPUs to use for distributed inference | >0 |
gpu_ids |
List[int] | [0] | The indices of the GPU’s to use for distributed inference | |
ann_path |
string | The path to the annotation JSON file | ||
img_dir |
string | The image directory | ||
label_dump_path |
string | The path to save the output JSON file with pseudo masks | ||
batch_size |
Unsigned int | The inference batch size | ||
load_mask |
boolean | False | A flag specifying whether to load masks if the annotation file has them |
Use the following command to run MAL training:
tao model mal train [-h] -e <experiment_spec>
[results_dir=<global_results_dir>]
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[train.<train_option>=<train_option_value>]
[train.gpu_ids=<gpu indices>]
[train.num_gpus=<number of gpus>]
Required Arguments
The only required argument is the path to the experiment spec:
-e, --experiment_spec
: The experiment specification file to set up the training experiment
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
-h, --help
: Show this help message and exit.model.<model_option>
: The model options.dataset.<dataset_option>
: The dataset options.train.<train_option>
: The train options.
For training, evaluation, and inference, we expose 2 variables for each respective task: num_gpus
and gpu_ids
, which
default to 1
and [0]
, respectively. If both are passed, but inconsistent, for example num_gpus = 1
,
gpu_ids = [0, 1]
, then they are modified to follow the setting with more GPUs, for example num_gpus = 1 -> num_gpus = 2
.
Checkpointing and Resuming Training
At every train.checkpoint_interval
, a PyTorch Lightning checkpoint is saved. It is called model_epoch_<epoch_num>.pth
.
These are saved in train.results_dir
, like so:
$ ls /results/train
'model_epoch_000.pth'
'model_epoch_001.pth'
'model_epoch_002.pth'
'model_epoch_003.pth'
'model_epoch_004.pth'
The latest checkpoint is also saved as mal_model_latest.pth
.
Training automatically resumes from mal_model_latest.pth
, if it exists in train.results_dir
.
This is superseded by train.resume_training_checkpoint_path
, if it is provided.
The major implication of this logic is that, if you wish to trigger fresh training from scratch, either:
Specify a new, empty results directory (Recommended)
Remove the latest checkpoint from the results directory
To run evaluation for a MAL model, use this command:
tao model mal evaluate [-h] -e <experiment_spec_file>
evaluate.checkpoint=<model to be evaluated>
[evaluate.<evaluate_option>=<evaluate_option_value>]
[evaluate.gpu_ids=<gpu indices>]
[evaluate.num_gpus=<number of gpus>]
Required Arguments
-e, --experiment_spec
: The experiment spec file to set up the evaluation experiment.evaluate.checkpoint
: The.pth
model to be evaluated.
Optional Arguments
evaluate.<evaluate_option>
: The evaluate options.
The inference
tool for MAL networks can be used to generate pseudo masks.
Here’s an example of using this tool:
tao model mal inference [-h] -e <experiment spec file>
inference.checkpoint=<model to be inferenced>
[inference.<inference_option>=<inference_option_value>]
[inference.gpu_ids=<gpu indices>]
[inference.num_gpus=<number of gpus>]
Required Arguments
-e, --experiment_spec
: The experiment spec file to set up the inference experiment.inference.checkpoint
: The.pth
model to inference.
Optional Arguments
inference.<inference_option>
: The inference options.