Mask Auto Labeler
Mask Auto Labeler (MAL) is a high-quality, transformer-based mask auto-labeling framework for instance segmentation using only box annotations. It supports the following tasks:
train
evaluate
inference
These tasks may be invoked from the TAO Toolkit Launcher using the following convention on the command line:
tao mal <sub_task> <args_per_subtask>
Where args_per_subtask
are the command-line arguments required for a given subtask. Each of
these subtasks are explained in detail below.
Below is a sample MAL spec file. It has four five components–model
, inference
,
evaluate
, dataset
, and train
–as well as several global parameters,
which are described below. The format of the spec file is a YAML file.
gpus: [0, 1]
strategy: 'ddp_sharded'
results_dir: '/path/to/result/dir'
dataset:
train_ann_path: '/datasets/coco/annotations/instances_train2017.json'
train_img_dir: '/datasets/coco/raw-data/train2017'
val_ann_path: '/coco/annotations/instances_val2017.json'
val_img_dir: '/datasets/coco/raw-data/val2017'
load_mask: True
crop_size: 512
inference:
ann_path: '/dataset/sample.json'
img_dir: '/dataset/sample_dir'
label_dump_path: '/dataset/sample_output.json'
model:
arch: 'vit-mae-base/16'
train:
num_epochs: 10
batch_size: 4
use_amp: True
Field | Description | Data Type and Constraints | Recommended/Typical Value |
gpus | A list of GPU indices to use | List of int | |
strategy | The distributed training strategy | string | ‘ddp_sharded’ |
num_nodes | The number of nodes in multinode training | Unsigned int | – |
checkpoint | Either a pretrained model or a MAL checkpoint to load | string | – |
results_dir | The directory to save experiement results to | string | – |
dataset | The dataset config | Dict | – |
train | The training config | Dict | – |
model | The model config | Dict | – |
evaluate | The evaluation config | Dict | – |
Dataset Config
The dataset configuration (dataset
) defines the data source and input size.
Field | Description | Data Type and Constraints | Recommended/Typical Value |
train_ann_path | The path to the training annotation JSON file | string | |
val_ann_path | The path to the validation annotation JSON file | string | |
train_img_dir | The path to the training image directory | string | |
val_img_dir | The path to the validation annotation JSON file | string | |
crop_size | The effective input size of the model | Unsigned int | 512 |
load_mask | A flag specifying whether to load the segmentation mask from the JSON file | boolean | |
min_obj_size | The minimum object size for training | float | 2048 |
max_obj_size | The maximum object size for training | float | 1e10 |
num_workers_per_gpu | The number of workers to load data for each GPU | Unsigned int |
Model Config
The model configuration (model
) defines the model architecture.
Field | Description | Data Type and Constraints | Recommended/Typical Value |
arch | The backbone architecture
Supported backbones include the following:
|
string | vit-mae-base/16 |
frozen_stages | The indices of the frozen blocks | List[int] | -1 |
mask_head_num_convs | The number of conv layers in the mask head | Unsigned int | 4 |
mask_head_hidden_channel | The number of conv channels in the mask head | Unsigned int | 256 |
mask_head_out_channel | The number of output channels in the mask head | Unsigned int | 256 |
teacher_momentum | The momentum of the teacher model | float | 0.996 |
Train Config
The training configuration (train
) specifies the parameters for the training process.
Field | Description | Data Type and Constraints | Recommended/Typical Value |
num_epochs | The number of epochs | Unsigned int | 10 |
save_every_k_epoch | The save checkpoint for every K epochs | Unsigned int | 1 |
val_interval | The validation interval | Unsigned int | 1 |
batch_size | The training batch size | Unsigned int | |
use_amp | A flag specifying whether to use mixed precision | boolean | True |
optim_momentum | The momentum of the AdamW optimizer | float | 0.9 |
lr | The learning rate | float | 0.0000015 |
min_lr_rate | The minimum learning rate ratio | float | 0.2 |
wd | The weight decay | float | 0.0005 |
warmup_epochs | The number of epochs for warmup | Unsigned int | 1 |
crf_kernel_size | The kernel size of the mean field approximation | Unsigned int | 3 |
crf_num_iter | The number of iterations to run mask refinement | Unsigned int | 100 |
loss_mil_weight | The weight of multiple instance learning loss | float | 4 |
loss_crf_weight | The weight of conditional random field loss | float | 0.5 |
results_dir | The directory to save training results | string |
Evaluation Config
The evaluation configuration (evaluate
) specifies the parameters for the validation during training as well as the standalone evaluation.
Field | Description | Data Type and Constraints | Recommended/Typical Value |
batch_size | The evaluation batch size | Unsigned int | |
use_mixed_model_test | A flag specifying whether to evaluate with a mixed model | boolean | False |
use_teacher_test | A flag specifying whether to evaluate with the teacher model | boolean | False |
results_dir | The directory to save the evaluation log | string |
Inference Config
The inference configuration (inference
) specifies the parameters for generating pseudo masks given the groundtruth bounding boxes in COCO format.
Field | Description | Data Type and Constraints | Recommended/Typical Value |
ann_path | The path to the annotation JSON file | string | |
img_dir | The image directory | string | |
label_dump_path | The path to save the output JSON file with pseudo masks | string | |
batch_size | The inference batch size | Unsigned int | |
load_mask | A flag specifying whether to load masks if the annotation file has them | boolean | False |
results_dir | The directory to save the inference log | string |
Train the MAL model using this command:
tao model mal train [-h] -e <experiment_spec>
[-r <results_dir>]
[--gpus <num_gpus>]
Required Arguments
-e, --experiment_spec_file
: The experiment specification file
Optional Arguments
--gpus
: The number of GPUs to use for training. The default value is 1.-h, --help
: Show this help message and exit.
Sample Usage
Here’s an example of using the train
command on a MAL model:
tao model mal train --gpus 2 -e /path/to/spec.yaml
To run evaluation for a MAL model, use this command:
tao model mal evaluate [-h] -e <experiment_spec_file>
[-r <results_dir>]
[--gpus <num_gpus>]
Required Arguments
-e, --experiment_spec_file
: The experiment specification file
Optional Arguments
--gpus
: The number of GPUs to use for evaluation. The default value is 1.-h, --help
: Show this help message and exit.
The inference
tool for MAL networks can be used to generate pseudo masks.
Here’s an example of using this tool:
tao model mal inference [-h] -e <experiment spec file>
[-r <results_dir>]
[--gpus <num_gpus>]
Required Arguments
-e, --experiment_spec_file
: The experiment specification file
Optional Arguments
--gpus
: The number of GPUs to use for inference. The default value is 1.-h, --help
: Show this help message and exit.