Mask Auto Labeler
Mask Auto Labeler (MAL) is a high-quality, transformer-based mask auto-labeling framework for instance segmentation using only box annotations. It supports the following tasks:
train
evaluate
inference
These tasks may be invoked from the TAO Toolkit Launcher using the following convention on the command line:
tao mal <sub_task> <args_per_subtask>
Where args_per_subtask
are the command-line arguments required for a given subtask. Each of
these subtasks are explained in detail below.
Below is a sample MAL spec file. It has four five components–model
, inference
,
evaluate
, dataset
, and train
–as well as several global parameters,
which are described below. The format of the spec file is a YAML file.
gpus: [0, 1]
strategy: 'ddp_sharded'
results_dir: '/path/to/result/dir'
dataset:
train_ann_path: '/datasets/coco/annotations/instances_train2017.json'
train_img_dir: '/datasets/coco/raw-data/train2017'
val_ann_path: '/coco/annotations/instances_val2017.json'
val_img_dir: '/datasets/coco/raw-data/val2017'
load_mask: True
crop_size: 512
inference:
ann_path: '/dataset/sample.json'
img_dir: '/dataset/sample_dir'
label_dump_path: '/dataset/sample_output.json'
model:
arch: 'vit-mae-base/16'
train:
num_epochs: 10
batch_size: 4
use_amp: True
Field |
Description |
Data Type and Constraints |
Recommended/Typical Value |
gpus |
A list of GPU indices to use |
List of int |
|
strategy |
The distributed training strategy |
string |
‘ddp_sharded’ |
num_nodes |
The number of nodes in multinode training |
Unsigned int |
– |
checkpoint |
Either a pretrained model or a MAL checkpoint to load |
string |
– |
results_dir |
The directory to save experiement results to |
string |
– |
dataset |
The dataset config |
Dict |
– |
train |
The training config |
Dict |
– |
model |
The model config |
Dict |
– |
evaluate |
The evaluation config |
Dict |
– |
Dataset Config
The dataset configuration (dataset
) defines the data source and input size.
Field |
Description |
Data Type and Constraints |
Recommended/Typical Value |
train_ann_path |
The path to the training annotation JSON file |
string |
|
val_ann_path |
The path to the validation annotation JSON file |
string |
|
train_img_dir |
The path to the training image directory |
string |
|
val_img_dir |
The path to the validation annotation JSON file |
string |
|
crop_size |
The effective input size of the model |
Unsigned int |
512 |
load_mask |
A flag specifying whether to load the segmentation mask from the JSON file |
boolean |
|
min_obj_size |
The minimum object size for training |
float |
2048 |
max_obj_size |
The maximum object size for training |
float |
1e10 |
num_workers_per_gpu |
The number of workers to load data for each GPU |
Unsigned int |
Model Config
The model configuration (model
) defines the model architecture.
Field |
Description |
Data Type and Constraints |
Recommended/Typical Value |
arch |
The backbone architecture Supported backbones include the following:
|
string |
vit-mae-base/16 |
frozen_stages |
The indices of the frozen blocks |
List[int] |
-1 |
mask_head_num_convs |
The number of conv layers in the mask head |
Unsigned int |
4 |
mask_head_hidden_channel |
The number of conv channels in the mask head |
Unsigned int |
256 |
mask_head_out_channel |
The number of output channels in the mask head |
Unsigned int |
256 |
teacher_momentum |
The momentum of the teacher model |
float |
0.996 |
Train Config
The training configuration (train
) specifies the parameters for the training process.
Field |
Description |
Data Type and Constraints |
Recommended/Typical Value |
num_epochs |
The number of epochs |
Unsigned int |
10 |
save_every_k_epoch |
The save checkpoint for every K epochs |
Unsigned int |
1 |
val_interval |
The validation interval |
Unsigned int |
1 |
batch_size |
The training batch size |
Unsigned int |
|
use_amp |
A flag specifying whether to use mixed precision |
boolean |
True |
optim_momentum |
The momentum of the AdamW optimizer |
float |
0.9 |
lr |
The learning rate |
float |
0.0000015 |
min_lr_rate |
The minimum learning rate ratio |
float |
0.2 |
wd |
The weight decay |
float |
0.0005 |
warmup_epochs |
The number of epochs for warmup |
Unsigned int |
1 |
crf_kernel_size |
The kernel size of the mean field approximation |
Unsigned int |
3 |
crf_num_iter |
The number of iterations to run mask refinement |
Unsigned int |
100 |
loss_mil_weight |
The weight of multiple instance learning loss |
float |
4 |
loss_crf_weight |
The weight of conditional random field loss |
float |
0.5 |
results_dir |
The directory to save training results |
string |
Evaluation Config
The evaluation configuration (evaluate
) specifies the parameters for the validation during training as well as the standalone evaluation.
Field |
Description |
Data Type and Constraints |
Recommended/Typical Value |
batch_size |
The evaluation batch size |
Unsigned int |
|
use_mixed_model_test |
A flag specifying whether to evaluate with a mixed model |
boolean |
False |
use_teacher_test |
A flag specifying whether to evaluate with the teacher model |
boolean |
False |
results_dir |
The directory to save the evaluation log |
string |
Inference Config
The inference configuration (inference
) specifies the parameters for generating pseudo masks given the groundtruth bounding boxes in COCO format.
Field |
Description |
Data Type and Constraints |
Recommended/Typical Value |
ann_path |
The path to the annotation JSON file |
string |
|
img_dir |
The image directory |
string |
|
label_dump_path |
The path to save the output JSON file with pseudo masks |
string |
|
batch_size |
The inference batch size |
Unsigned int |
|
load_mask |
A flag specifying whether to load masks if the annotation file has them |
boolean |
False |
results_dir |
The directory to save the inference log |
string |
Train the MAL model using this command:
tao model mal train [-h] -e <experiment_spec>
[-r <results_dir>]
[--gpus <num_gpus>]
Required Arguments
-e, --experiment_spec_file
: The experiment specification file
Optional Arguments
--gpus
: The number of GPUs to use for training. The default value is 1.-h, --help
: Show this help message and exit.
Sample Usage
Here’s an example of using the train
command on a MAL model:
tao model mal train --gpus 2 -e /path/to/spec.yaml
To run evaluation for a MAL model, use this command:
tao model mal evaluate [-h] -e <experiment_spec_file>
[-r <results_dir>]
[--gpus <num_gpus>]
Required Arguments
-e, --experiment_spec_file
: The experiment specification file
Optional Arguments
--gpus
: The number of GPUs to use for evaluation. The default value is 1.-h, --help
: Show this help message and exit.
The inference
tool for MAL networks can be used to generate pseudo masks.
Here’s an example of using this tool:
tao model mal inference [-h] -e <experiment spec file>
[-r <results_dir>]
[--gpus <num_gpus>]
Required Arguments
-e, --experiment_spec_file
: The experiment specification file
Optional Arguments
--gpus
: The number of GPUs to use for inference. The default value is 1.-h, --help
: Show this help message and exit.