Mask Auto Labeler
Mask Auto Labeler (MAL) is a high-quality, transformer-based mask auto-labeling framework for instance segmentation using only box annotations. It supports the following tasks:
train
evaluate
inference
These tasks may be invoked from the TAO Toolkit Launcher using the following convention on the command line:
tao mal <sub_task> <args_per_subtask>
Where args_per_subtask are the command-line arguments required for a given subtask. Each of
these subtasks are explained in detail below.
Below is a sample MAL spec file. It has four five components–model, inference,
evaluate, dataset, and train–as well as several global parameters,
which are described below. The format of the spec file is a YAML file.
gpus: [0, 1]
strategy: 'ddp_sharded'
results_dir: '/path/to/result/dir'
dataset:
train_ann_path: '/datasets/coco/annotations/instances_train2017.json'
train_img_dir: '/datasets/coco/raw-data/train2017'
val_ann_path: '/coco/annotations/instances_val2017.json'
val_img_dir: '/datasets/coco/raw-data/val2017'
load_mask: True
crop_size: 512
inference:
ann_path: '/dataset/sample.json'
img_dir: '/dataset/sample_dir'
label_dump_path: '/dataset/sample_output.json'
model:
arch: 'vit-mae-base/16'
train:
num_epochs: 10
batch_size: 4
use_amp: True
| Field | Description | Data Type and Constraints | Recommended/Typical Value |
| gpus | A list of GPU indices to use | List of int | |
| strategy | The distributed training strategy | string | ‘ddp_sharded’ |
| num_nodes | The number of nodes in multinode training | Unsigned int | – |
| checkpoint | Either a pretrained model or a MAL checkpoint to load | string | – |
| results_dir | The directory to save experiement results to | string | – |
| dataset | The dataset config | Dict | – |
| train | The training config | Dict | – |
| model | The model config | Dict | – |
| evaluate | The evaluation config | Dict | – |
Dataset Config
The dataset configuration (dataset) defines the data source and input size.
| Field | Description | Data Type and Constraints | Recommended/Typical Value |
| train_ann_path | The path to the training annotation JSON file | string | |
| val_ann_path | The path to the validation annotation JSON file | string | |
| train_img_dir | The path to the training image directory | string | |
| val_img_dir | The path to the validation annotation JSON file | string | |
| crop_size | The effective input size of the model | Unsigned int | 512 |
| load_mask | A flag specifying whether to load the segmentation mask from the JSON file | boolean | |
| min_obj_size | The minimum object size for training | float | 2048 |
| max_obj_size | The maximum object size for training | float | 1e10 |
| num_workers_per_gpu | The number of workers to load data for each GPU | Unsigned int |
Model Config
The model configuration (model) defines the model architecture.
| Field | Description | Data Type and Constraints | Recommended/Typical Value |
| arch | The backbone architecture
Supported backbones include the following:
|
string | vit-mae-base/16 |
| frozen_stages | The indices of the frozen blocks | List[int] | -1 |
| mask_head_num_convs | The number of conv layers in the mask head | Unsigned int | 4 |
| mask_head_hidden_channel | The number of conv channels in the mask head | Unsigned int | 256 |
| mask_head_out_channel | The number of output channels in the mask head | Unsigned int | 256 |
| teacher_momentum | The momentum of the teacher model | float | 0.996 |
Train Config
The training configuration (train) specifies the parameters for the training process.
| Field | Description | Data Type and Constraints | Recommended/Typical Value |
| num_epochs | The number of epochs | Unsigned int | 10 |
| save_every_k_epoch | The save checkpoint for every K epochs | Unsigned int | 1 |
| val_interval | The validation interval | Unsigned int | 1 |
| batch_size | The training batch size | Unsigned int | |
| use_amp | A flag specifying whether to use mixed precision | boolean | True |
| optim_momentum | The momentum of the AdamW optimizer | float | 0.9 |
| lr | The learning rate | float | 0.0000015 |
| min_lr_rate | The minimum learning rate ratio | float | 0.2 |
| wd | The weight decay | float | 0.0005 |
| warmup_epochs | The number of epochs for warmup | Unsigned int | 1 |
| crf_kernel_size | The kernel size of the mean field approximation | Unsigned int | 3 |
| crf_num_iter | The number of iterations to run mask refinement | Unsigned int | 100 |
| loss_mil_weight | The weight of multiple instance learning loss | float | 4 |
| loss_crf_weight | The weight of conditional random field loss | float | 0.5 |
| results_dir | The directory to save training results | string |
Evaluation Config
The evaluation configuration (evaluate) specifies the parameters for the validation during training as well as the standalone evaluation.
| Field | Description | Data Type and Constraints | Recommended/Typical Value |
| batch_size | The evaluation batch size | Unsigned int | |
| use_mixed_model_test | A flag specifying whether to evaluate with a mixed model | boolean | False |
| use_teacher_test | A flag specifying whether to evaluate with the teacher model | boolean | False |
| results_dir | The directory to save the evaluation log | string |
Inference Config
The inference configuration (inference) specifies the parameters for generating pseudo masks given the groundtruth bounding boxes in COCO format.
| Field | Description | Data Type and Constraints | Recommended/Typical Value |
| ann_path | The path to the annotation JSON file | string | |
| img_dir | The image directory | string | |
| label_dump_path | The path to save the output JSON file with pseudo masks | string | |
| batch_size | The inference batch size | Unsigned int | |
| load_mask | A flag specifying whether to load masks if the annotation file has them | boolean | False |
| results_dir | The directory to save the inference log | string |
Train the MAL model using this command:
tao model mal train [-h] -e <experiment_spec>
[-r <results_dir>]
[--gpus <num_gpus>]
Required Arguments
-e, --experiment_spec_file: The experiment specification file
Optional Arguments
--gpus: The number of GPUs to use for training. The default value is 1.-h, --help: Show this help message and exit.
Sample Usage
Here’s an example of using the train command on a MAL model:
tao model mal train --gpus 2 -e /path/to/spec.yaml
To run evaluation for a MAL model, use this command:
tao model mal evaluate [-h] -e <experiment_spec_file>
[-r <results_dir>]
[--gpus <num_gpus>]
Required Arguments
-e, --experiment_spec_file: The experiment specification file
Optional Arguments
--gpus: The number of GPUs to use for evaluation. The default value is 1.-h, --help: Show this help message and exit.
The inference tool for MAL networks can be used to generate pseudo masks.
Here’s an example of using this tool:
tao model mal inference [-h] -e <experiment spec file>
[-r <results_dir>]
[--gpus <num_gpus>]
Required Arguments
-e, --experiment_spec_file: The experiment specification file
Optional Arguments
--gpus: The number of GPUs to use for inference. The default value is 1.-h, --help: Show this help message and exit.