Mask Auto Labeler - NVIDIA Docs

Mask Auto Labeler (MAL) is a high-quality, transformer-based mask auto-labeling framework for instance segmentation using only box annotations. It supports the following tasks:

train
evaluate
inference

These tasks may be invoked from the TAO Toolkit Launcher using the following convention on the command line:

Copy
Copied!

            
            tao mal <sub_task> <args_per_subtask>

Where args_per_subtask are the command-line arguments required for a given subtask. Each of these subtasks are explained in detail below.

Creating a Configuration File

Below is a sample MAL spec file. It has four five components–model, inference, evaluate, dataset, and train–as well as several global parameters, which are described below. The format of the spec file is a YAML file.

Copy
Copied!

            
            gpus: [0, 1]
strategy: 'ddp_sharded'
results_dir: '/path/to/result/dir'
dataset:
  train_ann_path: '/datasets/coco/annotations/instances_train2017.json'
  train_img_dir: '/datasets/coco/raw-data/train2017'
  val_ann_path: '/coco/annotations/instances_val2017.json'
  val_img_dir: '/datasets/coco/raw-data/val2017'
  load_mask: True
  crop_size: 512
inference:
  ann_path: '/dataset/sample.json'
  img_dir: '/dataset/sample_dir'
  label_dump_path: '/dataset/sample_output.json'
model:
  arch: 'vit-mae-base/16'
train:
  num_epochs: 10
  batch_size: 4
  use_amp: True

Field	Description	Data Type and Constraints	Recommended/Typical Value
gpus	A list of GPU indices to use	List of int
strategy	The distributed training strategy	string	‘ddp_sharded’
num_nodes	The number of nodes in multinode training	Unsigned int	–
checkpoint	Either a pretrained model or a MAL checkpoint to load	string	–
results_dir	The directory to save experiement results to	string	–
dataset	The dataset config	Dict	–
train	The training config	Dict	–
model	The model config	Dict	–
evaluate	The evaluation config	Dict	–

Dataset Config

The dataset configuration (dataset) defines the data source and input size.

Field	Description	Data Type and Constraints	Recommended/Typical Value
train_ann_path	The path to the training annotation JSON file	string
val_ann_path	The path to the validation annotation JSON file	string
train_img_dir	The path to the training image directory	string
val_img_dir	The path to the validation annotation JSON file	string
crop_size	The effective input size of the model	Unsigned int	512
load_mask	A flag specifying whether to load the segmentation mask from the JSON file	boolean
min_obj_size	The minimum object size for training	float	2048
max_obj_size	The maximum object size for training	float	1e10
num_workers_per_gpu	The number of workers to load data for each GPU	Unsigned int

Model Config

The model configuration (model) defines the model architecture.

Field	Description	Data Type and Constraints	Recommended/Typical Value
arch	The backbone architecture Supported backbones include the following: vit-deit-tiny/16 vit-deit-small/16 vit-mae-base/16 vit-mae-large/16 vit-mae-huge/14 fan_tiny_12_p16_224 fan_small_12_p16_224 fan_base_18_p16_224 fan_large_24_p16_224 fan_tiny_8_p4_hybrid fan_small_12_p4_hybrid fan_base_16_p4_hybrid fan_large_16_p4_hybrid	string	vit-mae-base/16
frozen_stages	The indices of the frozen blocks	List[int]	-1
mask_head_num_convs	The number of conv layers in the mask head	Unsigned int	4
mask_head_hidden_channel	The number of conv channels in the mask head	Unsigned int	256
mask_head_out_channel	The number of output channels in the mask head	Unsigned int	256
teacher_momentum	The momentum of the teacher model	float	0.996

Train Config

The training configuration (train) specifies the parameters for the training process.

Field	Description	Data Type and Constraints	Recommended/Typical Value
num_epochs	The number of epochs	Unsigned int	10
save_every_k_epoch	The save checkpoint for every K epochs	Unsigned int	1
val_interval	The validation interval	Unsigned int	1
batch_size	The training batch size	Unsigned int
use_amp	A flag specifying whether to use mixed precision	boolean	True
optim_momentum	The momentum of the AdamW optimizer	float	0.9
lr	The learning rate	float	0.0000015
min_lr_rate	The minimum learning rate ratio	float	0.2
wd	The weight decay	float	0.0005
warmup_epochs	The number of epochs for warmup	Unsigned int	1
crf_kernel_size	The kernel size of the mean field approximation	Unsigned int	3
crf_num_iter	The number of iterations to run mask refinement	Unsigned int	100
loss_mil_weight	The weight of multiple instance learning loss	float	4
loss_crf_weight	The weight of conditional random field loss	float	0.5
results_dir	The directory to save training results	string

Evaluation Config

The evaluation configuration (evaluate) specifies the parameters for the validation during training as well as the standalone evaluation.

Field	Description	Data Type and Constraints	Recommended/Typical Value
batch_size	The evaluation batch size	Unsigned int
use_mixed_model_test	A flag specifying whether to evaluate with a mixed model	boolean	False
use_teacher_test	A flag specifying whether to evaluate with the teacher model	boolean	False
results_dir	The directory to save the evaluation log	string

Inference Config

The inference configuration (inference) specifies the parameters for generating pseudo masks given the groundtruth bounding boxes in COCO format.

Field	Description	Data Type and Constraints	Recommended/Typical Value
ann_path	The path to the annotation JSON file	string
img_dir	The image directory	string
label_dump_path	The path to save the output JSON file with pseudo masks	string
batch_size	The inference batch size	Unsigned int
load_mask	A flag specifying whether to load masks if the annotation file has them	boolean	False
results_dir	The directory to save the inference log	string

Training the Model

Train the MAL model using this command:

Copy
Copied!

            
            tao model mal train [-h] -e <experiment_spec>
                    [-r <results_dir>]
                    [--gpus <num_gpus>]

Required Arguments

-e, --experiment_spec_file: The experiment specification file

Optional Arguments

--gpus: The number of GPUs to use for training. The default value is 1.
-h, --help: Show this help message and exit.

Sample Usage

Here’s an example of using the train command on a MAL model:

Copy
Copied!

            
            tao model mal train --gpus 2 -e /path/to/spec.yaml

Evaluating the Model

To run evaluation for a MAL model, use this command:

Copy
Copied!

            
            tao model mal evaluate [-h] -e <experiment_spec_file>
                        [-r <results_dir>]
                        [--gpus <num_gpus>]

Required Arguments

-e, --experiment_spec_file: The experiment specification file

Optional Arguments

--gpus: The number of GPUs to use for evaluation. The default value is 1.
-h, --help: Show this help message and exit.

Running Inference

The inference tool for MAL networks can be used to generate pseudo masks. Here’s an example of using this tool:

Copy
Copied!

            
            tao model mal inference [-h] -e <experiment spec file>
                        [-r <results_dir>]
                        [--gpus <num_gpus>]

Required Arguments

-e, --experiment_spec_file: The experiment specification file

Optional Arguments

--gpus: The number of GPUs to use for inference. The default value is 1.
-h, --help: Show this help message and exit.