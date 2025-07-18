Mask Auto Labeler
Mask Auto Labeler (MAL) is a high-quality, transformer-based mask auto-labeling framework for instance segmentation using only box annotations. It supports the following tasks:
train
evaluate
inference
These tasks may be invoked from the TAO Launcher using the following convention on the command line:
tao mal <sub_task> <args_per_subtask>
Where
args_per_subtask are the command-line arguments required for a given subtask. Each of
these subtasks are explained in detail below.
Below is a sample MAL spec file. It has five components–
model,
inference,
evaluate,
dataset, and
train–as well as several global parameters,
which are described below. The format of the spec file is a YAML file.
strategy: 'fsdp'
results_dir: '/path/to/result/dir'
dataset:
train_ann_path: '/datasets/coco/annotations/instances_train2017.json'
train_img_dir: '/datasets/coco/raw-data/train2017'
val_ann_path: '/coco/annotations/instances_val2017.json'
val_img_dir: '/datasets/coco/raw-data/val2017'
load_mask: True
crop_size: 512
inference:
ann_path: '/dataset/sample.json'
img_dir: '/dataset/sample_dir'
label_dump_path: '/dataset/sample_output.json'
model:
arch: 'vit-mae-base/16'
train:
num_epochs: 10
checkpoint_interval: 5
validation_interval: 5
batch_size: 4
seed: 1234
num_gpus: 1
gpu_ids: [0]
use_amp: True
optim_momentum: 0.9
lr: 0.0000015
min_lr_rate: 0.2
wd: 0.0005
warmup_epochs: 1
crf_kernel_size: 3
crf_num_iter: 100
loss_mil_weight: 4
loss_crf_weight: 0.5
|Parameter
|Datatype
|Default
|Description
|Supported Values
|
model
|dict config
|–
|The configuration of the model architecture
|
dataset
|dict config
|–
|The configuration of the dataset
|
train
|dict config
|–
|The configuration of the training task
|
evaluate
|dict config
|–
|The configuration of the evaluation task
|
inference
|dict config
|–
|The configuration of the inference task
|
encryption_key
|string
|None
|The encryption key to encrypt and decrypt model files
|
results_dir
|string
|/results
|The directory where experiment results are saved
|
strategy
|string
|‘ddp’
|The distributed training strategy
|‘ddp’, ‘fsdp’
Dataset Config
The dataset configuration (
dataset) defines the data source and input size.
|Field
|Datatype
|Default
|Description
|Supported Values
|
train_ann_path
|string
|–
|The path to the training annotation JSON file
|
val_ann_path
|string
|–
|The path to the validation annotation JSON file
|
train_img_dir
|string
|–
|The path to the training image directory
|
val_img_dir
|string
|–
|The path to the validation annotation JSON file
|
crop_size
|Unsigned int
|512
|The effective input size of the model
|
load_mask
|boolean
|True
|A flag specifying whether to load the segmentation mask from the JSON file
|
min_obj_size
|float
|2048
|The minimum object size for training
|
max_obj_size
|float
|1e10
|The maximum object size for training
|
num_workers_per_gpu
|Unsigned int
|The number of workers to load data for each GPU
Model Config
The model configuration (
model) defines the model architecture.
|Field
|Datatype
|Default
|Description
|Supported Values
|
arch
|string
|vit-mae-base/16
|The backbone architecture
Supported backbones include the following:
|
frozen_stages
|List[int]
|[-1]
|The indices of the frozen blocks
|
mask_head_num_convs
|Unsigned int
|4
|The number of conv layers in the mask head
|
mask_head_hidden_channel
|Unsigned int
|256
|The number of conv channels in the mask head
|
mask_head_out_channel
|Unsigned int
|256
|The number of output channels in the mask head
|
teacher_momentum
|float
|0.996
|The momentum of the teacher model
Train Config
The training configuration (
train) specifies the parameters for the training process.
|Parameter
|Datatype
|Default
|Description
|Supported Values
|
num_gpus
|unsigned int
|1
|The number of GPUs to use for distributed training
|>0
|
gpu_ids
|List[int]
|[0]
|The indices of the GPU’s to use for distributed training
|
seed
|unsigned int
|1234
|The random seed for random, numpy, and torch
|>0
|
num_epochs
|unsigned int
|10
|The total number of epochs to run the experiment
|>0
|
checkpoint_interval
|unsigned int
|1
|The epoch interval at which the checkpoints are saved
|>0
|
validation_interval
|unsigned int
|1
|The epoch interval at which the validation is run
|>0
|
resume_training_checkpoint_path
|string
|The intermediate PyTorch Lightning checkpoint to resume training from
|
results_dir
|string
|/results/train
|The directory to save training results
|
batch_size
|Unsigned int
|The training batch size
|
use_amp
|boolean
|True
|A flag specifying whether to use mixed precision
|
optim_momentum
|float
|0.9
|The momentum of the AdamW optimizer
|
lr
|float
|0.0000015
|The learning rate
|
min_lr_rate
|float
|0.2
|The minimum learning rate ratio
|
wd
|float
|0.0005
|The weight decay
|
warmup_epochs
|Unsigned int
|1
|The number of epochs for warmup
|
crf_kernel_size
|Unsigned int
|3
|The kernel size of the mean field approximation
|
crf_num_iter
|Unsigned int
|100
|The number of iterations to run mask refinement
|
loss_mil_weight
|float
|4
|The weight of multiple instance learning loss
|
loss_crf_weight
|float
|0.5
|The weight of conditional random field loss
Evaluation Config
The evaluation configuration (
evaluate) specifies the parameters for the validation during training as well as the standalone evaluation.
|Field
|Datatype
|Default
|Description
|Supported Values
|
checkpoint
|string
|Path to PyTorch model to evaluate
|
results_dir
|string
|/results/evaluate
|The directory to save evaluation results
|
num_gpus
|unsigned int
|1
|The number of GPUs to use for distributed evaluation
|>0
|
gpu_ids
|List[int]
|[0]
|The indices of the GPU’s to use for distributed evaluation
|
batch_size
|Unsigned int
|The evaluation batch size
|
use_mixed_model_test
|boolean
|False
|A flag specifying whether to evaluate with a mixed model
|
use_teacher_test
|boolean
|False
|A flag specifying whether to evaluate with the teacher model
Inference Config
The inference configuration (
inference) specifies the parameters for generating pseudo masks given the groundtruth bounding boxes in COCO format.
|Field
|Datatype
|Default
|Description
|Supported Values
|
checkpoint
|string
|Path to PyTorch model to inference
|
results_dir
|string
|/results/inference
|The directory to save inference results
|
num_gpus
|unsigned int
|1
|The number of GPUs to use for distributed inference
|>0
|
gpu_ids
|List[int]
|[0]
|The indices of the GPU’s to use for distributed inference
|
ann_path
|string
|The path to the annotation JSON file
|
img_dir
|string
|The image directory
|
label_dump_path
|string
|The path to save the output JSON file with pseudo masks
|
batch_size
|Unsigned int
|The inference batch size
|
load_mask
|boolean
|False
|A flag specifying whether to load masks if the annotation file has them
Use the following command to run MAL training:
tao model mal train [-h] -e <experiment_spec>
[results_dir=<global_results_dir>]
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[train.<train_option>=<train_option_value>]
[train.gpu_ids=<gpu indices>]
[train.num_gpus=<number of gpus>]
Required Arguments
The only required argument is the path to the experiment spec:
-e, --experiment_spec: The experiment specification file to set up the training experiment
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
-h, --help: Show this help message and exit.
model.<model_option>: The model options.
dataset.<dataset_option>: The dataset options.
train.<train_option>: The train options.
For training, evaluation, and inference, we expose 2 variables for each respective task:
num_gpus and
gpu_ids, which
default to
1 and
[0], respectively. If both are passed, but inconsistent, for example
num_gpus = 1,
gpu_ids = [0, 1], then they are modified to follow the setting with more GPUs, for example
num_gpus = 1 -> num_gpus = 2.
Checkpointing and Resuming Training
At every
train.checkpoint_interval, a PyTorch Lightning checkpoint is saved. It is called
model_epoch_<epoch_num>.pth.
These are saved in
train.results_dir, like so:
$ ls /results/train
'model_epoch_000.pth'
'model_epoch_001.pth'
'model_epoch_002.pth'
'model_epoch_003.pth'
'model_epoch_004.pth'
The latest checkpoint is also saved as
mal_model_latest.pth.
Training automatically resumes from
mal_model_latest.pth, if it exists in
train.results_dir.
This is superseded by
train.resume_training_checkpoint_path, if it is provided.
The major implication of this logic is that, if you wish to trigger fresh training from scratch, either:
Specify a new, empty results directory (Recommended)
Remove the latest checkpoint from the results directory
To run evaluation for a MAL model, use this command:
tao model mal evaluate [-h] -e <experiment_spec_file>
evaluate.checkpoint=<model to be evaluated>
[evaluate.<evaluate_option>=<evaluate_option_value>]
[evaluate.gpu_ids=<gpu indices>]
[evaluate.num_gpus=<number of gpus>]
Required Arguments
-e, --experiment_spec: The experiment spec file to set up the evaluation experiment.
evaluate.checkpoint: The
.pthmodel to be evaluated.
Optional Arguments
evaluate.<evaluate_option>: The evaluate options.
The
inference tool for MAL networks can be used to generate pseudo masks.
Here’s an example of using this tool:
tao model mal inference [-h] -e <experiment spec file>
inference.checkpoint=<model to be inferenced>
[inference.<inference_option>=<inference_option_value>]
[inference.gpu_ids=<gpu indices>]
[inference.num_gpus=<number of gpus>]
Required Arguments
-e, --experiment_spec: The experiment spec file to set up the inference experiment.
inference.checkpoint: The
.pthmodel to inference.
Optional Arguments
inference.<inference_option>: The inference options.