Mask Auto Labeler#

Mask Auto Labeler (MAL) is a high-quality, transformer-based mask auto-labeling framework for instance segmentation using only box annotations. It supports the following tasks:

  • train

  • evaluate

  • inference

These tasks may be invoked from the TAO Launcher using the following convention on the command line:

tao mal <sub_task> <args_per_subtask>

Where args_per_subtask are the command-line arguments required for a given subtask. Each of these subtasks are explained in detail below.

Creating a Configuration File#

Below is a sample MAL spec file. It has five components–model, inference, evaluate, dataset, and train–as well as several global parameters, which are described below. The format of the spec file is a YAML file.

strategy: 'fsdp'
results_dir: '/path/to/result/dir'
dataset:
  train_ann_path: '/datasets/coco/annotations/instances_train2017.json'
  train_img_dir: '/datasets/coco/raw-data/train2017'
  val_ann_path: '/coco/annotations/instances_val2017.json'
  val_img_dir: '/datasets/coco/raw-data/val2017'
  load_mask: True
  crop_size: 512
inference:
  ann_path: '/dataset/sample.json'
  img_dir: '/dataset/sample_dir'
  label_dump_path: '/dataset/sample_output.json'
model:
  arch: 'vit-mae-base/16'
train:
  num_epochs: 10
  checkpoint_interval: 5
  validation_interval: 5
  batch_size: 4
  seed: 1234
  num_gpus: 1
  gpu_ids: [0]
  use_amp: True
  optim_momentum: 0.9
  lr: 0.0000015
  min_lr_rate: 0.2
  wd: 0.0005
  warmup_epochs: 1
  crf_kernel_size: 3
  crf_num_iter: 100
  loss_mil_weight: 4
  loss_crf_weight: 0.5

Parameter

Datatype

Default

Description

Supported Values

model

dict config

The configuration of the model architecture

dataset

dict config

The configuration of the dataset

train

dict config

The configuration of the training task

evaluate

dict config

The configuration of the evaluation task

inference

dict config

The configuration of the inference task

encryption_key

string

None

The encryption key to encrypt and decrypt model files

results_dir

string

/results

The directory where experiment results are saved

strategy

string

‘ddp’

The distributed training strategy

‘ddp’, ‘fsdp’

Dataset Config#

The dataset configuration (dataset) defines the data source and input size.

Field

Datatype

Default

Description

Supported Values

train_ann_path

string

The path to the training annotation JSON file

val_ann_path

string

The path to the validation annotation JSON file

train_img_dir

string

The path to the training image directory

val_img_dir

string

The path to the validation annotation JSON file

crop_size

Unsigned int

512

The effective input size of the model

load_mask

boolean

True

A flag specifying whether to load the segmentation mask from the JSON file

min_obj_size

float

2048

The minimum object size for training

max_obj_size

float

1e10

The maximum object size for training

num_workers_per_gpu

Unsigned int

The number of workers to load data for each GPU

Model Config#

The model configuration (model) defines the model architecture.

Field

Datatype

Default

Description

Supported Values

arch

string

vit-mae-base/16

The backbone architecture Supported backbones include the following:

  • vit-deit-tiny/16

  • vit-deit-small/16

  • vit-mae-base/16

  • vit-mae-large/16

  • vit-mae-huge/14

  • fan_tiny_12_p16_224

  • fan_small_12_p16_224

  • fan_base_18_p16_224

  • fan_large_24_p16_224

  • fan_tiny_8_p4_hybrid

  • fan_small_12_p4_hybrid

  • fan_base_16_p4_hybrid

  • fan_large_16_p4_hybrid

frozen_stages

List[int]

[-1]

The indices of the frozen blocks

mask_head_num_convs

Unsigned int

4

The number of conv layers in the mask head

mask_head_hidden_channel

Unsigned int

256

The number of conv channels in the mask head

mask_head_out_channel

Unsigned int

256

The number of output channels in the mask head

teacher_momentum

float

0.996

The momentum of the teacher model

Train Config#

The training configuration (train) specifies the parameters for the training process.

Parameter

Datatype

Default

Description

Supported Values

num_gpus

unsigned int

1

The number of GPUs to use for distributed training

>0

gpu_ids

List[int]

[0]

The indices of the GPU’s to use for distributed training

seed

unsigned int

1234

The random seed for random, numpy, and torch

>0

num_epochs

unsigned int

10

The total number of epochs to run the experiment

>0

checkpoint_interval

unsigned int

1

The epoch interval at which the checkpoints are saved

>0

validation_interval

unsigned int

1

The epoch interval at which the validation is run

>0

resume_training_checkpoint_path

string

The intermediate PyTorch Lightning checkpoint to resume training from

results_dir

string

/results/train

The directory to save training results

batch_size

Unsigned int

The training batch size

use_amp

boolean

True

A flag specifying whether to use mixed precision

optim_momentum

float

0.9

The momentum of the AdamW optimizer

lr

float

0.0000015

The learning rate

min_lr_rate

float

0.2

The minimum learning rate ratio

wd

float

0.0005

The weight decay

warmup_epochs

Unsigned int

1

The number of epochs for warmup

crf_kernel_size

Unsigned int

3

The kernel size of the mean field approximation

crf_num_iter

Unsigned int

100

The number of iterations to run mask refinement

loss_mil_weight

float

4

The weight of multiple instance learning loss

loss_crf_weight

float

0.5

The weight of conditional random field loss

Evaluation Config#

The evaluation configuration (evaluate) specifies the parameters for the validation during training as well as the standalone evaluation.

Field

Datatype

Default

Description

Supported Values

checkpoint

string

Path to PyTorch model to evaluate

results_dir

string

/results/evaluate

The directory to save evaluation results

num_gpus

unsigned int

1

The number of GPUs to use for distributed evaluation

>0

gpu_ids

List[int]

[0]

The indices of the GPU’s to use for distributed evaluation

batch_size

Unsigned int

The evaluation batch size

use_mixed_model_test

boolean

False

A flag specifying whether to evaluate with a mixed model

use_teacher_test

boolean

False

A flag specifying whether to evaluate with the teacher model

Inference Config#

The inference configuration (inference) specifies the parameters for generating pseudo masks given the groundtruth bounding boxes in COCO format.

Field

Datatype

Default

Description

Supported Values

checkpoint

string

Path to PyTorch model to inference

results_dir

string

/results/inference

The directory to save inference results

num_gpus

unsigned int

1

The number of GPUs to use for distributed inference

>0

gpu_ids

List[int]

[0]

The indices of the GPU’s to use for distributed inference

ann_path

string

The path to the annotation JSON file

img_dir

string

The image directory

label_dump_path

string

The path to save the output JSON file with pseudo masks

batch_size

Unsigned int

The inference batch size

load_mask

boolean

False

A flag specifying whether to load masks if the annotation file has them

Training the Model#

Use the following command to run MAL training:

tao model mal train [-h] -e <experiment_spec>
              [results_dir=<global_results_dir>]
              [model.<model_option>=<model_option_value>]
              [dataset.<dataset_option>=<dataset_option_value>]
              [train.<train_option>=<train_option_value>]
              [train.gpu_ids=<gpu indices>]
              [train.num_gpus=<number of gpus>]

Required Arguments#

The only required argument is the path to the experiment spec:

  • -e, --experiment_spec: The experiment specification file to set up the training experiment

Optional Arguments#

You can set optional arguments to override the option values in the experiment spec file.

Note

For training, evaluation, and inference, we expose 2 variables for each respective task: num_gpus and gpu_ids, which default to 1 and [0], respectively. If both are passed, but inconsistent, for example num_gpus = 1, gpu_ids = [0, 1], then they are modified to follow the setting with more GPUs, for example num_gpus = 1 -> num_gpus = 2.

Checkpointing and Resuming Training#

At every train.checkpoint_interval, a PyTorch Lightning checkpoint is saved. It is called model_epoch_<epoch_num>.pth. These are saved in train.results_dir, like so:

$ ls /results/train

'model_epoch_000.pth'
'model_epoch_001.pth'
'model_epoch_002.pth'
'model_epoch_003.pth'
'model_epoch_004.pth'

The latest checkpoint is also saved as mal_model_latest.pth. Training automatically resumes from mal_model_latest.pth, if it exists in train.results_dir. This is superseded by train.resume_training_checkpoint_path, if it is provided.

The major implication of this logic is that, if you wish to trigger fresh training from scratch, either:

  • Specify a new, empty results directory (Recommended)

  • Remove the latest checkpoint from the results directory

Evaluating the Model#

To run evaluation for a MAL model, use this command:

tao model mal evaluate [-h] -e <experiment_spec_file>
              evaluate.checkpoint=<model to be evaluated>
              [evaluate.<evaluate_option>=<evaluate_option_value>]
              [evaluate.gpu_ids=<gpu indices>]
              [evaluate.num_gpus=<number of gpus>]

Required Arguments#

  • -e, --experiment_spec: The experiment spec file to set up the evaluation experiment.

  • evaluate.checkpoint: The .pth model to be evaluated.

Optional Arguments#

Running Inference#

The inference tool for MAL networks can be used to generate pseudo masks. Here’s an example of using this tool:

tao model mal inference [-h] -e <experiment spec file>
              inference.checkpoint=<model to be inferenced>
              [inference.<inference_option>=<inference_option_value>]
              [inference.gpu_ids=<gpu indices>]
              [inference.num_gpus=<number of gpus>]

Required Arguments#

  • -e, --experiment_spec: The experiment spec file to set up the inference experiment.

  • inference.checkpoint: The .pth model to inference.

Optional Arguments#