TAO v5.5.0
NVIDIA TAO v5.5.0

Mask Auto Labeler

Mask Auto Labeler (MAL) is a high-quality, transformer-based mask auto-labeling framework for instance segmentation using only box annotations. It supports the following tasks:

  • train

  • evaluate

  • inference

These tasks may be invoked from the TAO Launcher using the following convention on the command line:

Copy
Copied!
            

tao mal <sub_task> <args_per_subtask>

Where args_per_subtask are the command-line arguments required for a given subtask. Each of these subtasks are explained in detail below.

Below is a sample MAL spec file. It has five components–model, inference, evaluate, dataset, and train–as well as several global parameters, which are described below. The format of the spec file is a YAML file.

Copy
Copied!
            

strategy: 'fsdp' results_dir: '/path/to/result/dir' dataset: train_ann_path: '/datasets/coco/annotations/instances_train2017.json' train_img_dir: '/datasets/coco/raw-data/train2017' val_ann_path: '/coco/annotations/instances_val2017.json' val_img_dir: '/datasets/coco/raw-data/val2017' load_mask: True crop_size: 512 inference: ann_path: '/dataset/sample.json' img_dir: '/dataset/sample_dir' label_dump_path: '/dataset/sample_output.json' model: arch: 'vit-mae-base/16' train: num_epochs: 10 checkpoint_interval: 5 validation_interval: 5 batch_size: 4 seed: 1234 num_gpus: 1 gpu_ids: [0] use_amp: True optim_momentum: 0.9 lr: 0.0000015 min_lr_rate: 0.2 wd: 0.0005 warmup_epochs: 1 crf_kernel_size: 3 crf_num_iter: 100 loss_mil_weight: 4 loss_crf_weight: 0.5

Parameter Datatype Default Description Supported Values
model dict config – The configuration of the model architecture
dataset dict config – The configuration of the dataset
train dict config – The configuration of the training task
evaluate dict config – The configuration of the evaluation task
inference dict config – The configuration of the inference task
encryption_key string None The encryption key to encrypt and decrypt model files
results_dir string /results The directory where experiment results are saved
strategy string ‘ddp’ The distributed training strategy ‘ddp’, ‘fsdp’

Dataset Config

The dataset configuration (dataset) defines the data source and input size.

Field Datatype Default Description Supported Values
train_ann_path string – The path to the training annotation JSON file
val_ann_path string – The path to the validation annotation JSON file
train_img_dir string – The path to the training image directory
val_img_dir string – The path to the validation annotation JSON file
crop_size Unsigned int 512 The effective input size of the model
load_mask boolean True A flag specifying whether to load the segmentation mask from the JSON file
min_obj_size float 2048 The minimum object size for training
max_obj_size float 1e10 The maximum object size for training
num_workers_per_gpu Unsigned int The number of workers to load data for each GPU

Model Config

The model configuration (model) defines the model architecture.

Field Datatype Default Description Supported Values
arch string vit-mae-base/16 The backbone architecture Supported backbones include the following:
  • vit-deit-tiny/16
  • vit-deit-small/16
  • vit-mae-base/16
  • vit-mae-large/16
  • vit-mae-huge/14
  • fan_tiny_12_p16_224
  • fan_small_12_p16_224
  • fan_base_18_p16_224
  • fan_large_24_p16_224
  • fan_tiny_8_p4_hybrid
  • fan_small_12_p4_hybrid
  • fan_base_16_p4_hybrid
  • fan_large_16_p4_hybrid
frozen_stages List[int] [-1] The indices of the frozen blocks
mask_head_num_convs Unsigned int 4 The number of conv layers in the mask head
mask_head_hidden_channel Unsigned int 256 The number of conv channels in the mask head
mask_head_out_channel Unsigned int 256 The number of output channels in the mask head
teacher_momentum float 0.996 The momentum of the teacher model

Train Config

The training configuration (train) specifies the parameters for the training process.

Parameter Datatype Default Description Supported Values
num_gpus unsigned int 1 The number of GPUs to use for distributed training >0
gpu_ids List[int] [0] The indices of the GPU’s to use for distributed training
seed unsigned int 1234 The random seed for random, numpy, and torch >0
num_epochs unsigned int 10 The total number of epochs to run the experiment >0
checkpoint_interval unsigned int 1 The epoch interval at which the checkpoints are saved >0
validation_interval unsigned int 1 The epoch interval at which the validation is run >0
resume_training_checkpoint_path string The intermediate PyTorch Lightning checkpoint to resume training from
results_dir string /results/train The directory to save training results
batch_size Unsigned int The training batch size
use_amp boolean True A flag specifying whether to use mixed precision
optim_momentum float 0.9 The momentum of the AdamW optimizer
lr float 0.0000015 The learning rate
min_lr_rate float 0.2 The minimum learning rate ratio
wd float 0.0005 The weight decay
warmup_epochs Unsigned int 1 The number of epochs for warmup
crf_kernel_size Unsigned int 3 The kernel size of the mean field approximation
crf_num_iter Unsigned int 100 The number of iterations to run mask refinement
loss_mil_weight float 4 The weight of multiple instance learning loss
loss_crf_weight float 0.5 The weight of conditional random field loss

Evaluation Config

The evaluation configuration (evaluate) specifies the parameters for the validation during training as well as the standalone evaluation.

Field Datatype Default Description Supported Values
checkpoint string Path to PyTorch model to evaluate
results_dir string /results/evaluate The directory to save evaluation results
num_gpus unsigned int 1 The number of GPUs to use for distributed evaluation >0
gpu_ids List[int] [0] The indices of the GPU’s to use for distributed evaluation
batch_size Unsigned int The evaluation batch size
use_mixed_model_test boolean False A flag specifying whether to evaluate with a mixed model
use_teacher_test boolean False A flag specifying whether to evaluate with the teacher model

Inference Config

The inference configuration (inference) specifies the parameters for generating pseudo masks given the groundtruth bounding boxes in COCO format.

Field Datatype Default Description Supported Values
checkpoint string Path to PyTorch model to inference
results_dir string /results/inference The directory to save inference results
num_gpus unsigned int 1 The number of GPUs to use for distributed inference >0
gpu_ids List[int] [0] The indices of the GPU’s to use for distributed inference
ann_path string The path to the annotation JSON file
img_dir string The image directory
label_dump_path string The path to save the output JSON file with pseudo masks
batch_size Unsigned int The inference batch size
load_mask boolean False A flag specifying whether to load masks if the annotation file has them

Use the following command to run MAL training:

Copy
Copied!
            

tao model mal train [-h] -e <experiment_spec> [results_dir=<global_results_dir>] [model.<model_option>=<model_option_value>] [dataset.<dataset_option>=<dataset_option_value>] [train.<train_option>=<train_option_value>] [train.gpu_ids=<gpu indices>] [train.num_gpus=<number of gpus>]

Required Arguments

The only required argument is the path to the experiment spec:

  • -e, --experiment_spec: The experiment specification file to set up the training experiment

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

Note

For training, evaluation, and inference, we expose 2 variables for each respective task: num_gpus and gpu_ids, which default to 1 and [0], respectively. If both are passed, but inconsistent, for example num_gpus = 1, gpu_ids = [0, 1], then they are modified to follow the setting with more GPUs, for example num_gpus = 1 -> num_gpus = 2.

Checkpointing and Resuming Training

At every train.checkpoint_interval, a PyTorch Lightning checkpoint is saved. It is called model_epoch_<epoch_num>.pth. These are saved in train.results_dir, like so:

Copy
Copied!
            

$ ls /results/train 'model_epoch_000.pth' 'model_epoch_001.pth' 'model_epoch_002.pth' 'model_epoch_003.pth' 'model_epoch_004.pth'

The latest checkpoint is also saved as mal_model_latest.pth. Training automatically resumes from mal_model_latest.pth, if it exists in train.results_dir. This is superseded by train.resume_training_checkpoint_path, if it is provided.

The major implication of this logic is that, if you wish to trigger fresh training from scratch, either:

  • Specify a new, empty results directory (Recommended)

  • Remove the latest checkpoint from the results directory

To run evaluation for a MAL model, use this command:

Copy
Copied!
            

tao model mal evaluate [-h] -e <experiment_spec_file> evaluate.checkpoint=<model to be evaluated> [evaluate.<evaluate_option>=<evaluate_option_value>] [evaluate.gpu_ids=<gpu indices>] [evaluate.num_gpus=<number of gpus>]

Required Arguments

  • -e, --experiment_spec: The experiment spec file to set up the evaluation experiment.

  • evaluate.checkpoint: The .pth model to be evaluated.

Optional Arguments

The inference tool for MAL networks can be used to generate pseudo masks. Here’s an example of using this tool:

Copy
Copied!
            

tao model mal inference [-h] -e <experiment spec file> inference.checkpoint=<model to be inferenced> [inference.<inference_option>=<inference_option_value>] [inference.gpu_ids=<gpu indices>] [inference.num_gpus=<number of gpus>]

Required Arguments

  • -e, --experiment_spec: The experiment spec file to set up the inference experiment.

  • inference.checkpoint: The .pth model to inference.

Optional Arguments

Previous mask2former
Next Mask Grounding DINO
© Copyright 2024, NVIDIA. Last updated on Aug 30, 2024.