TAO Toolkit v5.3.0
NVIDIA TAO v5.3.0

Mask Auto Labeler

Mask Auto Labeler (MAL) is a high-quality, transformer-based mask auto-labeling framework for instance segmentation using only box annotations. It supports the following tasks:

  • train

  • evaluate

  • inference

These tasks may be invoked from the TAO Toolkit Launcher using the following convention on the command line:

Copy
Copied!
            

tao mal <sub_task> <args_per_subtask>

Where args_per_subtask are the command-line arguments required for a given subtask. Each of these subtasks are explained in detail below.

Below is a sample MAL spec file. It has four five components–model, inference, evaluate, dataset, and train–as well as several global parameters, which are described below. The format of the spec file is a YAML file.

Copy
Copied!
            

gpus: [0, 1] strategy: 'ddp_sharded' results_dir: '/path/to/result/dir' dataset: train_ann_path: '/datasets/coco/annotations/instances_train2017.json' train_img_dir: '/datasets/coco/raw-data/train2017' val_ann_path: '/coco/annotations/instances_val2017.json' val_img_dir: '/datasets/coco/raw-data/val2017' load_mask: True crop_size: 512 inference: ann_path: '/dataset/sample.json' img_dir: '/dataset/sample_dir' label_dump_path: '/dataset/sample_output.json' model: arch: 'vit-mae-base/16' train: num_epochs: 10 batch_size: 4 use_amp: True

Field Description Data Type and Constraints Recommended/Typical Value
gpus A list of GPU indices to use List of int
strategy The distributed training strategy string ‘ddp_sharded’
num_nodes The number of nodes in multinode training Unsigned int –
checkpoint Either a pretrained model or a MAL checkpoint to load string –
results_dir The directory to save experiement results to string –
dataset The dataset config Dict –
train The training config Dict –
model The model config Dict –
evaluate The evaluation config Dict –

Dataset Config

The dataset configuration (dataset) defines the data source and input size.

Field Description Data Type and Constraints Recommended/Typical Value
train_ann_path The path to the training annotation JSON file string
val_ann_path The path to the validation annotation JSON file string
train_img_dir The path to the training image directory string
val_img_dir The path to the validation annotation JSON file string
crop_size The effective input size of the model Unsigned int 512
load_mask A flag specifying whether to load the segmentation mask from the JSON file boolean
min_obj_size The minimum object size for training float 2048
max_obj_size The maximum object size for training float 1e10
num_workers_per_gpu The number of workers to load data for each GPU Unsigned int

Model Config

The model configuration (model) defines the model architecture.

Field Description Data Type and Constraints Recommended/Typical Value
arch The backbone architecture Supported backbones include the following:
  • vit-deit-tiny/16
  • vit-deit-small/16
  • vit-mae-base/16
  • vit-mae-large/16
  • vit-mae-huge/14
  • fan_tiny_12_p16_224
  • fan_small_12_p16_224
  • fan_base_18_p16_224
  • fan_large_24_p16_224
  • fan_tiny_8_p4_hybrid
  • fan_small_12_p4_hybrid
  • fan_base_16_p4_hybrid
  • fan_large_16_p4_hybrid
string vit-mae-base/16
frozen_stages The indices of the frozen blocks List[int] -1
mask_head_num_convs The number of conv layers in the mask head Unsigned int 4
mask_head_hidden_channel The number of conv channels in the mask head Unsigned int 256
mask_head_out_channel The number of output channels in the mask head Unsigned int 256
teacher_momentum The momentum of the teacher model float 0.996

Train Config

The training configuration (train) specifies the parameters for the training process.

Field Description Data Type and Constraints Recommended/Typical Value
num_epochs The number of epochs Unsigned int 10
save_every_k_epoch The save checkpoint for every K epochs Unsigned int 1
val_interval The validation interval Unsigned int 1
batch_size The training batch size Unsigned int
use_amp A flag specifying whether to use mixed precision boolean True
optim_momentum The momentum of the AdamW optimizer float 0.9
lr The learning rate float 0.0000015
min_lr_rate The minimum learning rate ratio float 0.2
wd The weight decay float 0.0005
warmup_epochs The number of epochs for warmup Unsigned int 1
crf_kernel_size The kernel size of the mean field approximation Unsigned int 3
crf_num_iter The number of iterations to run mask refinement Unsigned int 100
loss_mil_weight The weight of multiple instance learning loss float 4
loss_crf_weight The weight of conditional random field loss float 0.5
results_dir The directory to save training results string

Evaluation Config

The evaluation configuration (evaluate) specifies the parameters for the validation during training as well as the standalone evaluation.

Field Description Data Type and Constraints Recommended/Typical Value
batch_size The evaluation batch size Unsigned int
use_mixed_model_test A flag specifying whether to evaluate with a mixed model boolean False
use_teacher_test A flag specifying whether to evaluate with the teacher model boolean False
results_dir The directory to save the evaluation log string

Inference Config

The inference configuration (inference) specifies the parameters for generating pseudo masks given the groundtruth bounding boxes in COCO format.

Field Description Data Type and Constraints Recommended/Typical Value
ann_path The path to the annotation JSON file string
img_dir The image directory string
label_dump_path The path to save the output JSON file with pseudo masks string
batch_size The inference batch size Unsigned int
load_mask A flag specifying whether to load masks if the annotation file has them boolean False
results_dir The directory to save the inference log string

Train the MAL model using this command:

Copy
Copied!
            

tao model mal train [-h] -e <experiment_spec> [-r <results_dir>] [--gpus <num_gpus>]

Required Arguments

  • -e, --experiment_spec_file: The experiment specification file

Optional Arguments

  • --gpus: The number of GPUs to use for training. The default value is 1.

  • -h, --help: Show this help message and exit.

Sample Usage

Here’s an example of using the train command on a MAL model:

Copy
Copied!
            

tao model mal train --gpus 2 -e /path/to/spec.yaml


To run evaluation for a MAL model, use this command:

Copy
Copied!
            

tao model mal evaluate [-h] -e <experiment_spec_file> [-r <results_dir>] [--gpus <num_gpus>]

Required Arguments

  • -e, --experiment_spec_file: The experiment specification file

Optional Arguments

  • --gpus: The number of GPUs to use for evaluation. The default value is 1.

  • -h, --help: Show this help message and exit.

The inference tool for MAL networks can be used to generate pseudo masks. Here’s an example of using this tool:

Copy
Copied!
            

tao model mal inference [-h] -e <experiment spec file> [-r <results_dir>] [--gpus <num_gpus>]

Required Arguments

  • -e, --experiment_spec_file: The experiment specification file

Optional Arguments

  • --gpus: The number of GPUs to use for inference. The default value is 1.

  • -h, --help: Show this help message and exit.

Previous MaskRCNN
Next Semantic Segmentation
© Copyright 2023, NVIDIA.. Last updated on Aug 26, 2024.