NVIDIA TAO Toolkit v4.0.1
NVIDIA TAO Release 4.0.1

Mask Auto Labeler

Mask Auto Labeler (MAL) is a high-quality, transformer-based mask auto-labeling framework for instance segmentation using only box annotations. It supports the following tasks:

  • train

  • evaluate

  • inference

These tasks may be invoked from the TAO Toolkit Launcher using the following convention on the command line:

Copy
Copied!
            

tao mal <sub_task> <args_per_subtask>

Where args_per_subtask are the command-line arguments required for a given subtask. Each of these subtasks are explained in detail below.

Below is a sample MAL spec file. It has four five components–model, inference, evaluate, dataset, and train–as well as several global parameters, which are described below. The format of the spec file is a YAML file.

Copy
Copied!
            

gpus: [0, 1] strategy: 'ddp_sharded' results_dir: '/path/to/result/dir' dataset: train_ann_path: '/datasets/coco/annotations/instances_train2017.json' train_img_dir: '/datasets/coco/raw-data/train2017' val_ann_path: '/coco/annotations/instances_val2017.json' val_img_dir: '/datasets/coco/raw-data/val2017' load_mask: True crop_size: 512 inference: ann_path: '/dataset/sample.json' img_dir: '/dataset/sample_dir' label_dump_path: '/dataset/sample_output.json' model: arch: 'vit-mae-base/16' train: num_epochs: 10 batch_size: 4 use_amp: True

Field

Description

Data Type and Constraints

Recommended/Typical Value

gpus

A list of GPU indices to use

List of int

strategy

The distributed training strategy

string

‘ddp_sharded’

num_nodes

The number of nodes in multinode training

Unsigned int

checkpoint

Either a pretrained model or a MAL checkpoint to load

string

results_dir

The directory to save experiement results to

string

dataset

The dataset config

Dict

train

The training config

Dict

model

The model config

Dict

evaluate

The evaluation config

Dict

Dataset Config

The dataset configuration (dataset) defines the data source and input size.

Field

Description

Data Type and Constraints

Recommended/Typical Value

train_ann_path

The path to the training annotation JSON file

string

val_ann_path

The path to the validation annotation JSON file

string

train_img_dir

The path to the training image directory

string

val_img_dir

The path to the validation annotation JSON file

string

crop_size

The effective input size of the model

Unsigned int

512

load_mask

A flag specifying whether to load the segmentation mask from the JSON file

boolean

min_obj_size

The minimum object size for training

float

2048

max_obj_size

The maximum object size for training

float

1e10

num_workers_per_gpu

The number of workers to load data for each GPU

Unsigned int

Model Config

The model configuration (model) defines the model architecture.

Field

Description

Data Type and Constraints

Recommended/Typical Value

arch

The backbone architecture Supported backbones include the following:

  • vit-deit-tiny/16

  • vit-deit-small/16

  • vit-mae-base/16

  • vit-mae-large/16

  • vit-mae-huge/14

  • fan_tiny_12_p16_224

  • fan_small_12_p16_224

  • fan_base_18_p16_224

  • fan_large_24_p16_224

  • fan_tiny_8_p4_hybrid

  • fan_small_12_p4_hybrid

  • fan_base_16_p4_hybrid

  • fan_large_16_p4_hybrid

string

vit-mae-base/16

frozen_stages

The indices of the frozen blocks

List[int]

-1

mask_head_num_convs

The number of conv layers in the mask head

Unsigned int

4

mask_head_hidden_channel

The number of conv channels in the mask head

Unsigned int

256

mask_head_out_channel

The number of output channels in the mask head

Unsigned int

256

teacher_momentum

The momentum of the teacher model

float

0.996

Train Config

The training configuration (train) specifies the parameters for the training process.

Field

Description

Data Type and Constraints

Recommended/Typical Value

num_epochs

The number of epochs

Unsigned int

10

save_every_k_epoch

The save checkpoint for every K epochs

Unsigned int

1

val_interval

The validation interval

Unsigned int

1

batch_size

The training batch size

Unsigned int

use_amp

A flag specifying whether to use mixed precision

boolean

True

optim_momentum

The momentum of the AdamW optimizer

float

0.9

lr

The learning rate

float

0.0000015

min_lr_rate

The minimum learning rate ratio

float

0.2

wd

The weight decay

float

0.0005

warmup_epochs

The number of epochs for warmup

Unsigned int

1

crf_kernel_size

The kernel size of the mean field approximation

Unsigned int

3

crf_num_iter

The number of iterations to run mask refinement

Unsigned int

100

loss_mil_weight

The weight of multiple instance learning loss

float

4

loss_crf_weight

The weight of conditional random field loss

float

0.5

results_dir

The directory to save training results

string

Evaluation Config

The evaluation configuration (evaluate) specifies the parameters for the validation during training as well as the standalone evaluation.

Field

Description

Data Type and Constraints

Recommended/Typical Value

batch_size

The evaluation batch size

Unsigned int

use_mixed_model_test

A flag specifying whether to evaluate with a mixed model

boolean

False

use_teacher_test

A flag specifying whether to evaluate with the teacher model

boolean

False

results_dir

The directory to save the evaluation log

string

Inference Config

The inference configuration (inference) specifies the parameters for generating pseudo masks given the groundtruth bounding boxes in COCO format.

Field

Description

Data Type and Constraints

Recommended/Typical Value

ann_path

The path to the annotation JSON file

string

img_dir

The image directory

string

label_dump_path

The path to save the output JSON file with pseudo masks

string

batch_size

The inference batch size

Unsigned int

load_mask

A flag specifying whether to load masks if the annotation file has them

boolean

False

results_dir

The directory to save the inference log

string

Train the MAL model using this command:

Copy
Copied!
            

tao model mal train [-h] -e <experiment_spec> [-r <results_dir>] [--gpus <num_gpus>]

Required Arguments

  • -e, --experiment_spec_file: The experiment specification file

Optional Arguments

  • --gpus: The number of GPUs to use for training. The default value is 1.

  • -h, --help: Show this help message and exit.

Sample Usage

Here’s an example of using the train command on a MAL model:

Copy
Copied!
            

tao model mal train --gpus 2 -e /path/to/spec.yaml


To run evaluation for a MAL model, use this command:

Copy
Copied!
            

tao model mal evaluate [-h] -e <experiment_spec_file> [-r <results_dir>] [--gpus <num_gpus>]

Required Arguments

  • -e, --experiment_spec_file: The experiment specification file

Optional Arguments

  • --gpus: The number of GPUs to use for evaluation. The default value is 1.

  • -h, --help: Show this help message and exit.

The inference tool for MAL networks can be used to generate pseudo masks. Here’s an example of using this tool:

Copy
Copied!
            

tao model mal inference [-h] -e <experiment spec file> [-r <results_dir>] [--gpus <num_gpus>]

Required Arguments

  • -e, --experiment_spec_file: The experiment specification file

Optional Arguments

  • --gpus: The number of GPUs to use for inference. The default value is 1.

  • -h, --help: Show this help message and exit.

© Copyright 2023, NVIDIA.. Last updated on Jul 27, 2023.