Annotating an image-based dataset can be quite tedious and time-consuming, which is especially true for segmentation tasks. When labelling, it can take 10 times longer to draw a good polygon around an object than a bounding box. The Auto-Label service of TAO Data Servcies is designed to reduce the time spent annotating an image dataset. Currently, this service supports: * automatically generating bounding box annotations given category names or referring expressions. * automatically generating instance segmentation masks given the groundtruth bounding boxes.

Data Input for Auto-Label# The Auto-Label service expects that the groundtruth annotation of a directory of images is stored in a COCO-format JSON file.

Configuring Spec File for Auto-Label# Parameter Datatype Description mal collection The configuration of MAL grounding_dino collection The configuration of Grounding DINO gpu_ids list Indices of GPUs to use num_gpus int32 Number of GPUs to use batch_size int32 Batch size num_workers int32 Number of workers for dataloader results_dir string Result directory autolabel_type string Type of auto-labeling to run (“mal” or “grounding_dino”) Grounding DINO Configuration# Field value_type Description model collection The GroundingDINO model config train collection The data source for testing: dataset collection The data source for inference: * image_dir : The list of directories that contains the inference images * class_names : The list of classes to run auto-labeling * noun_chunk_path : The JSONL file that stores noun chunks * augmentation : The GroundingDINO augmentation config results_dir string Result directory iteration_scheduler string The list of iteration schedule. Default is one iteration with confidence threshold of 0.5. Next iteration eliminates classes/noun chunks that have been already detected. visualize bool Flag to enable visualization of bounding boxes checkpoint string Grounding DINO model checkpoint path The process of using Grounding DINO to iteratively auto-label an image dataset is described as follows: A single forward pass of the candidate images is run through a Grounding DINO model that generates bounding box annotations for the list of grounded noun chunks or class names. Takes the labels from this iteration and then aggregates it with the labels from the previous iteration. The aggregation process involves a method of clustering similar annotations, such as NMS or DBSCAN. The iterative labeling process is terminated based on a predefined criterion, such as: current iteration number crossing an upper bound of maximum number of iterations.

if all the classes mentioned in the input list of noun chunks and class names have corresponding labels and no new labels have been added across iterations. If the termination condition isn’t met, it retriggers another forward pass through the open vocabulary model inferencer. However, this time the model inference happens at a lower confidence threshold. The rate at which the confidence threshold is decreased, is determined by the confidence threshold annealing scheduler (“confidence annealing”). This could be stepwise annealing, exponential decay, or cosine annealing. MAL Configuration# See the Mask Auto Labeler (MAL) documentation for more information about creating a spec file.