Annotations#

The annotation service, which is part of TAO Data Services, offers tools for users to easily manipulate groundtruth labels. tao dataset annotations supports the following tasks:

  • convert

  • slice

  • merge

These tasks can be invoked from the TAO Launcher using the following convention on the command-line:

tao dataset annotations <sub_task> <args_per_subtask>

Where args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in the following sections.

Supported Data Formats#

The annotation service supports the KITTI, ODVG, and COCO formats.

Configuring a Spec File for Annotation Conversion#

The following is a sample spec file for converting a COCO dataset to KITTI format. It has three key components–data, kitti, and coco–as well as a global parameter, all of which are described below.

Use the following command to get an experiment spec file for annotations format conversion:

SPECS=$(tao-client annotations get-spec --action annotation_format_convert --job_type dataset --id $DATASET_ID)

Field

Description

Data Type and Constraints

Recommended/Typical Value

results_dir

The directory to save the annotation-conversion log to

string

data

The dataset config

Dict

kitti

The KITTI config

Dict

coco

The COCO config

Dict

odvg

The ODVG config

Dict

Data Config#

The data configuration (data) specifies the source and target formats of the label conversion, as well as the output path. See the Data Annotation Format page for more information about the data formats including KITTI, COCO, and ODVG.

  • Note 1: Direct KITTI to ODVG or ODVG to KITTI conversion is not supported, but you can use COCO as an intermediate format to bridge KITTI and ODVG.

  • Note 2: When input_format` and output_format` are both COCO, a new COCO annotation file is saved with category IDs remapped to contiguous IDs.

Field

Description

Data Type and Constraints

Recommended/Typical Value

input_format

The input data format (“KITTI”, “ODVG”, or “COCO”)

string

output_format

The output data format (“KITTI”, “ODVG”, or “COCO”)

string

output_dir

The path to save the converted annotations

string

KITTI Config#

The KITTI configuration (kitti) specifies the KITTI dataset information.

Field

Description

Data Type and Constraints

Recommended/Typical Value

image_dir

The image directory

string

label_dir

The label directory

string

project

The project name, which is used as the scene_id when converting to COCO format. The default value is the parent directory name of the image_dir.

string

mapping

A YAML file specifying the category mappings. If this value is not not specified, all categories in the label_dir are used).

string

no_skip

If True, do not skip images without any valid annotations.

bool

preserve_hierarchy

If True, preserve the KITTI folder structure.

bool

The following is an example of a category mapping file:

- person:
  - person
  - Person
  - person_group
  - rider
- bag:
  - hand_bag
  - backpack
  - personal_bag
- face:
  - face

COCO Config#

The COCO configuration (coco) specifies the COCO annotation file location.

Field

Description

Data Type and Constraints

Recommended/Typical Value

ann_file

The annotation file

string

refine_box

Whether to refine boxes with segmentation when converting to KITTI

bool

use_all_categories

Whether to use all categories

bool

add_background

Whether to add background categories

bool

ODVG Config#

The ODVG configuration (odvg) specifies the ODVG annotation file location.

Field

Description

Data Type and Constraints

Recommended/Typical Value

ann_file

The annotation file

string

labelmap_file

The label map file

string

Running the Annotation Conversion#

The annotation conversion service can be invoked from the TAO Launcher using the following convention on the command-line:

DS_FORMAT_CONVERT_JOB_ID=$(tao-client annotations dataset-run-action --action annotations_format_convert --id $DATASET_ID --specs "$SPECS")

Configuring a Spec File for the Annotation Slicing Service#

The following is a sample spec file for slicing a COCO annotation file. It has two key components–data and filter–as well as a global parameter (results_dir), all of which are described below.

Use the following command to get an experiment spec file for annotations slicing:

SLICE_SPECS=$(tao-client annotations get-spec --action annotation_slice --job_type dataset --id $DATASET_ID)

Field

Description

Data Type and Constraints

Recommended/Typical Value

results_dir

The directory to save the output annotation files and logs to

string

data

The dataset configuration

Dict

filter

The filter configuration

Dict

Data Config#

The dataset configuration (data) specifies the input format and annotation file.

Field

Description

Data Type and Constraints

Recommended/Typical Value

format

The configuration format. Currently, only “COCO” is supported.

string

“COCO”

annotation_file

The input COCO annotation file

string

Filter Config#

The filter configuration (filter) specifies how to slice the annotation data, which can be done in one of four modes:

  1. random: Randomly split the annotation file into N partitions or sample the annotation file by a certain percentage

  2. category: Filter annotation labels by the desired categories

  3. number: Pick N samples in order from the annotations

  4. filename: Filter the annotations by their file names

Field

Description

Data Type and Constraints

Recommended/Typical Value

mode

The filter mode (“random”, “category”, “number”, “filename”)

string

“COCO”

dump_remainder

A flag specifying whether to dump the remainder annotations. This parameter only applies when the mode parameter is set to “random” and the split parameter is a float value.

bool

split

The integer number of splits or the float sampling percentage (in “random” mode)

float or integer

num_samples

The number of annotations to keep (in “number” mode)

integer

included_categories

Categories to keep (in “category” mode)

list

excluded_categories

Categories to exclude (in “category” mode)

list

re_patterns

List of file name patterns to match (in “filename” mode)

list

Running the Annotation Slicer#

The annotation slicing service can be invoked from the TAO Launcher using the following convention on the command-line:

DS_SLICE_JOB_ID=$(tao-client annotations dataset-run-action --action annotations_slice --id $DATASET_ID --specs "$SLICE_SPECS")

Configuring a Spec file for Annotation Merge#

The following is a sample spec file for merging COCO annotation files. It has two key components–data and filter–as well as a global parameter (results_dir), all of which are described below.

Use the following command to get an experiment spec file for annotations merging:

MERGE_SPECS=$(tao-client annotations get-spec --action annotation_merge --job_type dataset --id $DATASET_ID)

Field

Description

Data Type and Constraints

Recommended/Typical Value

results_dir

The directory to save the output annotation file and logs

string

data

The dataset config

Dict

Data Config#

The data configuration (data) specifies the input format and annotatioin file.

Field

Description

Data Type and Constraints

Recommended/Typical Value

format

The configuration format. Currently, only “COCO” is supported.

string

“COCO”

annotations

A list of COCO annotation files

string

Note

All COCO annotation files must share the same categories.

Running the Annotation Merge#

The annotation merging service can be invoked from the TAO Launcher using the following convention on the command-line:

DS_MERGE_JOB_ID=$(tao-client annotations dataset-run-action --action annotations_merge --id $DATASET_ID --specs "$MERGE_SPECS")