Annotations#

The annotation service, which is part of TAO Data Services, offers tools for users to easily manipulate groundtruth labels. tao dataset annotations supports the following tasks:

convert
slice
merge

These tasks can be invoked from the TAO Launcher using the following convention on the command-line:

tao dataset annotations <sub_task> <args_per_subtask>

Where args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in the following sections.

Supported Data Formats#

The annotation service supports the KITTI, ODVG, and COCO formats.

Configuring a Spec File for Annotation Conversion#

The following is a sample spec file for converting a COCO dataset to KITTI format. It has three key components–data, kitti, and coco–as well as a global parameter, all of which are described below.

Use the following command to get an experiment spec file for annotations format conversion:

SPECS=$(tao-client annotations get-spec --action annotation_format_convert --job_type dataset --id $DATASET_ID)

data:
  input_format: "COCO"
  output_format: "KITTI"
  output_dir: "/workspace/output"
kitti:
  image_dir: "/workspace/kitti/images"
  label_dir: "/workspace/kitti/labels"
  mapping: "/workspace/kitti_mapping.json"
coco:
  ann_file: "/workspace/coco.json"
results_dir: "/path/to/results"

Field	Description	Data Type and Constraints	Recommended/Typical Value
results_dir	The directory to save the annotation-conversion log to	string	–
data	The dataset config	Dict	–
kitti	The KITTI config	Dict	–
coco	The COCO config	Dict	–
odvg	The ODVG config	Dict	–

Data Config#

The data configuration (data) specifies the source and target formats of the label conversion, as well as the output path. See the Data Annotation Format page for more information about the data formats including KITTI, COCO, and ODVG.

Note 1: Direct KITTI to ODVG or ODVG to KITTI conversion is not supported, but you can use COCO as an intermediate format to bridge KITTI and ODVG.
Note 2: When input_format` and output_format` are both COCO, a new COCO annotation file is saved with category IDs remapped to contiguous IDs.

Field	Description	Data Type and Constraints	Recommended/Typical Value
input_format	The input data format (“KITTI”, “ODVG”, or “COCO”)	string
output_format	The output data format (“KITTI”, “ODVG”, or “COCO”)	string
output_dir	The path to save the converted annotations	string

KITTI Config#

The KITTI configuration (kitti) specifies the KITTI dataset information.

Field	Description	Data Type and Constraints	Recommended/Typical Value
image_dir	The image directory	string
label_dir	The label directory	string
project	The project name, which is used as the `scene_id` when converting to COCO format. The default value is the parent directory name of the `image_dir`.	string
mapping	A YAML file specifying the category mappings. If this value is not not specified, all categories in the `label_dir` are used).	string
no_skip	If True, do not skip images without any valid annotations.	bool
preserve_hierarchy	If True, preserve the KITTI folder structure.	bool

The following is an example of a category mapping file:

- person:
  - person
  - Person
  - person_group
  - rider
- bag:
  - hand_bag
  - backpack
  - personal_bag
- face:
  - face

COCO Config#

The COCO configuration (coco) specifies the COCO annotation file location.

Field	Description	Data Type and Constraints	Recommended/Typical Value
ann_file	The annotation file	string
refine_box	Whether to refine boxes with segmentation when converting to KITTI	bool
use_all_categories	Whether to use all categories	bool
add_background	Whether to add background categories	bool

ODVG Config#

The ODVG configuration (odvg) specifies the ODVG annotation file location.

Field	Description	Data Type and Constraints	Recommended/Typical Value
ann_file	The annotation file	string
labelmap_file	The label map file	string

Running the Annotation Conversion#

The annotation conversion service can be invoked from the TAO Launcher using the following convention on the command-line:

DS_FORMAT_CONVERT_JOB_ID=$(tao-client annotations dataset-run-action --action annotations_format_convert --id $DATASET_ID --specs "$SPECS")

tao dataset annotations convert [-h] -e <experiment spec>
                                [results_dir=<results_dir>]

Required Arguments

-e, --experiment_spec_file: The experiment specification file

Optional Arguments

-h, --help: Show this help message and exit.

Example

The following is an example of using the convert command in Data Services:

tao dataset annotations convert -e /path/to/spec.yaml

Configuring a Spec File for the Annotation Slicing Service#

The following is a sample spec file for slicing a COCO annotation file. It has two key components–data and filter–as well as a global parameter (results_dir), all of which are described below.

Use the following command to get an experiment spec file for annotations slicing:

SLICE_SPECS=$(tao-client annotations get-spec --action annotation_slice --job_type dataset --id $DATASET_ID)

data:
  annotation_file: /datasets/coco/annotations/instances_val2017.json
filter:
  mode: "category" # random, number
  num_samples: 10
  split: 5
  excluded_categories:
    - person
results_dir: /output/dir/

Field	Description	Data Type and Constraints	Recommended/Typical Value
results_dir	The directory to save the output annotation files and logs to	string	–
data	The dataset configuration	Dict	–
filter	The filter configuration	Dict	–

Data Config#

The dataset configuration (data) specifies the input format and annotation file.

Field	Description	Data Type and Constraints	Recommended/Typical Value
format	The configuration format. Currently, only “COCO” is supported.	string	“COCO”
annotation_file	The input COCO annotation file	string

Filter Config#

The filter configuration (filter) specifies how to slice the annotation data, which can be done in one of four modes:

random: Randomly split the annotation file into N partitions or sample the annotation file by a certain percentage
category: Filter annotation labels by the desired categories
number: Pick N samples in order from the annotations
filename: Filter the annotations by their file names

Field	Description	Data Type and Constraints	Recommended/Typical Value
mode	The filter mode (“random”, “category”, “number”, “filename”)	string	“COCO”
dump_remainder	A flag specifying whether to dump the remainder annotations. This parameter only applies when the `mode` parameter is set to “random” and the `split` parameter is a float value.	bool
split	The integer number of splits or the float sampling percentage (in “random” mode)	float or integer
num_samples	The number of annotations to keep (in “number” mode)	integer
included_categories	Categories to keep (in “category” mode)	list
excluded_categories	Categories to exclude (in “category” mode)	list
re_patterns	List of file name patterns to match (in “filename” mode)	list

Running the Annotation Slicer#

The annotation slicing service can be invoked from the TAO Launcher using the following convention on the command-line:

DS_SLICE_JOB_ID=$(tao-client annotations dataset-run-action --action annotations_slice --id $DATASET_ID --specs "$SLICE_SPECS")

tao dataset annotations slice [-h] -e <experiment spec>
                              [results_dir=<results_dir>]

Required Arguments

-e, --experiment_spec_file: The experiment specification file

Optional Arguments

-h, --help: Show this help message and exit.

Example

The following is an example of using the slice command in Data Services:

tao dataset annotations slice -e /path/to/spec.yaml

Configuring a Spec file for Annotation Merge#

The following is a sample spec file for merging COCO annotation files. It has two key components–data and filter–as well as a global parameter (results_dir), all of which are described below.

Use the following command to get an experiment spec file for annotations merging:

MERGE_SPECS=$(tao-client annotations get-spec --action annotation_merge --job_type dataset --id $DATASET_ID)

data:
  format: "COCO"
  annotations:
    - /datasets/part_0.json
    - /datasets/part_1.json
    - /datasets/part_2.json
    - /datasets/part_3.json
    - /datasets/part_4.json
results_dir: /output/dir/

Field	Description	Data Type and Constraints	Recommended/Typical Value
results_dir	The directory to save the output annotation file and logs	string	–
data	The dataset config	Dict	–

Data Config#

The data configuration (data) specifies the input format and annotatioin file.

Field	Description	Data Type and Constraints	Recommended/Typical Value
format	The configuration format. Currently, only “COCO” is supported.	string	“COCO”
annotations	A list of COCO annotation files	string

Note

All COCO annotation files must share the same categories.

Running the Annotation Merge#

The annotation merging service can be invoked from the TAO Launcher using the following convention on the command-line:

DS_MERGE_JOB_ID=$(tao-client annotations dataset-run-action --action annotations_merge --id $DATASET_ID --specs "$MERGE_SPECS")

tao dataset annotations merge [-h] -e <experiment spec>
                              [results_dir=<results_dir>]

Required Arguments

-e, --experiment_spec_file: The experiment specification file

Optional Arguments

-h, --help: Show this help message and exit.

Example

Here’s an example of using the merge command in Data Services:

tao dataset annotations merge -e /path/to/spec.yaml