Annotations

The annotation service, which is part of TAO Data Services, offers tools for users to easily manipulate groundtruth labels. tao dataset annotations supports the following tasks:

  • convert

  • slice

  • merge

These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:

Copy
Copied!
            

tao dataset annotations <sub_task> <args_per_subtask>

Where args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in the following sections.

Currently, the annotation service supports the KITTI and COCO formats.

The following is a sample spec file for converting a COCO dataset to KITTI format. It has three key components–data, kitti, and coco–as well as a global parameter, all of which are described below.

Copy
Copied!
            

data: input_format: "COCO" output_format: "KITTI" output_dir: "/workspace/output" kitti: image_dir: "/workspace/kitti/images" label_dir: "/workspace/kitti/labels" mapping: "/workspace/kitti_mapping.json" coco: ann_file: "/workspace/coco.json" results_dir: "/path/to/results"

Field Description Data Type and Constraints Recommended/Typical Value
results_dir The directory to save the annotation-conversion log to string
data The dataset config Dict
kitti The KITTI config Dict
coco The COCO config Dict

Data Config

The data configuration (data) specifies the source and target formats of the label conversion, as well as the output path.

Field Description Data Type and Constraints Recommended/Typical Value
input_format The input data format (either “KITTI” or “COCO”) string
output_format The output data format (either “KITTI” or “COCO”) string
output_dir The path to save the converted annotations string

KITTI Config

The KITTI configuration (kitti) specifies the KITTI dataset information.

Field Description Data Type and Constraints Recommended/Typical Value
image_dir The image directory string
label_dir THe label directory string
project The project name, which is used as the scene_id when converting to COCO format. The default value is the parent directory name of the image_dir. string
mapping A YAML file specifying the category mappings. If this value is not not specified, all categories in the label_dir will be used). string

Here is an example of a category mapping file:

Copy
Copied!
            

- person: - person - Person - person_group - rider - bag: - hand_bag - backpack - personal_bag - face: - face


COCO Config

The COCO configuration (coco) specifies the COCO annotation file location.

Field Description Data Type and Constraints Recommended/Typical Value
ann_file The annotation file string

The annotation conversion service can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:

Copy
Copied!
            

tao dataset annotations convert [-h] -e <experiment spec> [-r <results_dir>]

Required Arguments

  • -e, --experiment_spec_file: The experiment specification file

Optional Arguments

  • -h, --help: Show this help message and exit.

Here’s an example of using the convert command in Data Services:

Example

Copy
Copied!
            

tao dataset annotations convert -e /path/to/spec.yaml


The following is a sample spec file for slicing a COCO annotation file. It has two key components–data and filter–as well as a global parameter (results_dir), all of which are described below.

Copy
Copied!
            

data: annotation_file: /datasets/coco/annotations/instances_val2017.json filter: mode: "category" # random, number num_samples: 10 split: 5 excluded_categories: - person results_dir: /output/dir/

Field Description Data Type and Constraints Recommended/Typical Value
results_dir The directory to save the output annotation files and logs to string
data The dataset configuration Dict
filter The filter configuration Dict

Data Config

The dataset configuration (data) specifies the input format and annotation file.

Field Description Data Type and Constraints Recommended/Typical Value
format The configuration format. Currently, only “COCO” is supported. string “COCO”
annotation_file The input COCO annotation file string

Filter Config

The filter configuration (filter) specifies how the annotation data will be sliced, which can be done in one of four modes:

  1. random: Randomly split the annotation file into N partitions or sample the annotation file by a certain percentage

  2. category: Filter annotation labels by the desired categories

  3. number: Pick N samples in order from the annotations

  4. filename: Filter the annotations by their file names

Field Description Data Type and Constraints Recommended/Typical Value
mode The filter mode (“random”, “category”, “number”, “filename”) string “COCO”
dump_remainder A flag specifying whether to dump the remainder annotations. This parameter only applies when the mode parameter is set to “random” and the split parameter is a float value. bool
split The integer number of splits or the float sampling percentage (in “random” mode) float or integer
num_samples The number of annotations to keep (in “number” mode) integer
included_categories Categories to keep (in “category” mode) list
excluded_categories Categories to exclude (in “category” mode) list
re_patterns List of file name patterns to match (in “filename” mode) list

The annotation slicing service can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:

Copy
Copied!
            

tao dataset annotations slice [-h] -e <experiment spec> [-r <results_dir>]

Required Arguments

  • -e, --experiment_spec_file: The experiment specification file

Optional Arguments

  • -h, --help: Show this help message and exit.

Example

Here’s an example of using the slice command in Data Services:

Copy
Copied!
            

tao dataset annotations slice -e /path/to/spec.yaml


The following is a sample spec file for merging COCO annotation files. It has two key components–data and filter–as well as a global parameter (results_dir), all of which are described below.

Copy
Copied!
            

data: format: "COCO" annotations: - /datasets/part_0.json - /datasets/part_1.json - /datasets/part_2.json - /datasets/part_3.json - /datasets/part_4.json results_dir: /output/dir/

Field Description Data Type and Constraints Recommended/Typical Value
results_dir The directory to save the output annotation file and logs string
data The dataset config Dict

Data config

The data configuration (data) specifies the input format and annotatioin file.

Field Description Data Type and Constraints Recommended/Typical Value
format The configuration format. Currently, only “COCO” is supported. string “COCO”
annotations A list of COCO annotation files string
Note

All COCO annotation files must share the same categories.


The annotation merging service can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:

Copy
Copied!
            

tao dataset annotations merge [-h] -e <experiment spec> [-r <results_dir>]

Required Arguments

  • -e, --experiment_spec_file: The experiment specification file

Optional Arguments

  • -h, --help: Show this help message and exit.

Example

Here’s an example of using the merge command in Data Services:

Copy
Copied!
            

tao dataset annotations merge -e /path/to/spec.yaml


© Copyright 2023, NVIDIA.. Last updated on Dec 8, 2023.