Annotations
The annotation service, which is part of TAO Data Services, offers tools for users to easily manipulate groundtruth labels. tao dataset annotations supports the following tasks:
convert
slice
merge
These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:
tao dataset annotations <sub_task> <args_per_subtask>
Where args_per_subtask
are the command-line arguments required for a given subtask. Each
subtask is explained in the following sections.
The following is a sample spec file for converting a COCO dataset to KITTI format.
It has three key components–data
, kitti
, and coco
–as well
as a global parameter, all of which are described below.
data:
input_format: "COCO"
output_format: "KITTI"
output_dir: "/workspace/output"
kitti:
image_dir: "/workspace/kitti/images"
label_dir: "/workspace/kitti/labels"
mapping: "/workspace/kitti_mapping.json"
coco:
ann_file: "/workspace/coco.json"
results_dir: "/path/to/results"
Field | Description | Data Type and Constraints | Recommended/Typical Value |
results_dir | The directory to save the annotation-conversion log to | string | – |
data | The dataset config | Dict | – |
kitti | The KITTI config | Dict | – |
coco | The COCO config | Dict | – |
Data Config
The data configuration (data
) specifies the source and target formats of the label conversion, as well as the output path.
Field | Description | Data Type and Constraints | Recommended/Typical Value |
input_format | The input data format (either “KITTI” or “COCO”) | string | |
output_format | The output data format (either “KITTI” or “COCO”) | string | |
output_dir | The path to save the converted annotations | string |
KITTI Config
The KITTI configuration (kitti
) specifies the KITTI dataset information.
Field | Description | Data Type and Constraints | Recommended/Typical Value |
image_dir | The image directory | string | |
label_dir | THe label directory | string | |
project | The project name, which is used as the scene_id when
converting to COCO format. The default value is the parent
directory name of the image_dir . |
string | |
mapping | A YAML file specifying the category mappings. If this value is not
not specified, all categories in the label_dir will be used). |
string |
Here is an example of a category mapping file:
- person:
- person
- Person
- person_group
- rider
- bag:
- hand_bag
- backpack
- personal_bag
- face:
- face
COCO Config
The COCO configuration (coco
) specifies the COCO annotation file location.
Field | Description | Data Type and Constraints | Recommended/Typical Value |
ann_file | The annotation file | string |
The annotation conversion service can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:
tao dataset annotations convert [-h] -e <experiment spec>
[-r <results_dir>]
Required Arguments
-e, --experiment_spec_file
: The experiment specification file
Optional Arguments
-h, --help
: Show this help message and exit.
Here’s an example of using the convert
command in Data Services:
Example
tao dataset annotations convert -e /path/to/spec.yaml
The following is a sample spec file for slicing a COCO annotation file.
It has two key components–data
and filter
–as well
as a global parameter (results_dir
), all of which are described below.
data:
annotation_file: /datasets/coco/annotations/instances_val2017.json
filter:
mode: "category" # random, number
num_samples: 10
split: 5
excluded_categories:
- person
results_dir: /output/dir/
Field | Description | Data Type and Constraints | Recommended/Typical Value |
results_dir | The directory to save the output annotation files and logs to | string | – |
data | The dataset configuration | Dict | – |
filter | The filter configuration | Dict | – |
Data Config
The dataset configuration (data
) specifies the input format and annotation file.
Field | Description | Data Type and Constraints | Recommended/Typical Value |
format | The configuration format. Currently, only “COCO” is supported. | string | “COCO” |
annotation_file | The input COCO annotation file | string |
Filter Config
The filter configuration (filter
) specifies how the annotation data will be sliced, which can be done in one of four modes:
random
: Randomly split the annotation file into N partitions or sample the annotation file by a certain percentagecategory
: Filter annotation labels by the desired categoriesnumber
: Pick N samples in order from the annotationsfilename
: Filter the annotations by their file names
Field | Description | Data Type and Constraints | Recommended/Typical Value |
mode | The filter mode (“random”, “category”, “number”, “filename”) | string | “COCO” |
dump_remainder | A flag specifying whether to dump the remainder annotations. This
parameter only applies when the mode parameter is set to
“random” and the split parameter is a float value. |
bool | |
split | The integer number of splits or the float sampling percentage (in “random” mode) | float or integer | |
num_samples | The number of annotations to keep (in “number” mode) | integer | |
included_categories | Categories to keep (in “category” mode) | list | |
excluded_categories | Categories to exclude (in “category” mode) | list | |
re_patterns | List of file name patterns to match (in “filename” mode) | list |
The annotation slicing service can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:
tao dataset annotations slice [-h] -e <experiment spec>
[-r <results_dir>]
Required Arguments
-e, --experiment_spec_file
: The experiment specification file
Optional Arguments
-h, --help
: Show this help message and exit.
Example
Here’s an example of using the slice
command in Data Services:
tao dataset annotations slice -e /path/to/spec.yaml
The following is a sample spec file for merging COCO annotation files.
It has two key components–data
and filter
–as well
as a global parameter (results_dir
), all of which are described below.
data:
format: "COCO"
annotations:
- /datasets/part_0.json
- /datasets/part_1.json
- /datasets/part_2.json
- /datasets/part_3.json
- /datasets/part_4.json
results_dir: /output/dir/
Field | Description | Data Type and Constraints | Recommended/Typical Value |
results_dir | The directory to save the output annotation file and logs | string | – |
data | The dataset config | Dict | – |
Data config
The data configuration (data
) specifies the input format and annotatioin file.
Field | Description | Data Type and Constraints | Recommended/Typical Value |
format | The configuration format. Currently, only “COCO” is supported. | string | “COCO” |
annotations | A list of COCO annotation files | string |
All COCO annotation files must share the same categories
.
The annotation merging service can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:
tao dataset annotations merge [-h] -e <experiment spec>
[-r <results_dir>]
Required Arguments
-e, --experiment_spec_file
: The experiment specification file
Optional Arguments
-h, --help
: Show this help message and exit.
Example
Here’s an example of using the merge
command in Data Services:
tao dataset annotations merge -e /path/to/spec.yaml