Annotations#
The annotation service, which is part of TAO Data Services, offers tools for users to easily manipulate groundtruth labels. tao dataset annotations supports the following tasks:
convert
slice
merge
These tasks can be invoked from the TAO Launcher using the following convention on the command-line:
tao dataset annotations <sub_task> <args_per_subtask>
Where args_per_subtask
are the command-line arguments required for a given subtask. Each
subtask is explained in the following sections.
Supported Data Formats#
The annotation service supports the KITTI, ODVG, and COCO formats.
Configuring a Spec File for Annotation Conversion#
The following is a sample spec file for converting a COCO dataset to KITTI format.
It has three key components–data
, kitti
, and coco
–as well
as a global parameter, all of which are described below.
Use the following command to get an experiment spec file for annotations format conversion:
SPECS=$(tao-client annotations get-spec --action annotation_format_convert --job_type dataset --id $DATASET_ID)
data:
input_format: "COCO"
output_format: "KITTI"
output_dir: "/workspace/output"
kitti:
image_dir: "/workspace/kitti/images"
label_dir: "/workspace/kitti/labels"
mapping: "/workspace/kitti_mapping.json"
coco:
ann_file: "/workspace/coco.json"
results_dir: "/path/to/results"
Field |
Description |
Data Type and Constraints |
Recommended/Typical Value |
results_dir |
The directory to save the annotation-conversion log to |
string |
– |
data |
The dataset config |
Dict |
– |
kitti |
The KITTI config |
Dict |
– |
coco |
The COCO config |
Dict |
– |
odvg |
The ODVG config |
Dict |
– |
Data Config#
The data configuration (data
) specifies the source and target formats of the label conversion, as well as the output path.
See the Data Annotation Format page for more information about the data formats including KITTI, COCO, and ODVG.
Note 1: Direct KITTI to ODVG or ODVG to KITTI conversion is not supported, but you can use COCO as an intermediate format to bridge KITTI and ODVG.
Note 2: When
input_format`
andoutput_format`
are both COCO, a new COCO annotation file is saved with category IDs remapped to contiguous IDs.
Field |
Description |
Data Type and Constraints |
Recommended/Typical Value |
input_format |
The input data format (“KITTI”, “ODVG”, or “COCO”) |
string |
|
output_format |
The output data format (“KITTI”, “ODVG”, or “COCO”) |
string |
|
output_dir |
The path to save the converted annotations |
string |
KITTI Config#
The KITTI configuration (kitti
) specifies the KITTI dataset information.
Field |
Description |
Data Type and Constraints |
Recommended/Typical Value |
image_dir |
The image directory |
string |
|
label_dir |
The label directory |
string |
|
project |
The project name, which is used as the |
string |
|
mapping |
A YAML file specifying the category mappings. If this value is not
not specified, all categories in the |
string |
|
no_skip |
If True, do not skip images without any valid annotations. |
bool |
|
preserve_hierarchy |
If True, preserve the KITTI folder structure. |
bool |
The following is an example of a category mapping file:
- person:
- person
- Person
- person_group
- rider
- bag:
- hand_bag
- backpack
- personal_bag
- face:
- face
COCO Config#
The COCO configuration (coco
) specifies the COCO annotation file location.
Field |
Description |
Data Type and Constraints |
Recommended/Typical Value |
ann_file |
The annotation file |
string |
|
refine_box |
Whether to refine boxes with segmentation when converting to KITTI |
bool |
|
use_all_categories |
Whether to use all categories |
bool |
|
add_background |
Whether to add background categories |
bool |
ODVG Config#
The ODVG configuration (odvg
) specifies the ODVG annotation file location.
Field |
Description |
Data Type and Constraints |
Recommended/Typical Value |
ann_file |
The annotation file |
string |
|
labelmap_file |
The label map file |
string |
Running the Annotation Conversion#
The annotation conversion service can be invoked from the TAO Launcher using the following convention on the command-line:
DS_FORMAT_CONVERT_JOB_ID=$(tao-client annotations dataset-run-action --action annotations_format_convert --id $DATASET_ID --specs "$SPECS")
tao dataset annotations convert [-h] -e <experiment spec>
[results_dir=<results_dir>]
Required Arguments
-e, --experiment_spec_file
: The experiment specification file
Optional Arguments
-h, --help
: Show this help message and exit.
Example
The following is an example of using the convert
command in Data Services:
tao dataset annotations convert -e /path/to/spec.yaml
Configuring a Spec File for the Annotation Slicing Service#
The following is a sample spec file for slicing a COCO annotation file.
It has two key components–data
and filter
–as well
as a global parameter (results_dir
), all of which are described below.
Use the following command to get an experiment spec file for annotations slicing:
SLICE_SPECS=$(tao-client annotations get-spec --action annotation_slice --job_type dataset --id $DATASET_ID)
data:
annotation_file: /datasets/coco/annotations/instances_val2017.json
filter:
mode: "category" # random, number
num_samples: 10
split: 5
excluded_categories:
- person
results_dir: /output/dir/
Field |
Description |
Data Type and Constraints |
Recommended/Typical Value |
results_dir |
The directory to save the output annotation files and logs to |
string |
– |
data |
The dataset configuration |
Dict |
– |
filter |
The filter configuration |
Dict |
– |
Data Config#
The dataset configuration (data
) specifies the input format and annotation file.
Field |
Description |
Data Type and Constraints |
Recommended/Typical Value |
format |
The configuration format. Currently, only “COCO” is supported. |
string |
“COCO” |
annotation_file |
The input COCO annotation file |
string |
Filter Config#
The filter configuration (filter
) specifies how to slice the annotation data, which can be done in one of four modes:
random
: Randomly split the annotation file into N partitions or sample the annotation file by a certain percentagecategory
: Filter annotation labels by the desired categoriesnumber
: Pick N samples in order from the annotationsfilename
: Filter the annotations by their file names
Field |
Description |
Data Type and Constraints |
Recommended/Typical Value |
mode |
The filter mode (“random”, “category”, “number”, “filename”) |
string |
“COCO” |
dump_remainder |
A flag specifying whether to dump the remainder annotations. This
parameter only applies when the |
bool |
|
split |
The integer number of splits or the float sampling percentage (in “random” mode) |
float or integer |
|
num_samples |
The number of annotations to keep (in “number” mode) |
integer |
|
included_categories |
Categories to keep (in “category” mode) |
list |
|
excluded_categories |
Categories to exclude (in “category” mode) |
list |
|
re_patterns |
List of file name patterns to match (in “filename” mode) |
list |
Running the Annotation Slicer#
The annotation slicing service can be invoked from the TAO Launcher using the following convention on the command-line:
DS_SLICE_JOB_ID=$(tao-client annotations dataset-run-action --action annotations_slice --id $DATASET_ID --specs "$SLICE_SPECS")
tao dataset annotations slice [-h] -e <experiment spec>
[results_dir=<results_dir>]
Required Arguments
-e, --experiment_spec_file
: The experiment specification file
Optional Arguments
-h, --help
: Show this help message and exit.
Example
The following is an example of using the slice
command in Data Services:
tao dataset annotations slice -e /path/to/spec.yaml
Configuring a Spec file for Annotation Merge#
The following is a sample spec file for merging COCO annotation files.
It has two key components–data
and filter
–as well
as a global parameter (results_dir
), all of which are described below.
Use the following command to get an experiment spec file for annotations merging:
MERGE_SPECS=$(tao-client annotations get-spec --action annotation_merge --job_type dataset --id $DATASET_ID)
data:
format: "COCO"
annotations:
- /datasets/part_0.json
- /datasets/part_1.json
- /datasets/part_2.json
- /datasets/part_3.json
- /datasets/part_4.json
results_dir: /output/dir/
Field |
Description |
Data Type and Constraints |
Recommended/Typical Value |
results_dir |
The directory to save the output annotation file and logs |
string |
– |
data |
The dataset config |
Dict |
– |
Data Config#
The data configuration (data
) specifies the input format and annotatioin file.
Field |
Description |
Data Type and Constraints |
Recommended/Typical Value |
format |
The configuration format. Currently, only “COCO” is supported. |
string |
“COCO” |
annotations |
A list of COCO annotation files |
string |
Note
All COCO annotation files must share the same categories
.
Running the Annotation Merge#
The annotation merging service can be invoked from the TAO Launcher using the following convention on the command-line:
DS_MERGE_JOB_ID=$(tao-client annotations dataset-run-action --action annotations_merge --id $DATASET_ID --specs "$MERGE_SPECS")
tao dataset annotations merge [-h] -e <experiment spec>
[results_dir=<results_dir>]
Required Arguments
-e, --experiment_spec_file
: The experiment specification file
Optional Arguments
-h, --help
: Show this help message and exit.
Example
Here’s an example of using the merge
command in Data Services:
tao dataset annotations merge -e /path/to/spec.yaml