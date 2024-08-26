Annotations
The annotation service, which is part of TAO Data Services, offers tools for users to easily manipulate groundtruth labels. tao dataset annotations supports the following tasks:
convert
slice
merge
These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:
tao dataset annotations <sub_task> <args_per_subtask>
Where
args_per_subtask are the command-line arguments required for a given subtask. Each
subtask is explained in the following sections.
The following is a sample spec file for converting a COCO dataset to KITTI format.
It has three key components–
data,
kitti, and
coco–as well
as a global parameter, all of which are described below.
data:
input_format: "COCO"
output_format: "KITTI"
output_dir: "/workspace/output"
kitti:
image_dir: "/workspace/kitti/images"
label_dir: "/workspace/kitti/labels"
mapping: "/workspace/kitti_mapping.json"
coco:
ann_file: "/workspace/coco.json"
results_dir: "/path/to/results"
|Field
|Description
|Data Type and Constraints
|Recommended/Typical Value
|results_dir
|The directory to save the annotation-conversion log to
|string
|–
|data
|The dataset config
|Dict
|–
|kitti
|The KITTI config
|Dict
|–
|coco
|The COCO config
|Dict
|–
Data Config
The data configuration (
data) specifies the source and target formats of the label conversion, as well as the output path.
|Field
|Description
|Data Type and Constraints
|Recommended/Typical Value
|input_format
|The input data format (either “KITTI” or “COCO”)
|string
|output_format
|The output data format (either “KITTI” or “COCO”)
|string
|output_dir
|The path to save the converted annotations
|string
KITTI Config
The KITTI configuration (
kitti) specifies the KITTI dataset information.
|Field
|Description
|Data Type and Constraints
|Recommended/Typical Value
|image_dir
|The image directory
|string
|label_dir
|THe label directory
|string
|project
|The project name, which is used as the
scene_id when
converting to COCO format. The default value is the parent
directory name of the
image_dir.
|string
|mapping
|A YAML file specifying the category mappings. If this value is not
not specified, all categories in the
label_dir will be used).
|string
Here is an example of a category mapping file:
- person:
- person
- Person
- person_group
- rider
- bag:
- hand_bag
- backpack
- personal_bag
- face:
- face
COCO Config
The COCO configuration (
coco) specifies the COCO annotation file location.
|Field
|Description
|Data Type and Constraints
|Recommended/Typical Value
|ann_file
|The annotation file
|string
The annotation conversion service can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:
tao dataset annotations convert [-h] -e <experiment spec>
[-r <results_dir>]
Required Arguments
-e, --experiment_spec_file: The experiment specification file
Optional Arguments
-h, --help: Show this help message and exit.
Here’s an example of using the
convert command in Data Services:
Example
tao dataset annotations convert -e /path/to/spec.yaml
The following is a sample spec file for slicing a COCO annotation file.
It has two key components–
data and
filter–as well
as a global parameter (
results_dir), all of which are described below.
data:
annotation_file: /datasets/coco/annotations/instances_val2017.json
filter:
mode: "category" # random, number
num_samples: 10
split: 5
excluded_categories:
- person
results_dir: /output/dir/
|Field
|Description
|Data Type and Constraints
|Recommended/Typical Value
|results_dir
|The directory to save the output annotation files and logs to
|string
|–
|data
|The dataset configuration
|Dict
|–
|filter
|The filter configuration
|Dict
|–
Data Config
The dataset configuration (
data) specifies the input format and annotation file.
|Field
|Description
|Data Type and Constraints
|Recommended/Typical Value
|format
|The configuration format. Currently, only “COCO” is supported.
|string
|“COCO”
|annotation_file
|The input COCO annotation file
|string
Filter Config
The filter configuration (
filter) specifies how the annotation data will be sliced, which can be done in one of four modes:
random: Randomly split the annotation file into N partitions or sample the annotation file by a certain percentage
category: Filter annotation labels by the desired categories
number: Pick N samples in order from the annotations
filename: Filter the annotations by their file names
|Field
|Description
|Data Type and Constraints
|Recommended/Typical Value
|mode
|The filter mode (“random”, “category”, “number”, “filename”)
|string
|“COCO”
|dump_remainder
|A flag specifying whether to dump the remainder annotations. This
parameter only applies when the
mode parameter is set to
“random” and the
split parameter is a float value.
|bool
|split
|The integer number of splits or the float sampling percentage (in “random” mode)
|float or integer
|num_samples
|The number of annotations to keep (in “number” mode)
|integer
|included_categories
|Categories to keep (in “category” mode)
|list
|excluded_categories
|Categories to exclude (in “category” mode)
|list
|re_patterns
|List of file name patterns to match (in “filename” mode)
|list
The annotation slicing service can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:
tao dataset annotations slice [-h] -e <experiment spec>
[-r <results_dir>]
Required Arguments
-e, --experiment_spec_file: The experiment specification file
Optional Arguments
-h, --help: Show this help message and exit.
Example
Here’s an example of using the
slice command in Data Services:
tao dataset annotations slice -e /path/to/spec.yaml
The following is a sample spec file for merging COCO annotation files.
It has two key components–
data and
filter–as well
as a global parameter (
results_dir), all of which are described below.
data:
format: "COCO"
annotations:
- /datasets/part_0.json
- /datasets/part_1.json
- /datasets/part_2.json
- /datasets/part_3.json
- /datasets/part_4.json
results_dir: /output/dir/
|Field
|Description
|Data Type and Constraints
|Recommended/Typical Value
|results_dir
|The directory to save the output annotation file and logs
|string
|–
|data
|The dataset config
|Dict
|–
Data config
The data configuration (
data) specifies the input format and annotatioin file.
|Field
|Description
|Data Type and Constraints
|Recommended/Typical Value
|format
|The configuration format. Currently, only “COCO” is supported.
|string
|“COCO”
|annotations
|A list of COCO annotation files
|string
All COCO annotation files must share the same
categories.
The annotation merging service can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:
tao dataset annotations merge [-h] -e <experiment spec>
[-r <results_dir>]
Required Arguments
-e, --experiment_spec_file: The experiment specification file
Optional Arguments
-h, --help: Show this help message and exit.
Example
Here’s an example of using the
merge command in Data Services:
tao dataset annotations merge -e /path/to/spec.yaml