Data Analytics
Data Analytics is currently only designed for object-detection datasets using KITTI or COCO format.
The Data Analytics service analyzes object-detection annotation files and image files, calculates insights, and generates graphs and a summary. This service supports the following tasks:
analyze
- This task analyzes the input files and generates graphs for calculated statistics. It can also generate the images with bounding boxes.validate
- This task validates the input files by calculating the invalid coordinates and suggesting whether data needs to be revised.kpi_analyze
- This task calculates the accuracy and average precision (AP) for a given test set.
These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:
tao dataset analytics <sub_task> <args_per_subtask>
Where args_per_subtask
are the command-line arguments required for a given subtask. Each subtask is
explained in detail in the following sections.
Here is an example spec file for analyzing KITTI input data.
data:
input_format: "KITTI"
output_dir: /path/to/results_dir/
image_dir: /path/to/images_dir/
ann_path: /path/to/annotation_dir/
workers: 36
image:
generate_image_with_bounding_box: False
image_sample_size: 100
graph:
generate_summary_and_graph: True
height: 15
width: 15
show_all: False
wandb:
visualize: False
project: "tao data analytics"
Parameter | Data Type | Default | Description |
data |
dict config | – | The configuration for the dataset |
workers |
int | – | The number of worker processes for data loading |
image |
dict config | – | The configuration for the image generation |
graph |
dict config | – | The configuration for the generated graphs |
wandb |
dict config | – | The configuration for the wandb |
data
Parameter | Datatype | Default | Description | Supported Values |
input_format |
string | The format of the input annotation files | KITTI, COCO | |
output_dir |
string | The path to the output result directory | ||
image_dir |
string | The path to the input image directory | ||
ann_path |
string | The path to the annotation directory |
image
Parameter | Datatype | Default | Description |
sample_size |
int | 100 | The image sample size to visualize |
generate_image_with_bounding_box |
bool | False | A flag specifying whether to generate images with rendered bounding boxes |
graph
Parameter | Datatype | Default | Description |
generate_summary_and_graph |
bool | True | Generate graphs and summary for the calculated statistics |
height |
int | 15 | The height of the graphs (Not Applicable for wandb visualization) |
width |
int | 15 | The width of the graphs (Not Applicable for wandb visualization) |
show_all |
bool | False | A flag specifying whether to put all the data on graph or (True) visualize only the top 100 candidates (False) |
wandb
Parameter | Datatype | Default | Description |
project |
string | The name of the project that the experiment data is uploaded to | |
entity |
string | The name of the entity (group) under which the project is created. | |
name |
string | The name of the experiment | |
notes |
string | A short description of the experiment | |
tags |
list | A list of strings that can be used to tag the experiment | |
visualize |
bool | False | A flag specifying whether to enable the visualization over wandb |
Here is an example spec file for validating COCO input data.
apply_correction: True
data:
output_dir: /path/to/result_dir/
input_format: "COCO"
image_dir: /path/to/images_dir/
ann_path: /path/to/annotation_dir/
workers: 36
Parameter | Data Type | Default | Description |
data |
dict config | – | The configuration for the dataset |
workers |
int | – | The number of worker processes for data loading |
apply_correction |
bool | False | A flag specifying whether to apply data correction |
data
Parameter | Datatype | Default | Description | Supported Values |
input_format |
string | The format of the input annotation files | KITTI, COCO | |
output_dir |
string | The path to the output results directory | ||
image_dir |
string | The path to the input image directory | ||
ann_path |
string | The path to the annotation directory |
Use the following command to analyze the data:
tao dataset analytics analyze -e <experiment_spec>
Required Arguments
-e, --experiment_spec_file
: The experiment spec file to set up theanalyze
experiment
Here’s an example of using the data analyze
command:
tao dataset analytics analyze -e $DEFAULT_SPEC
Result
The result directory or wandb contains the generated images with bounding boxes and graph PDFs for the below attributes.
Bounding box area
Object count
Occlusion(only for kitti input)
Truncation(only for kitti input)
Image size
Invalid bounding box coordinates (contains information about inverted and out-of-bounds coordinates)
Use the following command to validate the data:
tao dataset analytics validate -e <experiment_spec>
Required Arguments
-e, --experiment_spec_file
: The experiment spec file to set up thevalidate
experiment
Here’s an example of using the data validate
command:
tao dataset analytics validate -e $DEFAULT_SPEC
Result
The console output contains the validation summary. The results directory contains the corrected input files if
apply_correction=True
is specified. Below are the correction conditions for bounding box coordinates.
Set negative coordinates to 0.
Swap the inverted coordinates.
If
xmax
is greater thanimage_width
, then setxmax = image_width
.If
ymax
is greater thanimage_height
, then setymax = image_height
.
Here is an example spec file for calculating the KPI accuracy and average precision (AP) using KITTI data.
data:
input_format: KITTI
output_dir: /path/to/result_dir/
kpi_sources:
- image_dir: /path/to/raw_images_dir/
ground_truth_ann_path: /path/to/gt_annotation_dir/
inference_ann_path: /path/to/infer_annotation_dir/
mapping: /path/to/mapping_json/
visualize:
platform: wandb
kpi:
iou_threshold: 0.5
filter: False
num_recall_points: 11
conf_threshold: 0.3
ignore_sqwidth: 40
wandb:
visualize: True
project: kpi_calculation
Parameter | Data Type | Default | Description |
data |
dict config | – | The configuration for the dataset |
visualize |
dict config | – | The configuration for visualization |
kpi |
dict config | – | The configuration for KPI calculation |
wandb |
dict config | – | The configuration for the WandB |
data
Parameter | Datatype | Default | Description | Supported Values |
input_format |
string | The format of the input annotation files | KITTI, COCO | |
output_dir |
string | The path to the output result directory | ||
image_dir |
string | The path to the input image directory | ||
ann_path |
string | The path to the annotation directory | ||
mapping |
string | The path to the JSON file for class mapping | ||
|
dict |
- |
A list of dictionaries for the KPI sequences. The required values are |
|
visualize
Parameter | Datatype | Default | Description | Supported Values |
platform |
string | local | The location of the visualization | local, wandb |
tag |
string | The tag to be added to the final metric table |
kpi
Parameter | Datatype | Default | Description | Supported Values |
iou_threshold |
float | 0.5 | The IoU threshold for matching bounding boxes | >=0, <=1 |
filter |
bool | False | A flag specifying whether to filter bounding boxes smaller than ignore_sqwidth |
|
|
int |
0 |
Bounding boxes with area smaller than |
>=0 |
num_recall_points |
int | 11 | The number of recall points to use for plotting the Precision Recall Curve | >0 |
conf_threshold |
float | 0.5 | The confidence threshold for filtering predictions | >=0, <=1 |
wandb
Parameter | Datatype | Default | Description |
project |
string | The name of the project that the experiment data is uploaded to | |
entity |
string | The name of the entity (group) under which the project is created | |
name |
string | The name of the experiment | |
notes |
string | A short description of the experiment | |
tags |
list | A list of strings that can be used to tag the experiment | |
visualize |
bool | False | A flag specifying whether to enable the visualization over WandB |
Use the following command to calculate KPI on the data:
tao dataset analytics kpi_analyze -e <experiment_spec>
Required Arguments
-e, --experiment_spec_file
: The experiment spec file to configure thekpi_analyze
experiment
Here’s an example of using the data kpi_analyze
command:
tao dataset analytics kpi_analyze -e $DEFAULT_SPEC
Result
The precision recall curve will be saved as an image in the output results directory (output_dir
) or displayed in
WandB.