Data Analytics

Note

Data Analytics is currently only designed for object-detection datasets using KITTI or COCO format.

The Data Analytics service analyzes object-detection annotation files and image files, calculates insights, and generates graphs and a summary. This service supports the following tasks:

  • analyze - This task analyzes the input files and generates graphs for calculated statistics. It can also generate the images with bounding boxes.

  • validate - This task validates the input files by calculating the invalid coordinates and suggesting whether data needs to be revised.

  • kpi_analyze - This task calculates the accuracy and average precision (AP) for a given test set.

These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:

Copy
Copied!
            

tao dataset analytics <sub_task> <args_per_subtask>

Where args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.

Data Analytics expects a directory of images and a directory of annotated KITTI text files or a COCO JSON file.

Refer to the Data Annotation Format KITTI and COCO sections for more information about the data formats.

Here is an example spec file for analyzing KITTI input data.

Copy
Copied!
            

data: input_format: "KITTI" output_dir: /path/to/results_dir/ image_dir: /path/to/images_dir/ ann_path: /path/to/annotation_dir/ workers: 36 image: generate_image_with_bounding_box: False image_sample_size: 100 graph: generate_summary_and_graph: True height: 15 width: 15 show_all: False wandb: visualize: False project: "tao data analytics"

Parameter Data Type Default Description
data dict config The configuration for the dataset
workers int The number of worker processes for data loading
image dict config The configuration for the image generation
graph dict config The configuration for the generated graphs
wandb dict config The configuration for the wandb

data

Parameter Datatype Default Description Supported Values
input_format string
The format of the input annotation files KITTI, COCO
output_dir string
The path to the output result directory
image_dir string
The path to the input image directory
ann_path string
The path to the annotation directory

image

Parameter Datatype Default Description
sample_size int 100 The image sample size to visualize
generate_image_with_bounding_box bool False A flag specifying whether to generate images with rendered bounding boxes

graph

Parameter Datatype Default Description
generate_summary_and_graph bool True Generate graphs and summary for the calculated statistics
height int 15 The height of the graphs (Not Applicable for wandb visualization)
width int 15 The width of the graphs (Not Applicable for wandb visualization)
show_all bool False A flag specifying whether to put all the data on graph or (True) visualize only the top 100 candidates (False)

wandb

Parameter Datatype Default Description
project string
The name of the project that the experiment data is uploaded to
entity string
The name of the entity (group) under which the project is created.
name string
The name of the experiment
notes string
A short description of the experiment
tags list
A list of strings that can be used to tag the experiment
visualize bool False A flag specifying whether to enable the visualization over wandb

Here is an example spec file for validating COCO input data.

Copy
Copied!
            

apply_correction: True data: output_dir: /path/to/result_dir/ input_format: "COCO" image_dir: /path/to/images_dir/ ann_path: /path/to/annotation_dir/ workers: 36

Parameter Data Type Default Description
data dict config The configuration for the dataset
workers int The number of worker processes for data loading
apply_correction bool False A flag specifying whether to apply data correction

data

Parameter Datatype Default Description Supported Values
input_format string
The format of the input annotation files KITTI, COCO
output_dir string
The path to the output results directory
image_dir string
The path to the input image directory
ann_path string
The path to the annotation directory

Use the following command to analyze the data:

Copy
Copied!
            

tao dataset analytics analyze -e <experiment_spec>

Required Arguments

  • -e, --experiment_spec_file: The experiment spec file to set up the analyze experiment

Here’s an example of using the data analyze command:

Copy
Copied!
            

tao dataset analytics analyze -e $DEFAULT_SPEC

Result

The result directory or wandb contains the generated images with bounding boxes and graph PDFs for the below attributes.

  • Bounding box area

  • Object count

  • Occlusion(only for kitti input)

  • Truncation(only for kitti input)

  • Image size

  • Invalid bounding box coordinates (contains information about inverted and out-of-bounds coordinates)

Use the following command to validate the data:

Copy
Copied!
            

tao dataset analytics validate -e <experiment_spec>

Required Arguments

  • -e, --experiment_spec_file: The experiment spec file to set up the validate experiment

Here’s an example of using the data validate command:

Copy
Copied!
            

tao dataset analytics validate -e $DEFAULT_SPEC

Result

The console output contains the validation summary. The results directory contains the corrected input files if apply_correction=True is specified. Below are the correction conditions for bounding box coordinates.

  • Set negative coordinates to 0.

  • Swap the inverted coordinates.

  • If xmax is greater than image_width, then set xmax = image_width.

  • If ymax is greater than image_height, then set ymax = image_height.

Here is an example spec file for calculating the KPI accuracy and average precision (AP) using KITTI data.

Copy
Copied!
            

data: input_format: KITTI output_dir: /path/to/result_dir/ kpi_sources: - image_dir: /path/to/raw_images_dir/ ground_truth_ann_path: /path/to/gt_annotation_dir/ inference_ann_path: /path/to/infer_annotation_dir/ mapping: /path/to/mapping_json/ visualize: platform: wandb kpi: iou_threshold: 0.5 filter: False num_recall_points: 11 conf_threshold: 0.3 ignore_sqwidth: 40 wandb: visualize: True project: kpi_calculation

Parameter Data Type Default Description
data dict config The configuration for the dataset
visualize dict config The configuration for visualization
kpi dict config The configuration for KPI calculation
wandb dict config The configuration for the WandB

data

Parameter Datatype Default Description Supported Values
input_format string
The format of the input annotation files KITTI, COCO
output_dir string
The path to the output result directory
image_dir string
The path to the input image directory
ann_path string
The path to the annotation directory
mapping string
The path to the JSON file for class mapping

kpi_sources

dict

-

A list of dictionaries for the KPI sequences. The required values are
:code:image_dir, :code:ground_truth_ann_path, :code:inference_ann_path

visualize

Parameter Datatype Default Description Supported Values
platform string local The location of the visualization local, wandb
tag string
The tag to be added to the final metric table

kpi

Parameter Datatype Default Description Supported Values
iou_threshold float 0.5 The IoU threshold for matching bounding boxes >=0, <=1
filter bool False A flag specifying whether to filter bounding boxes smaller than ignore_sqwidth

ignore_sqwidth

int

0

Bounding boxes with area smaller than ignore_sqwidth x ignore_sqwidth will be
filtered (if filter is set to True).

>=0

num_recall_points int 11 The number of recall points to use for plotting the Precision Recall Curve >0
conf_threshold float 0.5 The confidence threshold for filtering predictions >=0, <=1

wandb

Parameter Datatype Default Description
project string
The name of the project that the experiment data is uploaded to
entity string
The name of the entity (group) under which the project is created
name string
The name of the experiment
notes string
A short description of the experiment
tags list
A list of strings that can be used to tag the experiment
visualize bool False A flag specifying whether to enable the visualization over WandB

Use the following command to calculate KPI on the data:

Copy
Copied!
            

tao dataset analytics kpi_analyze -e <experiment_spec>

Required Arguments

  • -e, --experiment_spec_file: The experiment spec file to configure the kpi_analyze experiment

Here’s an example of using the data kpi_analyze command:

Copy
Copied!
            

tao dataset analytics kpi_analyze -e $DEFAULT_SPEC

Result

The precision recall curve will be saved as an image in the output results directory (output_dir) or displayed in WandB.

Previous Auto-Label
Next TAO Toolkit Source Code
© Copyright 2024, NVIDIA. Last updated on Mar 22, 2024.