Data Analytics

NVIDIA TAO Release 4.0.1
Note

Data Analytics is currently only designed for object-detection datasets using KITTI or COCO format.

The Data Analytics service analyzes object-detection annotation files and image files, calculates insights, and generates graphs and a summary. This service supports the following tasks:

  • analyze - This task analyzes the input files and generates graphs for calculated statistics. It can also generate the images with bounding boxes.

  • validate - This task validates the input files by calculating the invalid coordinates and suggesting whether data needs to be revised.

These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:

Copy
Copied!
            

tao dataset analytics <sub_task> <args_per_subtask>

Where args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.

Data Analytics expects a directory of images and a directory of annotated KITTI text files or a COCO JSON file.

Refer to the Data Annotation Format KITTI and COCO sections for more information about the data formats.

Here is an example spec file for analyzing KITTI input data.

Copy
Copied!
            

data: input_format: "KITTI" output_dir: /path/to/results_dir/ image_dir: /path/to/images_dir/ ann_path: /path/to/annotation_dir/ workers: 36 image: generate_image_with_bounding_box: False image_sample_size: 100 graph: generate_summary_and_graph: True height: 15 width: 15 show_all: False wandb: visualize: False project: "tao data analytics"

Parameter

Data Type

Default

Description

data

dict config

The configuration for the dataset

workers

int

The number of worker processes for data loading

image

dict config

The configuration for the image generation

graph

dict config

The configuration for the generated graphs

wandb

dict config

The configuration for the wandb

data

Parameter

Datatype

Default

Description

Supported Values

input_format

string

The format of the input annotation files

KITTI, COCO

output_dir

string

The path to the output result directory

image_dir

string

The path to the input image directory

ann_path

string

The path to the annotation directory

image

Parameter

Datatype

Default

Description

sample_size

int

100

The image sample size to visualize

generate_image_with_bounding_box

bool

False

A flag specifying whether to generate images with rendered bounding boxes

graph

Parameter

Datatype

Default

Description

generate_summary_and_graph

bool

True

Generate graphs and summary for the calculated statistics

height

int

15

The height of the graphs (Not Applicable for wandb visualization)

width

int

15

The width of the graphs (Not Applicable for wandb visualization)

show_all

bool

False

A flag specifying whether to put all the data on graph or (True) visualize only the top 100 candidates (False)

wandb

Parameter

Datatype

Default

Description

project

string

The name of the project that the experiment data is uploaded to

entity

string

The name of the entity (group) under which the project is created.

name

string

The name of the experiment

notes

string

A short description of the experiment

tags

list

A list of strings that can be used to tag the experiment

visualize

bool

False

A flag specifying whether to enable the visualization over wandb

Here is an example spec file for validating COCO input data.

Copy
Copied!
            

apply_correction: True data: output_dir: /path/to/result_dir/ input_format: "COCO" image_dir: /path/to/images_dir/ ann_path: /path/to/annotation_dir/ workers: 36

Parameter

Data Type

Default

Description

data

dict config

The configuration for the dataset

workers

int

The number of worker processes for data loading

apply_correction

bool

False

A flag specifying whether to apply data correction

data

Parameter

Datatype

Default

Description

Supported Values

input_format

string

The format of the input annotation files

KITTI, COCO

output_dir

string

The path to the output results directory

image_dir

string

The path to the input image directory

ann_path

string

The path to the annotation directory

Use the following command to analyze the data:

Copy
Copied!
            

tao dataset analytics analyze -e <experiment_spec>

Required Arguments

  • -e, --experiment_spec_file: The experiment spec file to set up the analyze experiment

Here’s an example of using the data analyze command:

Copy
Copied!
            

tao dataset analytics analyze -e $DEFAULT_SPEC


Result

The result directory or wandb contains the generated images with bounding boxes and graph PDFs for the below attributes.

  • Bounding box area

  • Object count

  • Occlusion(only for kitti input)

  • Truncation(only for kitti input)

  • Image size

  • Invalid bounding box coordinates (contains information about inverted and out-of-bounds coordinates)

Use the following command to validate the data:

Copy
Copied!
            

tao dataset analytics validate -e <experiment_spec>

Required Arguments

  • -e, --experiment_spec_file: The experiment spec file to set up the validate experiment

Here’s an example of using the data validate command:

Copy
Copied!
            

tao dataset analytics validate -e $DEFAULT_SPEC


Result

The console output contains the validation summary. The results directory contains the corrected input files if apply_correction=True is specified. Below are the correction conditions for bounding box coordinates.

  • Set negative coordinates to 0.

  • Swap the inverted coordinates.

  • If xmax is greater than image_width, then set xmax = image_width.

  • If ymax is greater than image_height, then set ymax = image_height.

© Copyright 2023, NVIDIA.. Last updated on Jul 27, 2023.