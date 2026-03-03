Data Analytics#

Note Data Analytics is currently only designed for object-detection datasets using KITTI or COCO format.

The Data Analytics service analyzes object-detection annotation files and image files, calculates insights, and generates graphs and a summary. This service supports the following tasks:

analyze - This task analyzes the input files and generates graphs for calculated statistics. It can also generate the images with bounding boxes.

validate - This task validates the input files by calculating the invalid coordinates and suggesting whether data needs to be revised.

kpi_analyze - This task calculates the accuracy and average precision (AP) for a given test set.

These tasks can be invoked from the TAO Launcher using the following convention on the command-line:

tao dataset analytics <sub_task> <args_per_subtask>

Where args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.

Data input for Data Analytics# Data Analytics expects a directory of images and a directory of annotated KITTI text files or a COCO JSON file. Refer to the Data Annotation Format KITTI and COCO sections for more information about the data formats.

Creating Experiment Spec File for the Analyze Task# Here is an example spec file for analyzing KITTI input data. data: input_format: "KITTI" output_dir: /path/to/results_dir/ image_dir: /path/to/images_dir/ ann_path: /path/to/annotation_dir/ workers: 36 image: generate_image_with_bounding_box: False image_sample_size: 100 graph: generate_summary_and_graph: True height: 15 width: 15 show_all: False wandb: visualize: False project: "tao data analytics" Parameter Data Type Default Description data dict config – The configuration for the dataset workers int – The number of worker processes for data loading image dict config – The configuration for the image generation graph dict config – The configuration for the generated graphs wandb dict config – The configuration for the wandb data# Parameter Datatype Default Description Supported Values input_format string The format of the input annotation files KITTI, COCO output_dir string The path to the output result directory image_dir string The path to the input image directory ann_path string The path to the annotation directory image# Parameter Datatype Default Description sample_size int 100 The image sample size to visualize generate_image_with_bounding_box bool False A flag specifying whether to generate images with rendered bounding boxes graph# Parameter Datatype Default Description generate_summary_and_graph bool True Generate graphs and summary for the calculated statistics height int 15 The height of the graphs (Not Applicable for wandb visualization) width int 15 The width of the graphs (Not Applicable for wandb visualization) show_all bool False A flag specifying whether to put all the data on graph or (True) visualize only the top 100 candidates (False) wandb# Parameter Datatype Default Description project string The name of the project that the experiment data is uploaded to entity string The name of the entity (group) under which the project is created. name string The name of the experiment notes string A short description of the experiment tags list A list of strings that can be used to tag the experiment visualize bool False A flag specifying whether to enable the visualization over wandb

Running Analyze task on the Data# Use the following command to analyze the data: FTMS Client DS_ANALYZE_JOB_ID = $( tao-client analytics dataset-run-action --action analyze --id $DATASET_ID --specs " $SPECS " ) TAO Launcher tao dataset analytics analyze -e <experiment_spec> Required Arguments -e, --experiment_spec_file : The experiment spec file to set up the analyze experiment Here’s an example of using the data analyze command: tao dataset analytics analyze -e $DEFAULT_SPEC Result# The result directory or wandb contains the generated images with bounding boxes and graph PDFs for the below attributes. Bounding box area

Object count

Occlusion(only for kitti input)

Truncation(only for kitti input)

Image size

Invalid bounding box coordinates (contains information about inverted and out-of-bounds coordinates)

Creating an Experiment Spec File for the KPI Analyze Task# Here is an example spec file for calculating the KPI accuracy and average precision (AP) using KITTI data. data: input_format: KITTI output_dir: /path/to/result_dir/ kpi_sources: - image_dir: /path/to/raw_images_dir/ ground_truth_ann_path: /path/to/gt_annotation_dir/ inference_ann_path: /path/to/infer_annotation_dir/ mapping: /path/to/mapping_json/ visualize: platform: wandb kpi: iou_threshold: 0 .5 filter: False num_recall_points: 11 conf_threshold: 0 .3 ignore_sqwidth: 40 wandb: visualize: True project: kpi_calculation Parameter Data Type Default Description data dict config – The configuration for the dataset visualize dict config – The configuration for visualization kpi dict config – The configuration for KPI calculation wandb dict config – The configuration for the WandB data# Parameter Datatype Default Description Supported Values input_format string The format of the input annotation files KITTI, COCO output_dir string The path to the output result directory image_dir string The path to the input image directory ann_path string The path to the annotation directory mapping string The path to the JSON file for class mapping kpi_sources

dict

-

A list of dictionaries for the KPI sequences. The required values are :code: image_dir , :code: ground_truth_ann_path , :code: inference_ann_path



visualize# Parameter Datatype Default Description Supported Values platform string local The location of the visualization local, wandb tag string The tag to be added to the final metric table kpi# Parameter Datatype Default Description Supported Values iou_threshold float 0.5 The IoU threshold for matching bounding boxes >=0, <=1 filter bool False A flag specifying whether to filter bounding boxes smaller than ignore_sqwidth ignore_sqwidth

int

0

Bounding boxes with area smaller than ignore_sqwidth x ignore_sqwidth will be filtered (if filter is set to True). >=0

num_recall_points int 11 The number of recall points to use for plotting the Precision Recall Curve >0 conf_threshold float 0.5 The confidence threshold for filtering predictions >=0, <=1 wandb# Parameter Datatype Default Description project string The name of the project that the experiment data is uploaded to entity string The name of the entity (group) under which the project is created name string The name of the experiment notes string A short description of the experiment tags list A list of strings that can be used to tag the experiment visualize bool False A flag specifying whether to enable the visualization over WandB