Dataset Info

To work with TAO API, the datasets need to be part of cloud storage. For each model that is available via TAO API, below is the required dataset folder structure to which you can tar and upload to your cloud storage.

Example dataset preparation as Python functions are exposed in the dataset_prepare folder of the downloaded notebooks from TAO API.

Auto Labeling

Models: mal

Coco dataset is used as an example in the notebook provided.

API Dataset Type: instance_segmentation API Dataset Format: coco API Dataset Accepted Intents: training, evaluation

You must provide one tar file for train and one for val.

DATA_DIR
├── annotations.json
├── images
    ├── image_name_1.jpg
    ├── image_name_2.jpg
    ├── ...

Classification

Models: classification_tf2, classification_pyt

API Dataset Type: image_classification

API Dataset Format:

  • classification_tf2 format for classification_tf2 model

  • classification_pyt format for classification_pyt model

You provide three separate datasets, one each for train, val, and test.

API Dataset Accepted Intents: training, evaluation, testing

Notes on the dataset:

  • Each class name folder must contain the images corresponding to that class.

  • Same class name folders must be present across images_test, images_train, and images_val.

  • classes.txt is a file that contains the names of all classes (each name in a separate line).

  • classes.json is a JSON file where the key is classname value is integer. For example, VOC:

{"aeroplane": 0, "bicycle": 1, "bird": 2, "boat": 3, "bottle": 4, "bus": 5, "car": 6, "cat": 7, "chair": 8, "cow": 9, "diningtable": 10, "dog": 11, "horse": 12, "motorbike": 13, "person": 14, "pottedplant": 15, "sheep": 16, "sofa": 17, "train": 18, "tvmonitor": 19}
DATA_DIR_TEST - testing intent dataset
├── classes.txt
├── classmap.json
├── images_test
    ├── class_name_1
    │   ├── image_name_1.jpg
    │   ├── image_name_2.jpg
    │   ├── ...
    |   ...
    └── class_name_n
        ├── image_name_3.jpg
        ├── image_name_4.jpg
        ├── ...

DATA_DIR_TRAIN - training intent dataset
├── classes.txt
├── classmap.json
├── images_train
    ├── class_name_1
    │   ├── image_name_5.jpg
    │   ├── image_name_6.jpg
    |   ...
    └── class_name_n
        ├── image_name_7.jpg
        ├── image_name_8.jpg
        ├── ...


DATA_DIR_VAL - eval intent dataset
├── classes.txt
├── classmap.json
└── images_val
    ├── class_name_1
    │   ├── image_name_9.jpg
    │   ├── image_name_10.jpg
    │   ├── ...
    |   ...
    └── class_name_n
        ├── image_name_11.jpg
        ├── image_name_12.jpg
        ├── ...

Object Detection

Models: deformable_detr, dino, efficientdet_tf2

API Dataset Type: object_detection API Dataset Format: coco API Dataset Accepted Intents: training, evaluation, testing

The Coco dataset format is used for the object detection models.

The same format applies for all three dataset intents: train, test, val.

Provide three separate datasets, one each for train, val, and test.

DATA_DIR
├── annotations.json
├── images
    ├── image_name_1.jpg
    ├── image_name_2.jpg
    ├── ...

Purpose Built Models

Action Recognition

We use the HMDB51 dataset for the tutorial. After downloading, we preprocess the dataset to arrive in the format described below. The preprocessing snippets/workflow is present in the downloaded notebooks under dataset_prepare/purpose_built_models.ipynb. Any action name must exist both in train and test. The sub-directories names can be different.

API Dataset Type: action_recognition API Dataset Format: default API Dataset Accepted Intents: training

Provide only one tar file. Do not provide a tar file per intent.

DATA_DIR
├── train
    ├── action_name_1
        ├── subdirectories with images
    ├── action_name_2
        ├── subdirectories with images
├── test
    ├── action_name_1
        ├── subdirectories with images
    ├── action_name_2
        ├── subdirectories with images

MLRecog

We use the Retail Product Checkout Dataset. After downloading, we preprocess the dataset to arrive in the format described below. The preprocessing snippets/workflow is present in the downloaded notebooks under dataset_prepare/purpose_built_models.ipynb. The sub-directories names and the image names within them across reference, train, test, and val must be the same.

API Dataset Type: ml_recog API Dataset Format: default API Dataset Accepted Intents: training

Provide only one tar file. Do not provide a tar file per intent.

DATA_DIR
├── metric_learning_recognition
    ├── retail-product-checkout-dataset_classification_demo
        ├── known_classes
            ├── reference
                ├── subdir1 with images
            ├── train
                ├── subdir1 with images
            ├── test
                ├── subdir1 with images
            ├── val
                ├── subdir1 with images
        ├── unknown_classes
            ├── reference
                ├── subdir2 with images
            ├── train
                ├── subdir2 with images
            ├── test
                ├── subdir2 with images
            ├── val
                ├── subdir2 with images

OCDNET

We use the ICDAR2015 dataset. Access the dataset from Task 4.1: Text Localization. The name of files in gt folder must be prefixed with gt_.

API Dataset Type: ocdnet

API Dataset Format: default

API Dataset Accepted Intents: training, evaluation

You must provide one tar file for train and one for val.

DATA_DIR_TRAIN - training intent dataset
├── train
    ├── img
    |   ├── img_1.jpg
    ├── gt
        ├── gt_img_1.txt

DATA_DIR_TEST - eval/testing intent dataset
├── test
    ├── img
    |   ├── img_2.jpg
    ├── gt
        ├── gt_img_2.txt

OCRNET

We use ICDAR15 word recognition dataset. For more details, see Incidental Scene Text. Download the ICDAR15 word recognition train dataset and test_dataset from downloads. After downloading, we preprocess the dataset to arrive at the format described below. The preprocessing snippets/workflow is present in the downloaded notebooks under dataset_prepare/purpose_built_models.ipynb.

API Dataset Type: ocrnet

API Dataset Format: default

API Dataset Accepted Intents: training, evaluation

You must provide one tar file for train and one for val.

DATA_DIR_TRAIN - training intent dataset
├── character_list
├── train
    ├── coords.txt
    ├── gt.txt
    ├── gt_new.txt
    ├── word1.png

DATA_DIR_TEST - eval/testing intent dataset
├── character_list
├── test
    ├── Challenge4_Test_Task3_GT.txt
    ├── coords.txt
    ├── gt_new.txt
    ├── word2.png

Pointpillars

We use Kitti object detection dataset for this example. For more details, see KITTI Vision Benchmark Suite. After downloading, we preprocess the dataset to arrive at the format described below. The preprocessing snippets/workflow is present in the downloaded notebooks under dataset_prepare/purpose_built_models.ipynb.

API Dataset Type: pointpillars API Dataset Format: default API Dataset Accepted Intents: training

The name of files across label and lidar folders must match.

A single dataset tar with both train and val folders is required. You must provide only one tar file, and not a tar file per intent.

DATA_DIR
├── train
│   ├── label
|   |   ├── img1.txt
│   ├── lidar
|       ├── img1.bin
├── val
    ├── label
    |   ├── img2.txt
    ├── lidar
        ├── img2.bin

Pose Classification

We use the Kinetics dataset from Deepmind or an NVIDIA created dataset from a Google drive.

API Dataset Type: pose_classification API Dataset Format: default API Dataset Accepted Intents: training

Provide only one tar file. Do not provide a tar file per intent.

DATA_DIR
├── kinetics/nvidia
    ├── train_data.npy
    ├── train_label.pkl
    ├── val_data.npy
    ├── val_label.pkl

Re-Identification

We use the Market-1501 dataset. Download the dataset from shared Google drive .

API Dataset Type: re_identification API Dataset Format: default API Dataset Accepted Intents: training

Provide only one tar file. Do not provide a tar file per intent.

DATA_DIR
├── sample_train
├── sample_test
├── sample_query

Optical Inspection

Bring your own dataset according to the format described for TAO.

API Dataset Type: optical_inspection API Dataset Format: default API Dataset Accepted Intents: training, evaluation, testing

Provide three separate datasets, one each for train, val, and test.

DATA_DIR
├── dataset.csv
├── images

Visual ChangeNet-Classification

Bring your own dataset according to the format described for TAO.

API Dataset Type: visual_changenet API Dataset Format: visual_changenet_classify API Dataset Accepted Intents: training, evaluation, testing

Provide three separate datasets, one each for train, val, and test.

DATA_DIR
├── dataset.csv
├── images

Visual ChangeNet-Segmentation

Bring your own dataset according to the format described for TAO.

API Dataset Type: visual_changenet API Dataset Format: visual_changenet_segment API Dataset Accepted Intents: training

Provide only one tar file. Do not provide a tar file per intent.

DATA_DIR
|── A
│   ├── image1.jpg
│   ├── image2.jpg
|── B
│   ├── image1.jpg
│   ├── image2.jpg
|── label
│   ├── image1.jpg
│   ├── image2.jpg
|── list
    ├── train.txt
    ├── val.txt
    ├── test.txt
    ├── predict.txt

CenterPose

We use Google Objectron dataset.

API Dataset Type: centerpose API Dataset Format: default API Dataset Accepted Intents: training

Provide only one tar file. Do not provide a tar file per intent.

Each image must have a corresponding JSON.

DATA_DIR
├── train
│   ├── image1.jpg
│   ├── image1.json
│   ├── image2.jpg
│   ├── image2.json
├── val
│   ├── image3.jpg
│   ├── image3.json
│   ├── image4.jpg
│   ├── image4.json
├── test
    ├── image5.jpg
    ├── image5.json
    ├── image6.jpg
    ├── image6.json

Segmentation

Models: segformer

We use ISBI dataset for Segformer. Access the open source repo to download the data.

API Dataset Type: semantic_segmentation API Dataset Format: unet API Dataset Accepted Intents: training, evaluation

Provide one tar file for train and one for val.

The filename must match for images and masks.

DATA_DIR
├── images
│   ├── test
│   │   ├── image_0.png
│   │   ├── image_1.png
|   |   ├── ...
│   ├── train
│   │   ├── image_2.png
│   │   ├── image_3.png
|   |   ├── ...
│   └── val
│       ├── image_4.png
│       ├── image_5.png
|       ├── ...
├── masks
    ├── train
    │   ├── image_2.png
    │   ├── image_3.png
    |   ├── ...
    └── val
        ├── image_4.png
        ├── image_5.png
        ├── ...