Dataset Info
To work with TAO API, the datasets need to be part of cloud storage. For each model that is available via TAO API, below is the required dataset folder structure to which you can tar and upload to your cloud storage.
Example dataset preparation as Python functions are exposed in the dataset_prepare folder of the downloaded notebooks from TAO API.
Auto Labeling
Models: mal
Coco dataset is used as an example in the notebook provided.
API Dataset Type: instance_segmentation
API Dataset Format: coco
API Dataset Accepted Intents: training, evaluation
You must provide one tar file for train and one for val.
DATA_DIR
├── annotations.json
├── images
├── image_name_1.jpg
├── image_name_2.jpg
├── ...
Classification
Models: classification_tf2
, classification_pyt
API Dataset Type: image_classification
API Dataset Format:
classification_tf2
format forclassification_tf2
modelclassification_pyt
format forclassification_pyt
model
You provide three separate datasets, one each for train, val, and test.
API Dataset Accepted Intents: training, evaluation, testing
Notes on the dataset:
Each class name folder must contain the images corresponding to that class.
Same class name folders must be present across images_test, images_train, and images_val.
classes.txt is a file that contains the names of all classes (each name in a separate line).
classes.json is a JSON file where the key is classname value is integer. For example, VOC:
{"aeroplane": 0, "bicycle": 1, "bird": 2, "boat": 3, "bottle": 4, "bus": 5, "car": 6, "cat": 7, "chair": 8, "cow": 9, "diningtable": 10, "dog": 11, "horse": 12, "motorbike": 13, "person": 14, "pottedplant": 15, "sheep": 16, "sofa": 17, "train": 18, "tvmonitor": 19}
DATA_DIR_TEST - testing intent dataset
├── classes.txt
├── classmap.json
├── images_test
├── class_name_1
│ ├── image_name_1.jpg
│ ├── image_name_2.jpg
│ ├── ...
| ...
└── class_name_n
├── image_name_3.jpg
├── image_name_4.jpg
├── ...
DATA_DIR_TRAIN - training intent dataset
├── classes.txt
├── classmap.json
├── images_train
├── class_name_1
│ ├── image_name_5.jpg
│ ├── image_name_6.jpg
| ...
└── class_name_n
├── image_name_7.jpg
├── image_name_8.jpg
├── ...
DATA_DIR_VAL - eval intent dataset
├── classes.txt
├── classmap.json
└── images_val
├── class_name_1
│ ├── image_name_9.jpg
│ ├── image_name_10.jpg
│ ├── ...
| ...
└── class_name_n
├── image_name_11.jpg
├── image_name_12.jpg
├── ...
Object Detection
Models: deformable_detr
, dino
, efficientdet_tf2
API Dataset Type: object_detection
API Dataset Format: coco
API Dataset Accepted Intents: training, evaluation, testing
The Coco dataset format is used for the object detection models.
The same format applies for all three dataset intents: train, test, val.
Provide three separate datasets, one each for train, val, and test.
DATA_DIR
├── annotations.json
├── images
├── image_name_1.jpg
├── image_name_2.jpg
├── ...
Purpose Built Models
Action Recognition
We use the HMDB51 dataset for the tutorial. After downloading, we preprocess the dataset to arrive in the format described below. The preprocessing snippets/workflow is present in the downloaded notebooks under dataset_prepare/purpose_built_models.ipynb. Any action name must exist both in train and test. The sub-directories names can be different.
API Dataset Type: action_recognition
API Dataset Format: default
API Dataset Accepted Intents: training
Provide only one tar file. Do not provide a tar file per intent.
DATA_DIR
├── train
├── action_name_1
├── subdirectories with images
├── action_name_2
├── subdirectories with images
├── test
├── action_name_1
├── subdirectories with images
├── action_name_2
├── subdirectories with images
MLRecog
We use the Retail Product Checkout Dataset.
After downloading, we preprocess the dataset to arrive in the format described below. The preprocessing snippets/workflow is present in the downloaded notebooks under dataset_prepare/purpose_built_models.ipynb.
The sub-directories names and the image names within them across reference
, train
, test
, and val
must be the same.
API Dataset Type: ml_recog
API Dataset Format: default
API Dataset Accepted Intents: training
Provide only one tar file. Do not provide a tar file per intent.
DATA_DIR
├── metric_learning_recognition
├── retail-product-checkout-dataset_classification_demo
├── known_classes
├── reference
├── subdir1 with images
├── train
├── subdir1 with images
├── test
├── subdir1 with images
├── val
├── subdir1 with images
├── unknown_classes
├── reference
├── subdir2 with images
├── train
├── subdir2 with images
├── test
├── subdir2 with images
├── val
├── subdir2 with images
OCDNET
We use the ICDAR2015 dataset. Access the dataset from Task 4.1: Text Localization.
The name of files in gt folder must be prefixed with gt_
.
API Dataset Type: ocdnet
API Dataset Format: default
API Dataset Accepted Intents: training, evaluation
You must provide one tar file for train and one for val.
DATA_DIR_TRAIN - training intent dataset
├── train
├── img
| ├── img_1.jpg
├── gt
├── gt_img_1.txt
DATA_DIR_TEST - eval/testing intent dataset
├── test
├── img
| ├── img_2.jpg
├── gt
├── gt_img_2.txt
OCRNET
We use ICDAR15 word recognition dataset. For more details, see Incidental Scene Text. Download the ICDAR15 word recognition train dataset and test_dataset from downloads. After downloading, we preprocess the dataset to arrive at the format described below. The preprocessing snippets/workflow is present in the downloaded notebooks under dataset_prepare/purpose_built_models.ipynb.
API Dataset Type: ocrnet
API Dataset Format: default
API Dataset Accepted Intents: training, evaluation
You must provide one tar file for train and one for val.
DATA_DIR_TRAIN - training intent dataset
├── character_list
├── train
├── coords.txt
├── gt.txt
├── gt_new.txt
├── word1.png
DATA_DIR_TEST - eval/testing intent dataset
├── character_list
├── test
├── Challenge4_Test_Task3_GT.txt
├── coords.txt
├── gt_new.txt
├── word2.png
Pointpillars
We use Kitti object detection dataset for this example. For more details, see KITTI Vision Benchmark Suite. After downloading, we preprocess the dataset to arrive at the format described below. The preprocessing snippets/workflow is present in the downloaded notebooks under dataset_prepare/purpose_built_models.ipynb.
API Dataset Type: pointpillars
API Dataset Format: default
API Dataset Accepted Intents: training
The name of files across label
and lidar
folders must match.
A single dataset tar with both train and val folders is required. You must provide only one tar file, and not a tar file per intent.
DATA_DIR
├── train
│ ├── label
| | ├── img1.txt
│ ├── lidar
| ├── img1.bin
├── val
├── label
| ├── img2.txt
├── lidar
├── img2.bin
Pose Classification
We use the Kinetics dataset from Deepmind or an NVIDIA created dataset from a Google drive.
API Dataset Type: pose_classification
API Dataset Format: default
API Dataset Accepted Intents: training
Provide only one tar file. Do not provide a tar file per intent.
DATA_DIR
├── kinetics/nvidia
├── train_data.npy
├── train_label.pkl
├── val_data.npy
├── val_label.pkl
Re-Identification
We use the Market-1501 dataset. Download the dataset from shared Google drive .
API Dataset Type: re_identification
API Dataset Format: default
API Dataset Accepted Intents: training
Provide only one tar file. Do not provide a tar file per intent.
DATA_DIR
├── sample_train
├── sample_test
├── sample_query
Optical Inspection
Bring your own dataset according to the format described for TAO.
API Dataset Type: optical_inspection
API Dataset Format: default
API Dataset Accepted Intents: training, evaluation, testing
Provide three separate datasets, one each for train, val, and test.
DATA_DIR
├── dataset.csv
├── images
Visual ChangeNet-Classification
Bring your own dataset according to the format described for TAO.
API Dataset Type: visual_changenet
API Dataset Format: visual_changenet_classify
API Dataset Accepted Intents: training, evaluation, testing
Provide three separate datasets, one each for train, val, and test.
DATA_DIR
├── dataset.csv
├── images
Visual ChangeNet-Segmentation
Bring your own dataset according to the format described for TAO.
API Dataset Type: visual_changenet
API Dataset Format: visual_changenet_segment
API Dataset Accepted Intents: training
Provide only one tar file. Do not provide a tar file per intent.
DATA_DIR
|── A
│ ├── image1.jpg
│ ├── image2.jpg
|── B
│ ├── image1.jpg
│ ├── image2.jpg
|── label
│ ├── image1.jpg
│ ├── image2.jpg
|── list
├── train.txt
├── val.txt
├── test.txt
├── predict.txt
CenterPose
We use Google Objectron dataset.
API Dataset Type: centerpose
API Dataset Format: default
API Dataset Accepted Intents: training
Provide only one tar file. Do not provide a tar file per intent.
Each image must have a corresponding JSON.
DATA_DIR
├── train
│ ├── image1.jpg
│ ├── image1.json
│ ├── image2.jpg
│ ├── image2.json
├── val
│ ├── image3.jpg
│ ├── image3.json
│ ├── image4.jpg
│ ├── image4.json
├── test
├── image5.jpg
├── image5.json
├── image6.jpg
├── image6.json
Segmentation
Models: segformer
We use ISBI dataset for Segformer. Access the open source repo to download the data.
API Dataset Type: semantic_segmentation
API Dataset Format: unet
API Dataset Accepted Intents: training, evaluation
Provide one tar file for train and one for val.
The filename must match for images
and masks
.
DATA_DIR
├── images
│ ├── test
│ │ ├── image_0.png
│ │ ├── image_1.png
| | ├── ...
│ ├── train
│ │ ├── image_2.png
│ │ ├── image_3.png
| | ├── ...
│ └── val
│ ├── image_4.png
│ ├── image_5.png
| ├── ...
├── masks
├── train
│ ├── image_2.png
│ ├── image_3.png
| ├── ...
└── val
├── image_4.png
├── image_5.png
├── ...