Data Annotation Format
======================

.. _data_annotation_format:

This page describes the dataset formats for computer-vision apps supported by TLT.

Image Classification Format
---------------------------

.. _image_classification_format:

Image classification expects a directory of images with the following structure, where each class
has its own directory with the class name. The naming convention for :code:`train/val/test` can
be different because the path of each set is individually specified in the spec file. See the
:ref:`Specification File for Classification<specification_file_for_classification>` section for more
information.

.. code::

   |--dataset_root:
       |--train
           |--audi:
               |--1.jpg
               |--2.jpg
           |--bmw:
               |--01.jpg
               |--02.jpg
       |--val
           |--audi:
               |--3.jpg
               |--4.jpg
           |--bmw:
               |--03.jpg
               |--04.jpg
       |--test
           |--audi:
               |--5.jpg
               |--6.jpg
           |--bmw:
               |--05.jpg
               |--06.jpg

Object Detection -- KITTI Format
--------------------------------

.. _object_detection_kitti_format:

Using the KITTI format requires data to be organized in this structure:

.. code::

   .
   |--dataset root
     |-- images
         |-- 000000.jpg
         |-- 000001.jpg
               .
               .
         |-- xxxxxx.jpg
     |-- labels
         |-- 000000.txt
         |-- 000001.txt
               .
               .
         |-- xxxxxx.txt
     |-- kitti_seq_to_map.json

Here's a description of the structure:

* The images directory contains the images to train on.
* The labels directory contains the labels to the corresponding images. Details of this file are
  included in the :ref:`Label Files<label_files>` section.

  .. Note:: The images and labels have the same file IDs before the extension. The image to label
           correspondence is maintained using this file name.

* The :code:`kitti_seq_to_map.json` file contains a sequence to frame ID mapping for the frames in
  the images directory. This is an optional file and is useful if the data needs to be split into
  N folds sequence wise. In case the data is to be split into a random 80:20 train:val split, then
  this file may be ignored.

.. Note::For DetectNet_v2, FasterRCNN, the :code:`train` tool does not support
   training on images of multiple resolutions, or resizing images during training. All of the
   images must be resized offline to the final training size and the corresponding bounding boxes
   must be scaled accordingly. Online resizing is supported for other detection model architectures.

Label Files
^^^^^^^^^^^

.. _label_files:

A KITTI format label file is a simple text file containing one line per object. Each line has
multiple fields. Here is a description of these fields:

+------------------+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+
| **Num elements** | **Parameter name**                                 | **Description**                                                                              | **Type**             | **Range**                                                                                     | **Example**            |
+==================+====================================================+==============================================================================================+======================+===============================================================================================+========================+
| 1                | Class names                                        | The class to which the object belongs.                                                       | String               | N/A                                                                                           | Person, car, Road_Sign |
+------------------+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+
| 1                | Truncation                                         | How much of the object has left image boundaries.                                            | Float                | 0.0, 0.1                                                                                      | 0.0                    |
+------------------+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+
| 1                | Occlusion                                          | Occlusion state [ 0 = fully visible, 1 = partly visible, 2 = largely occluded, 3 = unknown]. | Integer              | [0,3]                                                                                         | 2                      |
+------------------+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+
| 1                | Alpha                                              | Observation Angle of object                                                                  | Float                | [-pi, pi]                                                                                     | 0.146                  |
+------------------+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+
| 4                | Bounding box coordinates: [xmin, ymin, xmax, ymax] | Location of the object in the image                                                          | Float(0 based index) | [0 to image width],[0 to image_height], [top_left, image_width], [bottom_right, image_height] | 100 120                |
|                  |                                                    |                                                                                              |                      |                                                                                               | 180 160                |
+------------------+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+
| 3                | 3-D dimension                                      | Height, width, length of the object (in meters)                                              | Float                | N/A                                                                                           | 1.65, 1.67, 3.64       |
+------------------+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+
| 3                | Location                                           | 3-D object location x, y, z in camera coordinates (in meters)                                | Float                | N/A                                                                                           | -0.65,1.71, 46.7       |
+------------------+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+
| 1                | Rotation_y                                         | Rotation ry around the Y-axis in camera coordinates                                          | Float                | [-pi, pi]                                                                                     | -1.59                  |
+------------------+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+

The sum of the total number of elements per object is 15. Here is a sample text file:

.. code::

   car 0.00 0 -1.58 587.01 173.33 614.12 200.12 1.65 1.67 3.64 -0.65 1.71 46.70 -1.59
   cyclist 0.00 0 -2.46 665.45 160.00 717.93 217.99 1.72 0.47 1.65 2.45 1.35 22.10 -2.35
   pedestrian 0.00 2 0.21 423.17 173.67 433.17 224.03 1.60 0.38 0.30 -5.87 1.63 23.11 -0.03

This indicates that in the image there are 3 objects with parameters as mentioned above. Currently,
for detection the toolkit only requires the class name and bbox coordinates fields to be populated.
This is because the TLT training pipe supports training only for class and bbox coordinates. The
remaining fields may be set to 0. Here is a sample file for a custom annotated dataset:

.. code::

   car 0.00 0 0.00 587.01 173.33 614.12 200.12 0.00 0.00 0.00 0.00 0.00 0.00 0.00
   cyclist 0.00 0 0.00 665.45 160.00 717.93 217.99 0.00 0.00 0.00 0.00 0.00 0.00 0.00
   pedestrian 0.00 0 0.00 423.17 173.67 433.17 224.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Sequence Mapping File
^^^^^^^^^^^^^^^^^^^^^

This is an optional JSON file that captures the mapping between the frames in the :code:`images`
directory and the names of video sequences from which these frames were extracted. This
information is needed while doing an N-fold split of the dataset. This way frames from one
sequence don't repeat in other folds and one of the folds could be used for validation. Here's
an example of the json dictionary file.

.. code::

   {
     "video_sequence_name": [list of strings(frame idx)]
   }

Here's an example of a :code:`kitti_seq_to_frames.json` file with a sample dataset with six
sequences:

.. code::

   {
     "2011_09_28_drive_0165_sync": ["003193", "003185", "002857", "001864", "003838",
     "007320", "003476", "007308", "000337", "004165", "006573"],
     "2011_09_28_drive_0191_sync": ["005724", "002529", "004136", "005746"],
     "2011_09_28_drive_0179_sync": ["005107", "002485", "006089", "000695"],
     "2011_09_26_drive_0079_sync": ["005421", "000673", "002064", "000783", "003068"],
     "2011_09_28_drive_0035_sync": ["005540", "002424", "004949", "004996", "003969"],
     "2011_09_28_drive_0117_sync": ["007150", "003797", "002554", "001509"]
   }

Instance Segmentation -- COCO format
------------------------------------

.. _instance_segmentation_coco_format:

Using the COCO format requires data to be organized in this structure:

.. code::

    annotation{
    "id": int, 
    "image_id": int, 
    "category_id": int, 
    "segmentation": RLE or [polygon], 
    "area": float, 
    "bbox": [x,y,width,height], 
    "iscrowd": 0 or 1,
    }

    image{
    "id": int,
    "width": int,
    "height": int,
    "file_name": str,
    "license": int,
    "flickr_url": str,
    "coco_url": str,
    "date_captured": datetime,
    }

    categories[{
    "id": int, 
    "name": str, 
    "supercategory": str,
    }]

An example COCO annotation file is shown below:

.. code::

    "annotations": [{"segmentation": [[510.66,423.01,511.72,420.03,510.45,416.0,510.34,413.02,510.77,410.26,510.77,407.5,510.34,405.16,511.51,402.83,511.41,400.49,510.24,398.16,509.39,397.31,504.61,399.22,502.17,399.64,500.89,401.66,500.47,402.08,499.09,401.87,495.79,401.98,490.59,401.77,488.79,401.77,485.39,398.58,483.9,397.31,481.56,396.35,478.48,395.93,476.68,396.03,475.4,396.77,473.92,398.79,473.28,399.96,473.49,401.87,474.56,403.47,473.07,405.59,473.39,407.71,476.68,409.41,479.23,409.73,481.56,410.69,480.4,411.85,481.35,414.93,479.86,418.65,477.32,420.03,476.04,422.58,479.02,422.58,480.29,423.01,483.79,419.93,486.66,416.21,490.06,415.57,492.18,416.85,491.65,420.24,492.82,422.9,493.56,424.39,496.43,424.6,498.02,423.01,498.13,421.31,497.07,420.03,497.07,415.15,496.33,414.51,501.1,411.96,502.06,411.32,503.02,415.04,503.33,418.12,501.1,420.24,498.98,421.63,500.47,424.39,505.03,423.32,506.2,421.31,507.69,419.5,506.31,423.32,510.03,423.01,510.45,423.01]],"area": 702.1057499999998,"iscrowd": 0,"image_id": 289343,"bbox": [473.07,395.93,38.65,28.67],"category_id": 18,"id": 1768}],
    "images": [{"license": 1,"file_name": "000000407646.jpg","coco_url": "http://images.cocodataset.org/val2017/000000407646.jpg","height": 400,"width": 500,"date_captured": "2013-11-23 03:58:53","flickr_url": "http://farm4.staticflickr.com/3110/2855627782_17b93a684e_z.jpg","id": 407646}],
    "categories": [{"supercategory": "person","id": 1,"name": "person"},{"supercategory": "vehicle","id": 2,"name": "bicycle"},{"supercategory": "vehicle","id": 3,"name": "car"},{"supercategory": "vehicle","id": 4,"name": "motorcycle"}]

For more details, please check COCO format. A COCO dataset preparation script is provided in the
TLT container which automatically downloads and converts the dataset to TFRecords. In the
MaskRCNN notebook, you can run the script as follows:

.. code::

   download_and_preprocess_coco.sh $data_dir

When using a custom dataset, you should follow the COCO format closely and convert the dataset to
TFRecords using the following command (refer to L68-75 in download_and_preprocess_coco.sh for
more detail). 

.. code::

    python create_coco_tf_record.py
      --logtostderr
      --include_masks
      --train_image_dir=$TRAIN_IMAGE_DIR
      --val_image_dir=$VAL_IMAGE_DIR
      --train_object_annotations_file=$TRAIN_COCO_ANNOTATION_FILE
      --val_object_annotations_file=$VAL_ANNOTATION_FILE
      --output_dir=$OUTPUT_DIR

Semantic Segmentation -- UNet Format
------------------------------------

.. _semantic_segmentation_unet_format:

This section describes the formats of the dataset for training a semantic segmentation UNet in TLT, namely:

- Stuctured Image and Masks Folder
- Image and Masks Text File

.. _semantic_segmentation_unet_format_folders:

Structured Images and Masks Folders
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

UNet expects the images and corresponding masks encoded as images. Each mask image is a
single-channel image, where every pixel is assigned an integer value that represents the
segmentation class. The data folder structure for images and masks must be in the following format:

.. code::

  /Dataset_01
      /images
        /train
          0000.png
          0001.png
          ...
          ...
          N.png
        /val
          0000.png
          0001.png
          ...
          ...
          N.png
        /test
          0000.png
          0001.png
          ...
          ...
          N.png
      /masks 
        /train
          0000.png
          0001.png
          ...
          ...
          N.png
        /val
          0000.png
          0001.png
          ...
          ...
          N.png

* See the :ref:`Folders based Dataset Config<dataset_config_unet_folders>` section for further details about
  configuring these image and mask folder paths in experiment spec.

* Each image and label has the same file ID before the extension. The image-to-label correspondence is maintained using this filename. The :code:`test` folder in the above
  directory structure is optional; any folder can be used for inference.

.. _semantic_segmentation_unet_format_text:      

Image and Mask Text files
^^^^^^^^^^^^^^^^^^^^^^^^^

An image text file containing the paths to all the images and a mask text file containing the paths to the corresponding mask files. The image names and mask names 
should full abosolute unix paths.

Contents of example images text file :code:`images_source1.txt` is shown below:

.. code::

    /home/user/workspace/exports/images_final/00001.jpg
    /home/user/workspace/exports/images_final/00002.jpg

Contents of example corresponding mask text file :code:`labels_source1.txt` is shown below. It contains the corresponding mask names:

.. code::
    
    /home/user/workspace/exports/masks_final/00001.png
    /home/user/workspace/exports/masks_final/00002.png


* Text file method additionally allows to specify multiple sequences. 
* These text file paths should be provided in spec file. 

See :ref:`Text files based Dataset Config<dataset_config_unet_text_files>` section for further details about configuring multiple data sources using text files in dataset config.

.. Note:: The size of the images need not necessarily be equal to the model input dimensions. The images are resized internally to model input dimensions. 


Gesture Recognition -- Custom Format
------------------------------------

.. _gesture_recognition_custom_format:

A gesture recognition model should perform well on users outside the training dataset. Thus, model
training requires user segregation when splitting into train, validation and test dataset.
To enable this we need some unique identifier, `user_id` for each subject. In addition each
subject might record multiple videos. 

We wish to organise dataset in the following format:

.. code::

   .
   |-- original dataset root
     |-- uid_1
         |-- session_1     
             |-- 000000.png
             |-- 000001.png
                   .
                   .
             |-- xxxxxx.png
         |-- session_2     
             |-- 000000.png
             |-- 000001.png
                   .
                   .
             |-- xxxxxx.png
     |-- uid_2
         |-- session_1     
             |-- 000000.png
             |-- 000001.png
                   .
                   .
             |-- xxxxxx.png
         |-- session_2     
             |-- 000000.png
             |-- 000001.png
                   .
                   .
             |-- xxxxxx.png
     |-- uid_3
         |-- session_1     
             |-- 000000.png
             |-- 000001.png
                   .
                   .
             |-- xxxxxx.png


For each set we also prepare a metadata file that captures fields that can be used for dataset sampling.

.. code::

    {
        "set": "data", 
        "users": {
            "uid_1": {
                "location": "outdoor", 
                "illumination": "good",
                "class_fps": {
                    "session_1": 30, 
                    "session_2": 30
                }
            }, 
            "uid_2": {
                "location": "indoor", 
                "illumination": "good",
                "class_fps": {
                    "session_1": 10, 
                    "session_2": 15
                }
            }, 
            "uid_3": {
                "location": "indoor", 
                "illumination": "poor",
                "class_fps": {
                    "session_1": 10
                }
            }
        }
    }

Label Format
^^^^^^^^^^^^
Each image corresponds to a subject performing a gesture. The image requires a corresponding label
JSON which contains a bounding box for the hand of interest and gesture label. We follow the 
`Label Studio <https://labelstud.io/>`_ format. A sample label for an image is: 

.. code::

    {
      "completions": [
        {
          "result": [
            {
              "type": "rectanglelabels",
              "original_width": 320,
              "original_height": 240,
              "value": {
                "x": 58.1,
                "y": 18.3,
                "width": 18.8,
                "height": 49.5
              }
            },
            {
              "type": "choices",
              "value": {
                "choices": [
                  "Thumbs-up"
                ]
              }
            }
          ]
        }
      ],
      "task_path": "/workspace/tlt-experiments/gesturenet/data/uid_1/session_1/image_0001.png"
    }

* :code:`task_path`: specifies the full path to the image.

* :code:`completions`: This is a chunk that conatins the labels under `results`.

The bounding box and gesture class are seperate entries whith the following `type`

* :code:`rectanglelabels`: specifies the label corresponding to hand bounding box.

+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+----------------+
| **Parameter name**                                 | **Description**                                                                              | **Type**             | **Range**      |
+====================================================+==============================================================================================+======================+================+
| type                                               | The type of label                                                                            | String               | rectanglelabels| 
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+----------------+
| original_width                                     | Width of image being labelled (in pixels)                                                    | Integer              | [1, inf)       |
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+----------------+
| original_height                                    | Height of image being labelled (in pixels)                                                   | Integer              | [1, inf)       |
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+----------------+
| value["x"]                                         | x coordinate of top left corner of hand bounding box (as a percentage of image width)        | Float                | [0, 100]       |
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+----------------+
| value["y"]                                         | y coordinate of top left corner of hand bounding box (as a percentage of image height)       | Float                | [0, 100]       |
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+----------------+
| value["width"]                                     | Width of the hand bounding box (as a percentage of image width)                              | Float                | [0, 100]       |
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+----------------+
| value["height"]                                    | Height of the hand bounding box (as a percentage of image height)                            | Float                | [0, 100]       |
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+----------------+

* :code:`choices`: specifies the label corresponding to gesture class.

+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+------------------------+
| **Parameter name**                                 | **Description**                                                                              | **Type**             | **Range**              |
+====================================================+==============================================================================================+======================+========================+
| type                                               | The type of label                                                                            | String               | choices                | 
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+------------------------+
| value["choices"]                                   | List of attributes. For GestureNet app this will be a single entry with gesture class name   | List of strings      | Valid gesture classes  |
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+------------------------+

The :code:`dataset_convert` tool requires an extraction and experiment configuration spec files input. The details of the
configuration files and sample usage examples are included on the :ref:`Gesture Recognition
<gesturenet>` page.

Heart Rate Estimation -- Custom Format 
--------------------------------------

.. _heart_rate_estimation_custom_format:

HeartRateNet expects directories of images in the format shown below. The images and ground truth
labels are then converted to TFRecords for training.

.. code::

    Subject_001/
        ground_truth.csv
        image_timestamps.csv
        images/
            0000.bmp
            0001.bmp
                .
                .
            N.bmp
    .
    .
    Subject_M/
        ground_truth.csv
        image_timestamps.csv
        images/
            0000.bmp
            0001.bmp
                .
                .
            Y.bmp


EmotionNet, FPENET, GazeNet -- JSON Label Data Format
-----------------------------------------------------

.. _json_label_data_format:

EmotionNet, FPENet, and GazeNet use the same JSON data format labeled by the NVIDIA data factory
team. These apps expect data in this Json data format for training and evaluation. 
For EmotionNet, FPENet, and GazeNet, this data is converted to TFRecords for training. 
TFRecords help iterate faster through the data. Please refer to the corresponding section for the
JSON data format descriptions.

Using the Json Label data format requires data to be organized in a json file with the following struture:

.. code::

   .
   {
        "filename": "data/001_01_02_200_06.png", 
        "class": "image", 
        "annotations": [
            {
                "class": "FaceBbox",
                "tool-version": "1.0",
                "Occlusion": 0, 
                "face_outer_bboxx": 269.0082935424086,
                "face_outer_bboxy": 44.33839032556304, 
                "face_outer_bboxwidth": 182.97858097042064, 
                "face_outer_bboxheight": 276.28773076003836,
                "face_tight_bboxx": 269.211755426433, 
                "face_tight_bboxy": 147.9049289218409, 
                "face_tight_bboxwidth": 182.58110482105968, 
                "face_tight_bboxheight": 172.5088694283426
            }, 
            { 
                "class": "FiducialPoints",
                "tool-version": "1.0",
                "P1x": 304.8502837500011,
                "P1y": 217.10946645000078,
                "P2x": 311.0173699500011,
                "P2y": 237.15249660000086,
                .
                .
                "P26occluded": true,
                "P46occluded": true,
                .
                .
                "P68x": 419.5885050000024, 
                "P68y": 267.6976650000015,
                .
                .
                "P104x": 429.6, 
                "P104y": 189.5,
            }, 
            {
                "class": "eyes"
                "tool-version": "1.0", 
                "l_eyex": 389.1221901922325, 
                "l_eyey": 197.94528259092206, 
                "r_eyex": 633.489814294182, 
                "r_eyey": 10.52527209626886,
                "l_status": "open", 
                "r_status": "occluded",
            }
        ]
    }

Here's a description of the structure:

* :code:`filename` field: specifies the path to the images to train on.
* :code:`class` field: category of the labels for the respective section.
* :code:`annotation` field: annotation chunk. 

There are three supported chunk in the annotation including FaceBbox, FiducialPoints, and eyes.

* :code:`FaceBox` chunk: This is a chunk that describe Face Bounding Box labeling information.

+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+
| **Parameter name**                                 | **Description**                                                                              | **Type**             | **Range**                                                                                     | **Example**            |
+====================================================+==============================================================================================+======================+===============================================================================================+========================+
| class                                              | The class for the annotation chunk                                                           | String               | N/A                                                                                           | FaceBbox               |
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+
| :code:`tool-version`                               | Version of the labeling tool for this chunk                                                  | Float                | N/A                                                                                           | :code:`1.0`            |
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+
| :code:`Occlusion`                                  | Occlusion state [ 0 = not occlused, 1 = occluded ]                                           | Integer              | 0 or 1                                                                                        | :code:`0`              |
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+
| :code:`face_outer_bboxx`                           | x coordinate of top left corner of outer face bounding box                                   | Float                | [0, image_width]                                                                              | :code:`269.05`         |
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+
| :code:`face_outer_bboxy`                           | y coordinate of top left corner of outer face bounding box                                   | Float                | [0, image_height]                                                                             | :code:`44.33`          |
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+
| :code:`face_outer_bboxwidth`                       | Width of the outer face bounding box                                                         | Float                | [0, image_width]                                                                              | :code:`182.97`         |
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+
| :code:`face_outer_bboxheight`                      | Height of the outer face bounding box                                                        | Float                | [0, image_height]                                                                             | :code:`276.28`         |
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+
| :code:`face_tight_bboxx`                           | x coordinate of top left corner of tight face bounding box                                   | Float                | [0, image_width]                                                                              | :code:`269.21`         |
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+
| :code:`face_tight_bboxy`                           | y coordinate of top left corner of outer face bounding box                                   | Float                | [0, image_height]                                                                             | :code:`147.90`         |
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+
| :code:`face_tight_bboxwidth`                       | Width of the outer face bounding box                                                         | Float                | [0, image_width]                                                                              | :code:`182.58`         |
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+
| :code:`face_tight_bboxheight`                      | Height of the outer face bounding box                                                        | Float                | [0, image_height]                                                                             | :code:`172.50`         |
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+

* :code:`FiducialPoint` chunk: This is a chunk that describes Fiducial Point Labeling information.

+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+
| **Parameter name**                                 | **Description**                                                                              | **Type**             | **Range**                                                                                     | **Example**            |
+====================================================+==============================================================================================+======================+===============================================================================================+========================+
| class                                              | The class for the annotation chunk                                                           | String               | N/A                                                                                           | FaceBbox               |
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+
| :code:`tool-version`                               | Version of the labeling tool for this chunk                                                  | Float                | N/A                                                                                           | :code:`1.0`            |
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+
| :code:`Occlusion`                                  | Occlusion status [ 0 = not occlused, 1 = occluded ]                                          | Integer              | 0 or 1                                                                                        | :code:`0`              |
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+
| :code:`Pix`                                        | x coordinate of the ith landmarks point                                                      | Float                | [0, image_width]                                                                              | :code:`304.85`         |
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+
| :code:`Piy`                                        | y coordinate of the ith landmarks point                                                      | Float                | [0, image_height]                                                                             | :code:`217.10`         |
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+
| :code:`Pioccluded`                                 | Width of the outer face bounding box                                                         | String               | N/A                                                                                           | :code:`true`           |
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+

* :code:`eyes` chunk: This is a chunk that describes eyes labeling information. This chunk is not required. 

+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+
| **Parameter name**                                 | **Description**                                                                              | **Type**             | **Range**                                                                                     | **Example**            |
+====================================================+==============================================================================================+======================+===============================================================================================+========================+
| class                                              | The class for the annotation chunk                                                           | String               | N/A                                                                                           | FaceBbox               |
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+
| :code:`tool-version`                               | Version of the labeling tool for this chunk                                                  | Float                | N/A                                                                                           | :code:`1.0`            |
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+
| :code:`l_eyex`                                     | x coordinate of left eye center                                                              | Float                | [0, image_width]                                                                              | :code:`389.12`         |
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+
| :code:`l_eyey`                                     | y coordinate of left eye center                                                              | Float                | [0, image_height]                                                                             | :code:`197.94`         |
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+
| :code:`r_eyex`                                     | x coordinate of right eye center                                                             | Float                | [0, image_width]                                                                              | :code:`633.48`         |
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+
| :code:`r_eyey`                                     | y coordinate of right eye center                                                             | Float                | [0, image_height]                                                                             | :code:`182.97`         |
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+
| :code:`l_status`                                   | Status of the left eye                                                                       | String               | open/close/barely open/half open/occluded                                                     | :code:`open`           |
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+
| :code:`r_status`                                   | Status of the right eye                                                                      | String               | open/close/barely open/half open/occluded                                                     | :code:`occluded`       |
+----------------------------------------------------+----------------------------------------------------------------------------------------------+----------------------+-----------------------------------------------------------------------------------------------+------------------------+

Here's an example of a json file with a sample dataset with two image frames:

.. code::

    [
        {
            "filename": "data/001_01_02_200_06.png", 
            "class": "image", 
            "annotations": [
                {
                    "face_outer_bboxy": 44.33839032556304, 
                    "face_outer_bboxx": 269.0082935424086, 
                    "face_tight_bboxx": 269.211755426433, 
                    "face_tight_bboxy": 147.9049289218409, 
                    "tool-version": "1.0", 
                    "face_tight_bboxwidth": 182.58110482105968, 
                    "face_tight_bboxheight": 172.5088694283426, 
                    "face_outer_bboxwidth": 182.97858097042064, 
                    "Occlusionx": 0, 
                    "class": "FaceBbox", 
                    "face_outer_bboxheight": 276.28773076003836
                }, 
                {
                    "P91x": 395.3500000000004, 
                    "P91y": 196.6500000000002, 
                    "P74occluded": true, 
                    "P28x": 436.44144340908053, 
                    "P28y": 174.67157210032852, 
                    "P52y": 252.53100000000143, 
                    "P52x": 428.9925000000024, 
                    "P32y": 236.48449500000103, 
                    "P32x": 416.6063550000018, 
                    "P44x": 427.65443026467267, 
                    "P44y": 186.9615161604129, 
                    "P99x": 425.75, 
                    "P36occluded": true, 
                    "P75x": 428.85, 
                    "P75y": 190.95000000000002, 
                    "P20x": 389.46879000000166, 
                    "P20y": 178.13376000000076, 
                    "P8y": 313.8318038340011, 
                    "P8x": 407.70466707150143, 
                    "P81y": 192.2500000000002, 
                    "P94x": 427.70000000000005, 
                    "P81x": 393.5500000000004, 
                    "P12y": 268.179948238501, 
                    "P12x": 408.69280247400155, 
                    "P65y": 260.04348000000147, 
                    "P65x": 429.0319800000024, 
                    "P84x": 396.8500000000004, 
                    "P84y": 194.4500000000002, 
                    "P93occluded": true, 
                    "P46occluded": true, 
                    "P43y": 193.31428917697824, 
                    "P43x": 421.12354211680173, 
                    "P14occluded": true, 
                    "P92y": 187.5, 
                    "P54occluded": true, 
                    "P53x": 433.50450000000245, 
                    "P53y": 251.9670000000014, 
                    "P45occluded": true, 
                    "P33x": 426.3480450000019, 
                    "P33y": 238.67140500000104, 
                    "P60x": 413.82301500000233, 
                    "P100occluded": true, 
                    "P60y": 272.07148500000153, 
                    "P23y": 174.7903155211989, 
                    "P23x": 428.12940394815394, 
                    "P90y": 194.9000000000002, 
                    "P13x": 399.2067026100015, 
                    "P13y": 257.903340052501, 
                    "P7x": 388.1395861020014, 
                    "P7y": 304.93858521150105, 
                    "P61y": 262.1309850000015, 
                    "P104x": 429.6, 
                    "P104y": 189.5, 
                    "P83y": 193.2500000000002, 
                    "P83x": 395.0000000000004, 
                    "P61x": 404.5783500000023, 
                    "P50y": 254.6756100000014, 
                    "P50x": 414.2206350000023, 
                    "P100x": 424.8, 
                    "P100y": 191.3, 
                    "P34y": 240.46069500000107, 
                    "P34x": 435.9903300000019, 
                    "P18y": 188.2730700000008, 
                    "P18x": 366.50623500000154, 
                    "P25occluded": true, 
                    "P102occluded": true, 
                    "P46x": 436.0852131464696, 
                    "P46y": 191.82999641609848, 
                    "P58y": 275.0536350000016, 
                    "P58x": 429.2307900000024, 
                    "P77x": 306.5418228495726, 
                    "P77y": 258.61884245799524, 
                    "P97occluded": true, 
                    "P99y": 192.9, 
                    "P10y": 293.87146870350114, 
                    "P10x": 434.97720418050164, 
                    "P48occluded": true, 
                    "P26x": 436.0258414360342, 
                    "P26y": 171.99984513074497, 
                    "version": "v1", 
                    "P27occluded": true, 
                    "P86x": 397.8000000000004, 
                    "P86y": 198.45000000000022, 
                    "P73occluded": true, 
                    "P98occluded": true, 
                    "P2y": 237.15249660000086, 
                    "P90x": 393.3500000000004, 
                    "P29y": 203.3826300000009, 
                    "P29x": 433.6046100000019, 
                    "P101y": 188.85000000000002, 
                    "P101x": 425.65000000000003, 
                    "P51x": 423.6641100000023, 
                    "P51y": 252.5881050000014, 
                    "P35x": 436.78557000000194, 
                    "P35y": 239.26783500000104, 
                    "P66x": 433.70401500000247, 
                    "P66y": 268.0952850000015, 
                    "P19x": 378.4348350000016, 
                    "P19y": 181.61293500000076, 
                    "P98y": 193.45000000000002, 
                    "P98x": 427.85, 
                    "P45y": 187.0802595812833, 
                    "P45x": 433.2353710455805, 
                    "P21y": 176.44387500000076, 
                    "P21x": 398.1170250000017, 
                    "P59x": 422.1730350000024, 
                    "P59y": 274.25839500000154, 
                    "P9x": 431.0246625705015, 
                    "P9y": 312.25078719000106, 
                    "P17occluded": true, 
                    "P11x": 422.7243251895016, 
                    "P11y": 281.81621679300105, 
                    "P70y": 195.95000000000002, 
                    "P79occluded": true, 
                    "P95occluded": true, 
                    "P70x": 395.20000000000005, 
                    "P1x": 304.8502837500011, 
                    "P13occluded": true, 
                    "P85y": 196.6500000000002, 
                    "P85x": 398.1000000000004, 
                    "P69y": 196.95000000000002, 
                    "P24x": 433.0572559142747, 
                    "P36y": 236.88211500000105, 
                    "P36x": 427.5409050000019, 
                    "P94occluded": true, 
                    "P104occluded": true, 
                    "P47occluded": true, 
                    "P40x": 401.35650000000186, 
                    "P40y": 197.40000000000092, 
                    "P71x": 396.40000000000003, 
                    "P71y": 196.8, 
                    "P65occluded": true, 
                    "P26occluded": true, 
                    "P56y": 273.06553500000155, 
                    "P56x": 433.0081800000024, 
                    "P16occluded": true, 
                    "P89y": 196.2500000000002, 
                    "P89x": 392.4500000000004, 
                    "P48x": 428.54500592120047, 
                    "P48y": 195.45167075264504, 
                    "P16y": 216.4016531475008, 
                    "P16x": 360.47179483200136, 
                    "P15occluded": true, 
                    "P24y": 170.63429579073562, 
                    "P78x": 276.3975906000002, 
                    "class": "FiducialPoints", 
                    "P74y": 190.10000000000002, 
                    "P4y": 270.1562190435009, 
                    "P4x": 329.2467161130011, 
                    "P96y": 191.10000000000002, 
                    "P74x": 427.85, 
                    "P103y": 195.00000000000003, 
                    "P103x": 396.4500000000001, 
                    "P80x": 330.41417158035716, 
                    "P80y": 178.5832276794402, 
                    "P37x": 381.05250000000177, 
                    "P37y": 200.64300000000094, 
                    "P47y": 195.09544049003392, 
                    "P47x": 433.47285788732125, 
                    "P64x": 432.80937000000245, 
                    "P64y": 255.47085000000143, 
                    "P76y": 191.60000000000002, 
                    "P57y": 271.77327000000156, 
                    "P99occluded": true, 
                    "P43occluded": true, 
                    "P88x": 392.8500000000004, 
                    "P88y": 198.45000000000022, 
                    "P17x": 335.9660368500013, 
                    "P17y": 206.7179262030008, 
                    "P96x": 431.05, 
                    "P67y": 268.3935000000015, 
                    "P27y": 173.42476618118954, 
                    "P27x": 436.38207169864535, 
                    "P87y": 199.45000000000022, 
                    "P87x": 395.1000000000004, 
                    "P3x": 316.76397300000116, 
                    "P67x": 426.8450700000024, 
                    "P96occluded": true, 
                    "P12occluded": true, 
                    "P97x": 430.35, 
                    "P97y": 193.05, 
                    "P101occluded": true, 
                    "P55occluded": true, 
                    "P93x": 429.05, 
                    "P93y": 195.4, 
                    "P42x": 388.6665000000018, 
                    "P42y": 200.64300000000094, 
                    "P79y": 238.89320075909347, 
                    "P54y": 252.24900000000142, 
                    "P54x": 431.5305000000024, 
                    "P73x": 427.05, 
                    "P73y": 191, 
                    "P68y": 267.6976650000015, 
                    "P30y": 214.61539500000094, 
                    "P30x": 440.86117500000194, 
                    "P14y": 243.47656317600092, 
                    "P14x": 384.18704449200146, 
                    "P63y": 254.87442000000144, 
                    "P76occluded": true, 
                    "P22x": 406.8646650000017, 
                    "P22y": 176.94090000000077, 
                    "P28occluded": true, 
                    "P6y": 296.24299366950106, 
                    "P6x": 367.5863697300013, 
                    "P92x": 428.85, 
                    "P38y": 193.3815000000009, 
                    "P38x": 388.5255000000018, 
                    "P94y": 188.5, 
                    "P72y": 197.70000000000002, 
                    "P72x": 395.65000000000003, 
                    "P78y": 210.5218971000002, 
                    "P63x": 427.8391200000024, 
                    "P35occluded": true, 
                    "P82x": 393.8000000000004, 
                    "P82y": 200.95000000000022, 
                    "P11occluded": true, 
                    "tool-version": "1.0", 
                    "P41y": 200.99550000000093, 
                    "P41x": 396.5625000000018, 
                    "P56occluded": true, 
                    "P55x": 425.0508679558401, 
                    "P55y": 259.9172483306748, 
                    "P31x": 449.410005000002, 
                    "P31y": 225.351135000001, 
                    "P1y": 217.10946645000078, 
                    "P75occluded": true, 
                    "P62x": 420.38374500000236, 
                    "P62y": 256.06728000000146, 
                    "P15x": 373.5151821450014, 
                    "P15y": 228.45690505800087, 
                    "P49y": 261.4140000000014, 
                    "P49x": 400.0875000000022, 
                    "P25y": 170.87178263247637, 
                    "P25x": 435.25400920037674, 
                    "P2x": 311.0173699500011, 
                    "P80occluded": true, 
                    "P3y": 251.86940685000093, 
                    "P39x": 397.33800000000184, 
                    "P39y": 192.1830000000009, 
                    "P69x": 394.6, 
                    "P5x": 347.3103508088991, 
                    "P5y": 287.4697160411496, 
                    "P95x": 430, 
                    "P95y": 189.25, 
                    "P79x": 368.8999131564783, 
                    "P57x": 434.7974700000025, 
                    "P102x": 428.1, 
                    "P102y": 190.85000000000002, 
                    "P76x": 428.25
                }, 
                {
                    "l_eyex": 389.1221901922325, 
                    "l_eyey": 197.94528259092206, 
                    "tool-version": "1.0", 
                    "l_status": "open", 
                    "r_status": "occluded", 
                    "r_eyex": 633.489814294182, 
                    "r_eyey": 10.52527209626886, 
                    "class": "eyes"
                }
            ]
        }, 
        {
            "filename": "data/001_03_01_130_05.png", 
            "class": "image", 
            "annotations": [
                {
                    "face_outer_bboxy": 36.21548211860577, 
                    "face_outer_bboxx": 259.54428851667467, 
                    "face_tight_bboxx": 265.58020220310897, 
                    "face_tight_bboxy": 116.19133846386018, 
                    "tool-version": "1.0", 
                    "face_tight_bboxwidth": 191.64025954428882, 
                    "face_tight_bboxheight": 192.64624515869457, 
                    "face_outer_bboxwidth": 198.68215884512887, 
                    "Occlusionx": 0, 
                    "class": "FaceBbox", 
                    "face_outer_bboxheight": 273.62808711835464
                }, 
                {
                    "P91x": 283.35, 
                    "P91y": 179.55, 
                    "P28x": 304.14947850000084, 
                    "P28y": 176.3226009000005, 
                    "P5occluded": true, 
                    "P52y": 244.28250000000094, 
                    "P52x": 305.0535000000012, 
                    "P32y": 220.38088500000066, 
                    "P32x": 289.76557500000087, 
                    "P44x": 334.8750000000012, 
                    "P44y": 168.63600000000062, 
                    "P99x": 340.20000000000005, 
                    "P99y": 174.75, 
                    "P75x": 343.90000000000003, 
                    "P75y": 171.70000000000002, 
                    "P20x": 269.9839800000006, 
                    "P20y": 158.94859500000035, 
                    "P8y": 299.437842994699, 
                    "P8x": 301.7845345542186, 
                    "P94x": 342.70000000000005, 
                    "P12y": 272.68555921617576, 
                    "P12x": 389.08146056834715, 
                    "P65y": 249.500000000001, 
                    "P65x": 321.9500000000013, 
                    "P84x": 285.8, 
                    "P84y": 175.5, 
                    "P43y": 176.03850000000065, 
                    "P43x": 329.9400000000012, 
                    "P68x": 302.05, 
                    "P68y": 252.55, 
                    "P92y": 165.70000000000002, 
                    "P92x": 343.40000000000003, 
                    "P53x": 311.11650000000117, 
                    "P53y": 241.18050000000093, 
                    "P33x": 295.53106500000086, 
                    "P33y": 224.95351500000066, 
                    "P60x": 297.5100000000011, 
                    "P60y": 258.382500000001, 
                    "P23y": 149.55274915302633, 
                    "P23x": 325.5457633496816, 
                    "P90y": 177.15, 
                    "P13x": 406.681647264744, 
                    "P13y": 256.7280566114426, 
                    "P7x": 292.6324374720922, 
                    "P90x": 280.25, 
                    "P58x": 309.7065000000012, 
                    "P61y": 253.51800000000097, 
                    "P104x": 346.0500000000002, 
                    "P104y": 171.35000000000008, 
                    "P83y": 174.70000000000002, 
                    "P83x": 282.15000000000003, 
                    "P61x": 296.31150000000116, 
                    "P50y": 249.21750000000097, 
                    "P50x": 294.19650000000115, 
                    "P100x": 339.45000000000005, 
                    "P100y": 171.75, 
                    "P34y": 224.85411000000067, 
                    "P34x": 300.8989350000009, 
                    "P18y": 170.97660000000036, 
                    "P18x": 268.59231000000057, 
                    "P46x": 357.7170000000013, 
                    "P46y": 172.86600000000064, 
                    "P58y": 258.664500000001, 
                    "P4occluded": true, 
                    "P77x": 300.22496910000007, 
                    "P77y": 221.17413690000006, 
                    "tool-version": "1.0", 
                    "P10y": 298.73383552684317, 
                    "P10x": 341.67829106605154, 
                    "P26x": 361.5470170921228, 
                    "P26y": 148.3723801778643, 
                    "version": "v1", 
                    "P86x": 286.6, 
                    "P86y": 181.9, 
                    "P2y": 204.16216567820388, 
                    "P2x": 300.6111887744588, 
                    "P29y": 189.63790065000055, 
                    "P29x": 301.90690170000084, 
                    "P101y": 168.8, 
                    "P101x": 340.40000000000003, 
                    "P51x": 298.49700000000115, 
                    "P51y": 243.57750000000092, 
                    "P35x": 310.83943500000095, 
                    "P35y": 223.36303500000068, 
                    "P66x": 313.85, 
                    "P66y": 251.05, 
                    "P19x": 267.8964750000006, 
                    "P19y": 165.11170500000037, 
                    "P98y": 176.55, 
                    "P98x": 342.85, 
                    "P45y": 165.8865000000006, 
                    "P45x": 347.2125000000013, 
                    "P21y": 156.26466000000033, 
                    "P21x": 276.4453050000006, 
                    "P59x": 303.2910000000012, 
                    "P59y": 258.664500000001, 
                    "P9x": 316.33402222324, 
                    "P9y": 303.89655695778623, 
                    "P17occluded": true, 
                    "P11x": 366.0838832850552, 
                    "P11y": 286.0617011054374, 
                    "P70y": 178.35000000000002, 
                    "P70x": 283.15000000000003, 
                    "P1x": 307.6512634530176, 
                    "P1y": 189.14333969727852, 
                    "P85y": 178.60000000000002, 
                    "P85x": 287.5, 
                    "P69y": 179.3, 
                    "P69x": 282.65000000000003, 
                    "P36y": 219.78445500000066, 
                    "P36x": 326.446020000001, 
                    "P77occluded": true, 
                    "P81y": 173.25, 
                    "P81x": 281.95, 
                    "P40x": 298.00350000000105, 
                    "P40y": 178.85850000000062, 
                    "P71x": 283.95, 
                    "P71y": 178.9, 
                    "P56y": 254.505000000001, 
                    "P56x": 323.24250000000126, 
                    "P7y": 284.1843478578217, 
                    "P89y": 179.85000000000002, 
                    "P89x": 279.90000000000003, 
                    "P48x": 338.96400000000125, 
                    "P48y": 177.51900000000066, 
                    "P16y": 205.1008423020117, 
                    "P16x": 420.9964657778135, 
                    "P24x": 338.1757113839151, 
                    "P24y": 146.9559374076699, 
                    "class": "FiducialPoints", 
                    "P74y": 170.85000000000002, 
                    "P4y": 234.6691559519585, 
                    "P4x": 290.9897533804285, 
                    "P96y": 172, 
                    "P74x": 342.95000000000005, 
                    "P3occluded": true, 
                    "P78occluded": true, 
                    "P103y": 179.4, 
                    "P103x": 285.55, 
                    "P80x": 444.78450000000055, 
                    "P80y": 173.78250000000023, 
                    "P37x": 275.44350000000094, 
                    "P37y": 182.80650000000063, 
                    "P47y": 176.95500000000064, 
                    "P47x": 350.5965000000013, 
                    "P64x": 313.45000000000124, 
                    "P64y": 249.95000000000098, 
                    "P76y": 172.65, 
                    "P57y": 257.113500000001, 
                    "P6occluded": true, 
                    "P88x": 281.1, 
                    "P88y": 182.60000000000002, 
                    "P17x": 420.2924583099576, 
                    "P17y": 187.96999391751874, 
                    "P96x": 347.70000000000005, 
                    "P67y": 252.45000000000002, 
                    "P27y": 153.68404056609333, 
                    "P27x": 370.04567371328926, 
                    "P87y": 183.5, 
                    "P87x": 283.95, 
                    "P3x": 295.44846734351574, 
                    "P67x": 307.95000000000005, 
                    "P2occluded": true, 
                    "P97x": 346.25, 
                    "P97y": 175.35000000000002, 
                    "P93x": 344.35, 
                    "P93y": 177.8, 
                    "P42x": 280.30800000000096, 
                    "P42y": 185.34450000000064, 
                    "P54y": 243.78900000000093, 
                    "P54x": 321.0570000000012, 
                    "P73x": 342.5, 
                    "P73y": 171.95000000000002, 
                    "P30y": 201.83191200000059, 
                    "P30x": 298.26271440000085, 
                    "P14y": 239.83187738290155, 
                    "P14x": 416.77242097067824, 
                    "P63y": 251.20000000000098, 
                    "P63x": 307.7500000000012, 
                    "P22x": 284.2983000000006, 
                    "P22y": 157.95454500000034, 
                    "P1occluded": true, 
                    "P6y": 270.10419850070423, 
                    "P6x": 289.34706928876477, 
                    "P38y": 176.17950000000062, 
                    "P38x": 277.06500000000096, 
                    "P94y": 166.65, 
                    "P72y": 180.35000000000002, 
                    "P72x": 283.65000000000003, 
                    "P78y": 176.32260090000005, 
                    "P78x": 318.5860666500001, 
                    "P82x": 285.6, 
                    "P82y": 185.35000000000002, 
                    "P32occluded": true, 
                    "P41y": 183.37050000000062, 
                    "P41x": 290.460000000001, 
                    "P55x": 334.9455000000013, 
                    "P55y": 251.89650000000097, 
                    "P31x": 295.59965445000086, 
                    "P31y": 213.04479600000062, 
                    "P79y": 232.39037411700184, 
                    "P62x": 301.6000000000012, 
                    "P62y": 251.30000000000098, 
                    "P15x": 420.05778915400566, 
                    "P15y": 222.93569815436055, 
                    "P49y": 256.690500000001, 
                    "P49x": 291.65850000000114, 
                    "P25y": 145.8936053300241, 
                    "P25x": 350.45154872559993, 
                    "P3y": 218.94632250317727, 
                    "P39x": 286.30050000000097, 
                    "P39y": 173.50050000000059, 
                    "P5x": 288.8777309768609, 
                    "P5y": 253.44268842811516, 
                    "P95x": 346.6, 
                    "P95y": 168.25, 
                    "P79x": 431.0439612134159, 
                    "P57x": 315.4875000000012, 
                    "P102x": 343.1, 
                    "P102y": 172, 
                    "P76x": 343.20000000000005
                }, 
                {
                    "l_eyex": 289.90000000000003, 
                    "l_eyey": 179.60000000000002, 
                    "tool-version": "1.0", 
                    "l_status": "open", 
                    "r_status": "open", 
                    "r_eyex": 337.4000000000001, 
                    "r_eyey": 173.35000000000005, 
                    "class": "eyes"
                }
            ]
        } 


BodyposeNet -- COCO Format
--------------------------

.. _bodyposenet_coco_format:

Using the COCO format requires data to be organized in this structure:

.. code::

    |--dataset root
        |-- train2017
            |-- 000000001000.jpg
            |-- 000000001001.jpg
                .
                .
            |-- xxxxxxxxxxxx.jpg
        |-- val2017
            |-- 000000002000.jpg
            |-- 000000002001.jpg
                .
                .
            |-- xxxxxxxxxxxx.jpg
        |-- annotations
            |-- person_keypoints_train2017.json
            |-- person_keypoints_val2017.json

As long as you have a dataset root, and the filenames are adjusted accordingly in the
:code:`images->filename` field in annotations, you can choose to have a nested directory structure
for the train and test images.

Label Files
^^^^^^^^^^^

.. _bodyposenet_coco_label_files:

This section outlines the COCO annotations dataset format that the data must be in for BodyposeNet.
Although COCO annotations have more fields, only the attributes that are needed by BodyposeNet
are mentioned here. You may use the exact same format as COCO. The dataset should use the following
overall structure (in a :code:`.json` file):

.. code::

    "images": [
        {
            "file_name": "000000001000.jpg",
            "height": 480,
            "width": 640,
            "id": 1000
        },
        {
            "file_name": "000000580197.jpg",
            "height": 480,
            "width": 640,
            "id": 580197
        },
        ...
    ],
    "annotations": [
        {
            "segmentation": [[162.46,152.13,150.73,...173.92,156.23]],
            "num_keypoints": 17,
            "area": 8720.28915,
            "iscrowd": 0,
            "keypoints": [162,174,2,...,149,352,2],
            "image_id": 1000,
            "bbox": [115.16,152.13,83.23,228.41],
            "category_id": 1,
            "id": 1234574
        },
        ...
    ],
    "categories": [
        {
            "supercategory": "person",
            "id": 1,
            "name": "person",
            "keypoints": [
                "nose","left_eye","right_eye","left_ear","right_ear",
                "left_shoulder","right_shoulder","left_elbow","right_elbow",
                "left_wrist","right_wrist","left_hip","right_hip",
                "left_knee","right_knee","left_ankle","right_ankle"
            ],
            "skeleton": [
                [16,14],[14,12],[17,15],[15,13],[12,13],[6,12],[7,13],[6,7],
                [6,8],[7,9],[8,10],[9,11],[2,3],[1,2],[1,3],[2,4],[3,5],[4,6],[5,7]
            ]
        }
    ]


* The :code:`images` section contains the complete list of images in the dataset with some metadata.

.. Note:: Image IDs need to be unique among other images.

+--------------------+-----------------------------------+----------+-----------+
| **Parameter name** |    **Description**                | **Type** | **Range** |
+====================+===================================+==========+===========+
| :code:`file_name`  | The path to the image             | String   | N/A       |
+--------------------+-----------------------------------+----------+-----------+
| :code:`height`     | The height of the image           | Integer  | N/A       |
+--------------------+-----------------------------------+----------+-----------+
| :code:`width`      | The width of the image            | Float    | N/A       |
+--------------------+-----------------------------------+----------+-----------+
| :code:`id`         | The unique ID of the image        | Integer  | N/A       |
+--------------------+-----------------------------------+----------+-----------+

* The :code:`annotations` section contains the labels for the images. Each entity is one annotation,
  and each image can have multiple annotations.

+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+----------+------------------------+
|  **Parameter name**   |                                                                **Description**                                                                | **Type** |       **Range**        |
+=======================+===============================================================================================================================================+==========+========================+
| :code:`segmentation`  | A list of polygons, which has a list of vertices for a given person/group.                                                                    | List     | N/A                    |
+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+----------+------------------------+
| :code:`num_keypoints` | The number of keypoints that are labeled                                                                                                      | Integer  | [0, *total_keypoints*] |
+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+----------+------------------------+
| :code:`area`          | The area of the segmentation/bbox                                                                                                             | Float    | N/A                    |
+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+----------+------------------------+
| :code:`iscrowd`       | If `1`, indicates that the annotation mask is for multiple people                                                                             | Integer  | [0, 1]                 |
+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+----------+------------------------+
| :code:`keypoints`     | A list of keypoints with the following format: :code:`[x1, y1, v1, x2, y2, v2 ...]`, where x and y are pixel locations, and v is the          | List     | N/A                    |
|                       | visibility/occlusion flag.                                                                                                                    |          |                        |
+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+----------+------------------------+
| :code:`bbox`          | The bbox of the object/person                                                                                                                 | List     | N/A                    |
+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+----------+------------------------+
| :code:`image_id`      | The unique ID of the associated image                                                                                                         | Integer  | N/A                    |
+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+----------+------------------------+
| :code:`category_id`   | The object category (always :code:`1` for person)                                                                                             | Integer  | 1                      |
+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+----------+------------------------+
| :code:`id`            | The unique ID of the annotation                                                                                                               | Integer  | N/A                    |
+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+----------+------------------------+

* The COCO dataset follows the following occlusion flag labeling format: :code:`[visible: 2, occluded: 1, not_labeled: 0]`

* The :code:`categories` section contains the keypoint convention that is followed in the dataset

+-----------------------+--------------------------------------------------------------------------------------------------------------+----------+-----------+
|  **Parameter name**   |                                               **Description**                                                | **Type** | **Range** |
+=======================+==============================================================================================================+==========+===========+
| :code:`supercategory` | The supercategory                                                                                            | String   | *person*  |
+-----------------------+--------------------------------------------------------------------------------------------------------------+----------+-----------+
| :code:`id`            | The ID of the category                                                                                       | Integer  | 1         |
+-----------------------+--------------------------------------------------------------------------------------------------------------+----------+-----------+
| :code:`name`          | The name of the category                                                                                     | String   | *person*  |
+-----------------------+--------------------------------------------------------------------------------------------------------------+----------+-----------+
| :code:`keypoints`     | The keypoint names and ordering convention as used in labeling                                               | List     | N/A       |
+-----------------------+--------------------------------------------------------------------------------------------------------------+----------+-----------+
| :code:`skeleton`      | A list of skeleton edges with the following format: :code:`[[j1, j2], [j2, j3] ...]`, where j is the         | List     | N/A       |
|                       | keypoint/joint index.                                                                                        |          |           |
+-----------------------+--------------------------------------------------------------------------------------------------------------+----------+-----------+

For more details, please refer to the COCO keypoint annotations file and `COCO Keypoint Detection Task <https://cocodataset.org/#keypoints-2020>`_.