Data Input for Instance Segmentation ------------------------------------ .. _dataset_format: Instance segmentation expects directories of images for training or validation and annotation files in COCO format. The naming convention for train/val split can be different because the path of each set is individually specified in the data preparation script in the IPython notebook example. Image data and the corresponding annotation file is then converted to TFRecords for training. COCO format for Instance Segmentation ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Using the COCO format requires data to be organized in this structure: .. code:: annotation{ "id": int, "image_id": int, "category_id": int, "segmentation": RLE or [polygon], "area": float, "bbox": [x,y,width,height], "iscrowd": 0 or 1, } image{ "id": int, "width": int, "height": int, "file_name": str, "license": int, "flickr_url": str, "coco_url": str, "date_captured": datetime, } categories[{ "id": int, "name": str, "supercategory": str, }] An example COCO annotation file is shown below: .. code:: "annotations": [{"segmentation": [[510.66,423.01,511.72,420.03,510.45,416.0,510.34,413.02,510.77,410.26,510.77,407.5,510.34,405.16,511.51,402.83,511.41,400.49,510.24,398.16,509.39,397.31,504.61,399.22,502.17,399.64,500.89,401.66,500.47,402.08,499.09,401.87,495.79,401.98,490.59,401.77,488.79,401.77,485.39,398.58,483.9,397.31,481.56,396.35,478.48,395.93,476.68,396.03,475.4,396.77,473.92,398.79,473.28,399.96,473.49,401.87,474.56,403.47,473.07,405.59,473.39,407.71,476.68,409.41,479.23,409.73,481.56,410.69,480.4,411.85,481.35,414.93,479.86,418.65,477.32,420.03,476.04,422.58,479.02,422.58,480.29,423.01,483.79,419.93,486.66,416.21,490.06,415.57,492.18,416.85,491.65,420.24,492.82,422.9,493.56,424.39,496.43,424.6,498.02,423.01,498.13,421.31,497.07,420.03,497.07,415.15,496.33,414.51,501.1,411.96,502.06,411.32,503.02,415.04,503.33,418.12,501.1,420.24,498.98,421.63,500.47,424.39,505.03,423.32,506.2,421.31,507.69,419.5,506.31,423.32,510.03,423.01,510.45,423.01]],"area": 702.1057499999998,"iscrowd": 0,"image_id": 289343,"bbox": [473.07,395.93,38.65,28.67],"category_id": 18,"id": 1768}], "images": [{"license": 1,"file_name": "000000407646.jpg","coco_url": "http://images.cocodataset.org/val2017/000000407646.jpg","height": 400,"width": 500,"date_captured": "2013-11-23 03:58:53","flickr_url": "http://farm4.staticflickr.com/3110/2855627782_17b93a684e_z.jpg","id": 407646}], "categories": [{"supercategory": "person","id": 1,"name": "person"},{"supercategory": "vehicle","id": 2,"name": "bicycle"},{"supercategory": "vehicle","id": 3,"name": "car"},{"supercategory": "vehicle","id": 4,"name": "motorcycle"}] For more details, please check COCO format. A COCO dataset preparation script is provided in the TLT container which automatically downloads and converts the dataset to TFRecords. In the MaskRCNN notebook, you can run the script as follows: .. code:: download_and_preprocess_coco.sh $data_dir When using a custom dataset, you should follow the COCO format closely and convert the dataset to TFRecords using the following command (refer to L68-75 in download_and_preprocess_coco.sh for more detail). .. code:: python create_coco_tf_record.py --logtostderr --include_masks --train_image_dir=$TRAIN_IMAGE_DIR --val_image_dir=$VAL_IMAGE_DIR --train_object_annotations_file=$TRAIN_COCO_ANNOTATION_FILE --val_object_annotations_file=$VAL_ANNOTATION_FILE --output_dir=$OUTPUT_DIR