Data Input for Semantic SegmentationΒΆ
This section describes the format of the dataset for training a semantic segmentation UNet in TLT.
UNet expects the images and corresponding masks encoded as images. Each mask image is a single-channel image, where every pixel is assigned an integer value that represents the segmentation class. The data folder structure for images and masks must be in the following format:
/Dataset_01
/images
/train
0000.png
0001.png
...
...
N.png
/val
0000.png
0001.png
...
...
N.png
/test
0000.png
0001.png
...
...
N.png
/masks
/train
0000.png
0001.png
...
...
N.png
/val
0000.png
0001.png
...
...
N.png
Note
See the Dataset Config section for further details about configuring the dataset, classes, dataset type.
Note
Each image and label has the same file ID before the extension. The image-to-label
correspondence is maintained using this filename. The test
folder in the above
directory structure is optional; any folder can be used for inference.
Note
The size of the images need not necessarily be equal to the model input dimensions. The images are resized internally to model input dimensions.
However, ensure that all images in the images
and masks
folders for train, validation, and test are of the equal size.