Data Input for Semantic Segmentation¶

This section describes the format of the dataset for training a semantic segmentation UNet in TLT.

UNet expects the images and corresponding masks encoded as images. Each mask image is a single-channel image, where every pixel is assigned an integer value that represents the segmentation class. The data folder structure for images and masks must be in the following format:

/Dataset_01
    /images
      /train
        0000.png
        0001.png
        ...
        ...
        N.png
      /val
        0000.png
        0001.png
        ...
        ...
        N.png
      /test
        0000.png
        0001.png
        ...
        ...
        N.png
    /masks
      /train
        0000.png
        0001.png
        ...
        ...
        N.png
      /val
        0000.png
        0001.png
        ...
        ...
        N.png

Note

See the Dataset Config section for further details about configuring the dataset, classes, dataset type.

Note

Each image and label has the same file ID before the extension. The image-to-label correspondence is maintained using this filename. The test folder in the above directory structure is optional; any folder can be used for inference.

Note

The size of the images need not necessarily be equal to the model input dimensions. The images are resized internally to model input dimensions. However, ensure that all images in the images and masks folders for train, validation, and test are of the equal size.