Data Input for Semantic Segmentation
------------------------------------

.. _dataset_format_unet:

This section describes the format of the dataset for training a semantic segmentation UNet in TLT.

UNet expects the images and corresponding masks encoded as images. Each mask image is a
single-channel image, where every pixel is assigned an integer value that represents the
segmentation class. The data folder structure for images and masks must be in the following format:

.. code::

  /Dataset_01
      /images
        /train
          0000.png
          0001.png
          ...
          ...
          N.png
        /val
          0000.png
          0001.png
          ...
          ...
          N.png
        /test
          0000.png
          0001.png
          ...
          ...
          N.png
      /masks 
        /train
          0000.png
          0001.png
          ...
          ...
          N.png
        /val
          0000.png
          0001.png
          ...
          ...
          N.png

.. Note:: See the :ref:`Dataset Config<dataset_config_unet>` section for further details about
          configuring the dataset, classes, dataset type.

.. Note:: Each image and label has the same file ID before the extension. The image-to-label
          correspondence is maintained using this filename. The :code:`test` folder in the above
          directory structure is optional; any folder can be used for inference.

.. Note:: The size of the images need not necessarily be equal to the model input dimensions. The images are resized internally to model input dimensions. 
          However, ensure that all images in the :code:`images` and :code:`masks` folders for train, validation, and test are of the equal size.