Augmenting a Dataset ==================== .. _augmenting_a_dataset: Training a deep neural network can be a daunting task, and the most important component of training a model is the data. Acquiring curated and annotated dataset can be a very tiring and manual process, involving thousands of man hours of painstaking labelling. In spite of planning and collecting data, it is very difficult to estimate all the corner cases that a network may go through, and repeating the process of collecting the missing data and annotating is very expensive and has long turnover times. Online augmentation in the training data loader is a good way to increase the variation in the dataset. However, the augmented data is generated randomly based on the distribution the data loader follows when sampling the data and in order to achieve good accuracy, the model may need to be trained for a long time. Inorder to circumvent this and generate a dataset with the required augmentations and give control to the user, TLT provides an offline augmentation tool called :code:`tlt-augment`. Offline augmentation can dramatically increase the size of the dataset when collecting and labeling data is expensive or not possible. The :code:`tlt-augment` tools provides several custom GPU accelerated augmentation routines categorized into: * Spatial augmentation * Color space augmentation * Image Blur Spatial augmentation comprises routines where data is augmented in space. The following spatial augmentation operations are supported in TLT. * Rotate * Resize * Translate * Shear * Flip Color space augmentation comprises routines where the image data is augmented in the color space. The following color augmentations operators are supported. * Hue Rotation * Brightness offset * Contrast shift A long with the above mentioned augmentation operations :code:`tlt-augment` also enables use to blur images, using a Gaussian blur operator. More information about the operation is described in :ref:`Blur config `. All augmentation routines currently provided with :code:`tlt-augment` are supported only for an object detection dataset. The spatial augmentation routines are applied to the images as well as the labelled data coordinates, while the color augmentation routines and channel-wise blue operator is applied only to images as the object labels are not affected. The sample workflow of using tlt-augment is as follows: .. image:: ../content/augmenting1.png .. Note:: The data is expected in KITTI format, as described in Data input for objection detection. The following sections describe how to use the augmentation tool. Configuring the Augmentor ------------------------- The augmentor has several components which the user can configure by using a simple protobuf based configuration file. The configuration file is divided into 4 major components. 1. Spatial augmentation config 2. Color augmentation config 3. Blur config 4. Data dimensions - output image width, output image height, output image channel, image extension. This configuration file contains several nested protobuf elements, and global parameters which are defined below. +----------------------+------------------+---------------------------------------------------------------------------+--------------------------------------------------------------+ | **Parameter** | **Datatype** | **Description** | **Supported Values** | +----------------------+------------------+---------------------------------------------------------------------------+--------------------------------------------------------------+ | spatial_config | Protobuf message | This protobuf message configures the spatial augmentation. | Protobuf definition provided in Spatial augmentation config. | +----------------------+------------------+---------------------------------------------------------------------------+--------------------------------------------------------------+ | color_config | Protobuf message | This protobuf message configures the color space augmentation operator. | Protobuf definition provided in Color augmentation config. | +----------------------+------------------+---------------------------------------------------------------------------+--------------------------------------------------------------+ | blur_config | Protobuf message | This protobuf message configures the gaussian blue operator to be applied | Protobuf definition provided in Blur config. | | | | on the image. The blur is computed channel wise and then concatenated | | | | | based on the number of image channels. | | +----------------------+------------------+---------------------------------------------------------------------------+--------------------------------------------------------------+ | dataset_config | Protobuf message | This protobuf message configures the relative paths of the images and | Protobuf definition provided in Dataset config. | | | | labels path from the input dataset root defined over the tlt-augment | | | | | command line. | | +----------------------+------------------+---------------------------------------------------------------------------+--------------------------------------------------------------+ | output_image_width | int32 | This parameter defines the width of the output image. | | +----------------------+------------------+---------------------------------------------------------------------------+--------------------------------------------------------------+ | output_image_height | int32 | This parameter defines the height of the output image. | | +----------------------+------------------+---------------------------------------------------------------------------+--------------------------------------------------------------+ | output_image_channel | int32 | This parameter defines the number of channels in the output image. | 1, 3 | +----------------------+------------------+---------------------------------------------------------------------------+--------------------------------------------------------------+ | image_extension | string | The extension of the input image. Note that all the images in the input | .png, .jpeg, .jpg | | | | dataset are expected to be of the same extension. | | +----------------------+------------------+---------------------------------------------------------------------------+--------------------------------------------------------------+ Spatial Augmentation Config ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Spatial augmentation config contains parameters to configure the spatial augmentation routines. This is a nested protobuf element called :code:`spatial_config` containing protobuf elements for all the spatial augmentation operations. +--------------------+------------------+--------------------------------------------------------------------------------+----------------------------------------------------+ | **Parameter** | **Datatype** | **Description** | **Supported Values** | +--------------------+------------------+--------------------------------------------------------------------------------+----------------------------------------------------+ | rotation_config | Protobuf message | This protobuf message configures the rotate augmentation | .. code:: | | | | operator. Defining this activates rotation. | | | | | | { | | | | | angle: 0.5 | | | | | units: degrees | | | | | } | | | | | | | | | | See :ref:`Rotation config ` | +--------------------+------------------+--------------------------------------------------------------------------------+----------------------------------------------------+ | flip_config | Protobuf message | This protobuf message configures the flip augmentation operator. | .. code:: | | | | Defining this activates flip along the horizontal and/or vertical axes. | | | | | | { | | | | | flip_vertical: true | | | | | flip_horizontal: true | | | | | } | | | | | | | | | | See :ref:`Flip config ` | +--------------------+------------------+--------------------------------------------------------------------------------+----------------------------------------------------+ | translation_config | Protobuf message | This protobuf message configures the translation augmentation operator. | .. code:: | | | | Defining this activates translating the images across the x and/or y axes. | | | | | | { | | | | | translate_x: 8 | | | | | translate_y: 8 | | | | | } | | | | | | | | | | See :ref:`Translation config ` | +--------------------+------------------+--------------------------------------------------------------------------------+----------------------------------------------------+ | shear_config | Protobuf message | This protobuf message configures the shear augmentation operator. | .. code:: | | | | Defining this activates adds a shear to the images across the x and/or y axes. | | | | | | { | | | | | shear_ratio_x: 0.2 | | | | | shear_ratio_y: 0.2 | | | | | } | | | | | | | | | | See :ref:`Shear config ` | +--------------------+------------------+--------------------------------------------------------------------------------+----------------------------------------------------+ The augmentation operators may be enabled by simply defining the corresponding proto associated with it. When defining multiple proto elements, it implies that all the augmentation operations are cascaded. If you don’t wish to introduce any of the supported augmentation operations, simply omit the field you wish to drop. The configurable parameters for the individual spatial augmentation operators are mentioned in the table below. Rotation Config *************** .. _rotation_config: The rotation operation rotates the image at an angle. The transformation matrix for shear operation is computed as: :: [x_new, y_new, 1] = [x, y, 1] * [[cos(angle) sin(angle) zero] [-sin(angle) cos(angle) zero] [x_t y_t one]] Where x_t, y_t are defined as x_t = height * sin(angle) / 2.0 - width * cos(angle) / 2.0 + width / 2.0 y_t = -1 * height * cos(angle) / 2.0 + height / 2.0 - width * sin_(angle) / 2.0 Here height = height of the output image, width = width of the output image. +---------------+--------------+---------------------------------------------------------------------------+-----------------------+ | **Parameter** | **Datatype** | **Description** | **Supported Values** | +===============+==============+===========================================================================+=======================+ | angle | float | The angle of the rotation to be applied to the image and the coordinates. | +/- 0 - 360 (degrees) | | | | | +/- 0 - 2ℼ (radians) | +---------------+--------------+---------------------------------------------------------------------------+-----------------------+ | units | string | The unit in which the angle parameter mentioned below is mentioned. | “degrees”, “radians” | +---------------+--------------+---------------------------------------------------------------------------+-----------------------+ Shear Config ************ .. _shear_config: The shear operation introduces a slant to the object along the x or the y dimension. The transformation matrix for shear operation is computed as: :: [x_new, y_new, 1] = [x, y, 1] * [[1.0 shear_ratio_y, 0], [shear_ratio_x, 1.0, 0], [x_t, y_t, 1.0]] X_t = -height * shear_ratio_x / 2. Y_t = -width * shear_ratio_y / 2. Here height = height of the output image, width = width of the output image. +---------------+--------------+--------------------------------------------+----------------------+ | **Parameter** | **Datatype** | **Description** | **Supported Values** | +===============+==============+============================================+======================+ | shear_ratio_x | float32 | The amount of horizontal shift per y row. | | +---------------+--------------+--------------------------------------------+----------------------+ | shear_ratio_y | float32 | The amount of vertical shift per x column. | | +---------------+--------------+--------------------------------------------+----------------------+ Flip Config ************ .. _flip_config: This element configures the flip operator of tlt-augment. The operator flips an image and the bounding box coordinates along the horizontal and vertical axis. +-----------------+--------------+----------------------------------------------------+----------------------+ | **Parameter** | **Datatype** | **Description** | **Supported Values** | +=================+==============+====================================================+======================+ | flip_horizontal | bool | The flag to enable flipping an image horizontally. | true, false | +-----------------+--------------+----------------------------------------------------+----------------------+ | flip_vertical | bool | The flag to enable flipping an image vertically. | true, false | +-----------------+--------------+----------------------------------------------------+----------------------+ Translation Config ****************** .. _translation_config: This protobuf message configures the translation operator for :code:`tlt-augment`. The operator translates the image and polygon coordinates along the x and/or y axis. +---------------+--------------+---------------------------------------------------------------+----------------------+ | **Parameter** | **Datatype** | **Description** | **Supported Values** | +===============+==============+===============================================================+======================+ | translate_x | int | The number of pixels to translate the image along the x axis. | 0 - image_width | +---------------+--------------+---------------------------------------------------------------+----------------------+ | translate_y | int | The number of pixels to translate the image along the y axis. | 0 - image_height | +---------------+--------------+---------------------------------------------------------------+----------------------+ Color Augmentation Config ^^^^^^^^^^^^^^^^^^^^^^^^^ Color augmentation config contains parameters to configure the color space augmentation routines. This is a nested protobuf element called :code:`color_config` containing protobuf elements for all the color augmentation operations. +-----------------------+------------------+------------------------------------------------------------------------------------+----------------------------------------------------------+ | **Parameter** | **Datatype** | **Description** | **Supported Values** | +=======================+==================+====================================================================================+==========================================================+ | hue_saturation_config | Protobuf message | This augmentation operator applies hue rotation and color saturation augmentation. | .. code:: | | | | | | | | | | { | | | | | hue_rotation_angle: 30 | | | | | saturation_shift: 1.0 | | | | | } | | | | | | | | | | See :ref:`Hue saturation config ` | +-----------------------+------------------+------------------------------------------------------------------------------------+----------------------------------------------------------+ | contrast_config | Protobuf message | This augmentation operator applies contrast scaling. | .. code:: | | | | | | | | | | { | | | | | contrast: 0.0 | | | | | center: 127.5 | | | | | } | | | | | | | | | | See :ref:`Contrast config ` | +-----------------------+------------------+------------------------------------------------------------------------------------+----------------------------------------------------------+ | brightness_config | Protobuf message | This protobuf message configures the translation augmentation operator. | .. code:: | | | | Defining this activates translating the images across the x and/or y axes. | | | | | | { | | | | | translate_x: 8 | | | | | translate_y: 8 | | | | | } | | | | | | | | | | See :ref:`Brightness config` | +-----------------------+------------------+------------------------------------------------------------------------------------+----------------------------------------------------------+ The augmentation operators may be enabled by simply defining the corresponding proto associated with it. When defining multiple proto elements, it implies that all the augmentation operations are cascaded. If you don’t want to introduce any of the supported augmentation operations, simply omit the field you wish to drop. The configurable parameters for the individual color augmentation operators are mentioned in the table below. Hue Saturation Config ********************* .. _hue_saturation_config: This augmentation operator applies a color space manipulation by converting the RGB image to HSV applying hue rotation and saturation shift and then returning with the corresponding RGB image. +--------------------+--------------+-------------------------------------------------------------------------------------------+--------------------------------------------------+ | **Parameter** | **Datatype** | **Description** | **Supported Values** | +====================+==============+===========================================================================================+==================================================+ | hue_rotation_angle | float32 | hue rotation in degrees (scalar or vector). A value of 0.0 (modulo 360) | 0 - 360 (the angles are computed as angle % 360) | | | | leaves the hue unchanged. | | +--------------------+--------------+-------------------------------------------------------------------------------------------+--------------------------------------------------+ | saturation_shift | float32 | Saturation shift multiplier. A value of 1.0 leaves the saturation unchanged. | 0.0 - 1.0 | | | | A value of 0 removes all saturation from the image and makes all channels equal in value. | | +--------------------+--------------+-------------------------------------------------------------------------------------------+--------------------------------------------------+ Brightness Config ***************** .. _brightness_config: This augmentation operator applies a channel-wise brightness shift. +---------------+--------------+--------------------------------+----------------------+ | **Parameter** | **Datatype** | **Description** | **Supported Values** | +===============+==============+================================+======================+ | offset | float32 | Offset value per color channel | 0 - 255 | +---------------+--------------+--------------------------------+----------------------+ Contrast Config *************** .. _contrast_config: This augmentation operator applies contrast scaling across a center point to an image. +---------------+--------------+------------------------------------------------------------------+----------------------+ | **Parameter** | **Datatype** | **Description** | **Supported Values** | +===============+==============+==================================================================+======================+ | contrast | float32 | Contrast scale value. A value 0 leaves the contrast unchanged. | 0 - 1.0 | +---------------+--------------+------------------------------------------------------------------+----------------------+ | center | float32 | Center value for the image. In our case, the images are scaled | 0.0 - 1.0 | | | | between 0-255 (8 bit images), therefore setting a value of 127.5 | | | | | is the common value. | | +---------------+--------------+------------------------------------------------------------------+----------------------+ Dataloader ^^^^^^^^^^ .. code:: dataset_config { data_sources: { tfrecords_path: "/path/to/tfrecords/root/*" image_directory_path: "/path/to/dataset/root" } image_extension: "png" target_class_mapping { key: "car" value: "car" } target_class_mapping { key: "pedestrian" value: "pedestrian" } target_class_mapping { key: "cyclist" value: "cyclist" } target_class_mapping { key: "van" value: "car" } target_class_mapping { key: "person_sitting" value: "pedestrian" } validation_fold: 0 } See :ref:`Dataloader` for more information. Blur Config ^^^^^^^^^^^ .. _blur_config: This protobuf element configures the gaussian blur operator to an image. A gaussian kernel is formulated based on the parameters mentioned below and then a 2D convolution is performed between this image and kernel per channel. +---------------+--------------+--------------------------------------------------------+----------------------+ | **Parameter** | **Datatype** | **Description** | **Supported Values** | +===============+==============+========================================================+======================+ | size | int | Size of the kernel to be convolved. | >0 | +---------------+--------------+--------------------------------------------------------+----------------------+ | std | float | Standard deviation of the gaussian filter to blurring. | >0.0 | +---------------+--------------+--------------------------------------------------------+----------------------+ For example, the following configuration file augments the image by 1. rotating an image by 5 deg 2. shearing along x axis by a ratio of 0.3 3. translating along x axis by 8 pixels .. code:: # Spec file for tlt-augment. spatial_config{ rotation_config{ angle: 5.0 units: "degrees" } shear_config{ shear_ratio_x: 0.3 } translation_config{ translate_x: 8 } } color_config{ hue_saturation_config{ hue_rotation_angle: 25.0 saturation_shift: 1.0 } } # Setting up dataset config. dataset_config{ image_path: "image_2" label_path: "label_2" } output_image_width: 1248 output_image_height: 384 output_image_channel: 3 image_extension: ".png" Running the Augmentor Tool -------------------------- The :code:`tlt-augment` tool has a simple command line interface, which is defined as follows: .. code:: tlt-augment [-h] -d /path/to/the/dataset/root -a /path/to/augmentation/spec/file -o /path/to/the/augmented/output [-v] Here are the command line parameters: * :code:`-h, --help`: show this help message and exit * :code:`-d, --dataset-folder`: Path to the detection dataset * :code:`-a, --augmentation-proto`: Path to augmentation spec file * :code:`-o, --output-dataset`: Path to the augmented output dataset * :code:`-v, --verbose`: Flag to get detailed logs during the augmentation process The augmented images and labels are generated in the path mentioned in the output-dataset parameter under the following directories. * Augmented images: :code:`/path/to/augmented/output/images` * Augmented labels: :code:`/path/to/augmented/output/labels` .. Note:: When running tlt-augment with the verbose flag set, tlt-augment generates augmented images with the bbox outputs rendered under :code:`/path/to/augmented/output/images/annotated`. The log from a successful run of :code:`tlt-augment` is mentioned below: .. code-block bash Using TensorFlow backend. 2020-07-10 16:19:18,980 [INFO] iva.augment.spec_handler.spec_loader: Merging specification from /path/to/augmentor/spec/file.txt 2020-07-10 16:19:18,992 [INFO] iva.augment.build_augmentor: Input dataset: /path/to/input/dataset/root 2020-07-10 16:19:18,992 [INFO] iva.augment.build_augmentor: Output dataset: /path/to/augmented/output 2%|███▉ | 167/7481 [00:13<10:04, 12.09it/s] The dataset thus generated may then be used with :code:`tlt-dataset-convert` tool to be converted to TFRecords so that it may be ingested by :code:`tlt-train`. The details about converting the data to TFRecords are described in the :ref:`Data Input for Object Detection ` section and training a model with this dataset is described in the :ref:`Preparing the Input Data Structure `. .. Note:: The :code:`tlt-augment` only applies the spatial augmentation operators to the bounding box coordinates fields in the label files of the input dataset, as only the bbox coordinates are relevant to us. All the other fields are just propagated as from the input labels to the output labels. Sample rendered augmented images are shown below. .. figure:: ../content/rendered_aug_images1.png Input image rotated by 5 degrees .. figure:: ../content/rendered_aug_images2.png Image rotated by 5 degrees, hue rotation by 25 degrees and saturation shift of 0.0