NVIDIA TAO Toolkit v4.0
NVIDIA TAO Release tlt.40

Offline Data Augmentation

Training a deep neural network can be a daunting task and the most important component of training a model is the data. Acquiring curated and annotated dataset can be a very tiring and manual process involving thousands of man hours of painstaking labelling. In spite of planning and collecting data, it is very difficult to estimate all the corner cases that a network may go through. Repeating the process of collecting the missing data and annotating is very expensive and has long turnover times.

Online augmentation in the training data loader is a good way to increase the variation in the dataset. However, the augmented data is generated randomly based on the distribution the data loader follows when sampling the data. In order to achieve good accuracy, the model may need to be trained for a long time. In order to circumvent this and generate a dataset with the required augmentations and give control to the user, TAO Toolkit provides an offline augmentation tool called augment. Offline augmentation can dramatically increase the size of the dataset when collecting and labeling data is expensive or not possible. The augment tools provides several custom GPU accelerated augmentation routines categorized into:

  • Spatial augmentation

  • Color space augmentation

  • Image blur

Spatial augmentation comprises routines where data is augmented in space. The following spatial augmentation operations are supported in TAO Toolkit:

  • Rotate

  • Resize

  • Translate

  • Shear

  • Flip

Color space augmentation comprises routines where the image data is augmented in the color space. The following color augmentations operators are supported:

  • Hue Rotation

  • Brightness offset

  • Contrast shift

Along with the above mentioned augmentation operations, augment also enables use to blur images, using a Gaussian blur operator. More information about the operation is described in Blur config.

All augmentation routines currently provided with augment are supported only for an object detection dataset. The spatial augmentation routines are applied to the images as well as the labelled data coordinates, while the color augmentation routines and channel-wise blue operator are applied only to images as the object labels are not affected. The sample workflow of using augment is as follows:

augmenting1.png

Note

The data is expected in KITTI format, as described in Data Annotation Format. The following sections describe how to use the augmentation tool.

Configuring the Augmentor

The augmentor has several components which the user can configure by using a simple protobuf based configuration file. The configuration file is divided into 4 major components.

  1. Spatial augmentation config

  2. Color augmentation config

  3. Blur config

  4. Data dimensions - output image width, output image height, output image channel, image extension.

This configuration file contains several nested protobuf elements and global parameters which are defined below.

Parameter

Datatype

Description

Supported Values

spatial_config

Protobuf message

This protobuf message configures the spatial augmentation.

Protobuf definition provided in Spatial augmentation config.

color_config

Protobuf message

This protobuf message configures the color space augmentation operator.

Protobuf definition provided in Color augmentation config.

blur_config

Protobuf message

This protobuf message configures the gaussian blue operator to be applied on the image. The blur is computed channel wise and then concatenated based on the number of image channels.

Protobuf definition provided in Blur config. Blur Config

dataset_config

Protobuf message

This protobuf message configures the relative paths of the images and labels path from the input dataset root defined over the augment command line.

Protobuf definition provided in Dataset config.

output_image_width

int32

This parameter defines the width of the output image.

output_image_height

int32

This parameter defines the height of the output image.

output_image_channel

int32

This parameter defines the number of channels in the output image.

1, 3

image_extension

string

The extension of the input image. Note that all the images in the input dataset are expected to be of the same extension.

.png, .jpeg, .jpg

Spatial Augmentation Config

Spatial augmentation config contains parameters to configure the spatial augmentation routines. This is a nested protobuf element called spatial_config containing protobuf elements for all the spatial augmentation operations.

Parameter

Datatype

Description

Supported Values

rotation_config

Protobuf message

This protobuf message configures the rotate augmentation operator. Defining this activates rotation.

Copy
Copied!
            

{ angle: 0.5 units: degrees }

See Rotation config

flip_config

Protobuf message

This protobuf message configures the flip augmentation operator. Defining this activates flip along the horizontal and/or vertical axes.

Copy
Copied!
            

{ flip_vertical: true flip_horizontal: true }

See Flip config

translation_config

Protobuf message

This protobuf message configures the translation augmentation operator. Defining this activates translating the images across the x and/or y axes.

Copy
Copied!
            

{ translate_x: 8 translate_y: 8 }

See Translation config

shear_config

Protobuf message

This protobuf message configures the shear augmentation operator. Defining this activates a shear to the images across the x and/or y axes.

Copy
Copied!
            

{ shear_ratio_x: 0.2 shear_ratio_y: 0.2 }

See Shear config

The augmentation operators may be enabled by simply defining the corresponding proto associated with it. When defining multiple proto elements, it implies that all the augmentation operations are cascaded.

If you don’t wish to introduce any of the supported augmentation operations, omit the field you wish to drop. The configurable parameters for the individual spatial augmentation operators are mentioned in the table below.

Rotation Config

The rotation operation rotates the image at an angle. The transformation matrix for shear operation is computed as:

Copy
Copied!
            

[x_new, y_new, 1] = [x, y, 1] * [[cos(angle) sin(angle) zero] [-sin(angle) cos(angle) zero] [x_t y_t one]] Where x_t, y_t are defined as x_t = height * sin(angle) / 2.0 - width * cos(angle) / 2.0 + width / 2.0 y_t = -1 * height * cos(angle) / 2.0 + height / 2.0 - width * sin_(angle) / 2.0 Here height = height of the output image, width = width of the output image.

Parameter

Datatype

Description

Supported Values

angle

float

The angle of the rotation to be applied to the image and the coordinates.

+/- 0 - 360 (degrees) +/- 0 - 2ℼ (radians)

units

string

The unit in which the angle parameter below is mentioned.

“degrees”, “radians”

Shear Config

The shear operation introduces a slant to the object along the x or the y dimension. The transformation matrix for shear operation is computed as:

Copy
Copied!
            

[x_new, y_new, 1] = [x, y, 1] * [[1.0 shear_ratio_y, 0], [shear_ratio_x, 1.0, 0], [x_t, y_t, 1.0]] X_t = -height * shear_ratio_x / 2. Y_t = -width * shear_ratio_y / 2. Here height = height of the output image, width = width of the output image.

Parameter

Datatype

Description

Supported Values

shear_ratio_x

float32

The amount of horizontal shift per y row.

shear_ratio_y

float32

The amount of vertical shift per x column.

Flip Config

This element configures the flip operator of augment. The operator flips an image and the bounding box coordinates along the horizontal and vertical axis.

Parameter

Datatype

Description

Supported Values

flip_horizontal

bool

The flag to enable flipping an image horizontally.

true, false

flip_vertical

bool

The flag to enable flipping an image vertically.

true, false

Translation Config

This protobuf message configures the translation operator for augment. The operator translates the image and polygon coordinates along the x and/or y axis.

Parameter

Datatype

Description

Supported Values

translate_x

int

The number of pixels to translate the image along the x axis.

0 - image_width

translate_y

int

The number of pixels to translate the image along the y axis.

0 - image_height

Color Augmentation Config

Color augmentation config contains parameters to configure the color space augmentation routines. This is a nested protobuf element called color_config containing protobuf elements for all the color augmentation operations.

Parameter

Datatype

Description

Supported Values

hue_saturation_config

Protobuf message

This augmentation operator applies hue rotation and color saturation augmentation.

Copy
Copied!
            

{ hue_rotation_angle: 30 saturation_shift: 1.0 }

See Hue saturation config

contrast_config

Protobuf message

This augmentation operator applies contrast scaling.

Copy
Copied!
            

{ contrast: 0.0 center: 127.5 }

See Contrast config

brightness_config

Protobuf message

This protobuf message configures the translation augmentation operator. Defining this activates translating the images across the x and/or y axes.

Copy
Copied!
            

{ offset: 100 }

See Brightness config

The augmentation operators may be enabled by simply defining the corresponding proto associated with it. When defining multiple proto elements, it implies that all the augmentation operations are cascaded.

If you don’t want to introduce any of the supported augmentation operations, simply omit the field you wish to drop. The configurable parameters for the individual color augmentation operators are mentioned in the table below.

Hue Saturation Config

This augmentation operator applies a color space manipulation by converting the RGB image to HSV applying hue rotation and saturation shift and then returning with the corresponding RGB image.

Parameter

Datatype

Description

Supported Values

hue_rotation_angle

float32

Hue rotation in degrees (scalar or vector). A value of 0.0 (modulo 360) leaves the hue unchanged.

0 - 360 (the angles are computed as angle % 360)

saturation_shift

float32

Saturation shift multiplier. A value of 1.0 leaves the saturation unchanged. A value of 0 removes all saturation from the image and makes all channels equal in value.

0.0 - 1.0

Brightness Config

This augmentation operator applies a channel-wise brightness shift.

Parameter

Datatype

Description

Supported Values

offset

float32

Offset value per color channel

0 - 255

Contrast Config

This augmentation operator applies contrast scaling across a center point to an image.

Parameter

Datatype

Description

Supported Values

contrast

float32

Contrast scale value. A value 0 leaves the contrast unchanged.

0 - 1.0

center

float32

Center value for the image. In our case, the images are scaled between 0-255 (8 bit images), therefore setting a value of 127.5 is the common value.

0.0 - 1.0

Dataloader

Copy
Copied!
            

dataset_config { data_sources: { tfrecords_path: "/path/to/tfrecords/root/*" image_directory_path: "/path/to/dataset/root" } image_extension: "png" target_class_mapping { key: "car" value: "car" } target_class_mapping { key: "pedestrian" value: "pedestrian" } target_class_mapping { key: "cyclist" value: "cyclist" } target_class_mapping { key: "van" value: "car" } target_class_mapping { key: "person_sitting" value: "pedestrian" } validation_fold: 0 }

See Dataloader for more information.

Blur Config

This protobuf element configures the Gaussian blur operator to an image. A Gaussian kernel is formulated based on the parameters mentioned below and then a 2D convolution is performed between this image and kernel per channel.

Parameter

Datatype

Description

Supported Values

size

int

Size of the kernel to be convolved.

>0

std

float

Standard deviation of the Gaussian filter to blurring.

>0.0

For example, the following configuration file augments the image by:

  1. Rotating an image by 5 deg.

  2. Shearing along x axis by a ratio of 0.3.

  3. Translating along x axis by 8 pixels.

Copy
Copied!
            

# Spec file for augment. spatial_config{ rotation_config{ angle: 5.0 units: "degrees" } shear_config{ shear_ratio_x: 0.3 } translation_config{ translate_x: 8 } } color_config{ hue_saturation_config{ hue_rotation_angle: 25.0 saturation_shift: 1.0 } } # Setting up dataset config. dataset_config{ image_path: "image_2" label_path: "label_2" } output_image_width: 1248 output_image_height: 384 output_image_channel: 3 image_extension: ".png"

Running the Augmentor Tool

The augment tool has a simple command line interface, which is defined as follows:

Copy
Copied!
            

tao augment [-h] -d /path/to/the/dataset/root -a /path/to/augmentation/spec/file -o /path/to/the/augmented/output [-v]

Here are the command line parameters:

  • -h, --help: Show this help message and exit

  • -d, --dataset-folder: Path to the detection dataset

  • -a, --augmentation-proto: Path to augmentation spec file

  • -o, --output-dataset: Path to the augmented output dataset

  • -v, --verbose: Flag to get detailed logs during the augmentation process

The augmented images and labels are generated in the path mentioned in the output-dataset parameter under the following directories mentioned in the dataset_config. For the sample config file mentioned above, the images and labels would be created in

  • Augmented images: /path/to/augmented/output/image_2

  • Augmented labels: /path/to/augmented/output/label_2

Note

When running augment with the verbose flag set, augment generates augmented images with the bbox outputs rendered under /path/to/augmented/output/images_annotated.

The log from a successful run of augment is mentioned below:

The dataset thus generated may then be used with the dataset-convert tool to be converted to TFRecords so that it may be ingested by the train sub task. The details about converting the data to TFRecords are described in the Data Input for Object Detection section, and training a model with this dataset is described in the Data Annotation Format section.

Note

The augment only applies the spatial augmentation operators to the bounding box coordinates’ fields in the label files of the input dataset. Only the bbox coordinates are relevant to us. All the other fields are propagated from the input labels to the output labels.

Sample rendered augmented images are shown below.

rendered_aug_images1.png

Input image rotated by 5 degrees

rendered_aug_images2.png

Image rotated by 5 degrees, hue rotation by 25 degrees and saturation shift of 0.0

© Copyright 2022, NVIDIA.. Last updated on Mar 23, 2023.