Augmenting a Dataset
Training a deep neural network can be a daunting task, and the most important component of training a model is the data. Acquiring curated and annotated dataset can be a very tiring and manual process, involving thousands of man hours of painstaking labelling. In spite of planning and collecting data, it is very difficult to estimate all the corner cases that a network may go through, and repeating the process of collecting the missing data and annotating is very expensive and has long turnover times.
Online augmentation in the training data loader is a good way to increase the variation in the
dataset. However, the augmented data is generated randomly based on the distribution the data
loader follows when sampling the data and in order to achieve good accuracy, the model may need
to be trained for a long time. Inorder to circumvent this and generate a dataset with the
required augmentations and give control to the user, TLT provides an offline augmentation tool
called tlt-augment
. Offline augmentation can dramatically increase the size of the dataset when
collecting and labeling data is expensive or not possible. The tlt-augment
tools provides
several custom GPU accelerated augmentation routines categorized into:
Spatial augmentation
Color space augmentation
Image Blur
Spatial augmentation comprises routines where data is augmented in space. The following spatial augmentation operations are supported in TLT.
Rotate
Resize
Translate
Shear
Flip
Color space augmentation comprises routines where the image data is augmented in the color space. The following color augmentations operators are supported.
Hue Rotation
Brightness offset
Contrast shift
A
long with the above mentioned augmentation operations tlt-augment
also enables use to
blur images, using a Gaussian blur operator. More information about the operation is described
in Blur config.
All augmentation routines currently provided with tlt-augment
are supported only for an object
detection dataset. The spatial augmentation routines are applied to the images as well as the
labelled data coordinates, while the color augmentation routines and channel-wise blue operator
is applied only to images as the object labels are not affected. The sample workflow of using
tlt-augment is as follows:
The data is expected in KITTI format, as described in Data input for objection detection. The following sections describe how to use the augmentation tool.
The augmentor has several components which the user can configure by using a simple protobuf based configuration file. The configuration file is divided into 4 major components.
Spatial augmentation config
Color augmentation config
Blur config
Data dimensions - output image width, output image height, output image channel, image extension.
This configuration file contains several nested protobuf elements, and global parameters which are defined below.
Parameter |
Datatype |
Description |
Supported Values |
spatial_config |
Protobuf message |
This protobuf message configures the spatial augmentation. |
Protobuf definition provided in Spatial augmentation config. |
color_config |
Protobuf message |
This protobuf message configures the color space augmentation operator. |
Protobuf definition provided in Color augmentation config. |
blur_config |
Protobuf message |
This protobuf message configures the gaussian blue operator to be applied on the image. The blur is computed channel wise and then concatenated based on the number of image channels. |
Protobuf definition provided in Blur config. |
dataset_config |
Protobuf message |
This protobuf message configures the relative paths of the images and labels path from the input dataset root defined over the tlt-augment command line. |
Protobuf definition provided in Dataset config. |
output_image_width |
int32 |
This parameter defines the width of the output image. |
|
output_image_height |
int32 |
This parameter defines the height of the output image. |
|
output_image_channel |
int32 |
This parameter defines the number of channels in the output image. |
1, 3 |
image_extension |
string |
The extension of the input image. Note that all the images in the input dataset are expected to be of the same extension. |
.png, .jpeg, .jpg |
Spatial Augmentation Config
Spatial augmentation config contains parameters to configure the spatial augmentation routines.
This is a nested protobuf element called spatial_config
containing protobuf elements
for all the spatial augmentation operations.
Parameter |
Datatype |
Description |
Supported Values |
rotation_config |
Protobuf message |
This protobuf message configures the rotate augmentation operator. Defining this activates rotation. |
See Rotation config |
flip_config |
Protobuf message |
This protobuf message configures the flip augmentation operator. Defining this activates flip along the horizontal and/or vertical axes. |
See Flip config |
translation_config |
Protobuf message |
This protobuf message configures the translation augmentation operator. Defining this activates translating the images across the x and/or y axes. |
|
shear_config |
Protobuf message |
This protobuf message configures the shear augmentation operator. Defining this activates adds a shear to the images across the x and/or y axes. |
See Shear config |
The augmentation operators may be enabled by simply defining the corresponding proto associated with it. When defining multiple proto elements, it implies that all the augmentation operations are cascaded.
If you don’t wish to introduce any of the supported augmentation operations, simply omit the field you wish to drop. The configurable parameters for the individual spatial augmentation operators are mentioned in the table below.
Rotation Config
The rotation operation rotates the image at an angle. The transformation matrix for shear operation is computed as:
[x_new, y_new, 1] = [x, y, 1] * [[cos(angle) sin(angle) zero]
[-sin(angle) cos(angle) zero]
[x_t y_t one]]
Where x_t, y_t are defined as
x_t = height * sin(angle) / 2.0 - width * cos(angle) / 2.0 + width / 2.0
y_t = -1 * height * cos(angle) / 2.0 + height / 2.0 - width * sin_(angle) / 2.0
Here height = height of the output image, width = width of the output image.
Parameter |
Datatype |
Description |
Supported Values |
---|---|---|---|
angle |
float |
The angle of the rotation to be applied to the image and the coordinates. |
+/- 0 - 360 (degrees) +/- 0 - 2ℼ (radians) |
units |
string |
The unit in which the angle parameter mentioned below is mentioned. |
“degrees”, “radians” |
Shear Config
The shear operation introduces a slant to the object along the x or the y dimension. The transformation matrix for shear operation is computed as:
[x_new, y_new, 1] = [x, y, 1] * [[1.0 shear_ratio_y, 0],
[shear_ratio_x, 1.0, 0],
[x_t, y_t, 1.0]]
X_t = -height * shear_ratio_x / 2.
Y_t = -width * shear_ratio_y / 2.
Here height = height of the output image, width = width of the output image.
Parameter |
Datatype |
Description |
Supported Values |
---|---|---|---|
shear_ratio_x |
float32 |
The amount of horizontal shift per y row. |
|
shear_ratio_y |
float32 |
The amount of vertical shift per x column. |
Flip Config
This element configures the flip operator of tlt-augment. The operator flips an image and the bounding box coordinates along the horizontal and vertical axis.
Parameter |
Datatype |
Description |
Supported Values |
---|---|---|---|
flip_horizontal |
bool |
The flag to enable flipping an image horizontally. |
true, false |
flip_vertical |
bool |
The flag to enable flipping an image vertically. |
true, false |
Translation Config
This protobuf message configures the translation operator for tlt-augment
. The operator
translates the image and polygon coordinates along the x and/or y axis.
Parameter |
Datatype |
Description |
Supported Values |
---|---|---|---|
translate_x |
int |
The number of pixels to translate the image along the x axis. |
0 - image_width |
translate_y |
int |
The number of pixels to translate the image along the y axis. |
0 - image_height |
Color Augmentation Config
Color augmentation config contains parameters to configure the color space augmentation
routines. This is a nested protobuf element called color_config
containing protobuf
elements for all the color augmentation operations.
Parameter |
Datatype |
Description |
Supported Values |
---|---|---|---|
hue_saturation_config |
Protobuf message |
This augmentation operator applies hue rotation and color saturation augmentation. |
|
contrast_config |
Protobuf message |
This augmentation operator applies contrast scaling. |
See Contrast config |
brightness_config |
Protobuf message |
This protobuf message configures the translation augmentation operator. Defining this activates translating the images across the x and/or y axes. |
|
The augmentation operators may be enabled by simply defining the corresponding proto associated with it. When defining multiple proto elements, it implies that all the augmentation operations are cascaded.
If you don’t want to introduce any of the supported augmentation operations, simply omit the field you wish to drop. The configurable parameters for the individual color augmentation operators are mentioned in the table below.
Hue Saturation Config
This augmentation operator applies a color space manipulation by converting the RGB image to HSV applying hue rotation and saturation shift and then returning with the corresponding RGB image.
Parameter |
Datatype |
Description |
Supported Values |
---|---|---|---|
hue_rotation_angle |
float32 |
hue rotation in degrees (scalar or vector). A value of 0.0 (modulo 360) leaves the hue unchanged. |
0 - 360 (the angles are computed as angle % 360) |
saturation_shift |
float32 |
Saturation shift multiplier. A value of 1.0 leaves the saturation unchanged. A value of 0 removes all saturation from the image and makes all channels equal in value. |
0.0 - 1.0 |
Brightness Config
This augmentation operator applies a channel-wise brightness shift.
Parameter |
Datatype |
Description |
Supported Values |
---|---|---|---|
offset |
float32 |
Offset value per color channel |
0 - 255 |
Contrast Config
This augmentation operator applies contrast scaling across a center point to an image.
Parameter |
Datatype |
Description |
Supported Values |
---|---|---|---|
contrast |
float32 |
Contrast scale value. A value 0 leaves the contrast unchanged. |
0 - 1.0 |
center |
float32 |
Center value for the image. In our case, the images are scaled between 0-255 (8 bit images), therefore setting a value of 127.5 is the common value. |
0.0 - 1.0 |
Dataloader
dataset_config {
data_sources: {
tfrecords_path: "/path/to/tfrecords/root/*"
image_directory_path: "/path/to/dataset/root"
}
image_extension: "png"
target_class_mapping {
key: "car"
value: "car"
}
target_class_mapping {
key: "pedestrian"
value: "pedestrian"
}
target_class_mapping {
key: "cyclist"
value: "cyclist"
}
target_class_mapping {
key: "van"
value: "car"
}
target_class_mapping {
key: "person_sitting"
value: "pedestrian"
}
validation_fold: 0
}
See Dataloader for more information.
Blur Config
This protobuf element configures the gaussian blur operator to an image. A gaussian kernel is formulated based on the parameters mentioned below and then a 2D convolution is performed between this image and kernel per channel.
Parameter |
Datatype |
Description |
Supported Values |
---|---|---|---|
size |
int |
Size of the kernel to be convolved. |
>0 |
std |
float |
Standard deviation of the gaussian filter to blurring. |
>0.0 |
For example, the following configuration file augments the image by
rotating an image by 5 deg
shearing along x axis by a ratio of 0.3
translating along x axis by 8 pixels
# Spec file for tlt-augment.
spatial_config{
rotation_config{
angle: 5.0
units: "degrees"
}
shear_config{
shear_ratio_x: 0.3
}
translation_config{
translate_x: 8
}
}
color_config{
hue_saturation_config{
hue_rotation_angle: 25.0
saturation_shift: 1.0
}
}
# Setting up dataset config.
dataset_config{
image_path: "image_2"
label_path: "label_2"
}
output_image_width: 1248
output_image_height: 384
output_image_channel: 3
image_extension: ".png"
The tlt-augment
tool has a simple command line interface, which is defined as follows:
tlt-augment [-h] -d /path/to/the/dataset/root
-a /path/to/augmentation/spec/file
-o /path/to/the/augmented/output
[-v]
Here are the command line parameters:
-h, --help
: show this help message and exit-d, --dataset-folder
: Path to the detection dataset-a, --augmentation-proto
: Path to augmentation spec file-o, --output-dataset
: Path to the augmented output dataset-v, --verbose
: Flag to get detailed logs during the augmentation process
The augmented images and labels are generated in the path mentioned in the output-dataset parameter under the following directories.
Augmented images:
/path/to/augmented/output/images
Augmented labels:
/path/to/augmented/output/labels
When running tlt-augment with the verbose flag set, tlt-augment generates augmented images with
the bbox outputs rendered under /path/to/augmented/output/images/annotated
.
The log from a successful run of tlt-augment
is mentioned below:
The dataset thus generated may then be used with tlt-dataset-convert
tool to be converted
to TFRecords so that it may be ingested by tlt-train
. The details about converting the
data to TFRecords are described in the Data Input for Object Detection section and training a model with this dataset is described in
the Preparing the Input Data Structure.
The tlt-augment
only applies the spatial augmentation operators to the bounding box
coordinates fields in the label files of the input dataset, as only the bbox coordinates are
relevant to us. All the other fields are just propagated as from the input labels to the
output labels.
Sample rendered augmented images are shown below.