NVIDIA Docs Hub NVIDIA TAO NVIDIA TAO Toolkit v4.0 Offline Data Augmentation

Offline Data Augmentation

Object Detection

Training a deep neural network can be a daunting task and the most important component of training a model is the data. Acquiring curated and annotated dataset can be a very tiring and manual process involving thousands of man hours of painstaking labelling. In spite of planning and collecting data, it is very difficult to estimate all the corner cases that a network may go through. Repeating the process of collecting the missing data and annotating is very expensive and has long turnover times.

Online augmentation in the training data loader is a good way to increase the variation in the dataset. However, the augmented data is generated randomly based on the distribution the data loader follows when sampling the data. In order to achieve good accuracy, the model may need to be trained for a long time. In order to circumvent this and generate a dataset with the required augmentations and give control to the user, TAO Toolkit provides an offline augmentation tool called augment. Offline augmentation can dramatically increase the size of the dataset when collecting and labeling data is expensive or not possible. The augment tools provides several custom GPU accelerated augmentation routines categorized into:

Spatial augmentation
Color space augmentation
Image blur

Spatial augmentation comprises routines where data is augmented in space. The following spatial augmentation operations are supported in TAO Toolkit:

Rotate
Resize
Translate
Shear
Flip

Color space augmentation comprises routines where the image data is augmented in the color space. The following color augmentations operators are supported:

Hue Rotation
Brightness offset
Contrast shift

Along with the above mentioned augmentation operations, augment also enables use to blur images, using a Gaussian blur operator. More information about the operation is described in Blur config.

All augmentation routines currently provided with augment are supported only for an object detection dataset. The spatial augmentation routines are applied to the images as well as the labelled data coordinates, while the color augmentation routines and channel-wise blue operator are applied only to images as the object labels are not affected. The sample workflow of using augment is as follows:

Note

The data is expected in KITTI format, as described in Data Annotation Format. The following sections describe how to use the augmentation tool.

Configuring the Augmentor

The augmentor has several components which the user can configure by using a simple protobuf based configuration file. The configuration file is divided into 4 major components.

Spatial augmentation config
Color augmentation config
Blur config
Data dimensions - output image width, output image height, output image channel, image extension.

This configuration file contains several nested protobuf elements and global parameters which are defined below.

Parameter	Datatype	Description	Supported Values
spatial_config	Protobuf message	This protobuf message configures the spatial augmentation.	Protobuf definition provided in Spatial augmentation config.
color_config	Protobuf message	This protobuf message configures the color space augmentation operator.	Protobuf definition provided in Color augmentation config.
blur_config	Protobuf message	This protobuf message configures the gaussian blue operator to be applied on the image. The blur is computed channel wise and then concatenated based on the number of image channels.	Protobuf definition provided in Blur config. Blur Config
dataset_config	Protobuf message	This protobuf message configures the relative paths of the images and labels path from the input dataset root defined over the `augment` command line.	Protobuf definition provided in Dataset config.
output_image_width	int32	This parameter defines the width of the output image.
output_image_height	int32	This parameter defines the height of the output image.
output_image_channel	int32	This parameter defines the number of channels in the output image.	1, 3
image_extension	string	The extension of the input image. Note that all the images in the input dataset are expected to be of the same extension.	.png, .jpeg, .jpg

Spatial Augmentation Config

Spatial augmentation config contains parameters to configure the spatial augmentation routines. This is a nested protobuf element called spatial_config containing protobuf elements for all the spatial augmentation operations.

Parameter	Datatype	Description	Supported Values
rotation_config	Protobuf message	This protobuf message configures the rotate augmentation operator. Defining this activates rotation.	Copy Copied! `{ angle: 0.5 units: degrees }` See Rotation config
flip_config	Protobuf message	This protobuf message configures the flip augmentation operator. Defining this activates flip along the horizontal and/or vertical axes.	Copy Copied! `{ flip_vertical: true flip_horizontal: true }` See Flip config
translation_config	Protobuf message	This protobuf message configures the translation augmentation operator. Defining this activates translating the images across the x and/or y axes.	Copy Copied! `{ translate_x: 8 translate_y: 8 }` See Translation config
shear_config	Protobuf message	This protobuf message configures the shear augmentation operator. Defining this activates a shear to the images across the x and/or y axes.	Copy Copied! `{ shear_ratio_x: 0.2 shear_ratio_y: 0.2 }` See Shear config

The augmentation operators may be enabled by simply defining the corresponding proto associated with it. When defining multiple proto elements, it implies that all the augmentation operations are cascaded.

If you don’t wish to introduce any of the supported augmentation operations, omit the field you wish to drop. The configurable parameters for the individual spatial augmentation operators are mentioned in the table below.

Rotation Config

The rotation operation rotates the image at an angle. The transformation matrix for shear operation is computed as:

Copy
Copied!

            
            [x_new, y_new, 1] = [x, y, 1] * [[cos(angle) sin(angle)              zero]
                                 [-sin(angle)           cos(angle)           zero]
                                 [x_t                         y_t                    one]]
Where x_t, y_t are defined as
x_t = height * sin(angle) / 2.0 - width * cos(angle) / 2.0 + width / 2.0
y_t = -1 * height * cos(angle) / 2.0 + height / 2.0 - width * sin_(angle) / 2.0
Here height = height of the output image, width = width of the output image.

Parameter	Datatype	Description	Supported Values
angle	float	The angle of the rotation to be applied to the image and the coordinates.	+/- 0 - 360 (degrees) +/- 0 - 2ℼ (radians)
units	string	The unit in which the angle parameter below is mentioned.	“degrees”, “radians”

Shear Config

The shear operation introduces a slant to the object along the x or the y dimension. The transformation matrix for shear operation is computed as:

Copy
Copied!

            
            [x_new, y_new, 1] = [x, y, 1] * [[1.0             shear_ratio_y,    0],
                                          [shear_ratio_x,       1.0,         0],
                                    [x_t,                 y_t,       1.0]]
X_t = -height * shear_ratio_x / 2.
Y_t = -width * shear_ratio_y / 2.
Here height = height of the output image, width = width of the output image.

Parameter	Datatype	Description	Supported Values
shear_ratio_x	float32	The amount of horizontal shift per y row.
shear_ratio_y	float32	The amount of vertical shift per x column.

Flip Config

This element configures the flip operator of augment. The operator flips an image and the bounding box coordinates along the horizontal and vertical axis.

Parameter	Datatype	Description	Supported Values
flip_horizontal	bool	The flag to enable flipping an image horizontally.	true, false
flip_vertical	bool	The flag to enable flipping an image vertically.	true, false

Translation Config

This protobuf message configures the translation operator for augment. The operator translates the image and polygon coordinates along the x and/or y axis.

Parameter	Datatype	Description	Supported Values
translate_x	int	The number of pixels to translate the image along the x axis.	0 - image_width
translate_y	int	The number of pixels to translate the image along the y axis.	0 - image_height

Color Augmentation Config

Color augmentation config contains parameters to configure the color space augmentation routines. This is a nested protobuf element called color_config containing protobuf elements for all the color augmentation operations.

Parameter	Datatype	Description	Supported Values
hue_saturation_config	Protobuf message	This augmentation operator applies hue rotation and color saturation augmentation.	Copy Copied! `{ hue_rotation_angle: 30 saturation_shift: 1.0 }` See Hue saturation config
contrast_config	Protobuf message	This augmentation operator applies contrast scaling.	Copy Copied! `{ contrast: 0.0 center: 127.5 }` See Contrast config
brightness_config	Protobuf message	This protobuf message configures the translation augmentation operator. Defining this activates translating the images across the x and/or y axes.	Copy Copied! `{ offset: 100 }` See Brightness config

If you don’t want to introduce any of the supported augmentation operations, simply omit the field you wish to drop. The configurable parameters for the individual color augmentation operators are mentioned in the table below.

Hue Saturation Config

This augmentation operator applies a color space manipulation by converting the RGB image to HSV applying hue rotation and saturation shift and then returning with the corresponding RGB image.

Parameter	Datatype	Description	Supported Values
hue_rotation_angle	float32	Hue rotation in degrees (scalar or vector). A value of 0.0 (modulo 360) leaves the hue unchanged.	0 - 360 (the angles are computed as angle % 360)
saturation_shift	float32	Saturation shift multiplier. A value of 1.0 leaves the saturation unchanged. A value of 0 removes all saturation from the image and makes all channels equal in value.	0.0 - 1.0

Brightness Config

This augmentation operator applies a channel-wise brightness shift.

Parameter	Datatype	Description	Supported Values
offset	float32	Offset value per color channel	0 - 255

Contrast Config

This augmentation operator applies contrast scaling across a center point to an image.

Parameter	Datatype	Description	Supported Values
contrast	float32	Contrast scale value. A value 0 leaves the contrast unchanged.	0 - 1.0
center	float32	Center value for the image. In our case, the images are scaled between 0-255 (8 bit images), therefore setting a value of 127.5 is the common value.	0.0 - 1.0

Dataloader

Copy
Copied!

            
            dataset_config {
  data_sources: {
    tfrecords_path: "/path/to/tfrecords/root/*"
    image_directory_path: "/path/to/dataset/root"
  }
  image_extension: "png"
  target_class_mapping {
      key: "car"
      value: "car"
  }
  target_class_mapping {
      key: "pedestrian"
      value: "pedestrian"
  }
  target_class_mapping {
      key: "cyclist"
      value: "cyclist"
  }
  target_class_mapping {
      key: "van"
      value: "car"
  }
  target_class_mapping {
      key: "person_sitting"
      value: "pedestrian"
  }
  validation_fold: 0
}

See Dataloader for more information.

Blur Config

This protobuf element configures the Gaussian blur operator to an image. A Gaussian kernel is formulated based on the parameters mentioned below and then a 2D convolution is performed between this image and kernel per channel.

Parameter	Datatype	Description	Supported Values
size	int	Size of the kernel to be convolved.	>0
std	float	Standard deviation of the Gaussian filter to blurring.	>0.0

For example, the following configuration file augments the image by:

Rotating an image by 5 deg.
Shearing along x axis by a ratio of 0.3.
Translating along x axis by 8 pixels.

Copy
Copied!

            
            # Spec file for augment.
spatial_config{
  rotation_config{
    angle: 5.0
    units: "degrees"
  }
  shear_config{
    shear_ratio_x: 0.3
  }
  translation_config{
    translate_x: 8
  }
}
color_config{
  hue_saturation_config{
    hue_rotation_angle: 25.0
    saturation_shift: 1.0
  }
}
# Setting up dataset config.
dataset_config{
  image_path: "image_2"
  label_path: "label_2"
}
output_image_width: 1248
output_image_height: 384
output_image_channel: 3
image_extension: ".png"

Running the Augmentor Tool

The augment tool has a simple command line interface, which is defined as follows:

Copy
Copied!

            
            tao augment [-h] -d /path/to/the/dataset/root
                 -a /path/to/augmentation/spec/file
                 -o /path/to/the/augmented/output
                 [-v]

Here are the command line parameters:

-h, --help: Show this help message and exit
-d, --dataset-folder: Path to the detection dataset
-a, --augmentation-proto: Path to augmentation spec file
-o, --output-dataset: Path to the augmented output dataset
-v, --verbose: Flag to get detailed logs during the augmentation process

The augmented images and labels are generated in the path mentioned in the output-dataset parameter under the following directories mentioned in the dataset_config. For the sample config file mentioned above, the images and labels would be created in

Augmented images: /path/to/augmented/output/image_2
Augmented labels: /path/to/augmented/output/label_2

Note

When running augment with the verbose flag set, augment generates augmented images with the bbox outputs rendered under /path/to/augmented/output/images_annotated.

The log from a successful run of augment is mentioned below:

The dataset thus generated may then be used with the dataset-convert tool to be converted to TFRecords so that it may be ingested by the train sub task. The details about converting the data to TFRecords are described in the Data Input for Object Detection section, and training a model with this dataset is described in the Data Annotation Format section.

Note

The augment only applies the spatial augmentation operators to the bounding box coordinates’ fields in the label files of the input dataset. Only the bbox coordinates are relevant to us. All the other fields are propagated from the input labels to the output labels.

Sample rendered augmented images are shown below.

Input image rotated by 5 degrees

Image rotated by 5 degrees, hue rotation by 25 degrees and saturation shift of 0.0