Offline Data Augmentation

Note

Offline Data Augmentation is currently only designed for object-detection datasets using KITTI or COCO format.

Training a deep neural network can be a daunting task, and the most important component of training a model is the data. Acquiring curated and annotated datasets is often a manual process involving thousands of person-hours of painstaking labelling. Even if the data is carfully collected and planned, it is very difficult to estimate all the corner cases that a network can have, and repeating the process of collecting missing data and annotating it is very expensive and has long turnover times.

Online augmentation with the training data loader is a good way to increase variation in the dataset. However, the augmented data is generated randomly based on the distribution the data loader follows when sampling the data. To achieve a high level of accuracy, the model may need to be trained for a long time.

To circumvent these limitations, generate a dataset with the required augmentations, and give control to the user, the TAO Data Services provides an Offline Data Augmentation service. Offline augmentation can dramatically increase the size of the dataset when collecting and labeling data is expensive or not possible. The augmentation service provides several custom GPU-accelerated augmentation routines:

Spatial augmentation
Color-space augmentation
Image blur

Spatial Augmentation

Spatial augmentation comprises routines where data is augmented in space. The following spatial augmentation operations are supported:

Rotate (optionally, with AI-assisted bounding-box refinement)
Resize
Translate
Shear
Flip

Color-Space Augmentation

Color-space augmentation comprises routines where the image data is augmented in the color space. The following operators are supported:

Hue Rotation
Brightness offset
Contrast shift

Image Blur

Along with the above augmentation operations, the augmentation service also supports image blurring or sharpening, which is described further in FilterKernel config.

The spatial augmentation routines are applied to the images as well as the groundtruth labels, while the color augmentation routines and channel-wise blur operator are applied only to images. The sample workflow is as follows:

Data Input for Augmentation

Offline Data Augmentation expects a directory of images and a directory containing KITTI text labels or a COCO JSON file.

Refer to the Data Annotation Format KITTI and COCO sections for more information about the data formats.

Configuring a Spec File for Augmentation

The configuration YAML file for augmentation experiements contains four major components, as well as a few global parameters for launching multi-GPU job and saving the job log.

Spatial-augmentation config
Color-augmentation config
Blur config
Data config (output image dimension, dataset type, etc.)

Parameter	Datatype	Description
spatial_aug	Dict config	The configuration of the spatial-augmentation operators
color_aug	Dict config	The configuration of the color-augmentation operators
blur_aug	Dict config	The configuration of the gaussian-blur operator to be applied on the image. The blur is computed channel-wise and then concatenated based on the number of image channels.
data	Dict config	The configuration of the input data type and input/output dataset location
random_seed	int32	The seed for the randomized augmentation operations

Spatial-Augmentation Config

The spatial-augmentation config contains parameters to configure the spatial augmentation routines.

Parameter	Datatype	Description	Supported Values
rotation	Dict config	Configures the rotate-augmentation operator. Defining this value activates rotation.	Copy Copied! `{ angle: 0.5 units: degrees }` See Rotation config
flip	Dict config	Configures the flip-augmentation operator. Defining this value activates flip along the horizontal and/or vertical axes.	Copy Copied! `{ flip_vertical: true flip_horizontal: true }` See Flip config
translation	Dict config	Configures the translation-augmentation operator. Defining this values activates translating the images across the x and/or y axes.	Copy Copied! `{ translate_x: 8 translate_y: 8 }` See Translation config
shear	Dict config	Configures the shear augmentation operator. Defining this value activates a shear to the images across the x and/or y axes.	Copy Copied! `{ shear_ratio_x: 0.2 shear_ratio_y: 0.2 }` See Shear config

Rotation Config

The rotation operation rotates the image at an angle. The transformation matrix for shear operation is computed as follows:

Copy
Copied!

            
            [x_new, y_new, 1] = [x, y, 1] * [[cos(angle) sin(angle)              zero]
                                 [-sin(angle)           cos(angle)           zero]
                                 [x_t                         y_t                    one]]
Where x_t, y_t are defined as
x_t = height * sin(angle) / 2.0 - width * cos(angle) / 2.0 + width / 2.0
y_t = -1 * height * cos(angle) / 2.0 + height / 2.0 - width * sin_(angle) / 2.0
Here height = height of the output image, width = width of the output image.

Parameter	Datatype	Description	Supported Values
angle	float	The angle of the rotation to be applied to the image and the coordinates	+/- 0 - 360 (degrees) +/- 0 - 2ℼ (radians)
units	string	The units in which the `angle` parameter is measured	“degrees”, “radians”
refine_box	Dict config	Parameters for enabling AI-assisted bounding box refinement

Parameter	Datatype	Description	Supported Values
gt_cache	string	The path to the groundtruth mask labels or the pseudo mask labels generated by the TAO Auto Labeling service
enabled	boolea	A flag specifying whether to enable AI-assisted bounding box refinement	True, False

Shear Config

The shear operation introduces a slant to the object along the x or y dimension. The transformation matrix for shear operation is computed as follows:

Copy
Copied!

            
            [x_new, y_new, 1] = [x, y, 1] * [[1.0             shear_ratio_y,    0],
                                          [shear_ratio_x,       1.0,         0],
                                    [x_t,                 y_t,       1.0]]
X_t = -height * shear_ratio_x / 2.
Y_t = -width * shear_ratio_y / 2.
Here height = height of the output image, width = width of the output image.

Parameter	Datatype	Description	Supported Values
shear_ratio_x	float32	The amount of horizontal shift per y row.
shear_ratio_y	float32	The amount of vertical shift per x column.

Flip Config

The operator flips an image and the bounding box coordinates along the horizontal and vertical axis.

Parameter	Datatype	Description	Supported Values
flip_horizontal	bool	The flag to enable flipping an image horizontally	true, false
flip_vertical	bool	The flag to enable flipping an image vertically	true, false

Translation Config

The operator translates the image and polygon coordinates along the x and/or y axis.

Parameter	Datatype	Description	Supported Values
translate_x	int	The number of pixels to translate the image along the x axis	0 - image_width
translate_y	int	The number of pixels to translate the image along the y axis	0 - image_height

Color Augmentation Config

The color augmentation config contains parameters to configure the color space augmentation routines, including the following:

Parameter	Datatype	Description
hue	Dict config	This augmentation operator applies hue rotation augmentation.
contrast	Dict config	This augmentation operator applies contrast scaling.
brightness	Dict config	This configures the brightness shift augmentation operator.
saturation	Dict config	This augmentation operator applies color saturation augmentation.

Hue Config

This augmentation operator applies color space manipulation by converting the RGB image to HSV, performing hue rotation, and then returning with the corresponding RGB image.

Parameter	Datatype	Description	Supported Values
hue_rotation_angle	float32	Hue rotation in degrees (scalar or vector). A value of 0.0 (modulo 360) leaves the hue unchanged.	0 - 360 (the angles are computed as angle % 360)

Saturation Config

This augmentation operator applies color space manipulation by converting the RGB image to HSV, performing saturation shift, and then returning with the corresponding RGB image.

Parameter	Datatype	Description	Supported Values
saturation_shift	float32	Saturation shift multiplier. A value of 1.0 leaves the saturation unchanged. A value of 0 removes all saturation from the image and makes all channels equal in value.	0.0 - 1.0

Brightness Config

This augmentation operator applies a channel-wise brightness shift.

Parameter	Datatype	Description	Supported Values
offset	float32	Offset value per color channel	0 - 255

Contrast Config

This augmentation operator applies contrast scaling across a center point to an image.

Parameter	Datatype	Description	Supported Values
contrast	float32	The contrast scale value. A value of 0 leaves the contrast unchanged.	0 - 1.0
center	float32	The center value for the image. In this case, the images are scaled between 0-255 (8 bit images), so setting a value of 127.5 is common.	0.0 - 1.0

KernelFilter Config

The KernelFilter config applies the Gaussian blur operator to an image. A Gaussian kernel is formulated based on the following parameters, then a per-channel 2D convolution is performed between this image and kernel.

Parameter	Datatype	Description	Supported Values
size	int	The size of the kernel for convolution	>0
std	float	The standard deviation of the Gaussian filter for blurring	>0.0

Data Config

The data config parameters configure the input and output data dimension, type, and location.

Parameter	Datatype	Description
dataset_type	string	The dataset type. Only “coco” and “kitti” values are supported
output_image_width	Optional[int]	The output image width. If this value is not specified, the input image width is preserved.
output_image_height	Optional[int]	The output image height. If this value is not specified, the input image height is preserved.
image_dir	string	The input image directory
ann_path	string	The annotation path (i.e. the label directory for the KITTI dataset and annotation JSON for COCO)
output_dataset	string	The directory to save the augmented images and labels
batch_size	int	The batch size of DALI dataloader
include_masks	boolean	A flag specifying whether to load segmentation annotation when reading a COCO JSON file

Running the augmentation service

The offline augmentation service has a simple command line interface, which is defined as follows:

Copy
Copied!

            
            tao dataset augmentation generate [-h] -e <experiment spec file>
                                       --gpus <num_gpus>

-e, --experiment_spec_file: The path to the YAML spec file

Optional Arguments

--gpus: The number of GPUs to use. The default value is 1.
-h, --help: Show this help message and exit.

Once the dataset is generated, you can use the code:dataset-convert tool to convert it to TFRecords so that it can be ingested by the train sub-task. Details about converting the data to TFRecords are described in the Data Input for Object Detection section, and training a model with this dataset is described in the Data Annotation Format section.

Below are some sample rendered augmented images:

Input image rotated by 5 degrees

Image rotated by 5 degrees, with hue rotation by 25 degrees and a saturation shift of 0.0