RandAugment¶

RandAugment, as described in https://arxiv.org/abs/1909.13719, is an automatic augmentation scheme that simplified the AutoAugment. For RandAugment the policy is just a list of augmentations with a search space limited to two parameters n and m.

n describes how many randomly selected augmentations should we apply to an input sample.
m is a fixed magnitude used for all of the augmentations.

For example, to use 3 random operations for each sample, each with fixed magnitude 17, you can call rand_augment(), as follows:

from nvidia.dali import pipeline_def
from nvidia.dali.auto_aug import rand_augment

@pipeline_def(enable_conditionals=True)
def training_pipe(data_dir, image_size):

    jpegs, labels = fn.readers.file(file_root=data_dir, ...)
    shapes = fn.peek_image_shape(jpegs)
    images = fn.decoders.image(jpegs, device="mixed", output_type=types.RGB)

    augmented_images = rand_augment.rand_augment(images, shape=shapes, n=3, m=17)

    resized_images = fn.resize(augmented_images, size=[image_size, image_size])

    return resized_images, labels

The rand_augment() uses set of augmentations described in the paper. To apply custom augmentations refer to this section.

Warning

You need to define the pipeline with the @pipeline_def decorator and set enable_conditionals to True to use automatic augmentations.

Invoking predefined RandAugment policies¶

To invoke the predefined RandAugment policy, use the following function.

nvidia.dali.auto_aug.rand_augment.rand_augment(data, n, m, num_magnitude_bins=31, shape=None, fill_value=128, interp_type=None, max_translate_abs=None, max_translate_rel=None, seed=None, monotonic_mag=True, excluded=None)¶

Applies RandAugment (https://arxiv.org/abs/1909.13719) augmentation scheme to the provided batch of samples.

Parameters:

data (DataNode) – A batch of samples to be processed. The supported samples are images of HWC layout and videos of FHWC layout, the supported data type is uint8.
n (int) – The number of randomly sampled operations to be applied to a sample.
m (int) – A magnitude (strength) of each operation to be applied, it must be an integer within [0, num_magnitude_bins - 1].
num_magnitude_bins (int, optional) – The number of bins to divide the magnitude ranges into.
shape (DataNode or Tuple[int, int], optional) – The size (height and width) of the image or frames in the video sequence passed as the data. If specified, the magnitude of translation operations depends on the image/frame shape and spans from 0 to max_translate_rel * shape. Otherwise, the magnitude range is [0, max_translate_abs] for any sample.
fill_value (int, optional) – A value to be used as a padding for images/frames transformed with warp_affine ops (translation, shear and rotate). If None is specified, the images/frames are padded with the border value repeated (clamped).
interp_type (DALIInterpType, optional) – Interpolation method used by the warp_affine ops (translation, shear and rotate). Supported values are types.INTERP_LINEAR (default) and types.INTERP_NN.
max_translate_abs (int or (int, int), optional) – Only valid when shapes is not provided. Specifies the maximal shift (in pixels) in the translation augmentation. If a tuple is specified, the first component limits height, the second the width. Defaults to 100, which means the maximal magnitude shifts the image by 100 pixels.
max_translate_rel (float or (float, float), optional) – Only valid when shapes argument is provided. Specifies the maximal shift as a fraction of image shape in the translation augmentations. If a tuple is specified, the first component limits the height, the second the width. Defaults to around 0.45 (100/224).
seed (int, optional) – Seed to be used to randomly sample operations (and to negate magnitudes).
monotonic_mag (bool, optional) – There are two flavours of RandAugment available in different frameworks. For the default monotonic_mag=True the strength of operations that accept magnitude bins increases with the increasing bins. If set to False, the magnitude ranges for some color operations differ. There, the posterize() and solarize() strength decreases with increasing magnitude bins and enhance operations ( brightness(), contrast(), color(), sharpness()) use (0.1, 1.9) range, which means that the strength decreases the closer the magnitudes are to the center of the range. See get_rand_augment_non_monotonic_suite().
excluded (List[str], optional) – A list of names of the operations to be excluded from the default suite of augmentations. If, instead of just limiting the set of operations, you need to include some custom operations or fine-tune the existing ones, you can use the apply_rand_augment() directly, which accepts a list of augmentations.

Returns:

A batch of transformed samples.

Return type:

DataNode

Invoking custom RandAugment policies¶

Thanks to the simpler nature of RandAugment, its policies are defined as lists of augmentations, that can be passed as a first argument to the apply_rand_augment() when invoked inside a pipeline definition.

nvidia.dali.auto_aug.rand_augment.apply_rand_augment(augmentations, data, n, m, num_magnitude_bins=31, seed=None, **kwargs)¶

Applies the list of augmentations in RandAugment (https://arxiv.org/abs/1909.13719) fashion. Each sample is transformed with n operations in a sequence randomly selected from the augmentations list. Each operation uses m as the magnitude bin.

Parameters:

augmentations (List[core._Augmentation]) – List of augmentations to be sampled and applied in RandAugment fashion.
data (DataNode) – A batch of samples to be processed.
n (int) – The number of randomly sampled operations to be applied to a sample.
m (int) – A magnitude bin (strength) of each operation to be applied, it must be an integer within [0, num_magnitude_bins - 1].
num_magnitude_bins (int) – The number of bins to divide the magnitude ranges into.
seed (int) – Seed to be used to randomly sample operations (and to negate magnitudes).
kwargs – Any extra parameters to be passed when calling augmentations. The signature of each augmentation is checked for any extra arguments and if the name of the argument matches one from the kwargs, the value is passed as an argument. For example, some augmentations from the default RandAugment suite accept shapes, fill_value and interp_type.

Returns:

A batch of transformed samples.

Return type:

DataNode

Accessing predefined policies¶

To obtain the predefined policy definition refer to the following functions.

nvidia.dali.auto_aug.rand_augment.get_rand_augment_suite(use_shape=False, max_translate_abs=None, max_translate_rel=None)¶

Creates a list of RandAugment augmentations.

Parameters:

use_shape (bool) – If true, the translation offset is computed as a percentage of the image/frame shape. Useful if the samples processed with the auto augment have different shapes. If false, the offsets range is bounded by a constant (max_translate_abs).
max_translate_abs (int or (int, int), optional) – Only valid with use_shape=False, specifies the maximal shift (in pixels) in the translation augmentations. If a tuple is specified, the first component limits height, the second the width. Defaults 100.
max_translate_rel (float or (float, float), optional) – Only valid with use_shape=True, specifies the maximal shift as a fraction of image/frame shape in the translation augmentations. If a tuple is specified, the first component limits height, the second the width. Defaults to around 0.45 (100/224).

nvidia.dali.auto_aug.rand_augment.get_rand_augment_non_monotonic_suite(use_shape=False, max_translate_abs=None, max_translate_rel=None)¶

Similarly to get_rand_augment_suite() creates a list of RandAugment augmentations.

This variant uses brightness, contrast, color, sharpness, posterize, and solarize with magnitude ranges as used by the AutoAugment. However, those ranges do not meet the intuition that the bigger magnitude bin corresponds to stronger operation.