AutoAugment

AutoAugment, as described in https://arxiv.org/abs/1805.09501, builds policies out of pairs of augmentations called subpolicies. Each subpolicy specifies sequence of operations with the probability of application and the magnitude parameter. When AutoAugment is used, for each sample a random subpolicy is selected and applied.

To use the predefined policy that was discovered on ImageNet, import and invoke auto_augment() inside the pipeline definition, for example:

from nvidia.dali import pipeline_def, fn, types
from nvidia.dali.auto_aug import auto_augment

@pipeline_def(enable_conditionals=True)
def training_pipe(data_dir, image_size):

    jpegs, labels = fn.readers.file(file_root=data_dir, ...)
    shapes = fn.peek_image_shape(jpegs)
    images = fn.decoders.image(jpegs, device="mixed", output_type=types.RGB)

    augmented_images = auto_augment.auto_augment(images, shape=shapes)

    resized_images = fn.resize(augmented_images, size=[image_size, image_size])

    return resized_images, labels

Warning

You need to define the pipeline with the @pipeline_def decorator and set enable_conditionals to True to use automatic augmentations.

Refer to this section to read more about using custom policies.

Invoking predefined AutoAugment policies

To invoke one of the predefined policies use the following functions.

nvidia.dali.auto_aug.auto_augment.auto_augment(data, policy_name='image_net', shape=None, fill_value=128, interp_type=None, max_translate_abs=None, max_translate_rel=None, seed=None)

Applies one of the predefined policies from the AutoAugment paper (https://arxiv.org/abs/1805.09501) to the provided batch of samples.

Parameters:
  • data (DataNode) – A batch of samples to be processed. The supported samples are images of HWC layout and videos of FHWC layout, the supported data type is uint8.

  • policy_name (str, optional) – The name of predefined policy. Acceptable values are: image_net, reduced_image_net, svhn, reduced_cifar10. Defaults to image_net.

  • shape (DataNode or Tuple[int, int], optional) – The size (height and width) of the image or frames in the video sequence passed as the data. If specified, the magnitude of translation operations depends on the image/frame shape and spans from 0 to max_translate_rel * shape. Otherwise, the magnitude range is [0, max_translate_abs] for any sample.

  • fill_value (int, optional) – A value to be used as a padding for images/frames transformed with warp_affine ops (translation, shear and rotate). If None is specified, the images/frames are padded with the border value repeated (clamped).

  • interp_type (DALIInterpType, optional) – Interpolation method used by the warp_affine ops (translation, shear and rotate). Supported values are types.INTERP_LINEAR (default) and types.INTERP_NN.

  • max_translate_abs (int or (int, int), optional) – Only valid when shape is not provided. Specifies the maximal shift (in pixels) in the translation augmentation. If a tuple is specified, the first component limits height, the second the width. Defaults to 250, which means the maximal magnitude shifts the image by 250 pixels.

  • max_translate_rel (float or (float, float), optional) – Only valid when shape argument is provided. Specifies the maximal shift as a fraction of image shape in the translation augmentations. If a tuple is specified, the first component limits the height, the second the width. Defaults to 1, which means the maximal magnitude shifts the image entirely out of the canvas.

  • seed (int, optional) – Seed to be used to randomly sample operations (and to negate magnitudes).

Returns:

A batch of transformed samples.

Return type:

DataNode

nvidia.dali.auto_aug.auto_augment.auto_augment_image_net(data, shape=None, fill_value=128, interp_type=None, max_translate_abs=None, max_translate_rel=None, seed=None)

Applies image_net_policy in AutoAugment (https://arxiv.org/abs/1805.09501) fashion to the provided batch of samples.

Equivalent to auto_augment() call with policy_name specified to 'image_net'. See auto_augment() function for details.

Building and invoking custom policies

DALI’s AutoAugment implementation relies on Policy() class to define the policies to execute, which can be invoked within the pipeline using apply_auto_augment() function.

The best way is to wrap your policy creation into a function:

from nvidia.dali.auto_aug import augmentations
from nvidia.dali.auto_aug.core import Policy

def my_custom_policy() -> Policy:
     """
     Creates a simple AutoAugment policy with 3 sub-policies using custom magnitude ranges.
     """

     shear_x = augmentations.shear_x.augmentation((0, 0.5), True)
     shear_y = augmentations.shear_y.augmentation((0, 0.5), True)
     rotate = augmentations.rotate.augmentation((0, 40), True)
     invert = augmentations.invert
     return Policy(
         name="SimplePolicy", num_magnitude_bins=11, sub_policies=[
             [(shear_x, 0.8, 7), (shear_y, 0.8, 4)],
             [(invert, 0.4, None), (rotate, 0.6, 8)],
             [(rotate, 0.6, 7), (shear_y, 0.6, 3)],
         ])

The tuple within the subpolicy definition specifies:

  • the augmentation to use,

  • the probability of applying that augmentation (if this subpolicy is selected),

  • the magnitude to be used.

class nvidia.dali.auto_aug.core.Policy(name, num_magnitude_bins, sub_policies)
__init__(name, num_magnitude_bins, sub_policies)

Describes the augmentation policy as introduced in AutoAugment (https://arxiv.org/abs/1805.09501).

Parameters:
  • name (str) – A name of the policy, for presentation purposes.

  • num_magnitude_bins (int) – The number of bins that augmentations’ magnitude ranges should be divided into.

  • sub_policies (Sequence[Sequence[Tuple[Augmentation, float, Optional[int]]]]) – A list of sequences of transformations. For each processed sample, one of the sequences is chosen uniformly at random. Then, the tuples from the sequence are considered one by one. Each tuple describes what augmentation to apply at that point, what is the probability of skipping the augmentation at that time and what magnitude to use with the augmentation.

nvidia.dali.auto_aug.auto_augment.apply_auto_augment(policy, data, seed=None, **kwargs)

Applies AutoAugment (https://arxiv.org/abs/1805.09501) augmentation scheme to the provided batch of samples.

Parameters:
  • policy (Policy) – Set of sequences of augmentations to be applied in AutoAugment fashion.

  • data (DataNode) – A batch of samples to be processed.

  • seed (int, optional) – Seed to be used to randomly sample operations (and to negate magnitudes).

  • kwargs – A dictionary of extra parameters to be passed when calling augmentations. The signature of each augmentation is checked for any extra arguments and if the name of the argument matches one from the kwargs, the value is passed as an argument. For example, some augmentations from the default AutoAugment suite accept shape, fill_value and interp_type.

Returns:

A batch of transformed samples.

Return type:

DataNode

Accessing predefined policies

To obtain the predefined policy definition refer to the following functions.

nvidia.dali.auto_aug.auto_augment.get_image_net_policy(use_shape=False, max_translate_abs=None, max_translate_rel=None)

Creates augmentation policy tuned for the ImageNet as described in AutoAugment paper (https://arxiv.org/abs/1805.09501). The returned policy can be run with apply_auto_augment().

Parameters:
  • use_shape (bool) – If true, the translation offset is computed as a percentage of the image/frame shape. Useful if the samples processed with the auto augment have different shapes. If false, the offsets range is bounded by a constant (max_translate_abs).

  • max_translate_abs (int or (int, int), optional) – Only valid with use_shape=False, specifies the maximal shift (in pixels) in the translation augmentations. If a tuple is specified, the first component limits height, the second the width. Defaults to 250.

  • max_translate_rel (float or (float, float), optional) – Only valid with use_shape=True, specifies the maximal shift as a fraction of image/frame shape in the translation augmentations. If a tuple is specified, the first component limits height, the second the width. Defaults to 1.

nvidia.dali.auto_aug.auto_augment.get_reduced_cifar10_policy(use_shape=False, max_translate_abs=None, max_translate_rel=None)

Creates augmentation policy tuned with the reduced CIFAR-10 as described in AutoAugment paper (https://arxiv.org/abs/1805.09501). The returned policy can be run with apply_auto_augment().

Parameters:
  • use_shape (bool) – If true, the translation offset is computed as a percentage of the image/frame shape. Useful if the samples processed with the auto augment have different shapes. If false, the offsets range is bounded by a constant (max_translate_abs).

  • max_translate_abs (int or (int, int), optional) – Only valid with use_shape=False, specifies the maximal shift (in pixels) in the translation augmentations. If a tuple is specified, the first component limits height, the second the width. Defaults to 250.

  • max_translate_rel (float or (float, float), optional) – Only valid with use_shape=True, specifies the maximal shift as a fraction of image/frame shape in the translation augmentations. If a tuple is specified, the first component limits height, the second the width. Defaults to 1.

nvidia.dali.auto_aug.auto_augment.get_svhn_policy(use_shape=False, max_translate_abs=None, max_translate_rel=None)

Creates augmentation policy tuned with the SVHN as described in AutoAugment paper (https://arxiv.org/abs/1805.09501). The returned policy can be run with apply_auto_augment().

Parameters:
  • use_shape (bool) – If true, the translation offset is computed as a percentage of the image/frame shape. Useful if the samples processed with the auto augment have different shapes. If false, the offsets range is bounded by a constant (max_translate_abs).

  • max_translate_abs (int or (int, int), optional) – Only valid with use_shape=False, specifies the maximal shift (in pixels) in the translation augmentations. If a tuple is specified, the first component limits height, the second the width. Defaults to 250.

  • max_translate_rel (float or (float, float), optional) – Only valid with use_shape=True, specifies the maximal shift as a fraction of image/frame shape in the translation augmentations. If a tuple is specified, the first component limits height, the second the width. Defaults to 1.

nvidia.dali.auto_aug.auto_augment.get_reduced_image_net_policy()

Creates augmentation policy tuned with the reduced ImageNet as described in AutoAugment paper (https://arxiv.org/abs/1805.09501). The returned policy can be run with apply_auto_augment().