RandAugment#
RandAugment, as described in https://arxiv.org/abs/1909.13719, is an automatic augmentation scheme
that simplified the AutoAugment.
For RandAugment the policy is just a list of augmentations
with a search space limited to two parameters n
and m
.
n
describes how many randomly selected augmentations should we apply to an input sample.m
is a fixed magnitude used for all of the augmentations.
For example, to use 3 random operations for each sample, each with fixed magnitude 17,
you can call rand_augment()
, as follows:
from nvidia.dali import pipeline_def, fn, types
from nvidia.dali.auto_aug import rand_augment
@pipeline_def(enable_conditionals=True)
def training_pipe(data_dir, image_size):
jpegs, labels = fn.readers.file(file_root=data_dir, ...)
shapes = fn.peek_image_shape(jpegs)
images = fn.decoders.image(jpegs, device="mixed", output_type=types.RGB)
augmented_images = rand_augment.rand_augment(images, shape=shapes, n=3, m=17)
resized_images = fn.resize(augmented_images, size=[image_size, image_size])
return resized_images, labels
The rand_augment()
uses set of augmentations described in
the paper. To apply custom augmentations refer to
this section.
Warning
You need to define the pipeline with the @pipeline_def
decorator and set enable_conditionals
to True
to use automatic augmentations.
Invoking predefined RandAugment policies#
To invoke the predefined RandAugment policy, use the following function.
- nvidia.dali.auto_aug.rand_augment.rand_augment(data, n, m, num_magnitude_bins=31, shape=None, fill_value=128, interp_type=None, max_translate_abs=None, max_translate_rel=None, seed=None, monotonic_mag=True, excluded=None)#
Applies RandAugment (https://arxiv.org/abs/1909.13719) augmentation scheme to the provided batch of samples.
- Parameters:
data (DataNode) – A batch of samples to be processed. The supported samples are images of HWC layout and videos of FHWC layout, the supported data type is uint8.
n (int) – The number of randomly sampled operations to be applied to a sample.
m (int) – A magnitude (strength) of each operation to be applied, it must be an integer within
[0, num_magnitude_bins - 1]
.num_magnitude_bins (int, optional) – The number of bins to divide the magnitude ranges into.
shape (DataNode or Tuple[int, int], optional) – The size (height and width) of the image or frames in the video sequence passed as the data. If specified, the magnitude of translation operations depends on the image/frame shape and spans from 0 to max_translate_rel * shape. Otherwise, the magnitude range is [0, max_translate_abs] for any sample.
fill_value (int, optional) – A value to be used as a padding for images/frames transformed with warp_affine ops (translation, shear and rotate). If None is specified, the images/frames are padded with the border value repeated (clamped).
interp_type (DALIInterpType, optional) – Interpolation method used by the warp_affine ops (translation, shear and rotate). Supported values are types.INTERP_LINEAR (default) and types.INTERP_NN.
max_translate_abs (int or (int, int), optional) – Only valid when
shapes
is not provided. Specifies the maximal shift (in pixels) in the translation augmentation. If a tuple is specified, the first component limits height, the second the width. Defaults to 100, which means the maximal magnitude shifts the image by 100 pixels.max_translate_rel (float or (float, float), optional) – Only valid when
shapes
argument is provided. Specifies the maximal shift as a fraction of image shape in the translation augmentations. If a tuple is specified, the first component limits the height, the second the width. Defaults to around 0.45 (100/224).seed (int, optional) – Seed to be used to randomly sample operations (and to negate magnitudes).
monotonic_mag (bool, optional) – There are two flavours of RandAugment available in different frameworks. For the default
monotonic_mag=True
the strength of operations that accept magnitude bins increases with the increasing bins. If set to False, the magnitude ranges for some color operations differ. There, theposterize()
andsolarize()
strength decreases with increasing magnitude bins and enhance operations (brightness()
,contrast()
,color()
,sharpness()
) use (0.1, 1.9) range, which means that the strength decreases the closer the magnitudes are to the center of the range. Seeget_rand_augment_non_monotonic_suite()
.excluded (List[str], optional) – A list of names of the operations to be excluded from the default suite of augmentations. If, instead of just limiting the set of operations, you need to include some custom operations or fine-tune the existing ones, you can use the
apply_rand_augment()
directly, which accepts a list of augmentations.
- Returns:
A batch of transformed samples.
- Return type:
Invoking custom RandAugment policies#
Thanks to the simpler nature of RandAugment, its policies are defined as lists of
augmentations, that can be passed as a first argument to the
apply_rand_augment()
when invoked inside a pipeline
definition.
- nvidia.dali.auto_aug.rand_augment.apply_rand_augment(augmentations, data, n, m, num_magnitude_bins=31, seed=None, **kwargs)#
Applies the list of
augmentations
in RandAugment (https://arxiv.org/abs/1909.13719) fashion. Each sample is transformed withn
operations in a sequence randomly selected from theaugmentations
list. Each operation usesm
as the magnitude bin.- Parameters:
augmentations (List[core._Augmentation]) – List of augmentations to be sampled and applied in RandAugment fashion.
data (DataNode) – A batch of samples to be processed.
n (int) – The number of randomly sampled operations to be applied to a sample.
m (int) – A magnitude bin (strength) of each operation to be applied, it must be an integer within
[0, num_magnitude_bins - 1]
.num_magnitude_bins (int) – The number of bins to divide the magnitude ranges into.
seed (int) – Seed to be used to randomly sample operations (and to negate magnitudes).
kwargs – Any extra parameters to be passed when calling augmentations. The signature of each augmentation is checked for any extra arguments and if the name of the argument matches one from the kwargs, the value is passed as an argument. For example, some augmentations from the default RandAugment suite accept
shapes
,fill_value
andinterp_type
.
- Returns:
A batch of transformed samples.
- Return type:
Accessing predefined policies#
To obtain the predefined policy definition refer to the following functions.
- nvidia.dali.auto_aug.rand_augment.get_rand_augment_suite(use_shape=False, max_translate_abs=None, max_translate_rel=None)#
Creates a list of RandAugment augmentations.
- Parameters:
use_shape (bool) – If true, the translation offset is computed as a percentage of the image/frame shape. Useful if the samples processed with the auto augment have different shapes. If false, the offsets range is bounded by a constant (max_translate_abs).
max_translate_abs (int or (int, int), optional) – Only valid with use_shape=False, specifies the maximal shift (in pixels) in the translation augmentations. If a tuple is specified, the first component limits height, the second the width. Defaults 100.
max_translate_rel (float or (float, float), optional) – Only valid with use_shape=True, specifies the maximal shift as a fraction of image/frame shape in the translation augmentations. If a tuple is specified, the first component limits height, the second the width. Defaults to around 0.45 (100/224).
- nvidia.dali.auto_aug.rand_augment.get_rand_augment_non_monotonic_suite(use_shape=False, max_translate_abs=None, max_translate_rel=None)#
Similarly to
get_rand_augment_suite()
creates a list of RandAugment augmentations.This variant uses brightness, contrast, color, sharpness, posterize, and solarize with magnitude ranges as used by the AutoAugment. However, those ranges do not meet the intuition that the bigger magnitude bin corresponds to stronger operation.