Normalize operator¶

This notebook illustrates the usage of Normalize operator.

Introduction¶

Normalization is the process of shifting and scaling the data values to match desired distribution. It is done by calculating the mean \(\mu\) and standard deviation \(\sigma\) and modifying the data as follows:

\[Y_i = \frac{X_i - \mu}{\sigma}\]

There are more advanced features in Normalize, which we’ll explore after we’ve had the first glance at Normalize in its default setting.

Using the `Normalize` operator¶

First, we need some boilerplate code to import DALI and some other useful libraries and also to visualize the results.

[1]:

from nvidia.dali.pipeline import Pipeline
import math
import nvidia.dali.ops as ops
import nvidia.dali.types as types

batch_size = 10
image_filename = "../data/images"

import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec

def display(outputs, idx, columns = 2, captions = None):
    rows = int(math.ceil(len(outputs) / columns))
    fig = plt.figure()
    fig.set_size_inches(16, 6 * rows)
    gs = gridspec.GridSpec(rows, columns)
    row = 0
    col = 0
    for i, out in enumerate(outputs):
        plt.subplot(gs[i])
        plt.axis("off")
        if captions is not None:
            plt.title(captions[i])
        plt.imshow(out.at(idx));

def show(pipe_class, idx, columns = 2, captions = None):
    pipe = pipe_class(batch_size=batch_size, num_threads=1, device_id=0)
    pipe.build()
    display(pipe.run(), idx, columns, captions)

A simple pipeline¶

Let’s start with a simple pipeline which just loads some images and normalizes their dynamic range.

[2]:

class NormalizeSimple(Pipeline):
    def __init__(self, batch_size, num_threads, device_id):
        super(NormalizeSimple, self).__init__(batch_size, num_threads, device_id, seed=42)
        self.input = ops.FileReader(device="cpu", file_root=image_filename)
        self.decode = ops.ImageDecoder(device="cpu", output_type=types.RGB)
        self.norm = ops.Normalize(device="cpu")

    def define_graph(self):
        read, _ = self.input()
        image = self.decode(read)
        normalized = self.norm(image)
        return image, normalized

[3]:

show(NormalizeSimple, 1)

Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

../../_images/examples_general_normalize_6_1.png

Adjusting output dynamic range¶

As you can see in the example above, the image intensity values have been scaled and shifted, with many pixels forced below 0 (and displayed as black). This may be desired in many use cases, but if the output type has limited dynamic range (e.g. uint8), we may want to map the mean and standard deviation to values that more effectively utilize that limited range of values. For this purpose, Normalize offers two scalar arguments: shift and scale. Now the normalization formula becomes:

\[Y_i = \frac{X_i - \mu}{\sigma} \cdot {scale} + {shift}\]

Let’s modify the pipeline to produce uint8 output with the mean mapped to 128 and standard deviation to 64, which allows values within \(\mu \pm 2\sigma\) range to be correctly represented in the output.

[4]:

class NormalizeScaleShift(Pipeline):
    def __init__(self, batch_size, num_threads, device_id):
        super(NormalizeScaleShift, self).__init__(batch_size, num_threads, device_id, seed=42)
        self.input = ops.FileReader(device="cpu", file_root=image_filename)
        self.decode = ops.ImageDecoder(device="cpu", output_type=types.RGB)
        self.norm = ops.Normalize(device="cpu", scale=64, shift=128, dtype=types.UINT8)

    def define_graph(self):
        read, _ = self.input()
        image = self.decode(read)
        normalized = self.norm(image)
        return image, normalized

[5]:

show(NormalizeScaleShift, 1)

../../_images/examples_general_normalize_9_0.png

Directional reductions¶

In case of multidimensional data, it may be useful to calculate the mean and standard deviation only for a subset of dimensions. For example, the dimesnions may correspond to height (0), width (1) and color channels (2) of an image. Reducing the dimensions 0, 1 (height, width) will produce a separate mean and standard deviation for each channel. Normalize supports two arguments to specify directions:

axes - a tuple of dimension indices, 0 being outermost
axis_names - axis symbols looked up in the input layout

The example below normalizes the data along WC, H, WH and C.

[6]:

class NormalizeDirectional(Pipeline):
    def __init__(self, batch_size, num_threads, device_id):
        super(NormalizeDirectional, self).__init__(batch_size, num_threads, device_id, seed=42)
        self.input = ops.FileReader(device="cpu", file_root=image_filename)
        self.decode = ops.ImageDecoder(device="cpu", output_type=types.RGB)
        self.normwc  = ops.Normalize(device="cpu", axes = (1, 2), scale=64, shift=128, dtype=types.UINT8)
        self.normh  = ops.Normalize(device="cpu", axis_names = "H", scale=64, shift=128, dtype=types.UINT8)
        self.normhw = ops.Normalize(device="cpu", axis_names = "HW", scale=64, shift=128, dtype=types.UINT8)
        self.normc  = ops.Normalize(device="cpu", axes = (2,), scale=64, shift=128, dtype=types.UINT8)

    def define_graph(self):
        read, _ = self.input()
        image = self.decode(read)
        return [image, self.normwc(image), self.normh(image), self.normhw(image), self.normc(image)]

[7]:

titles = ["Original", "Width and channels", "Height", "Height and width", "Channel"]
show(NormalizeDirectional, 9, captions = titles)

../../_images/examples_general_normalize_12_0.png

Externally provided parameters¶

By default, Normalize calculates the mean and standard deviation internally - however, they can be provided externally via mean and stddev arguments. These arguments can be either scalar values or inputs. When providing mean or stddev as inputs, the directions of reduction can be inferred from parameter’s shape. If both mean and stddev are inputs, they must have the same shapes.

[8]:

class NormalizeWithParam(Pipeline):
    def __init__(self, batch_size, num_threads, device_id):
        super(NormalizeWithParam, self).__init__(batch_size, num_threads, device_id, seed=42)
        self.input = ops.FileReader(device="cpu", file_root=image_filename)
        self.decode = ops.ImageDecoder(device="cpu", output_type=types.RGB)

        self.norm_mean  = ops.Normalize(device="cpu", mean=64,
                                        axis_names="HW", scale=64, shift=128, dtype=types.UINT8)

        self.norm_stddev  = ops.Normalize(device="cpu", stddev=200,
                                          axis_names="HW", scale = 64, shift=128, dtype=types.UINT8)

    def define_graph(self):
        read, _ = self.input()
        image = self.decode(read)
        return [image, self.norm_mean(image), self.norm_stddev(image)]

[9]:

show(NormalizeWithParam, 1, captions = ["Original", "Fixed mean", "Fixed standard deviation"])

../../_images/examples_general_normalize_15_0.png

Batch normalization¶

Normalize can calculate the mean and standard deviation for the whole batch instead of per-item. We can enable this behavior by setting batch argument to True. Batch normalization demands that the extents of non-reduced dimensions match for all samples in the batch. For example, the pipeline below expects that all images have three channels, because we’re normalizing channels separately.

[10]:

class NormalizeBatch(Pipeline):
    def __init__(self, batch_size, num_threads, device_id):
        super(NormalizeBatch, self).__init__(batch_size, num_threads, device_id, seed=42)
        self.input = ops.FileReader(device="cpu", file_root=image_filename)
        self.decode = ops.ImageDecoder(device="cpu", output_type=types.RGB)

        self.norm_sample  = ops.Normalize(device="cpu", batch=False,
                                          axis_names="HW", scale=64, shift=128, dtype=types.UINT8)

        self.norm_batch  = ops.Normalize(device="cpu", batch=True, axis_names="HW",
                                         scale = 64, shift=128, dtype=types.UINT8)

    def define_graph(self):
        read, _ = self.input()
        image = self.decode(read)
        return [image, self.norm_sample(image), self.norm_batch(image)]

[11]:

show(NormalizeBatch, 1, columns = 3, captions = ["Original", "Per-sample normalization", "Batch normalization"])
show(NormalizeBatch, 4, columns = 3, captions = ["Original", "Per-sample normalization", "Batch normalization"])
show(NormalizeBatch, 7, columns = 3, captions = ["Original", "Per-sample normalization", "Batch normalization"])
show(NormalizeBatch, 9, columns = 3, captions = ["Original", "Per-sample normalization", "Batch normalization"])

../../_images/examples_general_normalize_18_0.png

../../_images/examples_general_normalize_18_1.png

../../_images/examples_general_normalize_18_2.png

../../_images/examples_general_normalize_18_3.png