Geometric Transforms

In this example we demonstrate the operators from transforms module and how they can be used for transforming images and point clouds.

Affine Transform

The operators from transforms module can generate and combine transform matrices for different kinds of affine transforms. An affine transform is defined by the formula:

\[\begin{split}X_{out} = \begin{vmatrix} M & T \end{vmatrix} \begin{vmatrix} X_{in} \\ 1 \end{vmatrix}\end{split}\]

Where \(X_{in}\) is an input point, \(X_{out}\) - the corresponding output, \(M\) - linear part of the transformation and \(T\) - a translation vector.

If the points are in 2D space, the formula can be written as:

\[\begin{split}\begin{vmatrix} x_{out} \\ y_{out} \end{vmatrix} = \begin{vmatrix} m_{00} & m_{01} & t_x \\ m_{10} & m_{11} & t_y \end{vmatrix} \begin{vmatrix} x_{in} \\ y_{in} \\ 1 \end{vmatrix}\end{split}\]

Transform Catalogue

There are several transforms available in transforms module. Each of these operators can generate an affine transform matrix and combine it with a pre-existing transform. Here’s the list of available transforms:

  • rotation - rotate by given angle (in degrees) around given point and axis (for 3D only)

  • translation - translate by given offset

  • scale - scale by given factor

  • shear - shear by given factors or angles; there are 2 shear factors for 2D and 6 factors for 3D

  • crop - translates and scales so that input corners (from_start, from_end) map to output corners (to_start, to_end).

The documentation of the operators contains the detailed information about their parameters.

There’s also the operator combine which combines multiple affine transforms.

Case Study: Transforming Keypoints

To illustrate the capabilities of the transforms, we’ll apply them to images with corresponding keypoint data - in this case, face landmarks. We start with importing necessary modules, defining the location of the data and writing a utility that displays images with keypoints drawn on them.

[1]:
from nvidia.dali import pipeline_def
import nvidia.dali.fn as fn
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import math
import os

dali_extra_dir = os.environ["DALI_EXTRA_PATH"]
root_dir = os.path.join(dali_extra_dir, "db", "face_landmark")

# images are in JPEG format
image_files = ["{}.jpeg".format(i) for i in range(6)]
# keypoints are in NumPy files
keypoint_files = ["{}.npy".format(i) for i in range(6)]
[2]:
def show(images, landmarks):
    if hasattr(images, "as_cpu"):
        images = images.as_cpu()
    batch_size = len(images)

    import matplotlib.gridspec as gridspec

    fig = plt.figure(figsize=(16, 14))
    plt.suptitle(None)
    columns = 3
    rows = int(math.ceil(batch_size / columns))
    gs = gridspec.GridSpec(rows, columns)
    for i in range(batch_size):
        ax = plt.subplot(gs[i])
        plt.axis("off")
        plt.title("")
        img = images.at(i)
        r = 0.002 * max(img.shape[0], img.shape[1])
        for p in landmarks.at(i):
            circle = patches.Circle(p, r, color=(0, 1, 0, 1))
            ax.add_patch(circle)
        plt.imshow(img)

First, let’s build a pipeline that just loads the images and keypoints, without any augmentations:

[3]:
@pipeline_def
def basic_pipe():
    jpegs, _ = fn.readers.file(file_root=root_dir, files=image_files)
    images = fn.decoders.image(jpegs, device="mixed")
    keypoints = fn.readers.numpy(file_root=root_dir, files=keypoint_files)
    return images, keypoints


pipe = basic_pipe(batch_size=6, num_threads=3, device_id=0)
[4]:
pipe.build()
images, keypoints = pipe.run()
[5]:
show(images, keypoints)
../../_images/examples_math_geometric_transforms_7_0.png

Adding Transforms to the Pipeline

In this step we apply a transform to the images and keypoints. We use warp_affine to transform images and coord_transform to transform keypoints. The operator warp_affine uses the transform matrix to perform inverse mapping: destination pixel coordinates are mapped to source coordinates. This effectively transforms the locations of image features by the inverse of the transform matrix. To make the keypoints and images transformed in the same way, we need to specify inverse_map=False in warp_affine.

[6]:
@pipeline_def
def rotate_pipe():
    jpegs, _ = fn.readers.file(file_root=root_dir, files=image_files)
    images = fn.decoders.image(jpegs, device="mixed")
    keypoints = fn.readers.numpy(file_root=root_dir, files=keypoint_files)
    mt = fn.transforms.rotation(angle=fn.random.uniform(range=(-45, 45)))
    images = fn.warp_affine(images, matrix=mt, fill_value=0, inverse_map=False)
    keypoints = fn.coord_transform(keypoints, MT=mt)
    return images, keypoints


pipe = rotate_pipe(batch_size=6, num_threads=3, device_id=0, seed=1234)
pipe.build()
images, keypoints = pipe.run()
[7]:
show(images, keypoints)
../../_images/examples_math_geometric_transforms_10_0.png

As we can see, the images have been rotated around point (0, 0) which is top-left corner. To rotate around the center, we can pass additional center argument to rotate. We can’t use a shape of images to calculate the center, becuase the images are on GPU. We can, however, look up the image shapes before decoding with peek_image_shape operator.

[8]:
def encoded_images_sizes(jpegs):
    shapes = fn.peek_image_shape(jpegs)  # the shapes are HWC
    h, w = shapes[0], shapes[1]  # extract H and W ...
    return fn.stack(w, h)  # ...and concatenate


@pipeline_def
def center_rotate_pipe():
    jpegs, _ = fn.readers.file(file_root=root_dir, files=image_files)
    images = fn.decoders.image(jpegs, device="mixed")

    size = encoded_images_sizes(jpegs)
    center = size / 2

    keypoints = fn.readers.numpy(file_root=root_dir, files=keypoint_files)
    mt = fn.transforms.rotation(angle=fn.random.uniform(range=(-45, 45)), center=center)
    images = fn.warp_affine(images, matrix=mt, fill_value=0, inverse_map=False)
    keypoints = fn.coord_transform(keypoints, MT=mt)
    return images, keypoints


pipe = center_rotate_pipe(batch_size=6, num_threads=3, device_id=0, seed=1234)
pipe.build()
images, keypoints = pipe.run()
[9]:
show(images, keypoints)
../../_images/examples_math_geometric_transforms_13_0.png

Combining Transforms

We can also combine multiple transforms. This can be achieved in two ways: 1. by passing an existing transform matrix as an input to a transform operator, 2. by explicitly using transforms.combine

In the example below, we apply rotation followed by a horizontal translation.

[10]:
@pipeline_def
def multi_transform_pipe():
    jpegs, _ = fn.readers.file(file_root=root_dir, files=image_files)
    images = fn.decoders.image(jpegs, device="mixed")

    size = encoded_images_sizes(jpegs)
    center = size / 2

    keypoints = fn.readers.numpy(file_root=root_dir, files=keypoint_files)
    mt = fn.transforms.rotation(angle=fn.random.uniform(range=(-45, 45)), center=center)
    mt = fn.transforms.translation(mt, offset=(400, 0))
    images = fn.warp_affine(images, matrix=mt, fill_value=0, inverse_map=False)
    keypoints = fn.coord_transform(keypoints, MT=mt)
    return images, keypoints


pipe = multi_transform_pipe(batch_size=6, num_threads=3, device_id=0, seed=1234)
pipe.build()
images, keypoints = pipe.run()
[11]:
show(images, keypoints)
../../_images/examples_math_geometric_transforms_16_0.png

Combinining Multiple Transforms with transforms.combine

This section demonstrates the usage of combine operator with results of other transforms and constants.

[12]:
@pipeline_def
def transform_combine_pipe():
    jpegs, _ = fn.readers.file(file_root=root_dir, files=image_files)
    images = fn.decoders.image(jpegs, device="mixed")

    size = encoded_images_sizes(jpegs)
    center = size / 2

    keypoints = fn.readers.numpy(file_root=root_dir, files=keypoint_files)
    tr1 = fn.transforms.translation(offset=-center)
    tr2 = fn.transforms.translation(offset=center)
    rot = fn.transforms.rotation(angle=fn.random.uniform(range=(-45, 45)))
    mt = fn.transforms.combine(tr1, rot, np.float32([[1, 1, 0], [0, 1, 0]]), tr2)
    images = fn.warp_affine(images, matrix=mt, fill_value=0, inverse_map=False)
    keypoints = fn.coord_transform(keypoints, MT=mt)
    return images, keypoints


pipe = transform_combine_pipe(batch_size=6, num_threads=3, device_id=0, seed=1234)
pipe.build()
images, keypoints = pipe.run()
[13]:
show(images, keypoints)
../../_images/examples_math_geometric_transforms_19_0.png

Keypoint Cropping

In the example below, we apply some randomized transforms and crop the result so that the face is in the center of the output image.

[14]:
@pipeline_def
def crop_pipe():
    jpegs, _ = fn.readers.file(file_root=root_dir, files=image_files)
    images = fn.decoders.image(jpegs, device="mixed")
    keypoints = fn.readers.numpy(file_root=root_dir, files=keypoint_files)

    # This part defines the agumentations: shear + rotation
    mt = fn.transforms.shear(shear=fn.random.uniform(range=(-1, 1), shape=[2]))
    mt = fn.transforms.rotation(mt, angle=fn.random.uniform(range=(-45, 45)))

    # Now, let's see where the keypoints would be after applying this transform
    uncropped = fn.coord_transform(keypoints, MT=mt)

    # Find the bounding box of the keypoints
    lo = fn.reductions.min(uncropped, axes=[0])
    hi = fn.reductions.max(uncropped, axes=[0])
    # ...and get its larger extent (width or height)
    size = fn.reductions.max(hi - lo)
    center = (lo + hi) / 2
    # make a square region centered at the center of the bounding box
    lo = center - size  # full size - this adds 50% margin
    hi = center + size  # likewise

    # Now we can calculate a crop transform that will map the bounding box to a 400x400 window
    # and combine it with the previous transform.
    mt = fn.transforms.crop(mt, from_start=lo, from_end=hi, to_start=[0, 0], to_end=[400, 400])

    # Apply the transform to the keypoints; specify the output size of 400x400.
    images = fn.warp_affine(images, size=[400, 400], matrix=mt, fill_value=0, inverse_map=False)
    keypoints = fn.coord_transform(keypoints, MT=mt)
    return images, keypoints


pipe = crop_pipe(batch_size=6, num_threads=3, device_id=0, seed=1234)
pipe.build()
images, keypoints = pipe.run()
[15]:
show(images, keypoints)
../../_images/examples_math_geometric_transforms_22_0.png