Serialization

Overview

This sample shows how to serialize the pipeline to a string.

Serialization

In order to use C API or TensorFlow plugin (or just to save the pipeline with a model, so the training process is fully reproducible) we need to serialize the pipeline.

Let us make a simple pipeline reading from MXNet recordIO format (for example of using other data formats please see other examples in examples directory.

In [1]:
from nvidia.dali.pipeline import Pipeline
import nvidia.dali.ops as ops
import nvidia.dali.types as types
import numpy as np
import matplotlib.pyplot as plt

base = "/data/imagenet/train-480-val-256-recordio/"
idx_files = [base + "train.idx"]
rec_files = [base + "train.rec"]


class SerializedPipeline(Pipeline):
    def __init__(self, batch_size, num_threads, device_id, seed):
        super(SerializedPipeline, self).__init__(batch_size,
                                                 num_threads,
                                                 device_id,
                                                 seed = seed)
        self.input = ops.MXNetReader(path = rec_files, index_path = idx_files)
        self.decode = ops.nvJPEGDecoder(device = "mixed", output_type = types.RGB)
        self.resize = ops.Resize(device = "gpu",
                                 image_type = types.RGB,
                                 interp_type = types.INTERP_LINEAR)
        self.cmnp = ops.CropMirrorNormalize(device = "gpu",
                                            output_dtype = types.FLOAT,
                                            crop = (224, 224),
                                            image_type = types.RGB,
                                            mean = [0., 0., 0.],
                                            std = [1., 1., 1.])
        self.res_uniform = ops.Uniform(range = (256.,480.))

    def define_graph(self):
        inputs, labels = self.input(name="Reader")
        images = self.decode(inputs)
        images = self.resize(images, resize_shorter = self.res_uniform())
        output = self.cmnp(images)
        return (output, labels)
In [2]:
batch_size = 16

pipe = SerializedPipeline(batch_size=batch_size, num_threads=2, device_id = 0, seed = 12)

We will now serialize this pipeline, using serialize function of the Pipeline class.

In [3]:
s = pipe.serialize()

In order to deserialize our pipeline in Python, we need to create another pipeline, this time using the generic Pipeline class. We give the same seed to the new pipeline, in order to compare the results.

In [4]:
pipe2 = Pipeline(batch_size = batch_size, num_threads = 2, device_id = 0, seed = 12)

Let us now use the serialized form of pipe object to make pipe2 a copy of it.

In [5]:
pipe2.deserialize_and_build(s)

Now we can compare the results of the 2 pipelines - original and deserialized.

In [6]:
pipe.build()
original_pipe_out = pipe.run()
serialized_pipe_out = pipe2.run()
In [7]:
def check_difference(batch_1, batch_2):
    return [np.sum(np.abs(batch_1.at(i) - batch_2.at(i))) for i in range(batch_size)]
In [8]:
original_images, _ = original_pipe_out
serialized_images, _ = serialized_pipe_out
In [9]:
check_difference(original_images.asCPU(), serialized_images.asCPU())
Out[9]:
[0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0]

Both pipelines give exactly the same results.