Serialization

Overview

This sample shows how to serialize the pipeline to a string.

Serialization

In order to use C API or TensorFlow plugin (or just to save the pipeline with a model, so the training process is fully reproducible) we need to serialize the pipeline.

Let us make a simple pipeline reading from MXNet recordIO format (for example of using other data formats please see other examples.

[1]:
from nvidia.dali import pipeline_def, Pipeline
import nvidia.dali.fn as fn
import nvidia.dali.types as types
import numpy as np
import matplotlib.pyplot as plt
import os.path

test_data_root = os.environ["DALI_EXTRA_PATH"]
base = os.path.join(test_data_root, "db", "recordio")

idx_files = [base + "/train.idx"]
rec_files = [base + "/train.rec"]


@pipeline_def
def example_pipe():
    encoded, labels = fn.readers.mxnet(path=rec_files, index_path=idx_files)
    images = fn.decoders.image(encoded, device="mixed", output_type=types.RGB)
    images = fn.resize(
        images,
        interp_type=types.INTERP_LINEAR,
        resize_shorter=fn.random.uniform(range=(256.0, 480.0)),
    )
    images = fn.crop_mirror_normalize(
        images, dtype=types.FLOAT, crop=(224, 224), mean=[0.0, 0.0, 0.0], std=[1.0, 1.0, 1.0]
    )
    return images, labels
[2]:
batch_size = 16

pipe = example_pipe(batch_size=batch_size, num_threads=2, device_id=0, seed=12)

We will now serialize this pipeline, using serialize function of the Pipeline class.

[3]:
s = pipe.serialize()

In order to deserialize our pipeline in Python, we need to create another pipeline, this time using the generic Pipeline class. We give the same seed to the new pipeline, in order to compare the results.

[4]:
pipe2 = Pipeline(batch_size=batch_size, num_threads=2, device_id=0, seed=12)

Let us now use the serialized form of pipe object to make pipe2 a copy of it.

[5]:
pipe2.deserialize_and_build(s)

Now we can compare the results of the 2 pipelines - original and deserialized.

[6]:
pipe.build()
original_pipe_out = pipe.run()
serialized_pipe_out = pipe2.run()
[7]:
def check_difference(batch_1, batch_2):
    return [np.sum(np.abs(batch_1.at(i) - batch_2.at(i))) for i in range(batch_size)]
[8]:
original_images, _ = original_pipe_out
serialized_images, _ = serialized_pipe_out
[9]:
check_difference(original_images.as_cpu(), serialized_images.as_cpu())
[9]:
[0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0]

Both pipelines give exactly the same results.