Video pipeline reading labelled videos from a directory

Goal

In this example, we will go through the creation of a pipeline using the VideoReader operator to read videos along with their labels. The pipeline will return a pair of outputs from VideoReader: a batch of sequences and respective labels.

For more information on the VideoReader parameters, please look at the documentation reference.

To make it clearer, let’s look at how we can obtain these sequences and how to use them!

Setting up

First let’s start with the imports:

[18]:
from __future__ import print_function
from __future__ import division
import os
import numpy as np

from nvidia.dali.pipeline import Pipeline
import nvidia.dali.ops as ops
import nvidia.dali.types as types

We need some video containers to process. We can use Sintel trailer, which is an mp4 container containing an h.264 video and distributed under the Create Common license. We’ve split it into 5 second clips and divided the clips into labelled groups. This can be done easily with the ffmpeg standalone tool. DALI_EXTRA_PATH environment variable should point to the place where data from DALI extra repository is downloaded. Please make sure that the proper release tag is checked out. The snippet below verifies, that you have defined DALI_extra path as an environment variable.

[19]:
print(os.listdir(os.environ['DALI_EXTRA_PATH']))
['image_info.txt', 'LICENSE', 'README.rst', 'db', '.gitattributes', '.git', 'NVIDIA_CLA_v1.0.1.docx']

Then we can set the parameters that will be used in the pipeline. The count parameter will define how many frames we want in each sequence sample.

We can replace video_directory with any other directory containing labelled subdirectories and video container files recognized by FFmpeg.

[20]:
batch_size=2
sequence_length=8

initial_prefetch_size=11

video_directory = os.path.join(os.environ['DALI_EXTRA_PATH'], "db", "video", "sintel", "labelled_videos")

shuffle=True

n_iter=6

Running the pipeline

We can then define a minimal Pipeline that will output directly the VideoReader outputs:

[11]:
class VideoPipe(Pipeline):
    def __init__(self, batch_size, num_threads, device_id, data, shuffle):
        super(VideoPipe, self).__init__(batch_size, num_threads, device_id, seed=16)
        self.input = ops.VideoReader(device="gpu", file_root=data, sequence_length=sequence_length,
                                     shard_id=0, num_shards=1,
                                     random_shuffle=shuffle, initial_fill=initial_prefetch_size)


    def define_graph(self):
        output, labels = self.input(name="Reader")
        return output, labels

Caution: One important here is tuning initial_fill, that correspond to the Loader prefetch buffer intial size. Since this buffer will be filled of initial_fill sequences, the total number of frames can be really huge! So set it consequently to not OOM during training.

Let’s try to build and run a VideoPipe on device 0 that will output batch_size sequences of count frames and batch_size labels at each iteration.

[12]:
pipe = VideoPipe(batch_size=batch_size, num_threads=2, device_id=0, data=video_directory, shuffle=shuffle)
pipe.build()
for i in range(n_iter):
    sequences_out, labels = pipe.run()
    sequences_out = sequences_out.as_cpu().as_array()
    labels = labels.as_cpu().as_array()
    print(sequences_out.shape)
    print(labels.shape)
(2, 8, 720, 1280, 3)
(2, 1)
(2, 8, 720, 1280, 3)
(2, 1)
(2, 8, 720, 1280, 3)
(2, 1)
(2, 8, 720, 1280, 3)
(2, 1)
(2, 8, 720, 1280, 3)
(2, 1)
(2, 8, 720, 1280, 3)
(2, 1)

Visualizing the results

The previous iterations seems to have the yield batches of the expected shape. But let’s visualize the results to be

[13]:
sequences_out, labels = pipe.run()
sequences_out = sequences_out.as_cpu().as_array()
labels = labels.as_cpu().as_array()

We will use matplotlib to display the frames we obtained in the last batch.

[14]:
%matplotlib inline
from matplotlib import pyplot as plt
import matplotlib.gridspec as gridspec
[15]:
def show_sequence(sequence, label):
    columns = 4
    rows = (sequence_length + 1) // (columns)
    fig = plt.figure(figsize = (32,(16 // columns) * rows))
    gs = gridspec.GridSpec(rows, columns)
    for j in range(rows*columns):
        plt.subplot(gs[j])
        plt.axis("off")
        plt.suptitle("label " + str(label[0]), fontsize=30)
        plt.imshow(sequence[j])

And now let’s generate 5 batches of sequence, label pairs:

[16]:
ITER = 5
for i in range(ITER):
    sequences_out, labels = pipe.run()
    sequences_out = sequences_out.as_cpu().as_array()
    labels = labels.as_cpu().as_array()
    show_sequence(sequences_out[1], labels[1])
../../../_images/examples_sequence_processing_video_video_reader_label_example_18_0.png
../../../_images/examples_sequence_processing_video_video_reader_label_example_18_1.png
../../../_images/examples_sequence_processing_video_video_reader_label_example_18_2.png
../../../_images/examples_sequence_processing_video_video_reader_label_example_18_3.png
../../../_images/examples_sequence_processing_video_video_reader_label_example_18_4.png