Simple Video pipeline reading from multiple files¶
Goal¶
In this example, we will go through the creation of a pipeline using the VideoReader operator. The pipeline will return the output of VideoReader: a batch of sequences. These sequences are an arbitrary number of frames (images). The difference being that images are or dimension HWC
whereas sequences are of dimension FHWC
.
For more information on the VideoReader parameters, please look at the documentation reference.
To make it clearer, let’s look at how we can obtain these sequences and how to use them!
Setting up¶
First let’s start with the imports:
[2]:
from __future__ import print_function
from __future__ import division
import os
import numpy as np
from nvidia.dali.pipeline import Pipeline
import nvidia.dali.ops as ops
import nvidia.dali.types as types
We need some video containers to process. We can use Sintel trailer, which is a mp4 video container containing a h264 and under the Create Common license. Let’s split it into 10s clips in order to check how VideoReader
handles mutliple video files. This can be done easily with the ffmpeg
standalone tool.
[3]:
%%bash
mkdir -p video_files
container_name=prepared.mp4
# Download video sample
wget -q -O ${container_name} https://download.blender.org/durian/trailer/sintel_trailer-720p.mp4
IFS='.' read -a splitted <<< "$container_name"
for i in {0..4};
do
ffmpeg -ss 00:00:${i}0 -t 00:00:10 -i $container_name -vcodec copy -acodec copy -y video_files/${splitted[0]}_$i.${splitted[1]
}
done
Then we can set the parameters that will be use in the pipeline. The count
parameter will define how many frames we want in each sequence sample.
We can replace video_directory
with any other directory containing video container files recognized by FFmpeg.
[4]:
batch_size=2
sequence_length=8
initial_prefetch_size=16
video_directory = "video_files"
video_files=[video_directory + '/' + f for f in os.listdir(video_directory)]
shuffle=True
n_iter=6
Running the pipeline¶
We can then define a minimal Pipeline that will output directly the VideoReader outputs:
[5]:
class VideoPipe(Pipeline):
def __init__(self, batch_size, num_threads, device_id, data, shuffle):
super(VideoPipe, self).__init__(batch_size, num_threads, device_id, seed=16)
self.input = ops.VideoReader(device="gpu", filenames=data, sequence_length=sequence_length,
shard_id=0, num_shards=1,
random_shuffle=shuffle, initial_fill=initial_prefetch_size)
def define_graph(self):
output = self.input(name="Reader")
return output
Caution: One important here is tuning initial_fill
, that correspond to the Loader prefetch buffer intial size. Since this buffer will be filled of initial_fill
sequences, the total number of frames can be really huge! So set it consequently to not OOM during training.
Let’s try to build and run a VideoPipe
on device 0 that will output batch_size
sequences of count
frames at each iteration.
[6]:
pipe = VideoPipe(batch_size=batch_size, num_threads=2, device_id=0, data=video_files, shuffle=shuffle)
pipe.build()
for i in range(n_iter):
pipe_out = pipe.run()
sequences_out = pipe_out[0].as_cpu().as_array()
print(sequences_out.shape)
(2, 8, 720, 1280, 3)
(2, 8, 720, 1280, 3)
(2, 8, 720, 1280, 3)
(2, 8, 720, 1280, 3)
(2, 8, 720, 1280, 3)
(2, 8, 720, 1280, 3)
Visualizing the results¶
The previous iterations seems to have the yield batches of the expected shape. But let’s visualize the results to be
[7]:
pipe_out = pipe.run()
sequences_out = pipe_out[0].as_cpu().as_array()
We will use matplotlib to display the frames we obtained in the last batch.
[8]:
%matplotlib inline
from matplotlib import pyplot as plt
import matplotlib.gridspec as gridspec
[9]:
def show_sequence(sequence):
columns = 4
rows = (sequence_length + 1) // (columns)
fig = plt.figure(figsize = (32,(16 // columns) * rows))
gs = gridspec.GridSpec(rows, columns)
for j in range(rows*columns):
plt.subplot(gs[j])
plt.axis("off")
plt.imshow(sequence[j])
[10]:
show_sequence(sequences_out[0])
And let’s check a second sequence:
[11]:
pipe_out = pipe.run()
sequences_out = pipe_out[0].as_cpu().as_array()
show_sequence(sequences_out[1])
And a third one…
[12]:
pipe_out = pipe.run()
sequences_out = pipe_out[0].as_cpu().as_array()
show_sequence(sequences_out[0])