Simple Video pipeline reading from multiple files

Goal

In this example, we will go through the creation of a pipeline using the VideoReader operator. The pipeline will return the output of VideoReader: a batch of sequences. These sequences are an arbitrary number of frames (images). The difference being that images are or dimension HWC whereas sequences are of dimension FHWC.

For more information on the VideoReader parameters, please look at the documentation reference.

To make it clearer, let’s look at how we can obtain these sequences and how to use them!

Setting up

First let’s start with the imports:

[1]:
from __future__ import print_function
from __future__ import division
import os
import numpy as np

from nvidia.dali.pipeline import Pipeline
import nvidia.dali.ops as ops
import nvidia.dali.types as types
/home/mszolucha/virtualenv/dali/lib/python3.5/importlib/_bootstrap.py:222: FutureWarning: pybind11-bound class 'nvidia.dali.backend_impl.TensorCPU' is using an old-style placement-new '__init__' which has been deprecated. See the upgrade guide in pybind11's docs. This message is only visible when compiled in debug mode.
  return f(*args, **kwds)
/home/mszolucha/virtualenv/dali/lib/python3.5/importlib/_bootstrap.py:222: FutureWarning: pybind11-bound class 'nvidia.dali.backend_impl.TensorListCPU' is using an old-style placement-new '__init__' which has been deprecated. See the upgrade guide in pybind11's docs. This message is only visible when compiled in debug mode.
  return f(*args, **kwds)
/home/mszolucha/virtualenv/dali/lib/python3.5/importlib/_bootstrap.py:222: FutureWarning: pybind11-bound class 'nvidia.dali.backend_impl.TensorListGPU' is using an old-style placement-new '__init__' which has been deprecated. See the upgrade guide in pybind11's docs. This message is only visible when compiled in debug mode.
  return f(*args, **kwds)
/home/mszolucha/virtualenv/dali/lib/python3.5/importlib/_bootstrap.py:222: FutureWarning: pybind11-bound class 'nvidia.dali.backend_impl.TensorCPU' is using an old-style placement-new '__init__' which has been deprecated. See the upgrade guide in pybind11's docs. This message is only visible when compiled in debug mode.
  return f(*args, **kwds)
/home/mszolucha/virtualenv/dali/lib/python3.5/importlib/_bootstrap.py:222: FutureWarning: pybind11-bound class 'nvidia.dali.backend_impl.TensorListCPU' is using an old-style placement-new '__init__' which has been deprecated. See the upgrade guide in pybind11's docs. This message is only visible when compiled in debug mode.
  return f(*args, **kwds)
/home/mszolucha/virtualenv/dali/lib/python3.5/importlib/_bootstrap.py:222: FutureWarning: pybind11-bound class 'nvidia.dali.backend_impl.TensorListGPU' is using an old-style placement-new '__init__' which has been deprecated. See the upgrade guide in pybind11's docs. This message is only visible when compiled in debug mode.
  return f(*args, **kwds)

We need some video containers to process. We can use Sintel trailer, which is a mp4 video container containing a h264 and under the Create Common license. Let’s split it into 10s clips in order to check how VideoReader handles mutliple video files. This can be done easily with the ffmpeg standalone tool.

[2]:
%%bash
mkdir -p video_files

container_name=prepared.mp4

# Download video sample
wget -q -O ${container_name} https://download.blender.org/durian/trailer/sintel_trailer-720p.mp4

IFS='.' read -a splitted <<< "$container_name"

for i in {0..4};
do
    ffmpeg -ss 00:00:${i}0 -t 00:00:10 -i $container_name -vcodec copy -acodec copy -y video_files/${splitted[0]}_$i.${splitted[1]
}
done
ffmpeg version 2.8.15-0ubuntu0.16.04.1 Copyright (c) 2000-2018 the FFmpeg developers
  built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.10) 20160609
  configuration: --prefix=/usr --extra-version=0ubuntu0.16.04.1 --build-suffix=-ffmpeg --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --cc=cc --cxx=g++ --enable-gpl --enable-shared --disable-stripping --disable-decoder=libopenjpeg --disable-decoder=libschroedinger --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmodplug --enable-libmp3lame --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-librtmp --enable-libschroedinger --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxvid --enable-libzvbi --enable-openal --enable-opengl --enable-x11grab --enable-libdc1394 --enable-libiec61883 --enable-libzmq --enable-frei0r --enable-libx264 --enable-libopencv
  libavutil      54. 31.100 / 54. 31.100
  libavcodec     56. 60.100 / 56. 60.100
  libavformat    56. 40.101 / 56. 40.101
  libavdevice    56.  4.100 / 56.  4.100
  libavfilter     5. 40.101 /  5. 40.101
  libavresample   2.  1.  0 /  2.  1.  0
  libswscale      3.  1.101 /  3.  1.101
  libswresample   1.  2.101 /  1.  2.101
  libpostproc    53.  3.100 / 53.  3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'prepared.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    creation_time   : 1970-01-01 00:00:00
    title           : Sintel Trailer
    artist          : Durian Open Movie Team
    encoder         : Lavf52.62.0
    copyright       : (c) copyright Blender Foundation | durian.blender.org
    description     : Trailer for the Sintel open movie project
  Duration: 00:00:52.21, start: 0.000000, bitrate: 1165 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1280x720, 1033 kb/s, 24 fps, 24 tbr, 24 tbn, 48 tbc (default)
    Metadata:
      creation_time   : 1970-01-01 00:00:00
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 126 kb/s (default)
    Metadata:
      creation_time   : 1970-01-01 00:00:00
      handler_name    : SoundHandler
[mp4 @ 0xc46040] Codec for stream 0 does not use global headers but container format requires global headers
[mp4 @ 0xc46040] Codec for stream 1 does not use global headers but container format requires global headers
Output #0, mp4, to 'video_files/prepared_0.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    description     : Trailer for the Sintel open movie project
    title           : Sintel Trailer
    artist          : Durian Open Movie Team
    copyright       : (c) copyright Blender Foundation | durian.blender.org
    encoder         : Lavf56.40.101
    Stream #0:0(und): Video: h264 ([33][0][0][0] / 0x0021), yuv420p, 1280x720, q=2-31, 1033 kb/s, 24 fps, 24 tbr, 12288 tbn, 24 tbc (default)
    Metadata:
      creation_time   : 1970-01-01 00:00:00
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac ([64][0][0][0] / 0x0040), 48000 Hz, stereo, 126 kb/s (default)
    Metadata:
      creation_time   : 1970-01-01 00:00:00
      handler_name    : SoundHandler
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #0:1 -> #0:1 (copy)
Press [q] to stop, [?] for help
frame=  242 fps=0.0 q=-1.0 Lsize=    1414kB time=00:00:10.00 bitrate=1157.8kbits/s
video:1248kB audio:158kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.582740%
ffmpeg version 2.8.15-0ubuntu0.16.04.1 Copyright (c) 2000-2018 the FFmpeg developers
  built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.10) 20160609
  configuration: --prefix=/usr --extra-version=0ubuntu0.16.04.1 --build-suffix=-ffmpeg --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --cc=cc --cxx=g++ --enable-gpl --enable-shared --disable-stripping --disable-decoder=libopenjpeg --disable-decoder=libschroedinger --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmodplug --enable-libmp3lame --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-librtmp --enable-libschroedinger --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxvid --enable-libzvbi --enable-openal --enable-opengl --enable-x11grab --enable-libdc1394 --enable-libiec61883 --enable-libzmq --enable-frei0r --enable-libx264 --enable-libopencv
  libavutil      54. 31.100 / 54. 31.100
  libavcodec     56. 60.100 / 56. 60.100
  libavformat    56. 40.101 / 56. 40.101
  libavdevice    56.  4.100 / 56.  4.100
  libavfilter     5. 40.101 /  5. 40.101
  libavresample   2.  1.  0 /  2.  1.  0
  libswscale      3.  1.101 /  3.  1.101
  libswresample   1.  2.101 /  1.  2.101
  libpostproc    53.  3.100 / 53.  3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'prepared.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    creation_time   : 1970-01-01 00:00:00
    title           : Sintel Trailer
    artist          : Durian Open Movie Team
    encoder         : Lavf52.62.0
    copyright       : (c) copyright Blender Foundation | durian.blender.org
    description     : Trailer for the Sintel open movie project
  Duration: 00:00:52.21, start: 0.000000, bitrate: 1165 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1280x720, 1033 kb/s, 24 fps, 24 tbr, 24 tbn, 48 tbc (default)
    Metadata:
      creation_time   : 1970-01-01 00:00:00
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 126 kb/s (default)
    Metadata:
      creation_time   : 1970-01-01 00:00:00
      handler_name    : SoundHandler
[mp4 @ 0x6f8040] Codec for stream 0 does not use global headers but container format requires global headers
[mp4 @ 0x6f8040] Codec for stream 1 does not use global headers but container format requires global headers
Output #0, mp4, to 'video_files/prepared_1.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    description     : Trailer for the Sintel open movie project
    title           : Sintel Trailer
    artist          : Durian Open Movie Team
    copyright       : (c) copyright Blender Foundation | durian.blender.org
    encoder         : Lavf56.40.101
    Stream #0:0(und): Video: h264 ([33][0][0][0] / 0x0021), yuv420p, 1280x720, q=2-31, 1033 kb/s, 24 fps, 24 tbr, 12288 tbn, 24 tbc (default)
    Metadata:
      creation_time   : 1970-01-01 00:00:00
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac ([64][0][0][0] / 0x0040), 48000 Hz, stereo, 126 kb/s (default)
    Metadata:
      creation_time   : 1970-01-01 00:00:00
      handler_name    : SoundHandler
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #0:1 -> #0:1 (copy)
Press [q] to stop, [?] for help
frame=  498 fps=0.0 q=-1.0 Lsize=    3862kB time=00:00:20.01 bitrate=1580.4kbits/s
video:3521kB audio:326kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.390393%
ffmpeg version 2.8.15-0ubuntu0.16.04.1 Copyright (c) 2000-2018 the FFmpeg developers
  built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.10) 20160609
  configuration: --prefix=/usr --extra-version=0ubuntu0.16.04.1 --build-suffix=-ffmpeg --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --cc=cc --cxx=g++ --enable-gpl --enable-shared --disable-stripping --disable-decoder=libopenjpeg --disable-decoder=libschroedinger --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmodplug --enable-libmp3lame --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-librtmp --enable-libschroedinger --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxvid --enable-libzvbi --enable-openal --enable-opengl --enable-x11grab --enable-libdc1394 --enable-libiec61883 --enable-libzmq --enable-frei0r --enable-libx264 --enable-libopencv
  libavutil      54. 31.100 / 54. 31.100
  libavcodec     56. 60.100 / 56. 60.100
  libavformat    56. 40.101 / 56. 40.101
  libavdevice    56.  4.100 / 56.  4.100
  libavfilter     5. 40.101 /  5. 40.101
  libavresample   2.  1.  0 /  2.  1.  0
  libswscale      3.  1.101 /  3.  1.101
  libswresample   1.  2.101 /  1.  2.101
  libpostproc    53.  3.100 / 53.  3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'prepared.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    creation_time   : 1970-01-01 00:00:00
    title           : Sintel Trailer
    artist          : Durian Open Movie Team
    encoder         : Lavf52.62.0
    copyright       : (c) copyright Blender Foundation | durian.blender.org
    description     : Trailer for the Sintel open movie project
  Duration: 00:00:52.21, start: 0.000000, bitrate: 1165 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1280x720, 1033 kb/s, 24 fps, 24 tbr, 24 tbn, 48 tbc (default)
    Metadata:
      creation_time   : 1970-01-01 00:00:00
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 126 kb/s (default)
    Metadata:
      creation_time   : 1970-01-01 00:00:00
      handler_name    : SoundHandler
[mp4 @ 0x167c040] Codec for stream 0 does not use global headers but container format requires global headers
[mp4 @ 0x167c040] Codec for stream 1 does not use global headers but container format requires global headers
Output #0, mp4, to 'video_files/prepared_2.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    description     : Trailer for the Sintel open movie project
    title           : Sintel Trailer
    artist          : Durian Open Movie Team
    copyright       : (c) copyright Blender Foundation | durian.blender.org
    encoder         : Lavf56.40.101
    Stream #0:0(und): Video: h264 ([33][0][0][0] / 0x0021), yuv420p, 1280x720, q=2-31, 1033 kb/s, 24 fps, 24 tbr, 12288 tbn, 24 tbc (default)
    Metadata:
      creation_time   : 1970-01-01 00:00:00
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac ([64][0][0][0] / 0x0040), 48000 Hz, stereo, 126 kb/s (default)
    Metadata:
      creation_time   : 1970-01-01 00:00:00
      handler_name    : SoundHandler
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #0:1 -> #0:1 (copy)
Press [q] to stop, [?] for help
frame=  868 fps=0.0 q=-1.0 Lsize=    5457kB time=00:00:30.00 bitrate=1489.7kbits/s
video:4868kB audio:563kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.462312%
ffmpeg version 2.8.15-0ubuntu0.16.04.1 Copyright (c) 2000-2018 the FFmpeg developers
  built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.10) 20160609
  configuration: --prefix=/usr --extra-version=0ubuntu0.16.04.1 --build-suffix=-ffmpeg --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --cc=cc --cxx=g++ --enable-gpl --enable-shared --disable-stripping --disable-decoder=libopenjpeg --disable-decoder=libschroedinger --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmodplug --enable-libmp3lame --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-librtmp --enable-libschroedinger --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxvid --enable-libzvbi --enable-openal --enable-opengl --enable-x11grab --enable-libdc1394 --enable-libiec61883 --enable-libzmq --enable-frei0r --enable-libx264 --enable-libopencv
  libavutil      54. 31.100 / 54. 31.100
  libavcodec     56. 60.100 / 56. 60.100
  libavformat    56. 40.101 / 56. 40.101
  libavdevice    56.  4.100 / 56.  4.100
  libavfilter     5. 40.101 /  5. 40.101
  libavresample   2.  1.  0 /  2.  1.  0
  libswscale      3.  1.101 /  3.  1.101
  libswresample   1.  2.101 /  1.  2.101
  libpostproc    53.  3.100 / 53.  3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'prepared.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    creation_time   : 1970-01-01 00:00:00
    title           : Sintel Trailer
    artist          : Durian Open Movie Team
    encoder         : Lavf52.62.0
    copyright       : (c) copyright Blender Foundation | durian.blender.org
    description     : Trailer for the Sintel open movie project
  Duration: 00:00:52.21, start: 0.000000, bitrate: 1165 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1280x720, 1033 kb/s, 24 fps, 24 tbr, 24 tbn, 48 tbc (default)
    Metadata:
      creation_time   : 1970-01-01 00:00:00
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 126 kb/s (default)
    Metadata:
      creation_time   : 1970-01-01 00:00:00
      handler_name    : SoundHandler
[mp4 @ 0x1aac040] Codec for stream 0 does not use global headers but container format requires global headers
[mp4 @ 0x1aac040] Codec for stream 1 does not use global headers but container format requires global headers
Output #0, mp4, to 'video_files/prepared_3.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    description     : Trailer for the Sintel open movie project
    title           : Sintel Trailer
    artist          : Durian Open Movie Team
    copyright       : (c) copyright Blender Foundation | durian.blender.org
    encoder         : Lavf56.40.101
    Stream #0:0(und): Video: h264 ([33][0][0][0] / 0x0021), yuv420p, 1280x720, q=2-31, 1033 kb/s, 24 fps, 24 tbr, 12288 tbn, 24 tbc (default)
    Metadata:
      creation_time   : 1970-01-01 00:00:00
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac ([64][0][0][0] / 0x0040), 48000 Hz, stereo, 126 kb/s (default)
    Metadata:
      creation_time   : 1970-01-01 00:00:00
      handler_name    : SoundHandler
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #0:1 -> #0:1 (copy)
Press [q] to stop, [?] for help
frame=  595 fps=0.0 q=-1.0 Lsize=    2725kB time=00:00:22.12 bitrate=1009.0kbits/s
video:2331kB audio:376kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.641874%
ffmpeg version 2.8.15-0ubuntu0.16.04.1 Copyright (c) 2000-2018 the FFmpeg developers
  built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.10) 20160609
  configuration: --prefix=/usr --extra-version=0ubuntu0.16.04.1 --build-suffix=-ffmpeg --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --cc=cc --cxx=g++ --enable-gpl --enable-shared --disable-stripping --disable-decoder=libopenjpeg --disable-decoder=libschroedinger --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmodplug --enable-libmp3lame --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-librtmp --enable-libschroedinger --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxvid --enable-libzvbi --enable-openal --enable-opengl --enable-x11grab --enable-libdc1394 --enable-libiec61883 --enable-libzmq --enable-frei0r --enable-libx264 --enable-libopencv
  libavutil      54. 31.100 / 54. 31.100
  libavcodec     56. 60.100 / 56. 60.100
  libavformat    56. 40.101 / 56. 40.101
  libavdevice    56.  4.100 / 56.  4.100
  libavfilter     5. 40.101 /  5. 40.101
  libavresample   2.  1.  0 /  2.  1.  0
  libswscale      3.  1.101 /  3.  1.101
  libswresample   1.  2.101 /  1.  2.101
  libpostproc    53.  3.100 / 53.  3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'prepared.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    creation_time   : 1970-01-01 00:00:00
    title           : Sintel Trailer
    artist          : Durian Open Movie Team
    encoder         : Lavf52.62.0
    copyright       : (c) copyright Blender Foundation | durian.blender.org
    description     : Trailer for the Sintel open movie project
  Duration: 00:00:52.21, start: 0.000000, bitrate: 1165 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1280x720, 1033 kb/s, 24 fps, 24 tbr, 24 tbn, 48 tbc (default)
    Metadata:
      creation_time   : 1970-01-01 00:00:00
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 126 kb/s (default)
    Metadata:
      creation_time   : 1970-01-01 00:00:00
      handler_name    : SoundHandler
[mp4 @ 0x2295040] Codec for stream 0 does not use global headers but container format requires global headers
[mp4 @ 0x2295040] Codec for stream 1 does not use global headers but container format requires global headers
Output #0, mp4, to 'video_files/prepared_4.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    description     : Trailer for the Sintel open movie project
    title           : Sintel Trailer
    artist          : Durian Open Movie Team
    copyright       : (c) copyright Blender Foundation | durian.blender.org
    encoder         : Lavf56.40.101
    Stream #0:0(und): Video: h264 ([33][0][0][0] / 0x0021), yuv420p, 1280x720, q=2-31, 1033 kb/s, 24 fps, 24 tbr, 12288 tbn, 24 tbc (default)
    Metadata:
      creation_time   : 1970-01-01 00:00:00
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac ([64][0][0][0] / 0x0040), 48000 Hz, stereo, 126 kb/s (default)
    Metadata:
      creation_time   : 1970-01-01 00:00:00
      handler_name    : SoundHandler
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #0:1 -> #0:1 (copy)
Press [q] to stop, [?] for help
frame=  481 fps=0.0 q=-1.0 Lsize=    1562kB time=00:00:12.12 bitrate=1055.4kbits/s
video:1245kB audio:303kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.923752%

Then we can set the parameters that will be use in the pipeline. The count parameter will define how many frames we want in each sequence sample.

We can replace video_directory with any other directory containing video container files recognized by FFmpeg.

[3]:
batch_size=2
sequence_length=8

initial_prefetch_size=16

video_directory = "video_files"

video_files=[video_directory + '/' + f for f in os.listdir(video_directory)]

shuffle=True

n_iter=6

Running the pipeline

We can then define a minimal Pipeline that will output directly the VideoReader outputs:

[4]:
class VideoPipe(Pipeline):
    def __init__(self, batch_size, num_threads, device_id, data, shuffle):
        super(VideoPipe, self).__init__(batch_size, num_threads, device_id, seed=16)
        self.input = ops.VideoReader(device="gpu", filenames=data, sequence_length=sequence_length,
                                     shard_id=0, num_shards=1,
                                     random_shuffle=shuffle, initial_fill=initial_prefetch_size)


    def define_graph(self):
        output = self.input(name="Reader")
        return output

Caution: One important here is tuning initial_fill, that correspond to the Loader prefetch buffer intial size. Since this buffer will be filled of initial_fill sequences, the total number of frames can be really huge! So set it consequently to not OOM during training.

Let’s try to build and run a VideoPipe on device 0 that will output batch_size sequences of count frames at each iteration.

[5]:
pipe = VideoPipe(batch_size=batch_size, num_threads=2, device_id=0, data=video_files, shuffle=shuffle)
pipe.build()
# for i in range(n_iter):
#     pipe_out = pipe.run()
#     sequences_out = pipe_out[0].as_cpu().as_array()
#     print(sequences_out.shape)

pipe_out = pipe.run()
sequences_out = pipe_out[0].as_cpu().as_array()
print(sequences_out.shape)
(2, 8, 720, 1280, 3)

Visualizing the results

The previous iterations seems to have the yield batches of the expected shape. But let’s visualize the results to be

[7]:
pipe_out = pipe.run()
sequences_out = pipe_out[0].as_cpu().as_array()

We will use matplotlib to display the frames we obtained in the last batch.

[8]:
%matplotlib inline
from matplotlib import pyplot as plt
import matplotlib.gridspec as gridspec
[9]:
def show_sequence(sequence):
    columns = 4
    rows = (sequence_length + 1) // (columns)
    fig = plt.figure(figsize = (32,(16 // columns) * rows))
    gs = gridspec.GridSpec(rows, columns)
    for j in range(rows*columns):
        plt.subplot(gs[j])
        plt.axis("off")
        plt.imshow(sequence[j])
[10]:
show_sequence(sequences_out[0])
../../_images/examples_video_video_reader_simple_example_17_0.png

And let’s check a second sequence:

[11]:
pipe_out = pipe.run()
sequences_out = pipe_out[0].as_cpu().as_array()
show_sequence(sequences_out[1])
../../_images/examples_video_video_reader_simple_example_19_0.png

And a third one…

[12]:
pipe_out = pipe.run()
sequences_out = pipe_out[0].as_cpu().as_array()
show_sequence(sequences_out[0])
../../_images/examples_video_video_reader_simple_example_21_0.png