Temporal Shift Module Inference in PaddlePaddle#
This demo shows how to use DALI pipeline for video classification in PaddlePaddle.
The model used for this demo is TSM: Temporal Shift Module for Efficient Video Understanding.
It is trained on kinetics400 and weights will be downloaded automatically.
For inference, videos should be resized to 300p and clipped to 10 second in length, which can be done with ffpmeg
.
Run the following commands to download and preprocess some videos from kinetics400 valset.
mkdir demo
youtube-dl --quiet --no-warnings -f mp4 -o demo/tmp.mp4 \
'https://www.youtube.com/watch?v=iU3ByohkPaM'
ffmpeg -y -i demo/tmp.mp4 -filter:v scale=-1:300 -ss 0 -t 10 -c:a copy demo/1.mp4
youtube-dl --quiet --no-warnings -f mp4 -o demo/tmp.mp4 \
'https://www.youtube.com/watch?v=C0J6EQYYLzI'
ffmpeg -y -i demo/tmp.mp4 -filter:v scale=-1:300 -ss 0 -t 10 -c:a copy demo/2.mp4
rm demo/tmp.mp4
The script will extract 8 frames from the input videos with a stride of s
(30 by default), and will output top k
predicted (1 by default) labels for each video.
python infer.py -k 1 -s 30 demo
# will output
# prediction for demo/1.mp4 is: ['carving_pumpkin']
# prediction for demo/2.mp4 is: ['blowing_out_candles']
Requirements#
Install the following python packages via pip or other means.
PaddlePaddle (1.6 or above)
Optionally, the following programs are needed for preparing the input videos.
Usage#
usage: infer.py [-h] [--topk K] [--stride S] DIR
Paddle Temporal Shift Module Inference
positional arguments:
DIR Path to video files
optional arguments:
-h, --help show this help message and exit
--topk K, -k K Top k results (default: 1)
--stride S, -s S Distance between frames (default: 30)