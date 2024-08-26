ActionRecognitionNet requires RGB video frames for the RGB input stream and optical flow vectors for the OF input stream. The x-axis and y-axis of the raw optical flow vectors should be mapped to grayscale images for training. We provide a simple tool to preprocess sample. This tool will convert the video to frames and generate optical flow images based on the NVIDIA Optical Flow (NVOF) SDK.

The data should be organized in the following structure:

Copy Copied! /Dataset /class_a /video_1 /rgb 000000.png 000001.png ... N.png /u 000000.jpg 000001.jpg ... N-1.jpg /v 000000.jpg 000001.jpg ... N-1.jpg

The root directory of dataset contains multiple sub-directories for different classes. Each class directory has sub-folders for different videos, and each of these subfolders contain rgb , u and v folders that respectively hold RGB frames, optical flow x-axis grayscale images, and optical flow y-axis grayscale images. The u and v folders can be empty if you want to train an RGB-only model. A simple script is provided to generate RGB frames only.

Note The preprocess tool is released on Github under the MIT license. And all-in-one scripts are provided for processing HMDB51 datasets.

The common data process pipeline can be depicted with the following diagrams: