Load video data for curation using NeMo Curator.
NeMo Curator loads videos with a composite stage that discovers files and extracts metadata:
VideoReader is a composite stage that is broken down into a
FilePartitioningStage to list filess3://, gcs://)
ClientPartitioningStage backed by fsspec.input_list_json_path allows explicit file lists under a root prefix.VideoReaderStage)FSPath) for each listed filevideo.populate_metadata() to extract resolution, fps, duration, encoding format, and other fields.You can set
video_limit to limit the number of files to be processed; use None for unlimited.verbose=True to log detailed per-video information.Use VideoReader to load videos from local paths or remote URLs.
For remote datasets, ClientPartitioningStage can use an explicit file list JSON. Each entry must be an absolute path under the specified root.
If any entry is outside the root, the stage raises an error.
The loader filters these video extensions by default:
.mp4.mov.avi.mkv.webmAfter a successful read, the loader populates the following metadata fields for each video:
size (bytes)width, heightframeratenum_framesduration (seconds)video_codec, pixel_format, audio_codecbit_rate_kWith verbose=True, the loader logs size, resolution, fps, duration, weight, and bit rate for each processed video.