Load video data for curation using NeMo Curator.
NeMo Curator loads videos with a composite stage that discovers files and extracts metadata:
VideoReader decomposes into a partitioning stage plus a reader stage.FilePartitioningStage to list files; remote URLs (for example, s3://, gcs://, http(s)://) use ClientPartitioningStage backed by fsspec.ClientPartitioningStage.input_list_json_path.VideoReaderStage downloads bytes (local or via FSPath) and calls video.populate_metadata() to extract resolution, fps, duration, encoding format, and other fields.video_limit to cap discovery; use None for unlimited. Set verbose=True to log detailed per-video information.Use VideoReader to load videos from local paths or remote URLs.
/data/videos/, /mnt/datasets/av/FilePartitioningStage to recursively discover files..mp4, .mov, .avi, .mkv, .webm.video_limit to cap discovery during testing (None means unlimited).s3://bucket/path/, gcs://bucket/path/, https://host/path/, and other fsspec-supported protocols such as s3a:// and abfs://.ClientPartitioningStage backed by fsspec to list files.input_list_json_path allows explicit file lists under a root prefix.FSPath for efficient byte access during reading.Use an object storage prefix (for example, s3://my-bucket/videos/) to stream from cloud storage. Configure credentials in your environment or client configuration.
For remote datasets, ClientPartitioningStage can use an explicit file list JSON. Each entry must be an absolute path under the specified root.
If any entry is outside the root, the stage raises an error.
The loader filters these video extensions by default:
.mp4.mov.avi.mkv.webmAfter a successful read, the loader populates the following metadata fields for each video:
size (bytes)width, heightframeratenum_framesduration (seconds)video_codec, pixel_format, audio_codecbit_rate_kWith verbose=True, the loader logs size, resolution, fps, duration, weight, and bit rate for each processed video.