*** description: >- Extract frames from clips or full videos for embeddings, filtering, and analysis categories: * video-curation tags: * frames * extraction * fps * ffmpeg * nvdec personas: * data-scientist-focused * mle-focused difficulty: intermediate content\_type: howto modality: video-only *** # Frame Extraction Extract frames from clips or full videos at target rates and resolutions. Use frames for embeddings (such as Cosmos‑Embed1), aesthetic filtering, previews, and custom analysis. ## Use Cases * Prepare inputs for embedding models that expect frame sequences. * Run aesthetic filtering that operates on sampled frames. * Generate lightweight previews or QA snapshots. * Provide frames for scene-change detection before clipping (TransNetV2). ## Before You Start If you need saved media files, frame extraction is optional. [Embeddings](/curate-video/process-data/embeddings) and [aesthetic filtering](/curate-images/process-data/filters/aesthetic) require frames. *** ## Quickstart Use the pipeline stages or the example script flags to extract frames for embeddings, filtering, and analysis. ```python from nemo_curator.pipeline import Pipeline from nemo_curator.stages.video.clipping.clip_extraction_stages import FixedStrideExtractorStage from nemo_curator.stages.video.clipping.clip_frame_extraction import ClipFrameExtractionStage from nemo_curator.utils.decoder_utils import FrameExtractionPolicy, FramePurpose from nemo_curator.stages.video.embedding.cosmos_embed1 import ( CosmosEmbed1FrameCreationStage, CosmosEmbed1EmbeddingStage, ) pipe = Pipeline(name="clip_frames_embeddings") pipe.add_stage(FixedStrideExtractorStage(clip_len_s=10.0, clip_stride_s=10.0)) pipe.add_stage( ClipFrameExtractionStage( extraction_policies=(FrameExtractionPolicy.sequence,), extract_purposes=(FramePurpose.EMBEDDINGS,), target_res=(-1, -1), verbose=True, ) ) pipe.add_stage(CosmosEmbed1FrameCreationStage(model_dir="/models", variant="224p", target_fps=2.0, verbose=True)) pipe.add_stage(CosmosEmbed1EmbeddingStage(model_dir="/models", variant="224p", gpu_memory_gb=20.0, verbose=True)) pipe.run() ``` ```bash # Clip frames implicitly when generating embeddings or aesthetics python tutorials/video/getting-started/video_split_clip_example.py \ ... \ --generate-embeddings \ --clip-extraction-target-res -1 # Full-video frames for TransNetV2 scene change python tutorials/video/getting-started/video_split_clip_example.py \ ... \ --splitting-algorithm transnetv2 \ --transnetv2-frame-decoder-mode pynvc ``` ## Options in NeMo Curator NeMo Curator provides two complementary stages: * `ClipFrameExtractionStage`: Extracts frames from already‑split clips. Supports several target FPS values and computes an LCM rate to reduce decode work. * `VideoFrameExtractionStage`: Extracts frames from full videos (for example, before scene‑change detection). Supports PyNvCodec (NVDEC) or `ffmpeg` CPU/GPU decode. ### Extract Frames ```python from nemo_curator.stages.video.clipping.clip_frame_extraction import ( ClipFrameExtractionStage, ) from nemo_curator.utils.decoder_utils import FrameExtractionPolicy, FramePurpose extract_frames = ClipFrameExtractionStage( extraction_policies=(FrameExtractionPolicy.sequence,), extract_purposes=(FramePurpose.EMBEDDINGS,), # sets default FPS if target_fps not provided target_res=(-1, -1), # keep original resolution # target_fps=[1, 2], # optional: override with explicit FPS values verbose=True, ) ``` ```python from nemo_curator.stages.video.clipping.video_frame_extraction import VideoFrameExtractionStage frame_extractor = VideoFrameExtractionStage( decoder_mode="pynvc", # or "ffmpeg_gpu", "ffmpeg_cpu" output_hw=(27, 48), # (height, width) for frame extraction pyncv_batch_size=64, # batch size for PyNvCodec verbose=True, ) ``` ## Parameters | Parameter | Description | | --------------------- | ----------------------------------------------------------------------------------------------------------------------- | | `extraction_policies` | Frame selection strategy. Use `sequence` for uniform sampling. `middle` selects a single middle frame. | | `target_fps` | For clips: sampling rate in frames per second. If you provide several integer values, the stage uses LCM sampling. | | `extract_purposes` | Shortcut that sets default FPS for specific purposes (such as embeddings). You can still pass `target_fps` to override. | | `target_res` | Output frame resolution `(height, width)`. Use `(-1, -1)` to keep original. | | `num_cpus` | Number of CPU cores for frame extraction. Default: `3`. | | `decoder_mode` | For full‑video extraction: `pynvc` (NVDEC), `ffmpeg_gpu`, or `ffmpeg_cpu`. | | `output_hw` | For full‑video extraction: `(height, width)` tuple for frame dimensions. Default: `(27, 48)`. | | `pyncv_batch_size` | For full‑video extraction: batch size for PyNvCodec processing. Default: `64`. | ### LCM Sampling for Several FPS Values If you provide several integer `target_fps` values (such as `1` and `2`), the clip stage decodes once at the LCM rate and then samples every k‑th frame to produce each target rate. This reduces decode cost. ```python ClipFrameExtractionStage( extraction_policies=(FrameExtractionPolicy.sequence,), target_fps=[1, 2], # LCM = 2; decode once at 2 FPS, then subsample ) ``` ## Hardware and Performance * Prefer `pynvc` (NVDEC) or `ffmpeg_gpu` for high throughput when GPU hardware is available; otherwise use `ffmpeg_cpu`. * Use batching where applicable and track worker resource use. * Keep resolution modest if memory limits apply; set `target_res` when needed. ## Downstream Dependencies * **Embeddings**: Cosmos‑Embed1 expects frames at specific rates. Refer to [Embeddings](/curate-video/process-data/embeddings). * **Aesthetic Filtering**: Requires frames extracted earlier. Refer to [Filtering](/curate-video/process-data/filtering). * **Clipping with TransNetV2**: Uses full‑video frame extraction before scene‑change detection. Refer to [Clipping](/curate-video/process-data/clipping). ## Troubleshooting * "Frame extraction failed": Check decoder mode and availability; confirm `ffmpeg` and drivers for GPU modes. * Not enough frames for embeddings: Increase `target_fps` or adjust clip length; certain embedding stages can re‑extract at a higher rate when needed.