Frame Extraction | NeMo Curator

Extract frames from clips or full videos at target rates and resolutions. Use frames for embeddings (such as Cosmos‑Embed1), aesthetic filtering, previews, and custom analysis.

Use Cases

Prepare inputs for embedding models that expect frame sequences.
Run aesthetic filtering that operates on sampled frames.
Generate lightweight previews or QA snapshots.
Provide frames for scene-change detection before clipping (TransNetV2).

Before You Start

If you need saved media files, frame extraction is optional. Embeddings and aesthetic filtering require frames.

Quickstart

Use the pipeline stages or the example script flags to extract frames for embeddings, filtering, and analysis.

Pipeline Stage

Script Flags

1 from nemo_curator.pipeline import Pipeline
2 from nemo_curator.stages.video.clipping.clip_extraction_stages import FixedStrideExtractorStage
3 from nemo_curator.stages.video.clipping.clip_frame_extraction import ClipFrameExtractionStage
4 from nemo_curator.utils.decoder_utils import FrameExtractionPolicy, FramePurpose
5 from nemo_curator.stages.video.embedding.cosmos_embed1 import (
6     CosmosEmbed1FrameCreationStage,
7     CosmosEmbed1EmbeddingStage,
8 )
9 
10 pipe = Pipeline(name="clip_frames_embeddings")
11 pipe.add_stage(FixedStrideExtractorStage(clip_len_s=10.0, clip_stride_s=10.0))
12 pipe.add_stage(
13     ClipFrameExtractionStage(
14         extraction_policies=(FrameExtractionPolicy.sequence,),
15         extract_purposes=(FramePurpose.EMBEDDINGS,),
16         target_res=(-1, -1),
17         verbose=True,
18     )
19 )
20 pipe.add_stage(CosmosEmbed1FrameCreationStage(model_dir="/models", variant="224p", target_fps=2.0, verbose=True))
21 pipe.add_stage(CosmosEmbed1EmbeddingStage(model_dir="/models", variant="224p", gpu_memory_gb=20.0, verbose=True))
22 pipe.run()

Options in NeMo Curator

NeMo Curator provides two complementary stages:

ClipFrameExtractionStage: Extracts frames from already‑split clips. Supports several target FPS values and computes an LCM rate to reduce decode work.
VideoFrameExtractionStage: Extracts frames from full videos (for example, before scene‑change detection). Supports PyNvCodec (NVDEC) or ffmpeg CPU/GPU decode.

Extract Frames

From Clips

From Full Videos (Scene Change)

1 from nemo_curator.stages.video.clipping.clip_frame_extraction import (
2     ClipFrameExtractionStage,
3 )
4 from nemo_curator.utils.decoder_utils import FrameExtractionPolicy, FramePurpose
5 
6 extract_frames = ClipFrameExtractionStage(
7     extraction_policies=(FrameExtractionPolicy.sequence,),
8     extract_purposes=(FramePurpose.EMBEDDINGS,),  # sets default FPS if target_fps not provided
9     target_res=(-1, -1),  # keep original resolution
10     # target_fps=[1, 2],  # optional: override with explicit FPS values
11     verbose=True,
12 )

Parameters

Parameter	Description
`extraction_policies`	Frame selection strategy. Use `sequence` for uniform sampling. `middle` selects a single middle frame.
`target_fps`	For clips: sampling rate in frames per second. If you provide several integer values, the stage uses LCM sampling.
`extract_purposes`	Shortcut that sets default FPS for specific purposes (such as embeddings). You can still pass `target_fps` to override.
`target_res`	Output frame resolution `(height, width)`. Use `(-1, -1)` to keep original.
`num_cpus`	Number of CPU cores for frame extraction. Default: `3`.
`decoder_mode`	For full‑video extraction: `pynvc` (NVDEC), `ffmpeg_gpu`, or `ffmpeg_cpu`.
`output_hw`	For full‑video extraction: `(height, width)` tuple for frame dimensions. Default: `(27, 48)`.
`pyncv_batch_size`	For full‑video extraction: batch size for PyNvCodec processing. Default: `64`.

LCM Sampling for Several FPS Values

If you provide several integer target_fps values (such as 1 and 2), the clip stage decodes once at the LCM rate and then samples every k‑th frame to produce each target rate. This reduces decode cost.

1 ClipFrameExtractionStage(
2     extraction_policies=(FrameExtractionPolicy.sequence,),
3     target_fps=[1, 2],  # LCM = 2; decode once at 2 FPS, then subsample
4 )

Hardware and Performance

Prefer pynvc (NVDEC) or ffmpeg_gpu for high throughput when GPU hardware is available; otherwise use ffmpeg_cpu.
Use batching where applicable and track worker resource use.
Keep resolution modest if memory limits apply; set target_res when needed.

Downstream Dependencies

Embeddings: Cosmos‑Embed1 expects frames at specific rates. Refer to Embeddings.
Aesthetic Filtering: Requires frames extracted earlier. Refer to Filtering.
Clipping with TransNetV2: Uses full‑video frame extraction before scene‑change detection. Refer to Clipping.

Troubleshooting

“Frame extraction failed”: Check decoder mode and availability; confirm ffmpeg and drivers for GPU modes.
Not enough frames for embeddings: Increase target_fps or adjust clip length; certain embedding stages can re‑extract at a higher rate when needed.