Curate VideoProcess Data

Frame Extraction

View as Markdown

Extract frames from clips or full videos at target rates and resolutions. Use frames for embeddings (such as Cosmos‑Embed1), aesthetic filtering, previews, and custom analysis.

Use Cases

  • Prepare inputs for embedding models that expect frame sequences.
  • Run aesthetic filtering that operates on sampled frames.
  • Generate lightweight previews or QA snapshots.
  • Provide frames for scene-change detection before clipping (TransNetV2).

Before You Start

If you need saved media files, frame extraction is optional. Embeddings and aesthetic filtering require frames.


Quickstart

Use the pipeline stages or the example script flags to extract frames for embeddings, filtering, and analysis.

1from nemo_curator.pipeline import Pipeline
2from nemo_curator.stages.video.clipping.clip_extraction_stages import FixedStrideExtractorStage
3from nemo_curator.stages.video.clipping.clip_frame_extraction import ClipFrameExtractionStage
4from nemo_curator.utils.decoder_utils import FrameExtractionPolicy, FramePurpose
5from nemo_curator.stages.video.embedding.cosmos_embed1 import (
6 CosmosEmbed1FrameCreationStage,
7 CosmosEmbed1EmbeddingStage,
8)
9
10pipe = Pipeline(name="clip_frames_embeddings")
11pipe.add_stage(FixedStrideExtractorStage(clip_len_s=10.0, clip_stride_s=10.0))
12pipe.add_stage(
13 ClipFrameExtractionStage(
14 extraction_policies=(FrameExtractionPolicy.sequence,),
15 extract_purposes=(FramePurpose.EMBEDDINGS,),
16 target_res=(-1, -1),
17 verbose=True,
18 )
19)
20pipe.add_stage(CosmosEmbed1FrameCreationStage(model_dir="/models", variant="224p", target_fps=2.0, verbose=True))
21pipe.add_stage(CosmosEmbed1EmbeddingStage(model_dir="/models", variant="224p", gpu_memory_gb=20.0, verbose=True))
22pipe.run()

Options in NeMo Curator

NeMo Curator provides two complementary stages:

  • ClipFrameExtractionStage: Extracts frames from already‑split clips. Supports several target FPS values and computes an LCM rate to reduce decode work.
  • VideoFrameExtractionStage: Extracts frames from full videos (for example, before scene‑change detection). Supports PyNvCodec (NVDEC) or ffmpeg CPU/GPU decode.

Extract Frames

1from nemo_curator.stages.video.clipping.clip_frame_extraction import (
2 ClipFrameExtractionStage,
3)
4from nemo_curator.utils.decoder_utils import FrameExtractionPolicy, FramePurpose
5
6extract_frames = ClipFrameExtractionStage(
7 extraction_policies=(FrameExtractionPolicy.sequence,),
8 extract_purposes=(FramePurpose.EMBEDDINGS,), # sets default FPS if target_fps not provided
9 target_res=(-1, -1), # keep original resolution
10 # target_fps=[1, 2], # optional: override with explicit FPS values
11 verbose=True,
12)

Parameters

ParameterDescription
extraction_policiesFrame selection strategy. Use sequence for uniform sampling. middle selects a single middle frame.
target_fpsFor clips: sampling rate in frames per second. If you provide several integer values, the stage uses LCM sampling.
extract_purposesShortcut that sets default FPS for specific purposes (such as embeddings). You can still pass target_fps to override.
target_resOutput frame resolution (height, width). Use (-1, -1) to keep original.
num_cpusNumber of CPU cores for frame extraction. Default: 3.
decoder_modeFor full‑video extraction: pynvc (NVDEC), ffmpeg_gpu, or ffmpeg_cpu.
output_hwFor full‑video extraction: (height, width) tuple for frame dimensions. Default: (27, 48).
pyncv_batch_sizeFor full‑video extraction: batch size for PyNvCodec processing. Default: 64.

LCM Sampling for Several FPS Values

If you provide several integer target_fps values (such as 1 and 2), the clip stage decodes once at the LCM rate and then samples every k‑th frame to produce each target rate. This reduces decode cost.

1ClipFrameExtractionStage(
2 extraction_policies=(FrameExtractionPolicy.sequence,),
3 target_fps=[1, 2], # LCM = 2; decode once at 2 FPS, then subsample
4)

Hardware and Performance

  • Prefer pynvc (NVDEC) or ffmpeg_gpu for high throughput when GPU hardware is available; otherwise use ffmpeg_cpu.
  • Use batching where applicable and track worker resource use.
  • Keep resolution modest if memory limits apply; set target_res when needed.

Downstream Dependencies

  • Embeddings: Cosmos‑Embed1 expects frames at specific rates. Refer to Embeddings.
  • Aesthetic Filtering: Requires frames extracted earlier. Refer to Filtering.
  • Clipping with TransNetV2: Uses full‑video frame extraction before scene‑change detection. Refer to Clipping.

Troubleshooting

  • “Frame extraction failed”: Check decoder mode and availability; confirm ffmpeg and drivers for GPU modes.
  • Not enough frames for embeddings: Increase target_fps or adjust clip length; certain embedding stages can re‑extract at a higher rate when needed.