***

description: >-
Extract frames from clips or full videos for embeddings, filtering, and
analysis
categories:

* video-curation
  tags:
* frames
* extraction
* fps
* ffmpeg
* nvdec
  personas:
* data-scientist-focused
* mle-focused
  difficulty: intermediate
  content\_type: howto
  modality: video-only

***

# Frame Extraction

Extract frames from clips or full videos at target rates and resolutions. Use frames for embeddings (such as Cosmos‑Embed1), aesthetic filtering, previews, and custom analysis.

## Use Cases

* Prepare inputs for embedding models that expect frame sequences.
* Run aesthetic filtering that operates on sampled frames.
* Generate lightweight previews or QA snapshots.
* Provide frames for scene-change detection before clipping (TransNetV2).

## Before You Start

If you need saved media files, frame extraction is optional. [Embeddings](/curate-video/process-data/embeddings) and [aesthetic filtering](/curate-images/process-data/filters/aesthetic) require frames.

***

## Quickstart

Use the pipeline stages or the example script flags to extract frames for embeddings, filtering, and analysis.

<Tabs>
  <Tab title="Pipeline Stage">
    ```python
    from nemo_curator.pipeline import Pipeline
    from nemo_curator.stages.video.clipping.clip_extraction_stages import FixedStrideExtractorStage
    from nemo_curator.stages.video.clipping.clip_frame_extraction import ClipFrameExtractionStage
    from nemo_curator.utils.decoder_utils import FrameExtractionPolicy, FramePurpose
    from nemo_curator.stages.video.embedding.cosmos_embed1 import (
        CosmosEmbed1FrameCreationStage,
        CosmosEmbed1EmbeddingStage,
    )

    pipe = Pipeline(name="clip_frames_embeddings")
    pipe.add_stage(FixedStrideExtractorStage(clip_len_s=10.0, clip_stride_s=10.0))
    pipe.add_stage(
        ClipFrameExtractionStage(
            extraction_policies=(FrameExtractionPolicy.sequence,),
            extract_purposes=(FramePurpose.EMBEDDINGS,),
            target_res=(-1, -1),
            verbose=True,
        )
    )
    pipe.add_stage(CosmosEmbed1FrameCreationStage(model_dir="/models", variant="224p", target_fps=2.0, verbose=True))
    pipe.add_stage(CosmosEmbed1EmbeddingStage(model_dir="/models", variant="224p", gpu_memory_gb=20.0, verbose=True))
    pipe.run()
    ```
  </Tab>

  <Tab title="Script Flags">
    ```bash
    # Clip frames implicitly when generating embeddings or aesthetics
    python tutorials/video/getting-started/video_split_clip_example.py \
      ... \
      --generate-embeddings \
      --clip-extraction-target-res -1

    # Full-video frames for TransNetV2 scene change
    python tutorials/video/getting-started/video_split_clip_example.py \
      ... \
      --splitting-algorithm transnetv2 \
      --transnetv2-frame-decoder-mode pynvc
    ```
  </Tab>
</Tabs>

## Options in NeMo Curator

NeMo Curator provides two complementary stages:

* `ClipFrameExtractionStage`: Extracts frames from already‑split clips. Supports several target FPS values and computes an LCM rate to reduce decode work.
* `VideoFrameExtractionStage`: Extracts frames from full videos (for example, before scene‑change detection). Supports PyNvCodec (NVDEC) or `ffmpeg` CPU/GPU decode.

### Extract Frames

<Tabs>
  <Tab title="From Clips">
    ```python
    from nemo_curator.stages.video.clipping.clip_frame_extraction import (
        ClipFrameExtractionStage,
    )
    from nemo_curator.utils.decoder_utils import FrameExtractionPolicy, FramePurpose

    extract_frames = ClipFrameExtractionStage(
        extraction_policies=(FrameExtractionPolicy.sequence,),
        extract_purposes=(FramePurpose.EMBEDDINGS,),  # sets default FPS if target_fps not provided
        target_res=(-1, -1),  # keep original resolution
        # target_fps=[1, 2],  # optional: override with explicit FPS values
        verbose=True,
    )
    ```
  </Tab>

  <Tab title="From Full Videos (Scene Change)">
    ```python
    from nemo_curator.stages.video.clipping.video_frame_extraction import VideoFrameExtractionStage

    frame_extractor = VideoFrameExtractionStage(
        decoder_mode="pynvc",  # or "ffmpeg_gpu", "ffmpeg_cpu"
        output_hw=(27, 48),    # (height, width) for frame extraction
        pyncv_batch_size=64,   # batch size for PyNvCodec
        verbose=True,
    )
    ```
  </Tab>
</Tabs>

## Parameters

| Parameter             | Description                                                                                                             |
| --------------------- | ----------------------------------------------------------------------------------------------------------------------- |
| `extraction_policies` | Frame selection strategy. Use `sequence` for uniform sampling. `middle` selects a single middle frame.                  |
| `target_fps`          | For clips: sampling rate in frames per second. If you provide several integer values, the stage uses LCM sampling.      |
| `extract_purposes`    | Shortcut that sets default FPS for specific purposes (such as embeddings). You can still pass `target_fps` to override. |
| `target_res`          | Output frame resolution `(height, width)`. Use `(-1, -1)` to keep original.                                             |
| `num_cpus`            | Number of CPU cores for frame extraction. Default: `3`.                                                                 |
| `decoder_mode`        | For full‑video extraction: `pynvc` (NVDEC), `ffmpeg_gpu`, or `ffmpeg_cpu`.                                              |
| `output_hw`           | For full‑video extraction: `(height, width)` tuple for frame dimensions. Default: `(27, 48)`.                           |
| `pyncv_batch_size`    | For full‑video extraction: batch size for PyNvCodec processing. Default: `64`.                                          |

### LCM Sampling for Several FPS Values

If you provide several integer `target_fps` values (such as `1` and `2`), the clip stage decodes once at the LCM rate and then samples every k‑th frame to produce each target rate. This reduces decode cost.

```python
ClipFrameExtractionStage(
    extraction_policies=(FrameExtractionPolicy.sequence,),
    target_fps=[1, 2],  # LCM = 2; decode once at 2 FPS, then subsample
)
```

## Hardware and Performance

* Prefer `pynvc` (NVDEC) or `ffmpeg_gpu` for high throughput when GPU hardware is available; otherwise use `ffmpeg_cpu`.
* Use batching where applicable and track worker resource use.
* Keep resolution modest if memory limits apply; set `target_res` when needed.

## Downstream Dependencies

* **Embeddings**: Cosmos‑Embed1 expects frames at specific rates. Refer to [Embeddings](/curate-video/process-data/embeddings).
* **Aesthetic Filtering**: Requires frames extracted earlier. Refer to [Filtering](/curate-video/process-data/filtering).
* **Clipping with TransNetV2**: Uses full‑video frame extraction before scene‑change detection. Refer to [Clipping](/curate-video/process-data/clipping).

## Troubleshooting

* "Frame extraction failed": Check decoder mode and availability; confirm `ffmpeg` and drivers for GPU modes.
* Not enough frames for embeddings: Increase `target_fps` or adjust clip length; certain embedding stages can re‑extract at a higher rate when needed.
