***

description: Generate clip-level embeddings using Cosmos-Embed1
categories:

* video-curation
  tags:
* embeddings
* cosmos-embed1
* video
  personas:
* data-scientist-focused
* mle-focused
  difficulty: intermediate
  content\_type: howto
  modality: video-only

***

# Embeddings

Generate clip-level embeddings for search, question answering, filtering, and duplicate removal.

## Use Cases

* Prepare semantic vectors for search, clustering, and near-duplicate detection.
* Score optional text prompts against clip content.
* Enable downstream filtering or retrieval tasks that need clip-level vectors.

## Before You Start

* Create clips upstream. Refer to [Clipping](/curate-video/process-data/clipping).
* Provide frames for embeddings or sample at the required rate. Refer to [Frame Extraction](/curate-video/process-data/frame-extraction).
* Access to model weights on each node (the stages download weights if missing).

***

## Quickstart

Use the pipeline stages or the example script flags to generate clip-level embeddings.

<Tabs>
  <Tab title="Pipeline Stage">
    ```python
    from nemo_curator.pipeline import Pipeline
    from nemo_curator.stages.video.clipping.clip_frame_extraction import ClipFrameExtractionStage
    from nemo_curator.utils.decoder_utils import FrameExtractionPolicy, FramePurpose
    from nemo_curator.stages.video.embedding.cosmos_embed1 import (
        CosmosEmbed1FrameCreationStage,
        CosmosEmbed1EmbeddingStage,
    )

    pipe = Pipeline(name="video_embeddings_example")
    pipe.add_stage(
        ClipFrameExtractionStage(
            extraction_policies=(FrameExtractionPolicy.sequence,),
            extract_purposes=(FramePurpose.EMBEDDINGS,),
            target_res=(-1, -1),
            verbose=True,
        )
    )
    pipe.add_stage(CosmosEmbed1FrameCreationStage(model_dir="/models", variant="224p", target_fps=2.0, verbose=True))
    pipe.add_stage(CosmosEmbed1EmbeddingStage(model_dir="/models", variant="224p", gpu_memory_gb=20.0, verbose=True))
    pipe.run()
    ```
  </Tab>

  <Tab title="Script Flags">
    ```bash
    # Cosmos-Embed1 (224p)
    python tutorials/video/getting-started/video_split_clip_example.py \
      ... \
      --generate-embeddings \
      --embedding-algorithm cosmos-embed1-224p \
      --embedding-gpu-memory-gb 20.0
    ```
  </Tab>
</Tabs>

## Embedding Options

### Cosmos-Embed1

1. Add `CosmosEmbed1FrameCreationStage` to transform extracted frames into model-ready tensors.

   ```python
   from nemo_curator.stages.video.embedding.cosmos_embed1 import (
       CosmosEmbed1FrameCreationStage,
       CosmosEmbed1EmbeddingStage,
   )

   frames = CosmosEmbed1FrameCreationStage(
       model_dir="/models",
       variant="224p",  # or 336p, 448p
       target_fps=2.0,
       verbose=True,
   )
   ```

2. Add `CosmosEmbed1EmbeddingStage` to generate `clip.cosmos_embed1_embedding` and optional `clip.cosmos_embed1_text_match`.

   ```python
   embed = CosmosEmbed1EmbeddingStage(
       model_dir="/models",
       variant="224p",
       gpu_memory_gb=20.0,
       verbose=True,
   )
   ```

#### Parameters

<Tabs>
  <Tab title="CosmosEmbed1FrameCreationStage">
    | Parameter    | Type                     | Default                  | Description                                                                         |
    | ------------ | ------------------------ | ------------------------ | ----------------------------------------------------------------------------------- |
    | `model_dir`  | str                      | `"models/cosmos_embed1"` | Directory for model utilities and configs used to format input frames.              |
    | `variant`    | {"224p", "336p", "448p"} | `"336p"`                 | Resolution preset that controls the model’s expected input size.                    |
    | `target_fps` | float                    | 2.0                      | Source sampling rate used to select frames; may re-extract at higher FPS if needed. |
    | `num_cpus`   | int                      | 3                        | CPU cores used when on-the-fly re-extraction is required.                           |
    | `verbose`    | bool                     | `False`                  | Log per-clip decisions and re-extraction messages.                                  |
  </Tab>

  <Tab title="CosmosEmbed1EmbeddingStage">
    | Parameter         | Type                     | Default                  | Description                                                      |
    | ----------------- | ------------------------ | ------------------------ | ---------------------------------------------------------------- |
    | `model_dir`       | str                      | `"models/cosmos_embed1"` | Directory for model weights; downloaded on each node if missing. |
    | `variant`         | {"224p", "336p", "448p"} | `"336p"`                 | Resolution preset used by the model weights.                     |
    | `gpu_memory_gb`   | int                      | 20                       | Approximate GPU memory reservation per worker.                   |
    | `texts_to_verify` | list\[str] \| None       | `None`                   | Optional text prompts to score against the clip embedding.       |
    | `verbose`         | bool                     | `False`                  | Log setup and per-clip outcomes.                                 |
  </Tab>
</Tabs>

#### Outputs

* `clip.cosmos_embed1_frames` → temporary tensors used by the embedding stage
* `clip.cosmos_embed1_embedding` → final clip-level vector (NumPy array)
* Optional: `clip.cosmos_embed1_text_match`

## Troubleshooting

* Not enough frames for embeddings: Increase `target_fps` during frame extraction or adjust clip length so that the model receives the required number of frames.
* Out of memory during embedding: Lower `gpu_memory_gb`, reduce batch size if exposed, or use a smaller resolution variant.
* Weights not found on node: Confirm `model_dir` and network access. The stages download weights if missing.

## Next Steps

* Use embeddings for duplicate removal. Refer to [Duplicate Removal](/curate-video/process-data/dedup).
* Generate captions and previews for review workflows. Refer to [Captions & Preview](/curate-video/process-data/captions-preview).
