*** description: Generate clip-level embeddings using Cosmos-Embed1 categories: * video-curation tags: * embeddings * cosmos-embed1 * video personas: * data-scientist-focused * mle-focused difficulty: intermediate content\_type: howto modality: video-only *** # Embeddings Generate clip-level embeddings for search, question answering, filtering, and duplicate removal. ## Use Cases * Prepare semantic vectors for search, clustering, and near-duplicate detection. * Score optional text prompts against clip content. * Enable downstream filtering or retrieval tasks that need clip-level vectors. ## Before You Start * Create clips upstream. Refer to [Clipping](/curate-video/process-data/clipping). * Provide frames for embeddings or sample at the required rate. Refer to [Frame Extraction](/curate-video/process-data/frame-extraction). * Access to model weights on each node (the stages download weights if missing). *** ## Quickstart Use the pipeline stages or the example script flags to generate clip-level embeddings. ```python from nemo_curator.pipeline import Pipeline from nemo_curator.stages.video.clipping.clip_frame_extraction import ClipFrameExtractionStage from nemo_curator.utils.decoder_utils import FrameExtractionPolicy, FramePurpose from nemo_curator.stages.video.embedding.cosmos_embed1 import ( CosmosEmbed1FrameCreationStage, CosmosEmbed1EmbeddingStage, ) pipe = Pipeline(name="video_embeddings_example") pipe.add_stage( ClipFrameExtractionStage( extraction_policies=(FrameExtractionPolicy.sequence,), extract_purposes=(FramePurpose.EMBEDDINGS,), target_res=(-1, -1), verbose=True, ) ) pipe.add_stage(CosmosEmbed1FrameCreationStage(model_dir="/models", variant="224p", target_fps=2.0, verbose=True)) pipe.add_stage(CosmosEmbed1EmbeddingStage(model_dir="/models", variant="224p", gpu_memory_gb=20.0, verbose=True)) pipe.run() ``` ```bash # Cosmos-Embed1 (224p) python tutorials/video/getting-started/video_split_clip_example.py \ ... \ --generate-embeddings \ --embedding-algorithm cosmos-embed1-224p \ --embedding-gpu-memory-gb 20.0 ``` ## Embedding Options ### Cosmos-Embed1 1. Add `CosmosEmbed1FrameCreationStage` to transform extracted frames into model-ready tensors. ```python from nemo_curator.stages.video.embedding.cosmos_embed1 import ( CosmosEmbed1FrameCreationStage, CosmosEmbed1EmbeddingStage, ) frames = CosmosEmbed1FrameCreationStage( model_dir="/models", variant="224p", # or 336p, 448p target_fps=2.0, verbose=True, ) ``` 2. Add `CosmosEmbed1EmbeddingStage` to generate `clip.cosmos_embed1_embedding` and optional `clip.cosmos_embed1_text_match`. ```python embed = CosmosEmbed1EmbeddingStage( model_dir="/models", variant="224p", gpu_memory_gb=20.0, verbose=True, ) ``` #### Parameters | Parameter | Type | Default | Description | | ------------ | ------------------------ | ------------------------ | ----------------------------------------------------------------------------------- | | `model_dir` | str | `"models/cosmos_embed1"` | Directory for model utilities and configs used to format input frames. | | `variant` | {"224p", "336p", "448p"} | `"336p"` | Resolution preset that controls the model’s expected input size. | | `target_fps` | float | 2.0 | Source sampling rate used to select frames; may re-extract at higher FPS if needed. | | `num_cpus` | int | 3 | CPU cores used when on-the-fly re-extraction is required. | | `verbose` | bool | `False` | Log per-clip decisions and re-extraction messages. | | Parameter | Type | Default | Description | | ----------------- | ------------------------ | ------------------------ | ---------------------------------------------------------------- | | `model_dir` | str | `"models/cosmos_embed1"` | Directory for model weights; downloaded on each node if missing. | | `variant` | {"224p", "336p", "448p"} | `"336p"` | Resolution preset used by the model weights. | | `gpu_memory_gb` | int | 20 | Approximate GPU memory reservation per worker. | | `texts_to_verify` | list\[str] \| None | `None` | Optional text prompts to score against the clip embedding. | | `verbose` | bool | `False` | Log setup and per-clip outcomes. | #### Outputs * `clip.cosmos_embed1_frames` → temporary tensors used by the embedding stage * `clip.cosmos_embed1_embedding` → final clip-level vector (NumPy array) * Optional: `clip.cosmos_embed1_text_match` ## Troubleshooting * Not enough frames for embeddings: Increase `target_fps` during frame extraction or adjust clip length so that the model receives the required number of frames. * Out of memory during embedding: Lower `gpu_memory_gb`, reduce batch size if exposed, or use a smaller resolution variant. * Weights not found on node: Confirm `model_dir` and network access. The stages download weights if missing. ## Next Steps * Use embeddings for duplicate removal. Refer to [Duplicate Removal](/curate-video/process-data/dedup). * Generate captions and previews for review workflows. Refer to [Captions & Preview](/curate-video/process-data/captions-preview).