Embeddings

Generate clip-level embeddings for search, question answering, filtering, and duplicate removal.

Use Cases

Prepare semantic vectors for search, clustering, and near-duplicate detection.
Score optional text prompts against clip content.
Enable downstream filtering or retrieval tasks that need clip-level vectors.

Before You Start

Create clips upstream. Refer to Clipping.
Provide frames for embeddings or sample at the required rate. Refer to Frame Extraction.
Access to model weights on each node (the stages download weights if missing).

Quickstart

Use the pipeline stages or the example script flags to generate clip-level embeddings.

Pipeline Stage

Script Flags

1 from nemo_curator.pipeline import Pipeline
2 from nemo_curator.stages.video.clipping.clip_frame_extraction import ClipFrameExtractionStage
3 from nemo_curator.utils.decoder_utils import FrameExtractionPolicy, FramePurpose
4 from nemo_curator.stages.video.embedding.cosmos_embed1 import (
5     CosmosEmbed1FrameCreationStage,
6     CosmosEmbed1EmbeddingStage,
7 )
8 
9 pipe = Pipeline(name="video_embeddings_example")
10 pipe.add_stage(
11     ClipFrameExtractionStage(
12         extraction_policies=(FrameExtractionPolicy.sequence,),
13         extract_purposes=(FramePurpose.EMBEDDINGS,),
14         target_res=(-1, -1),
15         verbose=True,
16     )
17 )
18 pipe.add_stage(CosmosEmbed1FrameCreationStage(model_dir="/models", variant="224p", target_fps=2.0, verbose=True))
19 pipe.add_stage(CosmosEmbed1EmbeddingStage(model_dir="/models", variant="224p", gpu_memory_gb=20.0, verbose=True))
20 pipe.run()

Embedding Options

Cosmos-Embed1

Add CosmosEmbed1FrameCreationStage to transform extracted frames into model-ready tensors.

1 from nemo_curator.stages.video.embedding.cosmos_embed1 import (
2     CosmosEmbed1FrameCreationStage,
3     CosmosEmbed1EmbeddingStage,
4 )
5 
6 frames = CosmosEmbed1FrameCreationStage(
7     model_dir="/models",
8     variant="224p",  # or 336p, 448p
9     target_fps=2.0,
10     verbose=True,
11 )

Add CosmosEmbed1EmbeddingStage to generate clip.cosmos_embed1_embedding and optional clip.cosmos_embed1_text_match.

1 embed = CosmosEmbed1EmbeddingStage(
2     model_dir="/models",
3     variant="224p",
4     gpu_memory_gb=20.0,
5     verbose=True,
6 )

Parameters

CosmosEmbed1FrameCreationStage

CosmosEmbed1EmbeddingStage

Parameter	Type	Default	Description
`model_dir`	str	`"models/cosmos_embed1"`	Directory for model utilities and configs used to format input frames.
`variant`	448p	`"336p"`	Resolution preset that controls the model’s expected input size.
`target_fps`	float	2.0	Source sampling rate used to select frames; may re-extract at higher FPS if needed.
`num_cpus`	int	3	CPU cores used when on-the-fly re-extraction is required.
`verbose`	bool	`False`	Log per-clip decisions and re-extraction messages.

Outputs

clip.cosmos_embed1_frames → temporary tensors used by the embedding stage
clip.cosmos_embed1_embedding → final clip-level vector (NumPy array)
Optional: clip.cosmos_embed1_text_match

Troubleshooting

Not enough frames for embeddings: Increase target_fps during frame extraction or adjust clip length so that the model receives the required number of frames.
Out of memory during embedding: Lower gpu_memory_gb, reduce batch size if exposed, or use a smaller resolution variant.
Weights not found on node: Confirm model_dir and network access. The stages download weights if missing.

Next Steps

Use embeddings for duplicate removal. Refer to Duplicate Removal.
Generate captions and previews for review workflows. Refer to Captions & Preview.