For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI Reference
DocumentationAPI Reference
  • Home
    • Welcome
  • About NeMo Curator
    • Overview
    • Key Features
  • Get Started
    • Overview
    • Install (All Modalities)
    • Text Quickstart
    • Image Quickstart
    • Video Quickstart
    • Audio Quickstart
  • Curate Text
    • Overview
    • Tutorials
    • Save and Export
  • Curate Images
    • Overview
    • Save and Export
  • Curate Video
    • Overview
    • Load Data
      • Overview
      • Clipping
      • Transcoding
      • Filtering
      • Embeddings
      • Deduplication
      • Frame Extraction
      • Captions Preview
    • Save and Export
  • Curate Audio
    • Overview
    • Save and Export
  • Setup & Deployment
    • Overview
  • Reference
    • Overview
    • Related Tools
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Curator
On this page
  • Use Cases
  • Before You Start
  • Quickstart
  • Embedding Options
  • Cosmos-Embed1
  • Parameters
  • Outputs
  • Troubleshooting
  • Next Steps
Curate VideoProcess Data

Embeddings

||View as Markdown|
Previous

Filtering

Next

Deduplication

Generate clip-level embeddings for search, question answering, filtering, and duplicate removal.

Use Cases

  • Prepare semantic vectors for search, clustering, and near-duplicate detection.
  • Score optional text prompts against clip content.
  • Enable downstream filtering or retrieval tasks that need clip-level vectors.

Before You Start

  • Create clips upstream. Refer to Clipping.
  • Provide frames for embeddings or sample at the required rate. Refer to Frame Extraction.
  • Access to model weights on each node (the stages download weights if missing).

Quickstart

Use the pipeline stages or the example script flags to generate clip-level embeddings.

Pipeline Stage
Script Flags
1from nemo_curator.pipeline import Pipeline
2from nemo_curator.stages.video.clipping.clip_frame_extraction import ClipFrameExtractionStage
3from nemo_curator.utils.decoder_utils import FrameExtractionPolicy, FramePurpose
4from nemo_curator.stages.video.embedding.cosmos_embed1 import (
5 CosmosEmbed1FrameCreationStage,
6 CosmosEmbed1EmbeddingStage,
7)
8
9pipe = Pipeline(name="video_embeddings_example")
10pipe.add_stage(
11 ClipFrameExtractionStage(
12 extraction_policies=(FrameExtractionPolicy.sequence,),
13 extract_purposes=(FramePurpose.EMBEDDINGS,),
14 target_res=(-1, -1),
15 verbose=True,
16 )
17)
18pipe.add_stage(CosmosEmbed1FrameCreationStage(model_dir="/models", variant="224p", target_fps=2.0, verbose=True))
19pipe.add_stage(CosmosEmbed1EmbeddingStage(model_dir="/models", variant="224p", gpu_memory_gb=20.0, verbose=True))
20pipe.run()

Embedding Options

Cosmos-Embed1

  1. Add CosmosEmbed1FrameCreationStage to transform extracted frames into model-ready tensors.

    1from nemo_curator.stages.video.embedding.cosmos_embed1 import (
    2 CosmosEmbed1FrameCreationStage,
    3 CosmosEmbed1EmbeddingStage,
    4)
    5
    6frames = CosmosEmbed1FrameCreationStage(
    7 model_dir="/models",
    8 variant="224p", # or 336p, 448p
    9 target_fps=2.0,
    10 verbose=True,
    11)
  2. Add CosmosEmbed1EmbeddingStage to generate clip.cosmos_embed1_embedding and optional clip.cosmos_embed1_text_match.

    1embed = CosmosEmbed1EmbeddingStage(
    2 model_dir="/models",
    3 variant="224p",
    4 gpu_memory_gb=20.0,
    5 verbose=True,
    6)

Parameters

CosmosEmbed1FrameCreationStage
CosmosEmbed1EmbeddingStage
ParameterTypeDefaultDescription
model_dirstr"models/cosmos_embed1"Directory for model utilities and configs used to format input frames.
variant448p"336p"Resolution preset that controls the model’s expected input size.
target_fpsfloat2.0Source sampling rate used to select frames; may re-extract at higher FPS if needed.
num_cpusint3CPU cores used when on-the-fly re-extraction is required.
verboseboolFalseLog per-clip decisions and re-extraction messages.

Outputs

  • clip.cosmos_embed1_frames → temporary tensors used by the embedding stage
  • clip.cosmos_embed1_embedding → final clip-level vector (NumPy array)
  • Optional: clip.cosmos_embed1_text_match

Troubleshooting

  • Not enough frames for embeddings: Increase target_fps during frame extraction or adjust clip length so that the model receives the required number of frames.
  • Out of memory during embedding: Lower gpu_memory_gb, reduce batch size if exposed, or use a smaller resolution variant.
  • Weights not found on node: Confirm model_dir and network access. The stages download weights if missing.

Next Steps

  • Use embeddings for duplicate removal. Refer to Duplicate Removal.
  • Generate captions and previews for review workflows. Refer to Captions & Preview.