*** description: >- Key abstractions in video curation including stages, pipelines, and execution modes for scalable processing categories: * concepts-architecture tags: * abstractions * pipeline * stages * video-curation * distributed * ray personas: * data-scientist-focused * mle-focused difficulty: intermediate content\_type: concept modality: video-only only: not ga *** # Key Abstractions NeMo Curator introduces core abstractions to organize and scale video curation workflows: * **Pipelines**: Ordered sequences of stages forming an end-to-end workflow. * **Stages**: Individual processing units that perform a single step (for example, reading, splitting, format conversion, filtering, embedding, captioning, writing). * **Tasks**: The unit of data that flows through a pipeline (for video, `VideoTask` holding a `Video` and its `Clip` objects). * **Executors**: Components that run pipelines on a backend (Ray) with automatic scaling. ![Stages and Pipelines](https://files.buildwithfern.com/nemo-curator.docs.buildwithfern.com/nemo/curator/3fc21d7c5143ec2fdd9672c83b1448d48bb22b6c0cb4c1caf55918c171ee5cee/assets/images/stages-pipelines-diagram.png) ## Pipelines A pipeline orchestrates stages into an end-to-end workflow. Key characteristics: * **Stage Sequence**: Stages must follow a logical order where each stage's output feeds into the next * **Input Configuration**: Specifies the data source location * **Stage Configuration**: Stages accept their own parameters, including model paths and algorithm settings * **Execution Mode**: Supports streaming and batch processing through the executor ## Stages A stage represents a single step in your data curation workflow. Video stages are organized into several functional categories: * **Input/Output**: Read video files and write processed outputs to storage ([Save & Export Documentation](/curate-video/save-export)) * **Video Clipping**: Split videos into clips using fixed stride or scene-change detection ([Video Clipping Documentation](/curate-video/process-data/clipping)) * **Frame Extraction**: Extract frames from videos or clips for analysis and embeddings ([Frame Extraction Documentation](/curate-video/process-data/frame-extraction)) * **Embedding Generation**: Generate clip-level embeddings using Cosmos-Embed1 models ([Embeddings Documentation](/curate-video/process-data/embeddings)) * **Filtering**: Filter clips based on motion analysis and aesthetic quality scores ([Filtering Documentation](/curate-video/process-data/filtering)) * **Caption and Preview**: Generate captions and preview images from video clips ([Captions & Preview Documentation](/curate-video/process-data/captions-preview)) * **Deduplication**: Remove near-duplicate clips using embedding-based clustering ([Duplicate Removal Documentation](/curate-video/process-data/dedup)) ### Stage Architecture Each processing stage: 1. Inherits from `ProcessingStage` 2. Declares a stable `name` and `resources: Resources` (CPU cores, GPU memory, entire GPU flag, or multiple GPUs) 3. Defines `inputs()`/`outputs()` to document required attributes and produced attributes on tasks 4. Implements `setup(worker_metadata)` for model initialization and `process(task)` to transform tasks This design enables map-style execution with executor-managed fault tolerance and dynamic scaling per stage. Stages can optionally provide `process_batch()` to support vectorized batch processing. Composite stages provide a user-facing convenience API and decompose into one or more execution stages at build time. ```python class MyStage(ProcessingStage[X, Y]): name: str = "..." resources: Resources = Resources(...) def inputs(self) -> tuple[list[str], list[str]]: ... def outputs(self) -> tuple[list[str], list[str]]: ... def setup(self, worker_metadata: WorkerMetadata | None = None) -> None: ... def process(self, task: X) -> Y | list[Y]: ... ``` Refer to the stage base and resources definitions in Curator for full details. ### Resource Semantics `Resources` support both fractional and whole‑GPU semantics: * `gpu_memory_gb`: Request a fraction of a single GPU by memory; Curator rounds to a fractional GPU share and enforces that `gpu_memory_gb` stays within one device. * `entire_gpu`: Request an entire GPU regardless of memory (also implies access to hardware decoders and encoders on that device). * `gpus`: Request more than one GPU for a stage that is multi‑GPU aware. Choose one of `gpu_memory_gb` (single‑GPU fractional) or `gpus` (multi‑GPU). Combining both is not allowed. ## Tasks Video pipelines operate on task types defined in Curator: * `VideoTask`: Wraps a single input `Video` * `Video`: Holds decoded metadata, frames, and lists of `Clip` * `Clip`: Holds buffers, extracted frames, embeddings, and caption windows Stages transform tasks stage by stage (for example, `VideoReader` populates `Video`, splitting stages create `Clip` objects, embedding and captioning stages annotate clips, and writer stages persist outputs). ## Executors Executors run pipelines on a backend. Curator uses [`XennaExecutor`](https://github.com/nvidia-cosmos/cosmos-xenna) to translate `ProcessingStage` definitions into Cosmos-Xenna stage specifications and run them on Ray with automatic scaling. Execution modes include streaming (default) and batch.