> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/curator/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/curator/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/curator/_mcp/server.

> Beginner-friendly tutorial for running your first Ray-based video splitting pipeline using the Python example

# Create a Video Pipeline

Learn the basics of creating a video pipeline in Curator by following a split-and-clip pipeline example.

```{contents} Tutorial Steps:
:local:
:depth: 2
```

## Before You Start

* Follow the [Get Started guide](/get-started/video) to install the package, prepare the model directory, and set up your data paths.

### Concepts and Mental Model

Use this overview to understand how stages pass data through the pipeline.

```mermaid
flowchart LR
  V[Videos] --> R[VideoReader]
  R --> S1[Split into clips]
  S1 --> T[Encode/Transcode]
  T --> F[Frame extraction]
  F --> E[Embeddings]
  T --> W[Write clips/metadata]
  E --> W
  classDef dim fill:#f6f8fa,stroke:#d0d7de,color:#24292f;
  class R,S1,T,F,E,W dim;
```

* **Pipeline**: An ordered list of stages that process data.
* **Stage**: A modular operation (for example, read, split, encode, embed, write).
* **Executor**: Runs the pipeline (Ray/Xenna backend).
* **Data units**: Input videos → clip windows → frames → embeddings + files.
* **Common choices**:
  * **Splitting**: fixed stride vs. scene-change (TransNetV2)
  * **Encoding**: `h264_nvenc` (NVENC-equipped GPU) or `libvpx-vp9` (CPU fallback for non-NVENC GPUs such as A100/H100)
  * **Embeddings**: Cosmos-Embed1
* **Outputs**: Clips (mp4), previews (optional), and parquet embeddings for downstream tasks (such as semantic duplicate removal).

For more information, refer to the [Video Concepts](/about/concepts/video) section.

***

## 1. Define Imports and Paths

Import required classes and define paths used throughout the example.

```python
from nemo_curator.pipeline import Pipeline

from nemo_curator.stages.video.io.video_reader import VideoReader
from nemo_curator.stages.video.clipping.clip_extraction_stages import (
    FixedStrideExtractorStage,
    ClipTranscodingStage,
)
from nemo_curator.stages.video.clipping.clip_frame_extraction import (
    ClipFrameExtractionStage,
)
from nemo_curator.utils.decoder_utils import FrameExtractionPolicy, FramePurpose
from nemo_curator.stages.video.embedding.cosmos_embed1 import (
    CosmosEmbed1FrameCreationStage,
    CosmosEmbed1EmbeddingStage,
)
from nemo_curator.stages.video.io.clip_writer import ClipWriterStage

VIDEO_DIR = "/path/to/videos"
MODEL_DIR = "/path/to/models"
OUT_DIR = "/path/to/output_clips"
```

## 2. Create the Pipeline

Instantiate a named pipeline to orchestrate the stages.

```python
pipeline = Pipeline(name="video_splitting", description="Split videos into clips")
```

## 3. Define Stages

Add modular stages to read, split, encode, extract frames, embed, and write outputs.

### Read Input Videos

Read videos from storage and extract metadata to prepare for clipping.

```python
pipeline.add_stage(
    VideoReader(input_video_path=VIDEO_DIR, video_limit=None, verbose=True)
)
```

### Split into Clips

[Create clip windows](/curate-video/process-data/clipping) using fixed intervals or scene-change detection.

```python
pipeline.add_stage(
    FixedStrideExtractorStage(
        clip_len_s=10.0,
        clip_stride_s=10.0,
        min_clip_length_s=2.0,
        limit_clips=0,
    )
)
```

```python
from nemo_curator.stages.video.clipping.video_frame_extraction import VideoFrameExtractionStage
from nemo_curator.stages.video.clipping.transnetv2_extraction import TransNetV2ClipExtractionStage

pipeline.add_stage(VideoFrameExtractionStage(decoder_mode="pynvc", verbose=True))
pipeline.add_stage(
    TransNetV2ClipExtractionStage(
        model_dir=MODEL_DIR,
        threshold=0.4,
        min_length_s=2.0,
        max_length_s=10.0,
        max_length_mode="stride",
        crop_s=0.5,
        gpu_memory_gb=10,
        limit_clips=0,
        verbose=True,
    )
)
```

### Encode Clips

Convert clip buffers using the selected encoder and settings. Choose `h264_nvenc` on NVENC-equipped GPUs or `libvpx-vp9` (CPU) on GPUs without NVENC such as A100/H100. Refer to [Clip Encoding](/curate-video/process-data/transcoding) for encoder details and NVENC setup.

```python
pipeline.add_stage(
    ClipTranscodingStage(
        num_cpus_per_worker=6.0,
        encoder="h264_nvenc",  # or "libvpx-vp9" for non-NVENC GPUs
        encoder_threads=1,
        encode_batch_size=16,
        use_hwaccel=True,
        use_input_bit_rate=False,
        num_clips_per_chunk=32,
        verbose=True,
    )
)
```

### Prepare Frames for Embeddings (Optional)

[Extract frames](/curate-video/process-data/frame-extraction) at target rates for downstream embedding models.

```python
pipeline.add_stage(
    ClipFrameExtractionStage(
        extraction_policies=(FrameExtractionPolicy.sequence,),
        extract_purposes=(FramePurpose.EMBEDDINGS,),
        target_res=(-1, -1),  # no resize
        verbose=True,
    )
)
```

### Generate Embeddings (Cosmos-Embed1)

Create Cosmos-Embed1-ready frames and compute clip-level embeddings.

```python
pipeline.add_stage(
    CosmosEmbed1FrameCreationStage(model_dir=MODEL_DIR, target_fps=2.0, verbose=True)
)
pipeline.add_stage(
    CosmosEmbed1EmbeddingStage(model_dir=MODEL_DIR, gpu_memory_gb=20.0, verbose=True)
)
```

### Write Clips and Metadata

Write clips, embeddings, and metadata to the output directory. Refer to [Save & Export](/curate-video/save-export) for a full list of parameters.

```python
pipeline.add_stage(
    ClipWriterStage(
        output_path=OUT_DIR,
        input_path=VIDEO_DIR,
        upload_clips=True,
        dry_run=False,
        generate_embeddings=True,
        generate_previews=False,
        generate_captions=False,
        embedding_algorithm="cosmos-embed1",
        caption_models=[],
        enhanced_caption_models=[],
        verbose=True,
    )
)
```

When using the example pipeline module, configure the writer-related flags:

```bash
python tutorials/video/getting-started/video_split_clip_example.py \
  --video-dir "$VIDEO_DIR" \
  --model-dir "$MODEL_DIR" \
  --output-clip-path "$OUT_DIR" \
  --no-upload-clips          # optional: do not write mp4s
  --dry-run                   # optional: write nothing, validate only
  --generate-embeddings      # optional: enable embedding outputs
  --generate-captions        # optional: enable captions JSON
  --generate-previews        # optional: enable .webp previews
```

## 4. Run the Pipeline

Run the configured pipeline using the executor.

```python
pipeline.run()
```