***

layout: overview
slug: nemo-curator/nemo\_curator/tasks/video
title: nemo\_curator.tasks.video
--------------------------------

## Module Contents

### Classes

| Name                                                       | Description                                                                          |
| ---------------------------------------------------------- | ------------------------------------------------------------------------------------ |
| [`Clip`](#nemo_curator-tasks-video-Clip)                   | Container for video clip data including metadata, frames, and processing results.    |
| [`ClipStats`](#nemo_curator-tasks-video-ClipStats)         | Statistics for video clips including filtering, transcoding, and captioning results. |
| [`Video`](#nemo_curator-tasks-video-Video)                 | Container for video content including metadata, frames, and processing results.      |
| [`VideoMetadata`](#nemo_curator-tasks-video-VideoMetadata) | Metadata for video content including dimensions, timing, and codec information.      |
| [`VideoTask`](#nemo_curator-tasks-video-VideoTask)         | Task for processing a single video.                                                  |
| [`_Window`](#nemo_curator-tasks-video-_Window)             | Container for video window data including metadata, frames, and processing results.  |

### API

<Anchor id="nemo_curator-tasks-video-Clip">
  <CodeBlock links={{"nemo_curator.tasks.video._Window":"#nemo_curator-tasks-video-_Window"}} showLineNumbers={false} wordWrap={true}>
    ```python
    class nemo_curator.tasks.video.Clip(
        uuid: uuid.UUID,
        source_video: str,
        span: tuple[float, float],
        buffer: bytes | None = None,
        extracted_frames: dict[str, numpy.typing.NDArray[numpy.uint8]] = dict(),
        decoded_motion_data: None = None,
        motion_score_global_mean: float | None = None,
        motion_score_per_patch_min_256: float | None = None,
        aesthetic_score: float | None = None,
        cosmos_embed1_frames: numpy.typing.NDArray[numpy.float32] | None = None,
        cosmos_embed1_embedding: numpy.typing.NDArray[numpy.float32] | None = None,
        windows: list[nemo_curator.tasks.video._Window] = list(),
        egomotion: dict[str, bytes] = dict(),
        cosmos_embed1_text_match: tuple[str, float] | None = None,
        errors: dict[str, str] = dict()
    )
    ```
  </CodeBlock>
</Anchor>

<Indent>
  <Badge>
    Dataclass
  </Badge>

  Container for video clip data including metadata, frames, and processing results.

  This class stores information about a video segment, including its source, timing,
  extracted frames, motion data, aesthetic scores, and generated captions.

  <ParamField path="aesthetic_score" type="float | None = None" />

  <ParamField path="buffer" type="bytes | None = None" />

  <ParamField path="cosmos_embed1_embedding" type="NDArray[float32] | None = None" />

  <ParamField path="cosmos_embed1_frames" type="NDArray[float32] | None = None" />

  <ParamField path="cosmos_embed1_text_match" type="tuple[str, float] | None = None" />

  <ParamField path="decoded_motion_data" type="None = None" />

  <ParamField path="duration" type="float">
    Calculate the duration of the clip.
  </ParamField>

  <ParamField path="egomotion" type="dict[str, bytes] = field(default_factory=dict)" />

  <ParamField path="errors" type="dict[str, str] = field(default_factory=dict)" />

  <ParamField path="extracted_frames" type="dict[str, NDArray[uint8]] = field(default_factory=dict)" />

  <ParamField path="motion_score_global_mean" type="float | None = None" />

  <ParamField path="motion_score_per_patch_min_256" type="float | None = None" />

  <ParamField path="source_video" type="str" />

  <ParamField path="span" type="tuple[float, float]" />

  <ParamField path="uuid" type="UUID" />

  <ParamField path="windows" type="list[_Window] = field(default_factory=list)" />

  <Anchor id="nemo_curator-tasks-video-Clip-extract_metadata">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.tasks.video.Clip.extract_metadata() -> dict[str, typing.Any] | None
      ```
    </CodeBlock>
  </Anchor>

  <Indent>
    Extract metadata from the clip's buffer.

    **Returns:** `dict[str, Any] | None`

    A dictionary containing the extracted metadata (width, height, framerate,

    **Raises:**

    * `Exception`: Any exception from extract\_video\_metadata is propagated.
  </Indent>

  <Anchor id="nemo_curator-tasks-video-Clip-get_major_size">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.tasks.video.Clip.get_major_size() -> int
      ```
    </CodeBlock>
  </Anchor>

  <Indent>
    Calculate total memory size of the clip.

    **Returns:** `int`

    Total size in bytes.
  </Indent>
</Indent>

<Anchor id="nemo_curator-tasks-video-ClipStats">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    class nemo_curator.tasks.video.ClipStats(
        num_filtered_by_motion: int = 0,
        num_filtered_by_aesthetic: int = 0,
        num_passed: int = 0,
        num_transcoded: int = 0,
        num_with_embeddings: int = 0,
        num_with_caption: int = 0,
        num_with_webp: int = 0,
        total_clip_duration: float = 0.0,
        max_clip_duration: float = 0.0
    )
    ```
  </CodeBlock>
</Anchor>

<Indent>
  <Badge>
    Dataclass
  </Badge>

  Statistics for video clips including filtering, transcoding, and captioning results.

  This class accumulates statistics about the number of clips processed through
  different stages of the video processing pipeline, including motion filtering,
  aesthetic filtering, and captioning.

  <ParamField path="max_clip_duration" type="float = 0.0" />

  <ParamField path="num_filtered_by_aesthetic" type="int = 0" />

  <ParamField path="num_filtered_by_motion" type="int = 0" />

  <ParamField path="num_passed" type="int = 0" />

  <ParamField path="num_transcoded" type="int = 0" />

  <ParamField path="num_with_caption" type="int = 0" />

  <ParamField path="num_with_embeddings" type="int = 0" />

  <ParamField path="num_with_webp" type="int = 0" />

  <ParamField path="total_clip_duration" type="float = 0.0" />

  <Anchor id="nemo_curator-tasks-video-ClipStats-combine">
    <CodeBlock links={{"nemo_curator.tasks.video.ClipStats":"#nemo_curator-tasks-video-ClipStats"}} showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.tasks.video.ClipStats.combine(
          other: nemo_curator.tasks.video.ClipStats
      ) -> None
      ```
    </CodeBlock>
  </Anchor>

  <Indent>
    Combine two ClipStats objects.

    **Parameters:**

    <ParamField path="other" type="ClipStats">
      ClipStats object to combine with.
    </ParamField>
  </Indent>
</Indent>

<Anchor id="nemo_curator-tasks-video-Video">
  <CodeBlock links={{"nemo_curator.tasks.video.VideoMetadata":"#nemo_curator-tasks-video-VideoMetadata","nemo_curator.tasks.video.Clip":"#nemo_curator-tasks-video-Clip","nemo_curator.tasks.video.ClipStats":"#nemo_curator-tasks-video-ClipStats"}} showLineNumbers={false} wordWrap={true}>
    ```python
    class nemo_curator.tasks.video.Video(
        input_video: pathlib.Path,
        source_bytes: bytes | None = None,
        metadata: nemo_curator.tasks.video.VideoMetadata = VideoMetadata(),
        frame_array: numpy.typing.NDArray[numpy.uint8] | None = None,
        clips: list[nemo_curator.tasks.video.Clip] = list(),
        filtered_clips: list[nemo_curator.tasks.video.Clip] = list(),
        num_total_clips: int = 0,
        num_clip_chunks: int = 0,
        clip_chunk_index: int = 0,
        clip_stats: nemo_curator.tasks.video.ClipStats = ClipStats(),
        errors: dict[str, str] = dict()
    )
    ```
  </CodeBlock>
</Anchor>

<Indent>
  <Badge>
    Dataclass
  </Badge>

  Container for video content including metadata, frames, and processing results.

  This class stores information about a video segment, including its source, timing,
  extracted frames, motion data, aesthetic scores, and generated captions.

  <ParamField path="clip_chunk_index" type="int = 0" />

  <ParamField path="clip_stats" type="ClipStats = field(default_factory=ClipStats)" />

  <ParamField path="clips" type="list[Clip] = field(default_factory=list)" />

  <ParamField path="errors" type="dict[str, str] = field(default_factory=dict)" />

  <ParamField path="filtered_clips" type="list[Clip] = field(default_factory=list)" />

  <ParamField path="fraction" type="float">
    Calculate the fraction of processed clips.
  </ParamField>

  <ParamField path="frame_array" type="NDArray[uint8] | None = None" />

  <ParamField path="input_path" type="str">
    Get the input path of the video.
  </ParamField>

  <ParamField path="input_video" type="Path" />

  <ParamField path="metadata" type="VideoMetadata = field(default_factory=VideoMetadata)" />

  <ParamField path="num_clip_chunks" type="int = 0" />

  <ParamField path="num_total_clips" type="int = 0" />

  <ParamField path="source_bytes" type="bytes | None = None" />

  <ParamField path="weight" type="float">
    Calculate the weight of the video.
  </ParamField>

  <Anchor id="nemo_curator-tasks-video-Video-get_major_size">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.tasks.video.Video.get_major_size() -> int
      ```
    </CodeBlock>
  </Anchor>

  <Indent>
    Calculate total memory size of the video.

    **Returns:** `int`

    Total size in bytes.
  </Indent>

  <Anchor id="nemo_curator-tasks-video-Video-has_metadata">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.tasks.video.Video.has_metadata() -> bool
      ```
    </CodeBlock>
  </Anchor>

  <Indent>
    Check if all metadata fields are present.

    **Returns:** `bool`

    True if all metadata fields are present, False otherwise.
  </Indent>

  <Anchor id="nemo_curator-tasks-video-Video-is_10_bit_color">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.tasks.video.Video.is_10_bit_color() -> bool | None
      ```
    </CodeBlock>
  </Anchor>

  <Indent>
    Heuristic function to determine if the input video has 10-bit color.
  </Indent>

  <Anchor id="nemo_curator-tasks-video-Video-populate_metadata">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.tasks.video.Video.populate_metadata() -> None
      ```
    </CodeBlock>
  </Anchor>

  <Indent>
    Extract and assign video metadata from source\_bytes.

    This method extracts metadata from the video data in source\_bytes and
    assigns it to self.metadata.

    **Raises:**

    * `ValueError`: If source\_bytes is None.
    * `Exception`: Any exception from extract\_video\_metadata is propagated.
  </Indent>
</Indent>

<Anchor id="nemo_curator-tasks-video-VideoMetadata">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    class nemo_curator.tasks.video.VideoMetadata(
        size: int | None = None,
        height: int | None = None,
        width: int | None = None,
        framerate: float | None = None,
        num_frames: int | None = None,
        duration: float | None = None,
        video_codec: str | None = None,
        pixel_format: str | None = None,
        audio_codec: str | None = None,
        bit_rate_k: int | None = None
    )
    ```
  </CodeBlock>
</Anchor>

<Indent>
  <Badge>
    Dataclass
  </Badge>

  Metadata for video content including dimensions, timing, and codec information.

  This class stores essential video properties such as resolution, frame rate,
  duration, and encoding details.

  <ParamField path="audio_codec" type="str | None = None" />

  <ParamField path="bit_rate_k" type="int | None = None" />

  <ParamField path="duration" type="float | None = None" />

  <ParamField path="framerate" type="float | None = None" />

  <ParamField path="height" type="int | None = None" />

  <ParamField path="num_frames" type="int | None = None" />

  <ParamField path="pixel_format" type="str | None = None" />

  <ParamField path="size" type="int | None = None" />

  <ParamField path="video_codec" type="str | None = None" />

  <ParamField path="width" type="int | None = None" />
</Indent>

<Anchor id="nemo_curator-tasks-video-VideoTask">
  <CodeBlock links={{"nemo_curator.tasks.video.Video":"#nemo_curator-tasks-video-Video","nemo_curator.utils.performance_utils.StagePerfStats":"/nemo-curator/nemo_curator/utils/performance_utils#nemo_curator-utils-performance_utils-StagePerfStats"}} showLineNumbers={false} wordWrap={true}>
    ```python
    class nemo_curator.tasks.video.VideoTask(
        task_id: str,
        dataset_name: str,
        data: nemo_curator.tasks.video.Video = Video(),
        _stage_perf: list[nemo_curator.utils.performance_utils.StagePerfStats] = list(),
        _metadata: dict[str, typing.Any] = dict()
    )
    ```
  </CodeBlock>
</Anchor>

<Indent>
  <Badge>
    Dataclass
  </Badge>

  **Bases:** [Task\[Video\]](/nemo-curator/nemo_curator/tasks/tasks#nemo_curator-tasks-tasks-Task)

  Task for processing a single video.

  <ParamField path="data" type="Video = field(default_factory=Video)" />

  <ParamField path="num_items" type="int">
    Get the number of items in this task.
  </ParamField>

  <Anchor id="nemo_curator-tasks-video-VideoTask-validate">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.tasks.video.VideoTask.validate() -> bool
      ```
    </CodeBlock>
  </Anchor>

  <Indent>
    Validate the task data.
  </Indent>
</Indent>

<Anchor id="nemo_curator-tasks-video-_Window">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    class nemo_curator.tasks.video._Window(
        start_frame: int,
        end_frame: int,
        mp4_bytes: bytes | None = None,
        qwen_llm_input: dict[str, typing.Any] | None = None,
        x1_input: typing.Any | None = None,
        caption: dict[str, str] = dict(),
        enhanced_caption: dict[str, str] = dict(),
        webp_bytes: bytes | None = None
    )
    ```
  </CodeBlock>
</Anchor>

<Indent>
  <Badge>
    Dataclass
  </Badge>

  Container for video window data including metadata, frames, and processing results.

  This class stores information about a video window, including its source, timing,
  extracted frames, motion data, aesthetic scores, and generated captions.

  <ParamField path="caption" type="dict[str, str] = field(default_factory=dict)" />

  <ParamField path="end_frame" type="int" />

  <ParamField path="enhanced_caption" type="dict[str, str] = field(default_factory=dict)" />

  <ParamField path="mp4_bytes" type="bytes | None = None" />

  <ParamField path="qwen_llm_input" type="dict[str, Any] | None = None" />

  <ParamField path="start_frame" type="int" />

  <ParamField path="webp_bytes" type="bytes | None = None" />

  <ParamField path="x1_input" type="Any | None = None" />

  <Anchor id="nemo_curator-tasks-video-_Window-get_major_size">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.tasks.video._Window.get_major_size() -> int
      ```
    </CodeBlock>
  </Anchor>

  <Indent>
    Calculate total memory size of the window.

    **Returns:** `int`

    Total size in bytes.
  </Indent>
</Indent>
