> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/curator/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/curator/llms-full.txt.

# nemo_curator.stages.audio.alm.alm_data_overlap

ALM Data Overlap Stage - Native NeMo Curator Implementation.

Filters overlapping windows based on threshold.
Follows the exact pattern from NeMo Curator:
[https://github.com/NVIDIA-NeMo/Curator/blob/main/nemo\_curator/stages/audio/common.py](https://github.com/NVIDIA-NeMo/Curator/blob/main/nemo_curator/stages/audio/common.py)

Produces identical output to SDP implementation.

## Module Contents

### Classes

| Name                                                                                         | Description                     |
| -------------------------------------------------------------------------------------------- | ------------------------------- |
| [`ALMDataOverlapStage`](#nemo_curator-stages-audio-alm-alm_data_overlap-ALMDataOverlapStage) | Filter overlapping ALM windows. |

### Functions

| Name                                                                                                       | Description                                                                   |
| ---------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------- |
| [`_calculate_duration_list`](#nemo_curator-stages-audio-alm-alm_data_overlap-_calculate_duration_list)     | Calculate list of durations from windows data.                                |
| [`_calculate_timestamps`](#nemo_curator-stages-audio-alm-alm_data_overlap-_calculate_timestamps)           | Calculate (end, start) timestamp pairs from windows data.                     |
| [`_calculate_total_dur`](#nemo_curator-stages-audio-alm-alm_data_overlap-_calculate_total_dur)             | Calculate total duration from windows data.                                   |
| [`_filter_segments`](#nemo_curator-stages-audio-alm-alm_data_overlap-_filter_segments)                     | Filter out segments that have overlap greater than threshold.                 |
| [`_get_filepath_from_stats`](#nemo_curator-stages-audio-alm-alm_data_overlap-_get_filepath_from_stats)     | -                                                                             |
| [`_get_filtered_windows`](#nemo_curator-stages-audio-alm-alm_data_overlap-_get_filtered_windows)           | Get complete window objects that correspond to filtered timestamps.           |
| [`_overlap_ratio`](#nemo_curator-stages-audio-alm-alm_data_overlap-_overlap_ratio)                         | Calculate overlap ratio between two segments (stored as (end, start) tuples). |
| [`_process_filtered_dur`](#nemo_curator-stages-audio-alm-alm_data_overlap-_process_filtered_dur)           | Get total duration of qualified segments.                                     |
| [`_process_filtered_dur_list`](#nemo_curator-stages-audio-alm-alm_data_overlap-_process_filtered_dur_list) | Get duration list of qualified segments.                                      |

### Data

[`MAX_OVERLAP_PERCENTAGE`](#nemo_curator-stages-audio-alm-alm_data_overlap-MAX_OVERLAP_PERCENTAGE)

### API

<Anchor id="nemo_curator-stages-audio-alm-alm_data_overlap-ALMDataOverlapStage">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    class nemo_curator.stages.audio.alm.alm_data_overlap.ALMDataOverlapStage(
        name: str = 'alm_data_overlap',
        overlap_percentage: int = 0,
        target_duration: float = 120.0
    )
    ```
  </CodeBlock>
</Anchor>

<Indent>
  <Badge>
    Dataclass
  </Badge>

  **Bases:** [ProcessingStage\[AudioTask, AudioTask\]](/nemo-curator/nemo_curator/stages/base#nemo_curator-stages-base-ProcessingStage)

  Filter overlapping ALM windows.

  Removes windows with overlap exceeding the threshold, keeping
  windows closest to target duration.

  <ParamField path="name" type="str = 'alm_data_overlap'" />

  <ParamField path="overlap_percentage" type="int = 0" />

  <ParamField path="target_duration" type="float = 120.0" />

  <Anchor id="nemo_curator-stages-audio-alm-alm_data_overlap-ALMDataOverlapStage-__post_init__">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.stages.audio.alm.alm_data_overlap.ALMDataOverlapStage.__post_init__() -> None
      ```
    </CodeBlock>
  </Anchor>

  <Indent>
    Validate parameters.
  </Indent>

  <Anchor id="nemo_curator-stages-audio-alm-alm_data_overlap-ALMDataOverlapStage-_filter_overlaps">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.stages.audio.alm.alm_data_overlap.ALMDataOverlapStage._filter_overlaps(
          entry: dict[str, typing.Any]
      ) -> dict[str, typing.Any]
      ```
    </CodeBlock>
  </Anchor>

  <Indent>
    Filter overlapping windows from entry.
  </Indent>

  <Anchor id="nemo_curator-stages-audio-alm-alm_data_overlap-ALMDataOverlapStage-inputs">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.stages.audio.alm.alm_data_overlap.ALMDataOverlapStage.inputs() -> tuple[list[str], list[str]]
      ```
    </CodeBlock>
  </Anchor>

  <Indent />

  <Anchor id="nemo_curator-stages-audio-alm-alm_data_overlap-ALMDataOverlapStage-process">
    <CodeBlock links={{"nemo_curator.tasks.AudioTask":"/nemo-curator/nemo_curator/tasks/audio_task#nemo_curator-tasks-audio_task-AudioTask"}} showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.stages.audio.alm.alm_data_overlap.ALMDataOverlapStage.process(
          task: nemo_curator.tasks.AudioTask
      ) -> nemo_curator.tasks.AudioTask
      ```
    </CodeBlock>
  </Anchor>

  <Indent />
</Indent>

<Anchor id="nemo_curator-stages-audio-alm-alm_data_overlap-_calculate_duration_list">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.stages.audio.alm.alm_data_overlap._calculate_duration_list(
        windows: list[dict[str, typing.Any]]
    ) -> list[float]
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Calculate list of durations from windows data.
</Indent>

<Anchor id="nemo_curator-stages-audio-alm-alm_data_overlap-_calculate_timestamps">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.stages.audio.alm.alm_data_overlap._calculate_timestamps(
        windows: list[dict[str, typing.Any]]
    ) -> list[tuple[float, float]]
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Calculate (end, start) timestamp pairs from windows data.
</Indent>

<Anchor id="nemo_curator-stages-audio-alm-alm_data_overlap-_calculate_total_dur">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.stages.audio.alm.alm_data_overlap._calculate_total_dur(
        windows: list[dict[str, typing.Any]]
    ) -> float
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Calculate total duration from windows data.
</Indent>

<Anchor id="nemo_curator-stages-audio-alm-alm_data_overlap-_filter_segments">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.stages.audio.alm.alm_data_overlap._filter_segments(
        segments: list[tuple[float, float]],
        threshold: float,
        target_duration: float
    ) -> list[tuple[float, float]]
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Filter out segments that have overlap greater than threshold.
</Indent>

<Anchor id="nemo_curator-stages-audio-alm-alm_data_overlap-_get_filepath_from_stats">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.stages.audio.alm.alm_data_overlap._get_filepath_from_stats(
        stats: dict[str, typing.Any] | None,
        key: str
    ) -> str | None
    ```
  </CodeBlock>
</Anchor>

<Indent />

<Anchor id="nemo_curator-stages-audio-alm-alm_data_overlap-_get_filtered_windows">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.stages.audio.alm.alm_data_overlap._get_filtered_windows(
        windows: list[dict[str, typing.Any]],
        filtered_timestamps: list[tuple[float, float]]
    ) -> list[dict[str, typing.Any]]
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Get complete window objects that correspond to filtered timestamps.
</Indent>

<Anchor id="nemo_curator-stages-audio-alm-alm_data_overlap-_overlap_ratio">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.stages.audio.alm.alm_data_overlap._overlap_ratio(
        seg1: tuple[float, float],
        seg2: tuple[float, float]
    ) -> float
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Calculate overlap ratio between two segments (stored as (end, start) tuples).
</Indent>

<Anchor id="nemo_curator-stages-audio-alm-alm_data_overlap-_process_filtered_dur">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.stages.audio.alm.alm_data_overlap._process_filtered_dur(
        timestamps: list[tuple[float, float]]
    ) -> float
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Get total duration of qualified segments.
</Indent>

<Anchor id="nemo_curator-stages-audio-alm-alm_data_overlap-_process_filtered_dur_list">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.stages.audio.alm.alm_data_overlap._process_filtered_dur_list(
        timestamps: list[tuple[float, float]]
    ) -> list[float]
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Get duration list of qualified segments.
</Indent>

<Anchor id="nemo_curator-stages-audio-alm-alm_data_overlap-MAX_OVERLAP_PERCENTAGE">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.stages.audio.alm.alm_data_overlap.MAX_OVERLAP_PERCENTAGE = 100
    ```
  </CodeBlock>
</Anchor>