For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI Reference
DocumentationAPI Reference
  • API Reference
    • Overview
        • Nemo Curator
          • Backends
          • Config
          • Core
          • Metrics
          • Models
          • Package Info
          • Pipeline
          • Stages
            • Audio
              • Advanced Pipelines
              • Alm
              • Common
              • Datasets
              • Filtering
              • Inference
                • Asr
                • Sortformer
                • Speaker Diarization
                  • Pyannote
                • Vad
              • Io
              • Metrics
              • Postprocessing
              • Preprocessing
              • Segmentation
              • Tagging
            • Base
            • Client Partitioning
            • Deduplication
            • File Partitioning
            • Function Decorators
            • Image
            • Interleaved
            • Math
            • Resources
            • Synthetic
            • Text
            • Video
          • Tasks
          • Utils
    • Pipeline
    • ProcessingStage
    • CompositeStage
    • Resources
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Curator
On this page
  • Module Contents
  • Classes
  • Functions
  • API
API ReferenceFull Library ReferenceNemo CuratorNemo CuratorStagesAudioInferenceSpeaker Diarization

nemo_curator.stages.audio.inference.speaker_diarization.pyannote

||View as Markdown|
Previous

nemo_curator.stages.audio.inference.speaker_diarization

Next

nemo_curator.stages.audio.inference.vad

PyAnnote Diarization and Overlap Detection Stage.

Module Contents

Classes

NameDescription
PyAnnoteDiarizationStageStage that performs speaker diarization and overlap detection using PyAnnote.

Functions

NameDescription
has_overlapCheck if a given turn overlaps with any segment in the overlaps list.

API

class nemo_curator.stages.audio.inference.speaker_diarization.pyannote.PyAnnoteDiarizationStage(
hf_token: str,
model_name: str = 'pyannote/speaker-diarizati...,
segmentation_batch_size: int = 128,
embedding_batch_size: int = 128,
min_length: float = 0.5,
max_length: float = 40.0,
audio_filepath_key: str = 'resampled_audio_filepath',
segments_key: str = 'segments',
overlap_segments_key: str = 'overlap_segments',
name: str = 'PyAnnoteDiarization',
resources: nemo_curator.stages.resources.Resources = (lambda: Resources(gpus=1))(),
xenna_num_workers: int | None = None,
_pipeline: typing.Any = None,
_vad_model: typing.Any = None,
_rng: random.Random | None = None
)
Dataclass

Bases: ProcessingStage[AudioTask, AudioTask]

Stage that performs speaker diarization and overlap detection using PyAnnote.

Identifies different speakers and detects overlapping speech segments.

Parameters:

hf_token
str

HuggingFace authentication token

segmentation_batch_size
intDefaults to 128

Batch size for segmentation

embedding_batch_size
intDefaults to 128

Batch size for speaker embeddings

min_length
floatDefaults to 0.5

Minimum segment length in seconds

max_length
floatDefaults to 40.0

Maximum segment length in seconds

xenna_num_workers
int | NoneDefaults to None

If set, passes num_workers to Xenna (cluster-wide cap). Unset uses Xenna autoscaling.

_device
str

Derive device from resources configuration.

_pipeline
Any = field(default=None, repr=False)
_rng
Random | None = field(default=None, repr=False)
_vad_model
Any = field(default=None, repr=False)
audio_filepath_key
str = 'resampled_audio_filepath'
embedding_batch_size
int = 128
hf_token
str
max_length
float = 40.0
min_length
float = 0.5
model_name
str = 'pyannote/speaker-diarization-3.1'
name
str = 'PyAnnoteDiarization'
overlap_segments_key
str = 'overlap_segments'
resources
Resources = field(default_factory=(lambda: Resources(gpus=1)))
segmentation_batch_size
int = 128
segments_key
str = 'segments'
xenna_num_workers
int | None = None
nemo_curator.stages.audio.inference.speaker_diarization.pyannote.PyAnnoteDiarizationStage.add_vad_segments(
audio: torch.Tensor,
fs: int,
start: float,
end: float,
segments: list[dict],
speaker_id: str
) -> None

Add VAD segments for a given audio region to the segments list.

nemo_curator.stages.audio.inference.speaker_diarization.pyannote.PyAnnoteDiarizationStage.inputs() -> tuple[list[str], list[str]]
nemo_curator.stages.audio.inference.speaker_diarization.pyannote.PyAnnoteDiarizationStage.outputs() -> tuple[list[str], list[str]]
nemo_curator.stages.audio.inference.speaker_diarization.pyannote.PyAnnoteDiarizationStage.process(
task: nemo_curator.tasks.AudioTask
) -> nemo_curator.tasks.AudioTask

Process a single entry for diarization and overlap detection.

nemo_curator.stages.audio.inference.speaker_diarization.pyannote.PyAnnoteDiarizationStage.setup(
_: nemo_curator.backends.base.WorkerMetadata | None = None
) -> None

Load models to device (called per replica before processing).

nemo_curator.stages.audio.inference.speaker_diarization.pyannote.PyAnnoteDiarizationStage.setup_on_node(
_node_info: nemo_curator.backends.base.NodeInfo | None = None,
_worker_metadata: nemo_curator.backends.base.WorkerMetadata | None = None
) -> None

Download model weights (called once per node).

nemo_curator.stages.audio.inference.speaker_diarization.pyannote.PyAnnoteDiarizationStage.xenna_stage_spec() -> dict[str, typing.Any]
nemo_curator.stages.audio.inference.speaker_diarization.pyannote.has_overlap(
turn: pyannote.core.Segment,
overlaps: list
) -> bool

Check if a given turn overlaps with any segment in the overlaps list.

Parameters:

turn
Segment

A segment representing a speech turn

overlaps
list

List of overlap segments, sorted by start time

Returns: bool

True if the turn overlaps with any segment, False otherwise