nemo_curator.stages.audio.tagging.merge_alignment_diarization

View as Markdown

Merge Alignment and Diarization Stage.

Module Contents

Classes

NameDescription
MergeAlignmentDiarizationStageStage that merges alignment and diarization information.

API

class nemo_curator.stages.audio.tagging.merge_alignment_diarization.MergeAlignmentDiarizationStage(
text_key: str = 'text',
words_key: str = 'words',
name: str = 'MergeAlignmentDiarization'
)
Dataclass

Bases: ProcessingStage[AudioTask, AudioTask]

Stage that merges alignment and diarization information.

Takes a jsonl data containing both alignment and diarization information and merges the alignment info into the diarization segments.

Example: .. code-block:: yaml

  • target: nemo_curator.stages.audio.tagging.merge_alignment_diarization.MergeAlignmentDiarizationStage text_key: “text” words_key: “words”

Parameters:

text_key
strDefaults to 'text'

Key to add text to segments

words_key
strDefaults to 'words'

Key to add word alignments to segments

Returns:

The same data as in the input manifest, but with alignment information merged into

name
str = 'MergeAlignmentDiarization'
text_key
str = 'text'
words_key
str = 'words'
nemo_curator.stages.audio.tagging.merge_alignment_diarization.MergeAlignmentDiarizationStage.align_words_to_segments(
alignment: list[dict],
segments: list[dict],
text_key: str,
words_key: str
) -> None
staticmethod

Align words to segments based on timestamps.

Iterates through the alignment and finds words that belong in each segment, joining them together to form the text for the segment.

Alignment example: [ { “word”: “Hello”, “start”: 0.0, “end”: 1.0 },… ]

Segments example: [ { “speaker”: “speaker1”, “start”: 0.0, “end”: 3.0 },… ]

Output: [ { “speaker”: “speaker1”, “start”: 0.0, “end”: 3.0, “text”: “Hello there”, “words”: [ { “word”: “Hello”, “start”: 0.0, “end”: 1.0 }, { “word”: “there”, “start”: 1.0, “end”: 3.0 },… ] },… ]

Parameters:

alignment
list[dict]

List of words with start and end times

segments
list[dict]

List of segments with start and end times

text_key
str

Key to add text to segment

words_key
str

Key to add words to segment

Returns: None

None

nemo_curator.stages.audio.tagging.merge_alignment_diarization.MergeAlignmentDiarizationStage.inputs() -> tuple[list[str], list[str]]
nemo_curator.stages.audio.tagging.merge_alignment_diarization.MergeAlignmentDiarizationStage.outputs() -> tuple[list[str], list[str]]
nemo_curator.stages.audio.tagging.merge_alignment_diarization.MergeAlignmentDiarizationStage.process(
task: nemo_curator.tasks.AudioTask
) -> nemo_curator.tasks.AudioTask

Process entry to merge alignment and diarization.