nemo_curator.stages.audio.tagging.merge_alignment_diarization
nemo_curator.stages.audio.tagging.merge_alignment_diarization
Merge Alignment and Diarization Stage.
Module Contents
Classes
API
Bases: ProcessingStage[AudioTask, AudioTask]
Stage that merges alignment and diarization information.
Takes a jsonl data containing both alignment and diarization information and merges the alignment info into the diarization segments.
Example: .. code-block:: yaml
- target: nemo_curator.stages.audio.tagging.merge_alignment_diarization.MergeAlignmentDiarizationStage text_key: “text” words_key: “words”
Parameters:
Key to add text to segments
Key to add word alignments to segments
Returns:
The same data as in the input manifest, but with alignment information merged into
Align words to segments based on timestamps.
Iterates through the alignment and finds words that belong in each segment, joining them together to form the text for the segment.
Alignment example: [ { “word”: “Hello”, “start”: 0.0, “end”: 1.0 },… ]
Segments example: [ { “speaker”: “speaker1”, “start”: 0.0, “end”: 3.0 },… ]
Output: [ { “speaker”: “speaker1”, “start”: 0.0, “end”: 3.0, “text”: “Hello there”, “words”: [ { “word”: “Hello”, “start”: 0.0, “end”: 1.0 }, { “word”: “there”, “start”: 1.0, “end”: 3.0 },… ] },… ]
Parameters:
List of words with start and end times
List of segments with start and end times
Key to add text to segment
Key to add words to segment
Returns: None
None
Process entry to merge alignment and diarization.