NeMo TTS Collection API#

Model Classes#

Mel-Spectrogram Generators#

Speech-to-Text Aligner Models#

Two-Stage Models#

Vocoders#

Base Classes#

The classes below are the base of the TTS pipeline. To read more about them, see the Base Classes section of the intro page.

Dataset Processing Classes#

class nemo.collections.tts.data.dataset.MixerTTSXDataset(*args: Any, **kwargs: Any)[source]#

Bases: TTSDataset

add_lm_tokens(**kwargs)[source]#
class nemo.collections.tts.data.dataset.TTSDataset(*args: Any, **kwargs: Any)[source]#

Bases: Dataset

add_align_prior_matrix(**kwargs)[source]#
add_durations(**kwargs)[source]#
add_energy(**kwargs)[source]#
add_log_mel(**kwargs)[source]#
add_p_voiced(**kwargs)[source]#
add_pitch(**kwargs)[source]#
add_reference_audio(**kwargs)[source]#
add_speaker_id(**kwargs)[source]#
add_voiced_mask(**kwargs)[source]#
static filter_files(data, ignore_file, min_duration, max_duration, total_duration)[source]#
general_collate_fn(batch)[source]#
get_log_mel(audio)[source]#
get_spec(audio)[source]#
join_data(data_dict)[source]#
pitch_shift(audio, sr, rel_audio_path_as_text_id)[source]#
class nemo.collections.tts.data.dataset.VocoderDataset(*args: Any, **kwargs: Any)[source]#

Bases: Dataset