NeMo TTS API#

Model Classes#

Mel-Spectrogram Generators#

Speech-to-Text Aligner Models#

Two-Stage Models#

Vocoders#

Codecs#

Base Classes#

The classes below are the base of the TTS pipeline.

class nemo.collections.tts.models.base.MelToSpec(*args: Any, **kwargs: Any)#

Bases: ModelPT, ABC

A base class for models that convert mel spectrograms to linear (magnitude) spectrograms

abstract convert_mel_spectrogram_to_linear(

mel: torch.tensor,

**kwargs,

) → torch.tensor#

Accepts a batch of spectrograms and returns a batch of linear spectrograms

Parameters:: mel – A torch tensor representing the mel spectrograms [‘B’, ‘mel_freqs’, ‘T’]
Returns:: A torch tensor representing the linear spectrograms [‘B’, ‘n_freqs’, ‘T’]
Return type:: spec

classmethod list_available_models() → List[PretrainedModelInfo]#: This method returns a list of pre-trained model which can be instantiated directly from NVIDIA’s NGC cloud. :returns: List of available pre-trained models.

class nemo.collections.tts.models.base.SpectrogramGenerator(*args: Any, **kwargs: Any)#

Bases: NeedsNormalizer, ModelPT, ABC

Base class for all TTS models that turn text into a spectrogram

abstract generate_spectrogram(

tokens: torch.tensor,

**kwargs,

) → torch.tensor#

Accepts a batch of text or text_tokens and returns a batch of spectrograms

Parameters:: tokens – A torch tensor representing the text to be generated
Returns:: spectrograms

classmethod list_available_models() → List[PretrainedModelInfo]#: This method returns a list of pre-trained model which can be instantiated directly from NVIDIA’s NGC cloud. :returns: List of available pre-trained models.

abstract parse(str_input: str, **kwargs) → torch.tensor#

A helper function that accepts raw python strings and turns them into a tensor. The tensor should have 2 dimensions. The first is the batch, which should be of size 1. The second should represent time. The tensor should represent either tokenized or embedded text, depending on the model.

Note that some models have normalize parameter in this function which will apply normalizer if it is available.

class nemo.collections.tts.models.base.Vocoder(*args: Any, **kwargs: Any)#

Bases: ModelPT, ABC

A base class for models that convert spectrograms to audios. Note that this class takes as input either linear or mel spectrograms.

abstract convert_spectrogram_to_audio(

spec: torch.tensor,

**kwargs,

) → torch.tensor#

Accepts a batch of spectrograms and returns a batch of audio.

Parameters:: spec – [‘B’, ‘n_freqs’, ‘T’], A torch tensor representing the spectrograms to be vocoded.
Returns:: audio

classmethod list_available_models() → List[PretrainedModelInfo]#: This method returns a list of pre-trained model which can be instantiated directly from NVIDIA’s NGC cloud. :returns: List of available pre-trained models.

Dataset Processing Classes#

class nemo.collections.tts.data.dataset.MixerTTSXDataset(*args: Any, **kwargs: Any)#: Bases: TTSDataset

class nemo.collections.tts.data.dataset.TTSDataset(*args: Any, **kwargs: Any)#: Bases: Dataset

class nemo.collections.tts.data.dataset.VocoderDataset(*args: Any, **kwargs: Any)#: Bases: Dataset