NeMo TTS Collection API#

TTS Base Classes#

The classes below are the base of the TTS pipeline. To read more about them, see the Base Classes section of the intro page.

class nemo.collections.tts.models.base.SpectrogramGenerator(*args: Any, **kwargs: Any)[source]#

Bases: nemo.core.classes.modelPT.ModelPT, abc.ABC

Base class for all TTS models that turn text into a spectrogram

abstract generate_spectrogram(tokens: torch.tensor, **kwargs) torch.tensor[source]#

Accepts a batch of text or text_tokens and returns a batch of spectrograms

Parameters

tokens – A torch tensor representing the text to be generated

Returns

spectrograms

classmethod list_available_models() List[PretrainedModelInfo][source]#

This method returns a list of pre-trained model which can be instantiated directly from NVIDIA’s NGC cloud. :returns: List of available pre-trained models.

abstract parse(str_input: str, **kwargs) torch.tensor[source]#

A helper function that accepts raw python strings and turns them into a tensor. The tensor should have 2 dimensions. The first is the batch, which should be of size 1. The second should represent time. The tensor should represent either tokenized or embedded text, depending on the model.

Note that some models have normalize parameter in this function which will apply normalizer if it is available.

class nemo.collections.tts.models.base.Vocoder(*args: Any, **kwargs: Any)[source]#

Bases: nemo.core.classes.modelPT.ModelPT, abc.ABC

Base class for all TTS models that generate audio conditioned a on spectrogram

abstract convert_spectrogram_to_audio(spec: torch.tensor, **kwargs) torch.tensor[source]#

Accepts a batch of spectrograms and returns a batch of audio

Parameters

spec – A torch tensor representing the spectrograms to be vocoded

Returns

audio

classmethod list_available_models() List[PretrainedModelInfo][source]#

This method returns a list of pre-trained model which can be instantiated directly from NVIDIA’s NGC cloud. :returns: List of available pre-trained models.

class nemo.collections.tts.models.base.TextToWaveform(*args: Any, **kwargs: Any)[source]#

Bases: nemo.core.classes.modelPT.ModelPT, abc.ABC

Base class for all end-to-end TTS models that generate a waveform from text

abstract convert_text_to_waveform(*, tokens: torch.tensor, **kwargs) List[torch.tensor][source]#

Accepts a batch of text and returns a list containing a batch of audio

Parameters

tokens – A torch tensor representing the text to be converted to speech

Returns

A list of length batch_size containing torch tensors representing the waveform output

Return type

audio

classmethod list_available_models() List[PretrainedModelInfo][source]#

This method returns a list of pre-trained model which can be instantiated directly from NVIDIA’s NGC cloud. :returns: List of available pre-trained models.

abstract parse(str_input: str, **kwargs) torch.tensor[source]#

A helper function that accepts raw python strings and turns them into a tensor. The tensor should have 2 dimensions. The first is the batch, which should be of size 1. The second should represent time. The tensor should represent either tokenized or embedded text, depending on the model.

TTS Datasets#