Important

You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.

Speech and Audio Processing#

Speech and audio processing refers to a system that processes audio signals, such as speech, music, and environmental sounds. This collection includes models for speech enhancement, restoration and extraction.

We will illustrate details in the following sections.

Resources and Documentation#

Tutorial notebooks can be found under the audio tutorials folder. If you are just starting with NeMo, consider trying out the tutorials of NeMo Primer and NeMo Model. These tutorials can be run on Google Colab by specifying the link to the notebooks’ GitHub pages on Colab.

If you are looking for information about a particular model, or would like to find out more about the model architectures available in the directory of nemo.collections.audio, refer to the Models section.

Information about how to load model checkpoints (either local files or pretrained ones from NGC), as well as a list of the checkpoints available on NGC are located on the Checkpoints section.

Documentation regarding the configuration files specific to the NeMo audio models can be found on the Configuration Files section.