NVIDIA NeMo, part of the NVIDIA AI platform, is a toolkit for building new state-of-the-art conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS) models. Each collection consists of prebuilt modules that include everything needed to train on your data. Every module can easily be customized, extended, and composed to create new conversational AI model architectures.
Conversational AI architectures are typically large and require a lot of data and compute for training. NeMo uses PyTorch Lightning for easy and performant multi-GPU/multi-node mixed-precision training.
Pre-trained NeMo models are available in 14+ languages.
Before you begin using NeMo, it’s assumed you meet the following prerequisites.
You have Python version 3.10 or above.
You have Pytorch version 1.13.1 or 2.0+.
You have access to an NVIDIA GPU, if you intend to do model training.
Quick Start Guide#
You can try out NeMo’s ASR, NLP and TTS functionality with the example below, which is based on the Audio Translation tutorial.
Once you have installed NeMo, then you can run the code below:
# Import NeMo's ASR, NLP and TTS collections import nemo.collections.asr as nemo_asr import nemo.collections.nlp as nemo_nlp import nemo.collections.tts as nemo_tts # Download an audio file that we will transcribe, translate, and convert the written translation to speech import wget wget.download("https://nemo-public.s3.us-east-2.amazonaws.com/zh-samples/common_voice_zh-CN_21347786.mp3") # Instantiate a Mandarin speech recognition model and transcribe an audio file. asr_model = nemo_asr.models.ASRModel.from_pretrained(model_name="stt_zh_citrinet_1024_gamma_0_25") mandarin_text = asr_model.transcribe(['common_voice_zh-CN_21347786.mp3']) print(mandarin_text) # Instantiate Neural Machine Translation model and translate the text nmt_model = nemo_nlp.models.MTEncDecModel.from_pretrained(model_name="nmt_zh_en_transformer24x6") english_text = nmt_model.translate(mandarin_text) print(english_text) # Instantiate a spectrogram generator (which converts text -> spectrogram) # and vocoder model (which converts spectrogram -> audio waveform) spectrogram_generator = nemo_tts.models.FastPitchModel.from_pretrained(model_name="tts_en_fastpitch") vocoder = nemo_tts.models.HifiGanModel.from_pretrained(model_name="tts_en_hifigan") # Parse the text input, generate the spectrogram, and convert it to audio parsed_text = spectrogram_generator.parse(english_text) spectrogram = spectrogram_generator.generate_spectrogram(tokens=parsed_text) audio = vocoder.convert_spectrogram_to_audio(spec=spectrogram) # Save the audio to a file import soundfile as sf sf.write("output_audio.wav", audio.to('cpu').detach().numpy(), 22050)
You can also learn more about NeMo in the NeMo Primer tutorial, which introduces NeMo, PyTorch Lightning, and OmegaConf, and shows how to use, modify, save, and restore NeMo models. Additionally, the NeMo Models tutorial explains the fundamentals of how NeMo models are created. These concepts are also explained in detail in the NeMo Core documentation.
See the two introductory videos below for a high level overview of NeMo.
Developing State-Of-The-Art Conversational AI Models in Three Lines of Code
NVIDIA NeMo: Toolkit for Conversational AI at PyData Yerevan 2022
The simplest way to install NeMo is via pip, see info below.
Full NeMo installation instructions (with more ways to install NeMo, and how to handle optional dependencies) can be found in the GitHub README.
We recommend installing NeMo in a fresh Conda environment.
conda create --name nemo python==3.10.12 conda activate nemo
Install PyTorch using their configurator.
Use this installation mode if you want the latest released version.
apt-get update && apt-get install -y libsndfile1 ffmpeg pip install Cython pip install nemo_toolkit['all']
Depending on the shell used, you may need to use
"nemo_toolkit[all]" instead in the above command.
For more information and questions, visit the NVIDIA NeMo Discussion Board.
We welcome community contributions! Refer to the CONTRIBUTING.md file for the process.
NeMo is released under an Apache 2.0 license.