Introduction

NVIDIA NeMo Framework is an end-to-end, cloud-native framework to build, customize, and deploy generative AI models anywhere. To learn more about using NeMo in generative AI workflows, please refer to the NeMo Framework User Guide.

NVIDIA NeMo Framework has separate collections for Large Language Models (LLMs), Multimodal (MM), Computer Vision (CV), Automatic Speech Recognition (ASR), and Text-to-Speech (TTS) models. Each collection consists of prebuilt modules that include everything needed to train on your data. Every module can easily be customized, extended, and composed to create new generative AI model architectures.

Generative AI architectures are typically large and require a lot of data and compute for training. NeMo uses PyTorch Lightning for easy and performant multi-GPU/multi-node mixed-precision training.

Pre-trained NeMo models are available in 14+ languages.

Before you begin using NeMo, it’s assumed you meet the following prerequisites.

  1. You have Python version 3.10 or above.

  2. You have Pytorch version 1.13.1 or 2.0+.

  3. You have access to an NVIDIA GPU, if you intend to do model training.

You can try out NeMo’s ASR, LLM and TTS functionality with the example below, which is based on the Audio Translation tutorial.

Once you have installed NeMo, then you can run the code below:

Copy
Copied!
            

# Import NeMo's ASR, NLP and TTS collections import nemo.collections.asr as nemo_asr import nemo.collections.nlp as nemo_nlp import nemo.collections.tts as nemo_tts # Download an audio file that we will transcribe, translate, and convert the written translation to speech import wget wget.download("https://nemo-public.s3.us-east-2.amazonaws.com/zh-samples/common_voice_zh-CN_21347786.mp3") # Instantiate a Mandarin speech recognition model and transcribe an audio file. asr_model = nemo_asr.models.ASRModel.from_pretrained(model_name="stt_zh_citrinet_1024_gamma_0_25") mandarin_text = asr_model.transcribe(['common_voice_zh-CN_21347786.mp3']) print(mandarin_text) # Instantiate Neural Machine Translation model and translate the text nmt_model = nemo_nlp.models.MTEncDecModel.from_pretrained(model_name="nmt_zh_en_transformer24x6") english_text = nmt_model.translate(mandarin_text) print(english_text) # Instantiate a spectrogram generator (which converts text -> spectrogram) # and vocoder model (which converts spectrogram -> audio waveform) spectrogram_generator = nemo_tts.models.FastPitchModel.from_pretrained(model_name="tts_en_fastpitch") vocoder = nemo_tts.models.HifiGanModel.from_pretrained(model_name="tts_en_hifigan") # Parse the text input, generate the spectrogram, and convert it to audio parsed_text = spectrogram_generator.parse(english_text[0]) spectrogram = spectrogram_generator.generate_spectrogram(tokens=parsed_text) audio = vocoder.convert_spectrogram_to_audio(spec=spectrogram) # Save the audio to a file import soundfile as sf sf.write("output_audio.wav", audio.to('cpu').detach().numpy()[0], 22050)

You can learn more by about specific tasks you are interested in by checking out the NeMo tutorials, or documentation (e.g. read here to learn more about ASR).

You can also learn more about NeMo in the NeMo Primer tutorial, which introduces NeMo, PyTorch Lightning, and OmegaConf, and shows how to use, modify, save, and restore NeMo models. Additionally, the NeMo Models tutorial explains the fundamentals of how NeMo models are created. These concepts are also explained in detail in the NeMo Core documentation.

See the two introductory videos below for a high level overview of NeMo.

Developing State-Of-The-Art Conversational AI Models in Three Lines of Code


The simplest way to install NeMo is via pip, see info below.

Note

Full NeMo installation instructions (with more ways to install NeMo, and how to handle optional dependencies) can be found in the GitHub README.

Conda

We recommend installing NeMo in a fresh Conda environment.

Copy
Copied!
            

conda create --name nemo python==3.10.12 conda activate nemo

Install PyTorch using their configurator.

Pip

Use this installation mode if you want the latest released version.

Copy
Copied!
            

apt-get update && apt-get install -y libsndfile1 ffmpeg pip install Cython pip install nemo_toolkit['all']

Depending on the shell used, you may need to use "nemo_toolkit[all]" instead in the above command.

For more information and questions, visit the NVIDIA NeMo Discussion Board.

We welcome community contributions! Refer to the CONTRIBUTING.md file for the process.

NeMo is released under an Apache 2.0 license.

Previous NVIDIA NeMo Framework Developer Docs
Next Tutorials
© Copyright 2023-2024, NVIDIA. Last updated on Apr 22, 2024.