Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
Overview
NVIDIA NeMo Framework is a scalable and cloud-native generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (e.g. Automatic Speech Recognition and Text-to-Speech). It enables users to efficiently create, customize, and deploy new generative AI models by leveraging existing code and pre-trained model checkpoints.
Large Language Models and Multimodal Models
NeMo Framework offers comprehensive functionalities for developing both Language Learning Models (LLMs) and Multimodal Models (MMs). These functionalities cover the entire model development process. You have the flexibility to use this framework either on-premises or with a cloud provider of your preference.
- Data Curation 1
NeMo Curator is a Python library that includes a suite of data-mining modules. These modules are optimized for GPUs and designed to scale, making them ideal for curating natural language data to train LLMs. With NeMo Curator, researchers in Natural Language Processing (NLP) can efficiently extract high-quality text from extensive raw web data sources.
- Model Training and Customization
NeMo Framework provides a comprehensive set of tools for the efficient training and customization of LLMs and Multimodal models. This includes the setup of the compute cluster, data downloading, and model hyperparameter selection. Each model and task come with default configurations that are regularly tested. However, these configurations can be adjusted to train on new datasets or test new model hyperparameters. For customization, NeMo Framework supports not only fully Supervised Fine-Tuning (SFT), but also a range of Parameter Efficient Fine-Tuning (PEFT) techniques. These techniques include Ptuning, LoRA, Adapters, and IA3. They typically achieve nearly the same accuracy as SFT, but at a fraction of the computational cost.
- Model Alignment 1
Part of the framework, NeMo-Aligner is a scalable toolkit for efficient model alignment. NeMo-Aligner, a component of NeMo Framework, is a scalable toolkit designed for effective model alignment. The toolkit supports Supervised Finetuning (SFT) and other state-of-the-art (SOTA) model alignment algorithms such as SteerLM, DPO, and Reinforcement Learning from Human Feedback (RLHF). These algorithms enable users to align language models to be more safe and helpful.
- Launcher
NeMo Launcher streamlines your experience with the NeMo Framework by providing an intuitive interface for constructing comprehensive workflows. This allows for effective organization and management of experiments across different environments. Based on the Hydra framework, NeMo Launcher enables users to easily create and modify hierarchical configurations using both configuration files and command-line arguments. It simplifies the process of initiating large-scale training, customization, or alignment tasks. These tasks can be run locally (supporting single node), on NVIDIA Base Command Manager (Slurm), or on cloud providers such as AWS, Azure, and Oracle Cloud Infrastructure (OCI). This is all made possible through Launcher scripts, eliminating the need for writing any code.
- Model Inference
NeMo Framework seamlessly integrates with enterprise-level model deployment tools through NVIDIA NIM. This integration is powered by NVIDIA TensorRT-LLM and NVIDIA Triton Inference Server, ensuring optimized and scalable inference.
- Model Support
NeMo Framework supports end-to-end model development workflows for a variety of models. This includes popular community models such as Gemma, Starcoder 2, Llama 1/2, Baichuan 2, Falcon, Mixtral, Mistral, and others, as well as NVIDIA Nemotron models. The support extends to both pretraining and fine-tuning for all language and multimodal models. More specific details can be found in the support matrix provided below.
Speech AI
Developing conversational AI models is a complex process that involves defining, constructing, and training models within particular domains. This process typically requires several iterations to reach a high level of accuracy. It often involves multiple iterations to achieve high accuracy, fine-tuning on various tasks and domain-specific data, ensuring training performance, and preparing models for inference deployment.
NeMo Framework provides support for the training and customization of Speech AI models. This includes tasks like Automatic Speech Recognition (ASR) and Text-To-Speech (TTS) synthesis. It offers a smooth transition to enterprise-level production deployment with NVIDIA Riva. To assist developers and researchers, NeMo Framework includes state-of-the-art pre-trained checkpoints, tools for reproducible speech data processing, and features for interactive exploration and analysis of speech datasets. The components of the NeMo Framework for Speech AI are as follows:
- Training and Customization
NeMo Framework contains everything needed to train and customize speech models (ASR, Speech Classification, Speaker Recognition, Speaker Diarization, and TTS) in a reproducible manner.
- SOTA Pre-trained Models
NeMo Framework provides state-of-the-art recipes and pre-trained checkpoints of several ASR and TTS models, as well as instructions on how to load them.
- Speech Tools
NeMo Framework provides a set of tools useful for developing ASR and TTS models, including:
NeMo Forced Aligner (NFA) for generating token-, word- and segment-level timestamps of speech in audio using NeMo’s CTC-based Automatic Speech Recognition models.
Speech Data Processor (SDP), a toolkit for simplifying speech data processing. It allows you to represent data processing operations in a config file, minimizing boilerplate code, and allowing reproducibility and shareability.
Speech Data Explorer (SDE), a Dash-based web application for interactive exploration and analysis of speech datasets.
Dataset creation tool which provides functionality to align long audio files with the corresponding transcripts and split them into shorter fragments that are suitable for Automatic Speech Recognition (ASR) model training.
Comparison Tool for ASR Models to compare predictions of different ASR models at word accuracy and utterance level.
ASR Evaluator for evaluating the performance of ASR models and other features such as Voice Activity Detection.
Text Normalization Tool for converting text from the written form to the spoken form and vice versa (e.g. “31st” vs “thirty first”).
- Path to Deployment
NeMo models that have been trained or customized using the NeMo Framework can be optimized and deployed with NVIDIA Riva. Riva provides containers and Helm charts specifically designed to automate the steps for push-button deployment.
Programming Languages and Frameworks
Python
Pytorch
Bash
Resources
GitHub Repos
Where to Get Help
Licenses
NeMo is licensed under the NVIDIA AI PRODUCT AGREEMENT. By pulling and using the container, you accept the terms and conditions of this license.
This container contains Llama materials governed by the Meta Llama3 Community License Agreement, and is built with Meta Llama3.
Footnotes