Important
You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.
NeMo-Aligner#
Introduction#
NeMo-Aligner is a scalable toolkit for efficient model alignment. The toolkit has support for state-of-the-art model alignment algorithms such as SteerLM, Direct Preference Optimization (DPO), and Reinforcement Learning from Human Feedback (RLHF). These algorithms enable users to align language models to be more safe, harmless, and helpful. Users can perform end-to-end model alignment on a wide range of model sizes and take advantage of all the parallelism techniques to ensure their model alignment is done in a performant and resource-efficient manner. For more technical details, please refer to our paper.
The NeMo-Aligner toolkit is built using the NeMo Toolkit which allows for scaling training up to 1000s of GPUs using tensor, data and pipeline parallelism for all components of alignment. All of our checkpoints are cross-compatible with the NeMo ecosystem, allowing for inference deployment and further customization.
The toolkit is currently in its early stages. We are committed to improving the toolkit to make it easier for developers to pick and choose different alignment algorithms to build safe, helpful, and reliable models.
Get Started#
NeMo-Aligner comes preinstalled in NVIDIA NeMo containers. NeMo containers are launched concurrently with NeMo version updates.
To get access to the container, log in to the NVIDIA GPU Cloud (NGC) platform or create a free NGC account here: NVIDIA NGC. Once you have logged in, you can get the container here: NVIDIA NGC NeMo Framework.
To run interactively using a pre-built container, run the following code:
docker run --rm -it \ --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --shm-size=8g \ --workdir /opt/NeMo-Aligner \ nvcr.io/nvidia/nemo:24.09Please use the latest tag in the form yy.mm.(patch).
Important
Some of the subsequent tutorials require accessing gated Hugging Face models. For details on how to access these models, refer to this document.
If you run into any problems, refer to NeMo’s Known Issues page. The page enumerates known issues and provides suggested workarounds where appropriate.
Build a NeMo-Aligner Dockerfile#
NeMo-Aligner also provides its own dockerfile if you want to customize the environment. Run the following to build the image:
- Obtain a Pretrained Model
- Model Alignment by Supervised Fine-Tuning (SFT)
- Supervised Fine-Tuning (SFT) with Knowledge Distillation
- Model Alignment by DPO, RPO, and IPO
- Model Alignment by RLHF
- Model Alignment by SteerLM Method
- SteerLM 2.0: Iterative Training for Attribute-Conditioned Language Model Alignment
- Model Alignment by Rejection Sampling
- Model Alignment by Self-Play Fine-Tuning (SPIN)
- Fine-Tuning Stable Diffusion with DRaFT+
- Constitutional AI: Harmlessness from AI Feedback
- CAI
- Motivation
- Train a CAI Model
- Step 1: Download the models and datasets
- Step 2: Generate and revise responses to harmful prompts creating the SL-CAI dataset
- Step 3: Fine-tune Mistral-7B on the revised responses to create a Mistral-7B-SL-CAI model
- Step 4: Generate the RL-CAI (preference) dataset for RM and PPO training
- Step 5: Train the Reward Model (RM)
- Step 6: Fine-tune the Mistral-7B-SL-CAI with PPO and the RM to train a Mistral-7B-RL-CAI model
- Step 7: Run inference
- Prerequisite Obtaining a Pre-Trained Model
This section provides instructions on how to download pre-trained LLMs in .nemo format. The following section will use these base LLMs for further fine-tuning and alignment.
- Model Alignment by Supervised Fine-Tuning (SFT)
In this section, we walk you through the most straightforward alignment method. We use a supervised dataset in the prompt-response pairs format to fine-tune the base model according to the desired behavior.
- Supervised Fine-Tuning (SFT) with Knowledge Distillation
In this section, we walk through a variation of SFT using Knowledge Distillation where we train a smaller “student” model using a larger “teacher” model.
- Model Alignment by DPO, RPO and IPO
DPO, RPO, and IPO are simpler alignment methods compared to RLHF. DPO introduces a novel parameterization of the reward model in RLHF, which allows us to extract the corresponding optimal policy. Similarly, RPO and IPO provide alternative parameterizations or optimization strategies, each contributing unique approaches to refining model alignment.
- Model Alignment by RLHF
RLHF is the next step up in alignment and is still responsible for most state-of-the-art chat models. In this section, we walk you through the process of RLHF alignment, including training a reward model and RLHF training with the PPO algorithm.
- Model Alignment by SteerLM Method
SteerLM is a novel approach developed by NVIDIA. SteerLM simplifies alignment compared to RLHF. It is based on SFT, but allows user-steerable AI by enabling you to adjust attributes at inference time.
- Model Alignment by SteerLM 2.0 Method
SteerLM 2.0 is an extension to SteerLM method that introduces an iterative training procedure to explicitly enforce the generated responses to follow the desired attribute distribution.
- Model Alignment by Rejection Sampling (RS)
RS is a simple online alignment algorithm. In RS, the policy model generates several responses. These responses are assigned a score by the reward model, and the highest scoring responses are used for SFT.
- Fine-tuning Stable Diffusion with DRaFT+
DRaFT+ is an algorithm for fine-tuning text-to-image generative diffusion models. It achieves this by directly backpropagating through a reward model. This approach addresses the mode collapse issues from the original DRaFT algorithm and improves diversity through regularization.
- Constitutional AI: Harmlessness from AI Feedback
CAI, an alignment method developed by Anthropic, enables the incorporation of AI feedback for aligning LLMs. This feedback is grounded in a small set of principles (referred to as the ‘Constitution’) that guide the model toward desired behaviors, emphasizing helpfulness, honesty, and harmlessness.
Algorithm |
TRTLLM Accelerated |
GPT 2B |
LLaMA2 |
LLaMA3 |
Mistral |
Nemotron-4 |
Mixtral |
---|---|---|---|---|---|---|---|
Yes (✓) |
Yes |
Yes |
Yes |
Yes (✓) |
|||
Yes (✓) |
Yes |
Yes |
Yes |
Yes |
|||
Yes (✓) |
Yes |
Yes |
Yes |
Yes (✓) |
In active development |
||
Yes |
Yes |
Yes |
Yes (✓) |
Yes |
Yes (✓) |
||
Yes |
Yes (✓) |
Yes |
Yes |
Yes |
|||
Yes |
Yes |
Yes |
Yes |
Yes |
|||
Yes |
Yes |
Yes |
Yes |
Yes |
|||
Yes |
Yes |
Yes |
Yes (✓) |
Yes |
Algorithm |
Stable Diffusion |
---|---|
Draft+ |
Yes (✓) |
Note
(✓): Indicates the model is verified to work with the algorithm. Models without this demarcation are expected to work but have not been formally verified yet.
Hardware Requirements#
NeMo-Aligner is powered by other NVIDIA libraries that support several NVIDIA GPUs. NeMo-Aligner is tested on H100 but also works on A100. Several tutorials assume 80GB VRAM, so if you are following along with GPUs with 40GB, adjust your config accordingly.
Examples of config adjustments are increasing node count, introducing more tensor/pipeline parallelism, lowering batch size, and increasing gradient accumulation.