NeMo RL Documentation#

Welcome to the NeMo RL documentation. NeMo RL is an open-source post-training library developed by NVIDIA, designed to streamline and scale reinforcement learning methods for multimodal models (LLMs, VLMs, etc.).

This documentation provides comprehensive guides, examples, and references to help you get started with NeMo RL and build powerful post-training pipelines for your models.

Getting Started#

Overview

Learn about NeMo RL’s architecture, design philosophy, and key features that make it ideal for scalable reinforcement learning.

Overview

Quick Start

Get up and running quickly with examples for both DTensor and Megatron Core training backends.

Quick Start

Installation

Step-by-step instructions for installing NeMo RL, including prerequisites, system dependencies, and environment setup.

Installation and Prerequisites

Features

Explore the current features and upcoming enhancements in NeMo RL, including distributed training, advanced parallelism, and more.

Features and Roadmap

Tips and Tricks

Troubleshooting common issues including missing submodules, Ray dashboard access, and debugging techniques.

Tips and Tricks

Training and Generation#

Training Backends

Learn about DTensor and Megatron Core training backends, their capabilities, and how to choose the right one for your use case.

Training and Generation Backends

Algorithms

Discover supported algorithms including GRPO, SFT, DPO, RM, and on-policy distillation with detailed guides and examples.

Algorithms

Evaluation

Learn how to evaluate your models using built-in evaluation datasets and custom evaluation pipelines.

Evaluation

Cluster Setup

Configure and deploy NeMo RL on multi-node Slurm or Kubernetes clusters for distributed computing.

Installation: Set Up Clusters

Guides and Examples#

GRPO DeepscaleR

Reproduce DeepscaleR results with NeMo RL using GRPO on mathematical reasoning tasks.

GRPO on DeepScaler

SFT on OpenMathInstruct2

Step-by-step guide for supervised fine-tuning on the OpenMathInstruct2 dataset.

SFT on OpenMathInstruct-2

Environments

Create custom reward environments and integrate them with NeMo RL training pipelines.

Environments for GRPO Training

Adding New Models

Learn how to add support for new model architectures in NeMo RL.

Add New Models

Advanced Topics#

Design and Philosophy

Deep dive into NeMo RL’s architecture, APIs, and design decisions for scalable RL.

Design and Philosophy

Debugging

Tools and techniques for debugging distributed Ray applications and RL training runs.

Debug NeMo RL Applications

FP8 Quantization

Optimize large language models with FP8 quantization for faster training and inference.

FP8 Quantization in NeMo RL

Docker Containers

Build and use Docker containers for reproducible NeMo RL environments.

Build Docker Images

API Reference#

Complete API Documentation

Comprehensive reference for all NeMo RL modules, classes, functions, and methods. Browse the complete Python API with detailed docstrings and usage examples.

API Reference