Understanding Concepts for NeMo Gym#

NeMo Gym concepts explain the mental model behind building RL training environments: when to use RL over SFT, how environment components work together, and how verification signals drive learning. Use this page as a compass to decide which explanation to read next.

Tip

New to RL for LLMs? Start with Training Approaches for context on SFT, RL, and RLVR, or refer to Key Terminology for a quick glossary.


Concept Highlights#

Each explainer below covers one foundational idea and links to deeper material.

Training Approaches

Understand the differences between SFT, DPO, and GRPO, and the rise of RLVR.

Training Approaches
Environment Components

Understand the three server components that make up a training environment.

Environment Components
Configuration System

Understand how servers are configured and connected.

Configuration System
Architecture

How components interact during startup and rollout collection.

Architecture
Task Verification

Understand the importance of verification and common implementation patterns.

Task Verification
Key Terminology

Essential vocabulary for agent training, RL workflows, and NeMo Gym.

Key Terminology