Concepts | NeMo Gym

NeMo Gym concepts explain the mental model behind building RL training environments: when to use RL over SFT, how environment components work together, and how verification signals drive learning. Use this page as a compass to decide which explanation to read next.

New to RL for LLMs? Start with training-approaches for context on SFT, RL, and RLVR, or refer to Key Terminology for a quick glossary.

Concept Highlights

Each explainer below covers one foundational idea and links to deeper material.

Training Approaches

Understand the differences between SFT, DPO, and GRPO, and the rise of RLVR.

Environment Components

Understand the three server components that make up a training environment.

Configuration System

Understand how servers are configured and connected.

Architecture

How components interact during startup and rollout collection.

Task Verification

Understand the importance of verification and common implementation patterns.

Key Terminology

Essential vocabulary for agent training, RL workflows, and NeMo Gym.