> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/gym/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/gym/llms-full.txt.

# Key Terminology

Essential vocabulary for model training, RL workflows, and NeMo Gym. This glossary defines terms you'll encounter throughout the tutorials and documentation.

## Rollout & Data Collection Terms

**Rollout / Trajectory**

**Rollout** (verb) refers to the process of executing a policy in an environment to generate data: stepping through the environment, taking actions, and recording what happens. **Rollout** (noun) is also used synonymously with **trajectory**: the resulting sequence of states, actions, and rewards: the ordered record of what happened. In practice, many people use "rollout" and "trajectory" interchangeably since a rollout produces exactly one trajectory.

**Rollout Batch**

A collection of multiple rollouts generated together, typically for the same task. Used for efficient parallel processing.

**Environment**

The conditions in which your model operates. Functionally, this typically refers to tools the model has access to.

**Task**

An input prompt paired with environment setup (tools + verification). What you want models to learn to do.

**Task Instance**

A single rollout attempt for a specific task. Multiple instances per task capture different approaches.

**Training environment**

A set of tasks that share the same environment setup compiled into a single prompt dataset.

**Trace**

Detailed log of a rollout including metadata for debugging or interpretability.

**Data Generation Process**

The complete pipeline from input prompt to scored rollout, involving rollout orchestration, model inference, tool usage, and verification.

**Rollout Collection**

The process of applying your data generation pipeline to input prompts at scale.

**Demonstration Data**

Training data format for SFT consisting of input prompts paired with successful rollouts. Shows models examples of correct behavior.

**Preference Pairs**

Training data format for DPO consisting of the same prompt with two different responses, where one is preferred over the other.

***

## Architecture Terms

**Policy Model**

The primary LLM being trained or evaluated - the "decision-making brain" you want to improve.

**Orchestration**

Coordination logic that manages when to call models, which tools to use, and how to sequence multi-step operations.

**Verifier**

Component that scores rollouts, producing reward signals. The word "verifier" may also refer colloquially to a different definition: "training environment with verifiable rewards."

**Service Discovery**

Mechanism by which distributed NeMo Gym components find and communicate with each other across machines.

**Reward / Reward Signal**

Numerical score (typically 0.0-1.0) indicating how well a task was accomplished.

## Training Approaches

**SFT (Supervised Fine-Tuning)**

Training approach using examples of good model behavior. Shows successful rollouts as training data.

**RL (Reinforcement Learning)**

Training approach where models learn through trial-and-error interaction with environments using reward signals.

**Online vs Offline Training**

* **Online**: Model learns while interacting with environment in real-time - **Offline**: Model learns from pre-collected rollout data

**DPO (Direct Preference Optimization)**

An offline RL training approach using pairs of rollouts where one is preferred over another. Teaches better vs worse responses.

**GRPO (Group Relative Policy Optimization)**

Reinforcement learning algorithm that optimizes policies by comparing groups of rollouts relative to each other. Used for online RL training with language models.

## Interaction Patterns

**Multi-turn**

Conversations spanning multiple exchanges where context and state persist across turns.

**Multi-step**

Complex tasks requiring agents to break problems into sequential steps, often using tools and intermediate reasoning.

**Tool Use / Function Calling**

Models invoking external capabilities (APIs, calculators, databases) to accomplish tasks beyond text generation.

## Technical Infrastructure

**Responses API**

OpenAI's standard interface for rollouts, including function calls and multi-turn conversations. NeMo Gym's native format.

**Chat Completions API**

OpenAI's simpler interface for basic LLM interactions. NeMo Gym includes middleware to convert formats.

**vLLM**

High-performance inference server for running open-source language models locally. Alternative to commercial APIs.