For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI Reference
DocumentationAPI Reference
  • Documentation
    • Home
  • About
    • Concepts
    • Ecosystem
  • Get Started
    • Quickstart
    • Detailed Setup Guide
    • Install from PyPI
    • Rollout Collection
  • Agent Server
  • Model Server
    • vLLM
  • Resources Server
  • Data
    • Prepare and Validate
    • Download from Hugging Face
    • Prompt Config
  • Environment Tutorials
    • Single-Step Environment
    • Multi-Step Environment
    • Stateful Environment
    • Real-World Environment
    • Integrate external libraries
    • Aggregate Metrics
    • LLM-as-Judge Verification
  • Benchmarks
    • Run benchmarks
    • Add a benchmark
    • Design a customer evaluation
  • Training Tutorials
    • NeMo RL
    • Unsloth
    • Multi-Environment Training
    • Offline Training (SFT/DPO)
  • Model Recipes
    • Nemotron 3 Nano
    • Nemotron 3 Super
  • Infrastructure
    • Deployment Topology
    • Engineering Notes
  • Reference
    • Configuration
    • RL Framework Compatibility
    • CLI Commands
    • FAQ
  • Troubleshooting
    • Configuration Errors
  • Contribute
    • Development Setup
    • Environments
    • Integrate RL Frameworks
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Manage My Privacy | Do Not Sell or Share My Data | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Gym
On this page
  • Tutorials
  • Environment Properties
  • Rollout Structure
  • Core Capabilities

Overview

||View as Markdown|
Previous

Prompt Config

Next

Single-Step Environment

Learn how to build custom environments for training or evaluation using NeMo Gym.

Looking to use an existing environment rather than build your own? See the Available Environments in the README.

Key Concepts

Before diving in, review these foundational pages:

  • core-components — Model, Resources, and Agent servers
  • architecture — How components interact during startup and execution
  • task-verification — Reward computation and verification patterns
  • configuration-concepts — YAML configuration system

Tutorials

Start with the single-step tutorial, then progress through increasingly complex patterns:

Single-Step Environment

Build a complete environment from scratch: scaffolding, task data, tools, verification, testing, and rollout collection.

start here
Multi-Step Environment

Multiple sequential tool calls with ground-truth verification.

intermediate
Stateful Environment

Per-episode session state with SESSION_ID_KEY.

intermediate
Real-World Environment

Production environment with dynamic routing and state-based verification.

advanced
LLM-as-Judge Verification

Configure a second model to score rollouts from verify() when ground truth is semantic or rubric-based.

verification

The single-step tutorial is a hands-on walkthrough. The multi-step, stateful, and real-world tutorials are pattern-oriented deep dives — each explains a key concept through annotated source excerpts and rollout transcripts from existing example servers.


Environment Properties

Training environments can be broadly characterized along five dimensions:

  1. Rollout structure: The interaction pattern between the model, environment, and user.
  2. Core capabilities: The behaviors or skills that a model needs in order to succeed in a given use case.
  3. Knowledge domain: What subject area, area of expertise, or field of study is involved.
  4. Task type: The high-level use case that is represented in the training environment.
  5. Verification method: How the environment computes rewards from model responses. See Task Verification for details.

Below are a subset of rollout structures and core capabilities found across NeMo Gym environments. We plan to add these as structured metadata to environments in the future. If you have ideas for additional properties, please let us know by opening an issue.

Rollout Structure

Rollout structureDescription
Multi-stepInterleaved assistant and tool messages
Multi-turnInterleaved user and assistant messages
Multi-modalInterleaved text, image, video, and/or audio messages
Long contextMessage content is very large or the number of messages is very large

Core Capabilities

Core capabilityDeveloper/User needRollout Structures Required
Information dependencyThe model receives environment responses that may require changes to subsequent actions.Multi-step
Proactive askingDevelopers put the model in a situation where user context is missing. The model needs to recognize user context is missing and ask the user for the missing context.Multi-turn
Schema adherenceUsers need more than one piece of information delivered by the model at one time in a specified delivery format.
Meta data instruction followingUser constrains the meta-properties of the model response e.g. “respond in 5 words”.
Counterintuitive instruction followingUser provides instructions that are against conventional wisdom, typically making sense in the specific context in which the model is being used
Information relevanceGiven a large volume of inputs, the model needs to ignore content irrelevant to the task at hand.Long context
Multiple intent synthesisUsers provide multiple tasks for the model to accomplish.Multi-step, Multi-turn