For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI Reference
DocumentationAPI Reference
  • Documentation
    • Home
  • About
    • Concepts
      • Training Approaches
      • Environment Components
      • Configuration System
      • Architecture
      • Task Verification
      • Key Terminology
    • Ecosystem
  • Get Started
    • Quickstart
    • Detailed Setup Guide
    • Install from PyPI
    • Rollout Collection
  • Agent Server
  • Model Server
    • vLLM
  • Resources Server
  • Data
    • Prepare and Validate
    • Download from Hugging Face
    • Prompt Config
  • Environment Tutorials
    • Single-Step Environment
    • Multi-Step Environment
    • Stateful Environment
    • Real-World Environment
    • Integrate external libraries
    • Aggregate Metrics
    • LLM-as-Judge Verification
  • Benchmarks
    • Run benchmarks
    • Add a benchmark
    • Design a customer evaluation
  • Training Tutorials
    • NeMo RL
    • Unsloth
    • Multi-Environment Training
    • Offline Training (SFT/DPO)
  • Model Recipes
    • Nemotron 3 Nano
    • Nemotron 3 Super
  • Infrastructure
    • Deployment Topology
    • Engineering Notes
  • Reference
    • Configuration
    • RL Framework Compatibility
    • CLI Commands
    • FAQ
  • Troubleshooting
    • Configuration Errors
  • Contribute
    • Development Setup
    • Environments
    • Integrate RL Frameworks
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Manage My Privacy | Do Not Sell or Share My Data | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Gym
On this page
  • Control Plane: Server Startup
  • Phase 1: Parse CLI
  • Phase 2: Load and Merge Configs
  • Phase 3: Initialize Ray
  • Phase 4: Start Servers
  • Running State
  • Shutdown
  • HTTP Request Flow: Example
  • Data Plane: Rollout Collection
AboutConcepts

Architecture

||View as Markdown|
Previous

Configuration System

Next

Task Verification

This section describes how NeMo Gym components interact during startup and execution. For an overview of the three server types (Model, Resources, Agent), see core-components.

Control Plane: Server Startup

When you run ng_run, the system starts up in four phases:

Phase 1: Parse CLI

The ng_run command uses Hydra to parse command-line arguments. Users specify configuration files via +config_paths:

$ng_run "+config_paths=[resources_servers/math/configs/math.yaml, responses_api_models/openai_model/configs/openai_model.yaml]"

Phase 2: Load and Merge Configs

Configuration is loaded from multiple sources in order of priority (later sources override earlier):

  1. YAML files specified in config_paths
  2. Local env.yaml file (for sensitive values like API keys)
  3. Command-line arguments (highest priority)

Port allocation: Users can explicitly specify host and port in their config. If not provided, the framework automatically allocates ports from available system ports, tracking used ports to prevent conflicts.

Phase 3: Initialize Ray

The system initializes a Ray cluster for distributed coordination. If ray_head_node_address is specified in the config, it connects to an existing cluster; otherwise, it starts a new one.

Phase 4: Start Servers

Servers are started in two stages:

  1. Head Server: Started as a background thread in the main process. Provides endpoints for config discovery (/global_config_dict_yaml) and server instance listing (/server_instances).

  2. Server Subprocesses: Each configured server is spawned as an independent OS process:

    • Each server has its own Python virtual environment in order to isolate dependencies.
    • Each runs uvicorn with a FastAPI application listening on http://{host}:{port}.
    • The global config is passed via environment variable NEMO_GYM_CONFIG_DICT.
    • The specific server identity is passed via NEMO_GYM_CONFIG_PATH.
    • Server URLs are registered in the global config, allowing other servers to discover and call them.
  3. Health Check: The main process polls each server’s HTTP endpoint until all return 200, then reports “All servers ready!”

Running State

Once all servers are healthy, the system enters steady state:

  • The main process sleeps and periodically polls subprocess health
  • Each server process runs its own uvicorn event loop, handling requests asynchronously
  • Servers communicate with each other only via HTTP (no shared memory)
  • Session state is maintained via cookies for multi-step rollouts

Shutdown

When the user presses Ctrl+C (or the process receives SIGINT):

  1. SIGINT is forwarded to all server subprocesses
  2. Main process waits for subprocesses to terminate (with timeout)
  3. Head server thread is stopped
  4. Process exits cleanly

HTTP Request Flow: Example

During a single rollout, servers communicate via HTTP. This example shows a math problem with tool use:

Key Design Points:

  • HTTP-only communication: All servers communicate via HTTP, enabling language-agnostic implementations and deployment flexibility
  • Stateless model servers: Model servers perform single-call generation without memory; the agent maintains conversation state
  • Session state in resources: Resources servers use session cookies to maintain per-rollout state across multiple tool calls
  • OpenAI API compatibility: Model servers expose /v1/responses endpoints compatible with the OpenAI Responses API
  • uvicorn + FastAPI: All servers use uvicorn as the ASGI server with FastAPI for HTTP routing and request handling

Data Plane: Rollout Collection

When you run ng_collect_rollouts, the system collects training data by executing rollouts in parallel:

The client first queries the Head Server to discover server addresses from the global config, then reads input JSONL and dispatches prompts to the Agent. Completed rollouts are written to output JSONL.

Concurrency behavior differs by use case:

  • Standalone rollout collection (ng_collect_rollouts): A semaphore gates concurrency via num_samples_in_parallel to control load.
  • Training framework integration (e.g., NeMo RL): All requests are sent without gating; the training framework manages concurrency externally.