Architecture | NeMo Gym

Goal: Understand how NeMo Gym implements environments as composable server components.

Read first: Environments — what an environment is and how it decomposes into dataset, agent harness, verifier, and state.

How NeMo Gym Implements Environments

NeMo Gym implements environments as composable FastAPI servers that communicate over async HTTP:

Concept	NeMo Gym Component Implementation
Dataset	JSONL: Responses API input — each row is a task (a single problem or challenge) for the agent to solve
Agent Harness	FastAPI Agent Server
Verifier + State	FastAPI Resources Server
Model	FastAPI Model Server or managed by your own agent harness

All components are composable. Use different datasets with the same resources server, the same resources server with different models, or the same model with different agent harnesses.

This gives you flexibility to integrate with your existing models and agents:

Bring your own agent — Integrate your existing agent to use it with any Gym environment components.
Use a built-in agent — NeMo Gym includes some native agents, e.g. general-purpose multi-step tool calling, as well as built-in integrations with external harnesses like OpenHands.
Train with any model endpoint — The Model Server standardizes different LLM endpoints behind the Responses API and provides token IDs and log probabilities needed for RL training.

How an Agent Runs a Task in NeMo Gym Environments

Each task attempt flows through three steps. The resulting trajectory is called a rollout:

Initialize — The agent receives a task row from the dataset and initializes a session on the Resources Server, which sets up isolated state for this task.
Agent Loop — The agent calls a model for inference, then routes any tool calls to either its own tools or the Resources Server. This repeats until the agent decides the task is complete.
Verify — The agent asks the Resources Server to score the attempt. The verifier inspects the final state and returns a reward signal.

  Dataset (JSONL - one row per task)
       │
       ▼
┌──────────────────────────────────────────┐
│               Agent Server               │
│                                          │
│  run():                                  │
│    1. resources.seed_session()  ─────────────►  Resources Server
│    2. agent loop:                        │
│         model.responses()       ─────────────►  Model Server
│         resources.my_tool()     ─────────────►  Resources Server
│    3. resources.verify()        ─────────────►  Resources Server
└──────────────────────────────────────────┘
┌───────────────────────────┐   ┌────────────────────────────────────┐
│       Model Server        │   │        Resources Server            │
│                           │   │                                    │
│  responses():             │   │  seed_session(): init env state    │
│    → text, tool calls,    │   │  my_tool():      execute action    │
│      or code              │   │  verify():       evaluate → reward │
└───────────────────────────┘   └────────────────────────────────────┘

Server Types

Agent Server

Hosts agent harnesses that orchestrate rollouts. Use your own harness, or use built-in harnesses such as OpenHands or NeMo Gym’s native harnesses such as Simple Agent.

Model Server

A stateless LLM inference endpoint that standardizes different model providers behind the Responses API. Supports local inference and inference providers.

Resources Server

Manages environment-specific tools, per-task state isolation, and verification:

Environment-Specific Tools — capabilities the environment provides to any agent (e.g., code execution, database queries, API calls)
State Isolation — each rollout gets its own session, so attempts never interfere with each other. Environments range from lightweight (verify a math answer, no setup needed) to heavyweight (provision a Docker container with a specific repo checkout for SWE-Bench-style tasks).
Verification — scoring logic that evaluates the agent’s output and returns a reward

Where Tools Live

Tools exist on a spectrum — some belong to the agent and can be used with any environment, some belong to the environment and can be used with any agent:

Agent-specific tools are part of the agent harness. They’re capabilities the agent brings regardless of which environment it runs in (e.g., OpenHands brings file editing and terminal tools).
Environment-specific tools are part of the Resources Server. They’re capabilities the environment provides to any agent that connects (e.g., a run_tests endpoint, a database query tool, a sandbox execution API).

An agent can use both simultaneously — its own tools and the environment’s tools in the same task. NeMo Gym’s server split reflects this: agent-specific logic in the harness, environment-specific logic in the Resources Server.

Communication

Servers communicate over async HTTP (aiohttp) with:

Session cookies propagated through the call stack for stateful environments
Retry logic with exponential backoff (3 attempts)
Connection pooling via a singleton aiohttp client for high-concurrency workloads

Next Steps

Concepts

Understand environments, evaluation, and training before diving into implementation.

Browse Environments

Browse available environments for evaluation and training.

Agents

Explore available agent harnesses and learn how to integrate your own agent.

Training

Improve your agent or model with RL or fine-tuning.

Build Custom Environments

Create your own evaluation or training environments.