> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/gym/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/gym/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/gym/_mcp/server.

# Architecture

> Understand how NeMo Gym implements environments as composable server components.

**Goal**: Understand how NeMo Gym implements environments as composable server components.

**Read first**: [Environments](/latest/about/concepts/environments) — what an environment is and how it decomposes into dataset, agent harness, verifier, and state.

## How NeMo Gym Implements Environments

NeMo Gym implements environments as composable FastAPI servers that communicate over async HTTP:

| Concept          | NeMo Gym Component Implementation                                                                      |
| ---------------- | ------------------------------------------------------------------------------------------------------ |
| Dataset          | JSONL: Responses API input — each row is a task (a single problem or challenge) for the agent to solve |
| Agent Harness    | FastAPI Agent Server                                                                                   |
| Verifier + State | FastAPI Resources Server                                                                               |
| Model            | FastAPI Model Server or managed by your own agent harness                                              |

All components are composable. Use different datasets with the same resources server, the same resources server with different models, or the same model with different agent harnesses.

This gives you flexibility to integrate with your existing models and agents:

* **Bring your own agent** — Integrate your existing agent to use it with any Gym environment components.
* **Use a built-in agent** — NeMo Gym includes some native agents, e.g. general-purpose multi-step tool calling, as well as built-in integrations with external harnesses like OpenHands.
* **Train with any model endpoint** — The Model Server standardizes different LLM endpoints behind the Responses API and provides token IDs and log probabilities needed for RL training.

## How an Agent Runs a Task in NeMo Gym Environments

Each task attempt flows through three steps. The resulting trajectory is called a *rollout*:

1. **Initialize** — The agent receives a task row from the dataset and initializes a session on the Resources Server, which sets up isolated state for this task.
2. **Agent Loop** — The agent calls a model for inference, then routes any tool calls to either its own tools or the Resources Server. This repeats until the agent decides the task is complete.
3. **Verify** — The agent asks the Resources Server to score the attempt. The verifier inspects the final state and returns a reward signal.

```
  Dataset (JSONL - one row per task)
       │
       ▼
┌──────────────────────────────────────────┐
│               Agent Server               │
│                                          │
│  run():                                  │
│    1. resources.seed_session()  ─────────────►  Resources Server
│    2. agent loop:                        │
│         model.responses()       ─────────────►  Model Server
│         resources.my_tool()     ─────────────►  Resources Server
│    3. resources.verify()        ─────────────►  Resources Server
└──────────────────────────────────────────┘

┌───────────────────────────┐   ┌────────────────────────────────────┐
│       Model Server        │   │        Resources Server            │
│                           │   │                                    │
│  responses():             │   │  seed_session(): init env state    │
│    → text, tool calls,    │   │  my_tool():      execute action    │
│      or code              │   │  verify():       evaluate → reward │
└───────────────────────────┘   └────────────────────────────────────┘
```

## Server Types

### Agent Server

Hosts agent harnesses that orchestrate rollouts. Use your own harness, or use built-in harnesses such as OpenHands or NeMo Gym's native harnesses such as Simple Agent.

### Model Server

A stateless LLM inference endpoint that standardizes different model providers behind the Responses API. Supports local inference and inference providers.

### Resources Server

Manages environment-specific tools, per-task state isolation, and verification:

* **Environment-Specific Tools** — capabilities the environment provides to any agent (e.g., code execution, database queries, API calls)
* **State Isolation** — each rollout gets its own session, so attempts never interfere with each other. Environments range from lightweight (verify a math answer, no setup needed) to heavyweight (provision a Docker container with a specific repo checkout for SWE-Bench-style tasks).
* **Verification** — scoring logic that evaluates the agent's output and returns a reward

## Where Tools Live

Tools exist on a spectrum — some belong to the agent and can be used with any environment, some belong to the environment and can be used with any agent:

* **Agent-specific tools** are part of the agent harness. They're capabilities the agent brings regardless of which environment it runs in (e.g., OpenHands brings file editing and terminal tools).
* **Environment-specific tools** are part of the Resources Server. They're capabilities the environment provides to any agent that connects (e.g., a `run_tests` endpoint, a database query tool, a sandbox execution API).

An agent can use both simultaneously — its own tools and the environment's tools in the same task. NeMo Gym's server split reflects this: agent-specific logic in the harness, environment-specific logic in the Resources Server.

## Communication

Servers communicate over async HTTP (aiohttp) with:

* **Session cookies** propagated through the call stack for stateful environments
* **Retry logic** with exponential backoff (3 attempts)
* **Connection pooling** via a singleton aiohttp client for high-concurrency workloads

## Next Steps

Understand environments, evaluation, and training before diving into implementation.

Browse available environments for evaluation and training.

Explore available agent harnesses and learn how to integrate your own agent.

Improve your agent or model with RL or fine-tuning.

Create your own evaluation or training environments.