Architecture
Goal: Understand how NeMo Gym implements environments as composable server components.
Read first: Environments — what an environment is and how it decomposes into dataset, agent harness, verifier, and state.
How NeMo Gym Implements Environments
NeMo Gym implements environments as composable FastAPI servers that communicate over async HTTP:
All components are composable. Use different datasets with the same resources server, the same resources server with different models, or the same model with different agent harnesses.
This gives you flexibility to integrate with your existing models and agents:
- Bring your own agent — Integrate your existing agent to use it with any Gym environment components.
- Use a built-in agent — NeMo Gym includes some native agents, e.g. general-purpose multi-step tool calling, as well as built-in integrations with external harnesses like OpenHands.
- Train with any model endpoint — The Model Server standardizes different LLM endpoints behind the Responses API and provides token IDs and log probabilities needed for RL training.
How an Agent Runs a Task in NeMo Gym Environments
Each task attempt flows through three steps. The resulting trajectory is called a rollout:
- Initialize — The agent receives a task row from the dataset and initializes a session on the Resources Server, which sets up isolated state for this task.
- Agent Loop — The agent calls a model for inference, then routes any tool calls to either its own tools or the Resources Server. This repeats until the agent decides the task is complete.
- Verify — The agent asks the Resources Server to score the attempt. The verifier inspects the final state and returns a reward signal.
Server Types
Agent Server
Hosts agent harnesses that orchestrate rollouts. Use your own harness, or use built-in harnesses such as OpenHands or NeMo Gym’s native harnesses such as Simple Agent.
Model Server
A stateless LLM inference endpoint that standardizes different model providers behind the Responses API. Supports local inference and inference providers.
Resources Server
Manages environment-specific tools, per-task state isolation, and verification:
- Environment-Specific Tools — capabilities the environment provides to any agent (e.g., code execution, database queries, API calls)
- State Isolation — each rollout gets its own session, so attempts never interfere with each other. Environments range from lightweight (verify a math answer, no setup needed) to heavyweight (provision a Docker container with a specific repo checkout for SWE-Bench-style tasks).
- Verification — scoring logic that evaluates the agent’s output and returns a reward
Where Tools Live
Tools exist on a spectrum — some belong to the agent and can be used with any environment, some belong to the environment and can be used with any agent:
- Agent-specific tools are part of the agent harness. They’re capabilities the agent brings regardless of which environment it runs in (e.g., OpenHands brings file editing and terminal tools).
- Environment-specific tools are part of the Resources Server. They’re capabilities the environment provides to any agent that connects (e.g., a
run_testsendpoint, a database query tool, a sandbox execution API).
An agent can use both simultaneously — its own tools and the environment’s tools in the same task. NeMo Gym’s server split reflects this: agent-specific logic in the harness, environment-specific logic in the Resources Server.
Communication
Servers communicate over async HTTP (aiohttp) with:
- Session cookies propagated through the call stack for stateful environments
- Retry logic with exponential backoff (3 attempts)
- Connection pooling via a singleton aiohttp client for high-concurrency workloads
Next Steps
Understand environments, evaluation, and training before diving into implementation.
Browse available environments for evaluation and training.
Explore available agent harnesses and learn how to integrate your own agent.
Improve your agent or model with RL or fine-tuning.
Create your own evaluation or training environments.