MCP Resources Server

View as Markdown

This tutorial shows how to expose environment tools over the Model Context Protocol (MCP) so that an MCP-native agent — such as Claude Code — can discover and call them, while the Resources Server still owns verification. The pattern is: MCP tool implementations + a verify() function = a Resources Server.

← Stateful Environment

Two ways to combine MCP with a Resources Server

There are two distinct integration shapes, and they need different amounts of plumbing:

FlowWhenWhat you build
Gym-owned MCP serverYou want the tools and their verification to live in Gym, with per-rollout session isolationSubclass MCPResourcesServer — Gym mounts a Streamable-HTTP MCP endpoint at /mcp on the same app as /seed_session and /verify
Existing / external MCP serverThe MCP server already runs outside Gym (a third-party or shared service)Point the agent at it directly with a static mcp_config; write a plain SimpleResourcesServer.verify() that scores the resulting trajectory

The rest of this page builds the Gym-owned flow (the one that needs new infrastructure) and then explains the external flow at the end.

Why a Gym-owned MCP server at all? Mounting the MCP endpoint inside the Resources Server lets a tool call be bound to the same per-rollout session as /seed_session and /verify. That is what makes “was this tool actually used in this episode?” a verifiable, isolated question. An external MCP server can’t offer that — Gym can’t observe its calls — so external-server verification has to work off the agent’s trajectory instead.


What You’ll Build

A weather environment with a single MCP tool, get_weather(city). The agent must call the tool and then answer with exactly the sentence the tool returned. The Resources Server rewards the rollout only if the tool was called in this session and the final answer contains the returned sentence.

Episode Flow

Goal
- Learn MCP tool usage bound to a Gym session: call an MCP tool, then answer using its result.
Inputs
- seed input: expected_city (e.g., "Paris")
Flow (the MCP endpoint and /verify share one session_id)
1) Agent -> ResourcesServer POST /seed_session {"verifier_metadata": {"expected_city": "Paris"}}
- returns hidden MCP metadata: a per-rollout X-NeMo-Gym-Session-Token bound to this session_id
2) Agent writes a per-rollout mcp_config and launches Claude Code with --mcp-config
3) Claude Code -> ResourcesServer POST /mcp (tools/call get_weather, carrying the token header)
- the tool resolves the token back to session_id and records the call
4) Agent -> ResourcesServer POST /verify {"verifier_metadata": {"expected_city": "Paris"}, "response": ...}
- reward = 1.0 iff the tool was called in this session AND the answer contains the sentence

Implementation

The base class MCPResourcesServer (in nemo_gym/base_resources_server.py) mounts the MCP endpoint and manages the per-rollout token. You write a @gym_tool method (your tool), a seed_session() that returns the MCP metadata so the agent can connect, and a verify() that scores the rollout.

File (resources_servers/example_mcp_weather/app.py):

1# simplified
2from typing import Any, Optional
3
4from fastapi import Request
5from pydantic import ConfigDict, Field
6
7from nemo_gym.base_resources_server import (
8 BaseResourcesServerConfig,
9 BaseSeedSessionRequest,
10 BaseSeedSessionResponse,
11 BaseVerifyRequest,
12 BaseVerifyResponse,
13 MCPResourcesServer,
14 MCPServerMetadata,
15 gym_tool,
16)
17from nemo_gym.server_utils import SESSION_ID_KEY
18
19
20def _weather_sentence(city: str) -> str:
21 return f"The weather in {city} is sunny and 72 F."
22
23
24class ExampleMCPWeatherResourcesServerConfig(BaseResourcesServerConfig):
25 pass
26
27
28class ExampleMCPWeatherSeedSessionRequest(BaseSeedSessionRequest):
29 model_config = ConfigDict(extra="allow")
30 # Task ground truth travels in verifier_metadata, e.g. {"expected_city": "Paris"}.
31 verifier_metadata: Optional[dict[str, Any]] = None
32
33
34# seed_session returns the MCP metadata under the `mcp` key
35class ExampleMCPWeatherSeedSessionResponse(BaseSeedSessionResponse):
36 mcp: MCPServerMetadata
37
38
39class ExampleMCPWeatherVerifyRequest(BaseVerifyRequest):
40 model_config = ConfigDict(extra="allow")
41 verifier_metadata: Optional[dict[str, Any]] = None
42
43
44class ExampleMCPWeatherResourcesServer(MCPResourcesServer):
45 config: ExampleMCPWeatherResourcesServerConfig
46 session_id_to_state: dict[str, dict[str, Any]] = Field(default_factory=dict)
47
48 async def seed_session(
49 self, request: Request, body: ExampleMCPWeatherSeedSessionRequest
50 ) -> ExampleMCPWeatherSeedSessionResponse:
51 session_id = request.session[SESSION_ID_KEY]
52 expected_city = (body.verifier_metadata or {}).get("expected_city", "Paris")
53 self.session_id_to_state[session_id] = {"expected_city": expected_city, "weather_calls": []}
54 # build_mcp_session_metadata() mints a per-rollout token bound to this session_id
55 return ExampleMCPWeatherSeedSessionResponse(mcp=self.build_mcp_session_metadata(request))
56
57 # Decorate a method with @gym_tool and it is auto-registered as an MCP tool named `get_weather`.
58 # Declare a `session_id: str` param to receive the Gym session; it is injected from the per-rollout
59 # token and hidden from the tool's input schema, so the model only sees `city`.
60 @gym_tool
61 def get_weather(self, session_id: str, city: str) -> str:
62 """Get a deterministic weather report for a city."""
63 state = self.session_id_to_state.setdefault(session_id, {"weather_calls": []})
64 weather = _weather_sentence(city)
65 state["weather_calls"].append({"city": city, "weather": weather})
66 return weather
67
68 async def verify(
69 self, request: Request, body: ExampleMCPWeatherVerifyRequest
70 ) -> BaseVerifyResponse:
71 session_id = request.session[SESSION_ID_KEY]
72 state = self.session_id_to_state.get(session_id, {"weather_calls": []})
73 expected_city_value = (body.verifier_metadata or {}).get("expected_city", "Paris")
74 expected_city = expected_city_value.casefold()
75 expected = _weather_sentence(expected_city_value)
76 # reward iff the tool was called for this city in this session AND the final answer repeats it
77 # (match case-insensitively, so a correct call/answer that used different casing still counts)
78 tool_called = any(str(c.get("city", "")).casefold() == expected_city for c in state["weather_calls"])
79 final_text = _extract_assistant_text(body) # join the assistant message text from body.response
80 reward = float(tool_called and expected.casefold() in final_text.casefold())
81 return BaseVerifyResponse(**body.model_dump(), reward=reward)
82
83
84if __name__ == "__main__":
85 ExampleMCPWeatherResourcesServer.run_webserver()

Key Pattern

Writing a tool is just decorating a method:

  1. @gym_tool — mark a method and the base class auto-registers it as an MCP tool (name = method name), mounted at /mcp (Streamable HTTP). The MCP input schema is derived from the method’s typed parameters. To receive the Gym session, declare a session_id: str parameter — it is injected from the per-rollout token and hidden from the tool’s input schema (the model only sees the real args). Omit it for a stateless tool. A missing/invalid token raises MCPSessionError, which — because MCP runs over JSON-RPC — FastMCP surfaces to the client as a tool error (isError: true) on an HTTP 200 response, not an HTTP status code. Both sync and async methods work. Tool names may not collide with reserved endpoints (verify, seed_session, aggregate_metrics, mcp), and a tool must not take a request parameter (there is no FastAPI Request on the MCP path — use session_id).
  2. build_mcp_session_metadata(request) — call this from seed_session and return it under the response’s mcp key. It mints the one-time X-NeMo-Gym-Session-Token bound to the current session_id.

Need full control (e.g. a hand-written @mcp.tool() with custom schema)? Override register_mcp_tools(self, mcp) — call super().register_mcp_tools(mcp) first to keep the auto-registered @gym_tool ones.

MCPResourcesServer disables the MCP SDK’s default DNS-rebinding protection (TransportSecuritySettings(enable_dns_rebinding_protection=False)). That protection only accepts loopback Host headers and returns HTTP 421 otherwise — which would break multi-node / use_absolute_ip=True deployments where the agent reaches the server by a routable host. The endpoint is instead protected by the per-rollout session token. You don’t need to set this yourself; the base class handles it.


Wiring the agent (Claude Code)

The claude_code_agent reads the mcp metadata from /seed_session, writes a per-rollout gym_mcp_config.json, and launches Claude Code with --mcp-config. The generated config looks like:

1{
2 "mcpServers": {
3 "example_mcp_weather": {
4 "type": "http",
5 "url": "http://<resources-server-host>:<port>/mcp",
6 "headers": { "X-NeMo-Gym-Session-Token": "<per-rollout-token>" }
7 }
8 }
9}

A minimal config (resources_servers/example_mcp_weather/configs/example_mcp_weather.yaml) wires the server and the agent together:

1example_mcp_weather:
2 resources_servers:
3 example_mcp_weather:
4 entrypoint: app.py
5 domain: agent
6
7example_mcp_weather_claude_code_agent:
8 responses_api_agents:
9 claude_code_agent:
10 entrypoint: app.py
11 resources_server: { type: resources_servers, name: example_mcp_weather }
12 model: claude-sonnet-4-6
13 anthropic_api_key: ${anthropic_api_key}
14 datasets:
15 - { name: example, type: example, jsonl_fpath: resources_servers/example_mcp_weather/data/example.jsonl }

Run it

Put your key in a repo-root env.yaml (the config above interpolates ${anthropic_api_key}):

1anthropic_api_key: sk-ant-...

Then start the servers:

$gym env start --config resources_servers/example_mcp_weather/configs/example_mcp_weather.yaml

Then collect rollouts against the example dataset and reward-profile as in the quickstart. A correct rollout shows Claude Code calling mcp__example_mcp_weather__get_weather and a reward of 1.0.

To watch the MCP round-trip without a full gym env start, start the Resources Server on its own and drive /seed_session → /mcp tools/call → /verify directly (a requests.Session preserves the session cookie). This is also the fastest way to confirm the endpoint is reachable from another host.


Pointing at an existing / external MCP server

If the MCP server already runs outside Gym, the agent talks to it directly — you do not need an MCPResourcesServer. Give the agent a static mcp_config pointing at the external server, and write a plain SimpleResourcesServer.verify() that scores the agent’s trajectory:

1my_external_mcp_agent:
2 responses_api_agents:
3 claude_code_agent:
4 entrypoint: app.py
5 resources_server: { type: resources_servers, name: my_verifier } # a SimpleResourcesServer with verify()
6 mcp_config: /abs/path/to/external_mcp_config.json # static config passed via --mcp-config

Things to know about this flow:

  • No cookie/session entanglement. Gym’s session cookie flows only between the agent server and the Resources Server (/seed_session/verify). The agent-to-external-MCP connection is a separate channel with its own auth (whatever headers you put in the static config). They don’t interfere.
  • Verify off the trajectory. Gym can’t observe the external server’s calls, so verify() must score the function_call / function_call_output items in the agent’s Responses-API output — not server-side session state.
  • Static + per-rollout compose. When both are present, the agent merges your static mcp_config with the per-rollout Gym-owned entry, so a single rollout can use external tools and a Gym-owned MCP server at once. If a static server happens to share the same name as the Gym resources server, the per-rollout Gym entry takes precedence and overwrites it.

Real-World Environment →