Example Resources Servers

View as Markdown

Resources Servers can model anything from a stateful workplace simulation to an isolated code execution sandbox. These examples show the shape of real server implementations and how task setup, tool actions, and verification fit together.

workplace_assistant

workplace_assistant implements multi-step tool calling in a workplace setting.

  • Task: Execute business activities such as sending emails, scheduling meetings, and managing projects.
  • Actions: 26 tools across 5 databases: email, calendar, analytics, project management, and CRM. Each tool can read and mutate the database state.
  • Verification: State matching: executes both the agent’s actions and the ground truth actions against fresh databases, then compares the resulting states.

math_with_code

math_with_code implements mathematical reasoning with code execution.

  • Task: Solve math problems using Python as a reasoning tool.
  • Actions: execute_python() runs code in an isolated per-session process with numpy, scipy, and pandas available. State persists across steps so the agent can build on previous computations.
  • Verification: Answer correctness: extracts the boxed answer from the model’s final response and compares it against the expected result.