> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/gym/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/gym/llms-full.txt.

# Multi-Step Environment

This tutorial focuses on the **Resources Server** implementation for a multi-step tool-calling environment. The full workflow — task data preparation, agent/model configuration, rollout collection, and training — follows the same steps as the [single-step tutorial](/v0.2/environment-tutorials/single-step-environment). What changes here is the complexity of the tools and verification logic.

<NavButton href="/v0.2/environment-tutorials/single-step-environment" label="Previous: Single-Step Environment" direction="prev" />

***

## What You'll Build

Many real tasks require a model to call tools in sequence, where the output of one call informs the next. For example: look up several pieces of information, then combine them into a final answer. This tutorial shows how to build and verify that kind of environment.

The example environment has two tools: `get_synonym_value` (lookup a numeric value for a word) and `extract_synonym_values` (submit the collected values). The agent must look up values for each synonym one by one, then submit the complete list. Reward is 1.0 for exact match, 0.0 otherwise.

### Episode Flow

```text
Goal (what the agent is learning)
  - Learn a multi-step tool workflow: call the right tool(s), carry values forward, and submit them in the required format.
  - It is *not* "learning ASCII math" as a capability. The ASCII-sum is just a deterministic placeholder tool so we can grade behavior reliably.

Inputs (from one JSONL row)
  - expected_synonyms:         ["Warm", "Blazing", ...]
  - expected_synonym_values:   [407, 711, ...]          # ground truth for grading
  - minefield_label/value:     ("Hot", 299)             # optional failure-mode tracking

What does synonym_value mean?
  - `synonym_value` is the numeric output returned by the tool `/get_synonym_value`.
  - In this example implementation, it's computed as the sum of character code points for the synonym string (e.g., "Warm" -> 407).

+--------------------------- ResponsesAPIAgent (/run) ---------------------------+
|                                                                                |
|  1) Initialize episode state                                                   |
|     POST ResourcesServer /seed_session                                         |
|                                                                                |
|  2) Interaction loop (repeat up to max_steps)                                  |
|     POST ModelServer /v1/responses                                             |
|       +- if output contains text: keep it in the conversation                  |
|       +- if output contains function_call(name=TOOL, arguments=...):           |
|              POST ResourcesServer /{TOOL}                                      |
|                - /get_synonym_value(synonym="Warm")   -> synonym_value=407     |
|                - /get_synonym_value(synonym="Blazing")-> synonym_value=711     |
|              append tool result back into the conversation                     |
|                                                                                |
|     (agent eventually submits)                                                 |
|       function_call: extract_synonym_values(synonym_values=[407, 711, ...])    |
|                                                                                |
|  3) Grade the rollout (reward)                                                 |
|     POST ResourcesServer /verify                                               |
|       - parse the final extract_synonym_values(...) arguments from the response|
|       - compare to expected_synonym_values                                     |
|       - reward = 1.0 if exact match else 0.0 (plus extra metrics)              |
+--------------------------------------------------------------------------------+
```

This is the `simple_agent` loop from the [single-step tutorial](/v0.2/environment-tutorials/single-step-environment) in action: the model is called, it emits a `function_call`, the tool executes, the result is appended back into the conversation, and the model is called again with the updated context. This cycle repeats until the model produces a final text response or hits `max_steps`. In a multi-step environment, this iterative loop is where the agent learns to chain tool calls correctly.

***

## Implementation

**File (simplified from [`resources_servers/example_multi_step/app.py`](https://github.com/NVIDIA-NeMo/Gym/tree/main/resources_servers/example_multi_step/app.py), with added defensive guards):**

```python
# simplified
import json
from typing import List

from fastapi import FastAPI
from pydantic import BaseModel

from nemo_gym.base_resources_server import (
    BaseResourcesServerConfig,
    BaseRunRequest,
    BaseVerifyRequest,
    BaseVerifyResponse,
    SimpleResourcesServer,
)

class ExampleMultiStepResourcesServerConfig(BaseResourcesServerConfig):
    pass

# Custom request types with task-specific metadata
class ExampleMultiStepRunRequest(BaseRunRequest):
    id: int
    expected_synonym_values: List[int]
    expected_synonyms: List[str]
    minefield_label: str
    minefield_label_value: int

class ExampleMultiStepVerifyRequest(ExampleMultiStepRunRequest, BaseVerifyRequest):
    pass

# Extended verify response with detailed metrics
class ExampleMultiStepVerifyResponse(BaseVerifyResponse):
    parsed_synonym_values: List[int]
    accuracy: bool
    set_overlap: float
    original_term_minefield_hit: bool
    order_instruction_following_failure: bool

# Tool request/response models
class GetSynonymValueRequest(BaseModel):
    synonym: str

class GetSynonymValueResponse(BaseModel):
    synonym_value: int

class ExtractSynonymValuesRequest(BaseModel):
    synonym_values: List[int]

class ExtractSynonymValuesResponse(BaseModel):
    success: bool

class ExampleMultiStepResourcesServer(SimpleResourcesServer):
    config: ExampleMultiStepResourcesServerConfig

    def setup_webserver(self) -> FastAPI:
        app = super().setup_webserver()

        # Register multiple tool endpoints
        app.post("/get_synonym_value")(self.get_synonym_value)
        app.post("/extract_synonym_values")(self.extract_synonym_values)

        return app

    # Tool 1: Get the numeric value for a synonym
    async def get_synonym_value(self, body: GetSynonymValueRequest) -> GetSynonymValueResponse:
        # Simple deterministic function: sum of character code points
        return GetSynonymValueResponse(synonym_value=sum(map(ord, body.synonym)))

    # Tool 2: Extract/submit the final answer
    async def extract_synonym_values(
        self, body: ExtractSynonymValuesRequest
    ) -> ExtractSynonymValuesResponse:
        return ExtractSynonymValuesResponse(success=True)

    # THE REWARD FUNCTION - This is where RL magic happens
    async def verify(
        self, body: ExampleMultiStepVerifyRequest
    ) -> ExampleMultiStepVerifyResponse:
        expected = body.expected_synonym_values  # Pulls the ground truth

        # Parse the agent's final answer from its response
        actual = []
        for output in reversed(body.response.output):
            if output.type == "function_call" and output.name == "extract_synonym_values":
                try:
                    actual = json.loads(output.arguments)["synonym_values"]
                except (json.JSONDecodeError, KeyError):
                    actual = []
                break

        # Compute reward based on exact match
        accuracy = expected == actual
        set_overlap = len(set(actual) & set(expected)) / len(expected) if expected else 0.0

        return ExampleMultiStepVerifyResponse(
            **body.model_dump(),
            reward=float(accuracy),  # 1.0 if correct, 0.0 otherwise
            parsed_synonym_values=actual,
            accuracy=accuracy,
            set_overlap=set_overlap,
            original_term_minefield_hit=body.minefield_label in actual or body.minefield_label_value in actual,
            order_instruction_following_failure=not accuracy and set_overlap == 1.0,
        )

if __name__ == "__main__":
    ExampleMultiStepResourcesServer.run_webserver()
```

### Key Insight

The `verify()` function parses the agent's tool calls from `body.response.output` and computes a reward by comparing against ground truth (`body.expected_synonym_values`). The ground truth fields come from the JSONL dataset row and are passed through the `ExampleMultiStepVerifyRequest`.

<Tip>
  The `json.loads(output.arguments)` call is wrapped in `try/except` to handle cases where the model produces malformed JSON. Always guard against unparseable model output in your verify function. For more verification patterns, see [task-verification](/v0.2/about/concepts/task-verification).
</Tip>

<Tip>
  Multi-step environments where tool outputs depend on earlier calls pair naturally with `parallel_tool_calls: false` in the JSONL data, which forces the model to call tools sequentially rather than in parallel.
</Tip>

***

## Rollout Transcript

```text
[Episode start]

Agent -> ResourcesServer: POST /seed_session
  (environment is initialized for this episode)

User: "For the synonyms ['Warm', 'Blazing'], look up each synonym_value and then submit the list."

Agent -> ModelServer: POST /v1/responses (tools available: get_synonym_value, extract_synonym_values)
Model decides to call a tool:
  function_call: get_synonym_value({"synonym": "Warm"})

Agent -> ResourcesServer: POST /get_synonym_value {"synonym": "Warm"}
ResourcesServer -> Agent:
  {"synonym_value": 407}

Agent -> ModelServer: POST /v1/responses (now includes tool output 407)
Model calls next tool:
  function_call: get_synonym_value({"synonym": "Blazing"})

Agent -> ResourcesServer: POST /get_synonym_value {"synonym": "Blazing"}
ResourcesServer -> Agent:
  {"synonym_value": 711}

Agent -> ModelServer: POST /v1/responses (now includes tool output 711)
Model submits final answer via the "submit" tool:
  function_call: extract_synonym_values({"synonym_values": [407, 711]})

[Episode end -> grading]

Agent -> ResourcesServer: POST /verify (includes the full response trace + ground truth fields)
ResourcesServer:
  - parses the extract_synonym_values(...) arguments -> actual=[407, 711]
  - compares to expected_synonym_values from the dataset row
  - returns reward: 1.0 if exact match else 0.0
```

***

<NavButton href="/v0.2/environment-tutorials/stateful-environment" label="Continue to Stateful Environment" direction="next" />