Single-Step Environment
Single-Step Environment
Single-Step Environment
Build a complete environment end-to-end, from scaffolding to RL-ready rollouts.
Goal: Build a weather assistant environment with tool calling and verification.
Time: ~30 minutes | Cost: ~$0.05 (OpenAI API)
In this tutorial, you will:
Complete Detailed Setup before starting — clone the repository, install dependencies, configure your API key, and verify servers start correctly.
If you followed the Quickstart, you’re ready to proceed.
Run all commands from the repository root directory (where pyproject.toml is located).
NeMo Gym uses a decoupled three-component architecture: the Agent Server orchestrates the loop, the Model Server runs inference, and the Resources Server provides tools and verification. All three are async FastAPI servers communicating over HTTP, which allows many rollouts to run concurrently across episodes. See core-components for the full architecture and diagram.
In most cases, the Resources Server is where your changes go: define your tool endpoints and a verify() method that returns a reward. NeMo Gym ships several pre-built agent servers (simple_agent, swe_agents, etc.) and model servers (openai_model, vllm_model) that you can use as-is, or you can bring your own.
Resource servers live in the resources_servers/ directory. Scaffold a weather server that provides weather information to models:
This generates the following structure along with a paired simple agent configuration:
Understanding the task is the first step in designing the environment itself.
Every environment starts with task data — the scenarios your model will practice on. Task data is stored in JSONL format (one JSON object per line), where each line represents a single training example. To get started, it’s not atypical for a domain-expert to hand-craft a few examples from scratch. Once the environment is developed and tested with these examples, you can scale up by collecting more data or using synthetic data generation using libraries like NeMo Data Designer.
Each line contains a responses_create_params object with the conversation messages, tool definitions, and any ground-truth metadata needed for verification:
Create resources_servers/my_weather_tool/data/example.jsonl with five weather examples:
This section covers the key aspects of building the environment itself: building or using an existing Agent server, creating the Resources Server, and writing tool and verification logic.
While this tutorial is about a single-step environment, it still can use the built-in simple_agent, which handles even multi-step tool calling out of the box. No custom agent code is needed. Here is simplified pseudocode showing the core flow (actual implementation):
This tutorial uses simple_agent. For other patterns (multi-turn correction, custom orchestration), see the other agents in responses_api_agents/, or build your own by extending SimpleResponsesAPIAgent.
While the agent handles orchestration, the Resources Server is where you define what makes your environment unique. It is the backbone of tool-based interactions in NeMo Gym.
It provides:
Some agents may come with predefined tools, and you can use the Resources Server to supplement them with additional external tools. When building a new environment, prefer defining tools in the Resources Server rather than the Agent Server. This separation lets multiple agents share the same tool logic without duplicating it.
Open resources_servers/my_weather_tool/app.py and implement:
The verify() function is the heart of your RL environment — it computes the reward signal that drives model training. In this example, verification is simple: return 1.0 if the model called the get_weather tool, 0.0 otherwise. Real environments will have more sophisticated logic, but the principle is the same — inspect the model’s output and score it.
This example checks tool usage, not argument correctness. See task-verification for the full verification patterns and best practices, or jump to Advanced: Verification Patterns at the end of this tutorial for quick examples.
Open resources_servers/my_weather_tool/configs/my_weather_tool.yaml. This file contains both the resource server and its paired simple agent configuration.
Update the domain field from other to agent:
The domain field categorizes your resource server and is required. Common values: math, coding, agent, knowledge, instruction_following, long_context, safety, games, e2e, other.
The domain is used for metrics grouping and dataset naming. Choose the category that best describes your task.
The agent entry references the resource server and model server by name, wiring all three components together.
If your server needs external packages, add them to requirements.txt:
Update resources_servers/my_weather_tool/tests/test_app.py to test your implementation:
Run the tests:
For detailed test output:
Start the servers:
ng_run reads the config files and starts all three components from the architecture diagram:
my_weather_tool_simple_agent) — the simple_agent that orchestrates the seed → model → tool → verify loopopenai_model) — proxies LLM inference requests to the OpenAI APImy_weather_tool_resources_server) — serves your get_weather tool endpoint and verify() logicConfigure your OpenAI API key in env.yaml (located in the repository root). The env.yaml is never committed to Git and is designed to hold secrets like API keys:
Set your API key as an environment variable before running the next command:
Never commit API keys directly in YAML files.
If you don’t want to use the OpenAI API, you can try using a local vLLM server (requires GPU access) instead! See model-server-vllm.
You can do a quick spot-check by pointing the built-in client at your agent. Inside responses_api_agents/simple_agent/client.py, change the server name to my_weather_tool_simple_agent, then run:
This client calls /v1/responses, which tests tool-calling but does not exercise the full episode lifecycle (seed_session → responses → verify). End-to-end validation happens during rollout collection below.
Before training, you collect rollouts to validate that your environment works end-to-end and to profile and establish a baseline. Each rollout runs a task through the full agent loop (prompt → model → tool calls → verification) and records the complete interaction along with the reward. This serves two purposes:
With your servers still running, collect rollouts against your example inputs:
Ensure your servers are running before collecting rollouts. The command processes each input example, runs it through the servers, and saves the complete interaction including tool calls and verification rewards to example_rollouts.jsonl.
Once you’ve collected rollouts and validated your environment, run training with your preferred RL framework:
Update resources_servers/my_weather_tool/README.md with licensing and usage information:
We’d love to see your contributions! Please make sure your PR includes accurate licensing information.
You’ve learned how to:
ng_init_resources_serverdomain field and wire components togetherFor tasks requiring multiple tool calls, define a custom verify request model to carry ground-truth data, then parse the final output to compute accuracy:
See resources_servers/example_multi_step/app.py for a complete example.
The custom request model (MultiStepVerifyRequest) is required for extra fields like expected_values to survive Pydantic parsing. Using BaseVerifyRequest directly would silently drop any fields not defined on the base class.
For tasks with multiple valid answers, use an LLM to judge correctness.
See resources_servers/math_with_judge/app.py for implementation details.
For code generation tasks, run unit tests against model output.
See resources_servers/code_gen/app.py for implementation details.
If you encounter the error "A domain is required for resource servers", ensure the domain field is set in your config YAML file.
Ensure you are running commands from the repository root directory and have installed dependencies:
Check that:
app.py are correctEnsure:
Check server status and logs:
Server logs appear in the terminal where ng_run was executed.