Creating a Training Environment#
Learn how to create a custom resource server to implement tools, verifiers, and business logic for your training environment.
Goal: Build a custom resource server with tools and verification logic.
Time: ~30 minutes | Cost: ~$0.05 (OpenAI API)
In this tutorial, you will:
Initialize a resource server from template
Implement tool endpoints
Add verification logic for rewards
Test with rollout collection
← Previous: Rollout Collection
Prerequisites#
Complete both of the following before starting this tutorial:
Detailed Setup Guide — Clone the repository, install dependencies, configure your API key, and verify servers start correctly.
Rollout Collection — Collect and view your first batch of rollouts. This tutorial builds on rollout concepts and uses
ng_collect_rolloutsin later steps.
Tip
If you followed the Quickstart, you’ve already completed both. You’re ready to proceed.
Important
Run all commands from the repository root directory (where pyproject.toml is located).
What You’ll Build#
By the end of this tutorial, you’ll have:
A runnable resource server with
ng_runUnit tests in
tests/test_app.pyConfiguration with required
domainfieldExample data in
data/example.jsonl(5 examples)Example rollouts in
data/example_rollouts.jsonlDocumentation with licensing information
What is a Resource Server?#
Resource servers are the backbone of tool-based interactions in NeMo Gym. They provide:
Tool implementations: APIs that models can call to perform actions or retrieve information
Verification logic: Functions to evaluate model performance and compute rewards
Business logic abstraction: Clean separation between model logic and domain-specific functionality
Each resource server must implement a verify function that evaluates the model’s interactions and returns a reward signal for reinforcement learning.
Key term: A rollout is a complete interaction trace—the model’s inputs, tool calls, and final outputs—used for training and evaluation.
1. Initialize the Resource Server#
Resource servers live in the resources_servers/ directory. Create a weather server that provides weather information to models.
Run the initialization command from the repository root:
ng_init_resources_server +entrypoint=resources_servers/my_weather_tool
This command creates a new directory structure with template files:
resources_servers/my_weather_tool/
├── app.py # Main server implementation
├── configs/
│ └── my_weather_tool.yaml # Configuration files
├── data/
│ └── .gitignore # Data directory for examples/datasets
├── tests/
│ └── test_app.py # Unit tests
├── requirements.txt # Python dependencies
└── README.md # Documentation
Tip
The initialization command also creates a paired simple agent configuration that references your resource server, making it easy to test end-to-end.
2. Configure the Domain#
Open resources_servers/my_weather_tool/configs/my_weather_tool.yaml and update the domain field:
my_weather_tool_resources_server:
resources_servers:
my_weather_tool:
entrypoint: app.py
domain: agent # Change from 'other' to 'agent' for this use case
The domain field categorizes your resource server and is required. Common categories include:
math— Mathematical problem-solvingcoding— Code generation and programmingagent— Agent-based interactions and tool callingknowledge— Knowledge-based question answeringinstruction_following— Instruction following benchmarkslong_context— Long context handlingsafety— Safety and alignmentgames— Game-playing scenariose2e— End-to-end workflowsother— General purpose
Tip
The domain is used for metrics grouping and dataset naming. Choose the category that best describes your task.
3. Implement the Server#
Open resources_servers/my_weather_tool/app.py and add the complete implementation:
from fastapi import FastAPI
from pydantic import BaseModel
from nemo_gym.base_resources_server import (
BaseResourcesServerConfig,
BaseVerifyRequest,
BaseVerifyResponse,
SimpleResourcesServer,
)
# 1. Define the server configuration
class MyWeatherResourcesServerConfig(BaseResourcesServerConfig):
"""Configuration for the weather resource server."""
pass
# 2. Define request and response schemas for your tools
class GetWeatherRequest(BaseModel):
"""Request schema for getting weather information."""
city: str
class GetWeatherResponse(BaseModel):
"""Response schema for weather information."""
city: str
weather_description: str
# 3. Implement the resource server
class MyWeatherResourcesServer(SimpleResourcesServer):
config: MyWeatherResourcesServerConfig
def setup_webserver(self) -> FastAPI:
"""Register API routes."""
app = super().setup_webserver()
# Register your tool endpoints
app.post("/get_weather")(self.get_weather)
return app
async def get_weather(self, body: GetWeatherRequest) -> GetWeatherResponse:
"""
Tool implementation: Get weather for a city.
In a production implementation, this would call a weather API.
For this example, we return a simple static response.
"""
return GetWeatherResponse(
city=body.city,
weather_description=f"The weather in {body.city} is cold."
)
async def verify(self, body: BaseVerifyRequest) -> BaseVerifyResponse:
"""
Verification function: Evaluate rollout performance.
This function is called after a rollout completes.
Return a reward between 0.0 and 1.0.
"""
# Check if the model called the get_weather tool
used_tool = False
for output in body.response.output:
if output.type == "function_call" and output.name == "get_weather":
used_tool = True
break
# Return higher reward if the tool was used correctly
reward = 1.0 if used_tool else 0.0
return BaseVerifyResponse(**body.model_dump(), reward=reward)
if __name__ == "__main__":
MyWeatherResourcesServer.run_webserver()
Key Components#
Configuration Class: Extends
BaseResourcesServerConfigand holds server-specific settingsRequest/Response Schemas: Pydantic models defining the API contract
Server Class: Extends
SimpleResourcesServerand implements tools and verificationsetup_webserver(): Registers FastAPI routes for your toolsTool Methods: Async functions that implement the actual tool logic
verify(): Required method that evaluates task performance and returns a reward
4. Add Dependencies (Optional)#
If your server needs external packages, add them to requirements.txt:
-e nemo-gym[dev] @ ../../
# Add any other dependencies here
5. Write Tests#
Update resources_servers/my_weather_tool/tests/test_app.py to test your implementation:
import pytest
from unittest.mock import MagicMock
from nemo_gym.server_utils import ServerClient
from resources_servers.my_weather_tool.app import (
MyWeatherResourcesServer,
MyWeatherResourcesServerConfig,
GetWeatherRequest,
)
@pytest.fixture
def server():
"""Create a server instance for testing."""
config = MyWeatherResourcesServerConfig(
host="0.0.0.0",
port=8080,
entrypoint="",
name="my_weather_tool",
)
return MyWeatherResourcesServer(
config=config, server_client=MagicMock(spec=ServerClient)
)
@pytest.mark.asyncio
async def test_get_weather(server):
"""Test the get_weather tool."""
request = GetWeatherRequest(city="San Francisco")
response = await server.get_weather(request)
assert response.city == "San Francisco"
assert "cold" in response.weather_description.lower()
@pytest.mark.asyncio
async def test_verify(server):
"""Test the verify function."""
from nemo_gym.base_resources_server import BaseVerifyRequest
from nemo_gym.openai_utils import NeMoGymResponse, NeMoGymResponseCreateParamsNonStreaming
# Create a proper BaseVerifyRequest with required fields
verify_request = BaseVerifyRequest(
responses_create_params=NeMoGymResponseCreateParamsNonStreaming(
input=[{"type": "text", "text": "What's the weather?"}]
),
response=NeMoGymResponse(
output=[{"type": "text", "text": "It's cold."}]
)
)
response = await server.verify(verify_request)
assert response.reward >= 0.0
assert response.reward <= 1.0
Run the tests:
ng_test +entrypoint=resources_servers/my_weather_tool
For detailed test output:
cd resources_servers/my_weather_tool
source .venv/bin/activate
pytest -v
6. Run with an Agent#
The initialization command created a paired simple agent configuration in the same YAML file. Start the servers:
config_paths="responses_api_agents/simple_agent/configs/simple_agent.yaml,\
responses_api_models/openai_model/configs/openai_model.yaml,\
resources_servers/my_weather_tool/configs/my_weather_tool.yaml"
ng_run "+config_paths=[$config_paths]" \
+simple_agent.responses_api_agents.simple_agent.resources_server.name=my_weather_tool_resources_server
This starts three servers:
The simple agent server (coordinates interactions)
The OpenAI model server (provides LLM responses)
Your weather resource server (provides the
get_weathertool)
Configure your OpenAI API key in env.yaml (located in the repository root):
openai_api_key: ${oc.env:OPENAI_API_KEY} # Reads from environment variable
policy_api_key: ${openai_api_key}
policy_base_url: https://api.openai.com/v1
policy_model_name: gpt-4o-mini
Tip
Set your API key as an environment variable before running:
export OPENAI_API_KEY="sk-your-key-here" # pragma: allowlist secret
Never commit API keys directly in YAML files.
Test the resources server#
After the servers start, test your resources server in a new terminal:
python responses_api_agents/simple_agent/client.py
The model should be able to use your get_weather tool to answer questions about weather!
7. Create Example Data#
Your resource server needs example data for testing and validation. Create resources_servers/my_weather_tool/data/example.jsonl with at least five example inputs.
Note
JSONL (JSON Lines) format: one JSON object per line, no wrapping array or trailing commas.
{"input": [{"type": "text", "text": "What's the weather in San Francisco?"}]}
{"input": [{"type": "text", "text": "Tell me the weather in New York"}]}
{"input": [{"type": "text", "text": "How's the weather in Seattle?"}]}
{"input": [{"type": "text", "text": "What is the current weather in Boston?"}]}
{"input": [{"type": "text", "text": "Can you check the weather in Chicago?"}]}
Generate Example Rollouts#
Collect rollouts by running against your example inputs. This generates interaction traces showing how models use your tools:
ng_collect_rollouts +agent_name=my_weather_tool_simple_agent \
+input_jsonl_fpath=resources_servers/my_weather_tool/data/example.jsonl \
+output_jsonl_fpath=resources_servers/my_weather_tool/data/example_rollouts.jsonl \
+limit=null \
+num_repeats=null \
+num_samples_in_parallel=null
Note
Ensure your servers are running (from step 6) before collecting rollouts. The command processes each input example, runs it through the servers, and saves the complete interaction including tool calls and verification rewards to example_rollouts.jsonl.
8. Update Documentation#
Update resources_servers/my_weather_tool/README.md with licensing and usage information:
# My Weather Tool Resource Server
A simple weather information resource server demonstrating tool calling.
## Description
This resource server provides a `get_weather` tool that returns weather information for cities.
## Data
- Example data: Five synthetic weather queries
## Licensing Information
**Code**: Apache 2.0
**Data**: Apache 2.0 (synthetic examples)
## Dependencies
- nemo_gym: Apache 2.0
Important
Your PR will not be merged unless licensing information is present and accurate!
Advanced: Verification Patterns#
Multi-step verification with output parsing
For tasks requiring multiple tool calls, parse the final output to compute accuracy:
async def verify(self, body: BaseVerifyRequest) -> BaseVerifyResponse:
"""Extract and validate multi-step results."""
expected = body.expected_values # From request
# Parse the final tool call output
actual = []
for output in reversed(body.response.output):
if output.type == "function_call" and output.name == "submit_answer":
import json
actual = json.loads(output.arguments).get("values", [])
break
# Compute accuracy metrics
accuracy = expected == actual
set_overlap = len(set(actual) & set(expected)) / len(expected) if expected else 0
return BaseVerifyResponse(
**body.model_dump(),
reward=float(accuracy),
)
See resources_servers/example_multi_step/app.py for a complete example.
LLM-as-judge verification
For tasks with multiple valid answers, use an LLM to judge correctness:
# See resources_servers/math_with_judge/app.py for the full pattern
See resources_servers/math_with_judge/app.py for implementation details.
Unit test verification (code generation)
For code generation tasks, run unit tests against model output:
# See resources_servers/code_gen/app.py for the full pattern
See resources_servers/code_gen/app.py for implementation details.
Next Steps#
Now that you have a working resource server:
Add training data: Collect rollouts and prepare datasets for RL training
Add complex verification: Add reward shaping and detailed performance metrics
Scale up: Add more tools and more sophisticated business logic
Integrate with RL: Use RL Training with NeMo RL using GRPO to train models on your tasks
Learn how to collect and process rollouts for training data.
Train models using your resource server with NeMo RL.
Troubleshooting#
Domain validation error#
If you encounter the error "A domain is required for resource servers", ensure the domain field is set in your config YAML file.
Import errors#
Ensure you are running commands from the repository root directory and have installed dependencies:
uv sync
Server does not start#
Check that:
Port is not already in use
Configuration file syntax is valid YAML
All imports in
app.pyare correct
Tests fail#
Ensure:
You are in the correct Python environment
All dependencies are installed
Test file imports match your actual file structure
Debugging server behavior#
Check server status and logs:
# View running servers
ng_status
# For detailed logs, run the server directly:
cd resources_servers/my_weather_tool
source .venv/bin/activate
python app.py
Server logs appear in the terminal where ng_run was executed.
Summary#
You’ve learned how to:
✅ Initialize a resource server with ng_init_resources_server
✅ Configure the required domain field
✅ Add tools and verification logic
✅ Write and run tests
✅ Run your server with an model
✅ Create required data artifacts
Resource servers are the foundation for building custom RL environments in NeMo Gym. Experiment with different tool implementations and verification strategies to create engaging tasks for your models!