Add a New Environment#

An environment consists of three components: Agents, Models, and Resources. Most contributors will create new resource servers while using existing agent server and model server implementations. If you need to create custom agents or models, you can reference the implementations in responses_api_agents/ and responses_api_models/.

This guide focuses on the resource server contribution process.

Tip

For a guide to building your first resource server, see Creating a Resource Server.

Required Files#

Your resource server must include these files:

File

Description

app.py

Main server implementation with verify function

configs/*.yaml

Configuration with valid domain field

tests/test_app.py

At least one unit test

data/example.jsonl

At least five example inputs

data/example_rollouts.jsonl

Pre-generated rollouts from example data (generate before submitting PR)

requirements.txt

Python dependencies

README.md

Documentation with licensing information

Contribution Workflow#

Contributing a resource server follows this sequence:

Step

Phase

Description

1

Curate Tasks

Collect or generate training tasks and create example data

2

Implementation

Build resource server with verification logic

3

Testing

Write and run unit tests

4

Example Rollouts

Generate example rollouts to verify functionality

5

Reward Profiling

Validate reward distribution with inference runs

6

Training Validation

Train with GRPO to ensure meaningful training signal

7

Submit PR

Submit pull request with all required information

8

Review

Address reviewer feedback and verify reproducibility

Detailed Steps#

1. Curate Training Tasks#

Prepare the dataset for your environment:

  • Collect or generate prompts/tasks for your environment

  • Create data/example.jsonl with at least 5 representative task examples

2. Resource Server Implementation#

Build your resource server:

  • Run ng_init_resources_server +entrypoint=resources_servers/my_server to scaffold the new resource server

  • Follow the Creating a Resource Server guide to implement your specific logic

  • Implement verification logic for your tasks by defining the verify() function

  • Set the domain field in your resource server configuration (see Domain).

  • Complete the auto-generated README.md with licensing information

3. Testing#

Write and run tests for your resource server:

  • At least one test per server is required for PR approval

  • You are responsible for ensuring your tests adequately cover your server’s functionality

4. Generate Example Rollouts#

Verify basic functionality and generate example rollouts:

  • Document the command used to start your server, for example, ng_run +entrypoint=resources_servers/my_server

  • Generate rollouts and save 5 example outputs to data/example_rollouts.jsonl to demonstrate correct reward signals

5. Reward Profiling#

Run inference to validate reward distribution:

  • Use a ~500 sample subset (minimum)

  • Use Qwen3-4B, Qwen3 30B A3B, or equivalent model

  • Generate 16 responses per prompt

  • Report reward distribution

  • For tool calling: Provide tool call metrics and correlation with rewards

6. Training-Based Validation#

Validate with actual training:

  • Train with GRPO on Qwen3-4B, Qwen 30B A3B Instruct, or equivalent model

  • Include training accuracy curve

  • Include test benchmark accuracy curve (if applicable)

7. Submit PR#

Include the following in your pull request description:

  • Description of the environment

  • Description of the verification logic

  • Description of the prompts/tasks: What is the source? Which domain does it cover?

  • Provide relevant license information for data and software. If models were used for synthetic data generation, note this in your PR description

8. PR Review Process#

After submitting your PR:

  1. A team member will be assigned to review and reproduce your environment

  2. The reviewer will verify all steps and check correctness of the 5 example rollouts

  3. The reviewer will re-run the procedure to ensure reproducibility

  4. Address any feedback from reviewers

  5. Once approved, maintainers will merge your contribution

Reference#