Core Components#
Before diving into code, let’s understand the three server components that make up a training environment in NeMo Gym.
If you are new to reinforcement learning for LLMs, we recommend you refer to Key Terminology first.
Responses API Model servers are stateless model endpoints that perform single-call text generation without conversation memory or orchestration. During training, you will always have at least one active Responses API Model server, typically called the “policy” model.
Available Implementations:
openai_model: Integration with OpenAI’s Responses APIazure_openai_model: Integration with Azure OpenAI APIvllm_model: Middleware converting local models (using vLLM) to Responses API format
Configuration: Models are configured with API endpoints and credentials using YAML files in responses_api_models/*/configs/
Resource servers host the components and logic of environments including multi-step state persistence, tool and reward function implementations. Resource servers are responsible for returning observations, such as tool results or updated environment state, and rewards as a result of actions taken by the policy model. Actions can be moves in a game, tool calls, or anything an agent can do. NeMo Gym contains a variety of NVIDIA and community contributed resource servers that you can use during training. We also have tutorials on how to add your own resource server.
Examples of Resources
A resource server usually provides tasks, possible actions, and verification logic:
Tasks: Problems or prompts that agents solve during rollouts
Actions: Actions agents can take during rollouts, including tool calling
Verification logic: Scoring logic that evaluates performance (returns reward signals for training)
Example Resource Servers
Each example shows what task the agent solves, what actions are available, and what verification logic measures success:
google_search: Web search with verificationTask: Answer knowledge questions using web search
Actions:
search()queries Google API;browse()extracts webpage contentVerification logic: Checks if final answer matches expected result for MCQA questions
math_with_code: Mathematical reasoning with code executionTask: Solve math problems using Python
Actions:
execute_python()runs Python code with numpy, scipy, pandasVerification logic: Extracts boxed answer and checks mathematical correctness
code_gen: Competitive programming problemsTask: Implement solutions to coding problems
Actions: None (agent generates code directly)
Verification logic: Executes generated code against unit test inputs/outputs
math_with_judge: Mathematical problem solvingTask: Solve math problems
Actions: None (or can be combined with
math_with_code)Verification logic: Uses math library + LLM judge to verify answer equivalence
mcqa: Multiple choice question answeringTask: Answer multiple choice questions
Actions: None (knowledge-based reasoning)
Verification logic: Checks if selected option matches ground truth
instruction_following: Instruction compliance evaluationTask: Follow specified instructions
Actions: None (evaluates response format/content)
Verification logic: Checks if response follows all specified instructions
example_single_tool_call: Mock weather APITask: Report weather information
Actions:
get_weather()returns mock weather dataVerification logic: Checks if weather tool was called correctly
Configuration: Refer to resource-specific config files in resources_servers/*/configs/
Responses API Agent servers orchestrate the rollout lifecycle—the full cycle of task execution and verification.
Implement multi-step and multi-turn agentic systems
Orchestrate the model server and resources server(s) to collect complete trajectories
NeMo Gym provides several agent patterns covering multi-step, multi-turn, and user modeling scenarios.
Examples:
simple_agent: Basic agent that coordinates model calls with resource tools
Configuration Pattern:
your_agent_name: # server ID
responses_api_agents: # server type. corresponds to the folder name in the project root
your_agent_name: # agent type. name of the folder inside the server type folder
entrypoint: app.py # server entrypoint path, relative to the agent type folder
resources_server: # which resource server to use
name: example_single_tool_call
model_server: # which model server to use
name: policy_model