Multi-Environment Training#
NeMo Gym supports training on multiple environments simultaneously. Multi-verifier training is another term for this concept.
Why Train on Multiple Environments?#
This technique often results in more stable gains across multiple benchmarks. Single-environment training may cause unrecoverable degradation of other benchmarks.
How to Configure#
Suppose you want to use both the example_single_tool_call and example_multi_step training environments. To start each server individually:
For example_single_tool_call:
config_paths="responses_api_models/openai_model/configs/openai_model.yaml,\
resources_servers/example_single_tool_call/configs/example_single_tool_call.yaml"
ng_run "+config_paths=[${config_paths}]"
For example_multi_step:
config_paths="responses_api_models/openai_model/configs/openai_model.yaml,\
resources_servers/example_multi_step/configs/example_multi_step.yaml"
ng_run "+config_paths=[$config_paths]"
To use both environments, add the YAML configs together as follows:
config_paths="responses_api_models/openai_model/configs/openai_model.yaml,\
resources_servers/example_single_tool_call/configs/example_single_tool_call.yaml,\
resources_servers/example_multi_step/configs/example_multi_step.yaml"
ng_run "+config_paths=[$config_paths]"
Dataset Preparation#
Build a dataset that contains data for both servers. Add the agent ref used to route requests to the correct agent server to each record.
jq -c '. + {"agent_ref": {"name": "example_single_tool_call_simple_agent"}}' resources_servers/example_single_tool_call/data/example.jsonl >> results/test_multiverifier_input.jsonl
jq -c '. + {"agent_ref": {"name": "example_multi_step_simple_agent"}}' resources_servers/example_multi_step/data/example.jsonl >> results/test_multiverifier_input.jsonl
Rollout Collection#
Run rollout collection as usual.
ng_collect_rollouts \
+input_jsonl_fpath=results/test_multiverifier_input.jsonl \
+output_jsonl_fpath=results/test_multiverifier_outputs.jsonl
Inside results/test_multiverifier_outputs.jsonl, you should see 10 rows with appropriate responses for each row.
Apply the same process for data preparation and downstream training. Add additional server configs as needed.