Multi-Environment Training
Multi-Environment Training
NeMo Gym supports training on multiple environments simultaneously. Multi-verifier training is another term for this concept.
Why Train on Multiple Environments?
This technique often results in more stable gains across multiple benchmarks. Single-environment training may cause unrecoverable degradation of other benchmarks.
How to Configure
Suppose you want to use both the example_single_tool_call and example_multi_step training environments. To start each server individually:
For example_single_tool_call:
For example_multi_step:
To use both environments, add the YAML configs together as follows:
Dataset Preparation
Build a dataset that contains data for both servers. Add the agent ref used to route requests to the correct agent server to each record.
Rollout Collection
Run rollout collection as usual.
Inside results/test_multiverifier_outputs.jsonl, you should see 10 rows with appropriate responses for each row.
Apply the same process for data preparation and downstream training. Add additional server configs as needed.