Environment Tutorials#
Learn how to build custom environments for training or evaluation using NeMo Gym.
Tip
Looking to use an existing environment rather than build your own? See the Available Environments in the README.
Environment Properties#
Training environments can be broadly characterized along five dimensions:
Rollout structure: The interaction pattern between the model, environment, and user.
Core capabilities: The behaviors or skills that a model needs in order to succeed in a given use case.
Knowledge domain: What subject area, area of expertise, or field of study is involved.
Task type: The high-level use case that is represented in the training environment.
Verification method: How the environment computes rewards from model responses. See Task Verification for details.
Below are a subset of rollout structures and core capabilities found across NeMo Gym environments. We plan to add these as structured metadata to environments in the future. If you have ideas for additional properties, please let us know by opening an issue.
Rollout Structure#
Rollout structure |
Description |
|---|---|
Multi-step |
Interleaved assistant and tool messages |
Multi-turn |
Interleaved user and assistant messages |
Multi-modal |
Interleaved text, image, video, and/or audio messages |
Long context |
Message content is very large or the number of messages is very large |
Core Capabilities#
Core capability |
Developer/User need |
Rollout Structures Required |
|---|---|---|
Information dependency |
The model receives environment responses that may require changes to subsequent actions. |
Multi-step |
Proactive asking |
Developers put the model in a situation where user context is missing. The model needs to recognize user context is missing and ask the user for the missing context. |
Multi-turn |
Schema adherence |
Users need more than one piece of information delivered by the model at one time in a specified delivery format. |
|
Meta data instruction following |
User constrains the meta-properties of the model response e.g. “respond in 5 words”. |
|
Counterintuitive instruction following |
User provides instructions that are against conventional wisdom, typically making sense in the specific context in which the model is being used |
|
Information relevance |
Given a large volume of inputs, the model needs to ignore content irrelevant to the task at hand. |
Long context |
Multiple intent synthesis |
Users provide multiple tasks for the model to accomplish. |
Multi-step, Multi-turn |
Tutorials#
Implement tools, define verification logic, and test your environment end-to-end.
Run multiple training environments simultaneously for rollout collection.