NeMo Gym
NeMo Gym
NeMo Gym is a library for evaluating and improving models and agents using environments. NeMo Gym provides infrastructure to develop environments, scalably run evaluation and training, and a collection of popular benchmarks and training environments.
When to Use NeMo Gym
- You need to evaluate models or agents in stateful environments (for example, code execution, tool calling, sandboxes).
- You want reproducible evaluation across teams using shared environments and verifiers.
- You need to use environments at scale — multiple repeats per task, or thousands of concurrent requests for training.
- You want to seamlessly transition between evaluation, agent optimization, and training.
If you are scoring model outputs with a stateless check and do not need scale or training, a script is probably sufficient.
What NeMo Gym Provides
- Modular, extensible interfaces for agents, environments, tasks, and verifiers
- Environment hub of popular benchmarks and training environments
- Use your own agents or choose from built-in harnesses
- Scale to thousands of concurrent environments
- Train with the RL framework of your choice
- Battle-tested in production Nemotron training

Integrations
NeMo Gym integrates with the broader agentic ecosystem:
- Environment libraries: Seamlessly combine environments and benchmarks from other libraries alongside NeMo Gym environments.
- Training framework libraries: Use environments for SFT and RL training.
- Agent harnesses: Popular agent harnesses for evaluation and training available out of the box.
- Agent framework libraries: Use your custom agent built with agent frameworks in NeMo Gym environments.
- Sandboxes: Isolate agent runtime execution.