Quickstart

View as Markdown

Quickstart

See Installation if you need to install NeMo Gym.

Configure Your Model

Create an env.yaml file in the project root with your model endpoint credentials:

1policy_base_url: https://api.openai.com/v1
2policy_api_key: <your-openai-api-key>
3policy_model_name: gpt-4.1-2025-04-14

This quickstart uses OpenAI. NeMo Gym supports local and hosted inference — see Configure Model for vLLM, Fireworks, OpenRouter, and others.

Run Evaluation

Run your agent on a set of tasks and score the results. This example uses a simple tool calling agent simple_agent with the mcqa (multiple-choice Q&A) environment and its included example data.

1. Start servers

NeMo Gym uses local servers to coordinate your model, agent, and task verification. Start them first:

$environment_config="resources_servers/mcqa/configs/mcqa.yaml"
$model_config="responses_api_models/openai_model/configs/openai_model.yaml"
$
$ng_run "+config_paths=[${environment_config},${model_config}]"

You should see three server instances starting:

[1] mcqa (resources_servers/mcqa)
[2] mcqa_simple_agent (responses_api_agents/simple_agent)
[3] policy_model (responses_api_models/openai_model)

2. Evaluate your agent

In a new terminal, run your agent on a single task to verify everything works:

$source .venv/bin/activate
$
$ng_collect_rollouts \
> +agent_name=mcqa_simple_agent \
> +input_jsonl_fpath=resources_servers/mcqa/data/example.jsonl \
> +output_jsonl_fpath=results/mcqa_rollouts.jsonl \
> +limit=5 \
> +num_repeats=1

You should see a progress bar followed by aggregate metrics:

Collecting rollouts: 100%|██████| 5/5 [01:22<00:00, 16.44s/it]
Key metrics for mcqa_simple_agent:
{
"mean/reward": 0.8,
"pass@1[avg-of-1]/accuracy": 80.0,
"pass@1/accuracy": 80.0
}
Finished rollout collection! View results at:
Fully materialized inputs: results/mcqa_rollouts_materialized_inputs.jsonl
Rollouts: results/mcqa_rollouts.jsonl
Aggregate metrics: results/mcqa_rollouts_aggregate_metrics.json

For per-task pass rates, see ng_reward_profile in the CLI Reference.

Explore

Now that you have a working setup, explore what’s available.

NeMo Gym ships with environments across many domains. You can use these existing environments in addition to building your own.

$ng_list_benchmarks
Available benchmarks in NeMo Gym
┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark name ┃ Agent name ┃ Num repeats ┃
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ aalcr │ aalcr_benchmark_simple_agent │ 16 │
│ aime25 │ aime25_math_with_judge_simple_ag… │ 32 │
│ browsecomp │ browsecomp_tavily_search_simple_… │ 1 │
│ gpqa │ gpqa_mcqa_simple_agent │ 8 │
│ ifbench │ ifbench_benchmark_simple_agent │ 5 │
| ... | ... | ... |
│ tau2 │ tau2_benchmark_agent │ 8 │
│ xstest │ xstest_benchmark_simple_agent │ 4 │
└──────────────────┴───────────────────────────────────┴─────────────┘

This lists benchmarks with pre-configured agents. For the full set of environments (including training environments), see the Available Environments table.

Every CLI command supports +h=true or +help=true for detailed usage information:

$ng_help
$ng_run +help=true

Next Steps