Prompt Config | NeMo Gym

Apply YAML-based prompt templates at rollout time to build responses_create_params.input on the fly. This enables prompt sweeps without re-preparing JSONL data.

Goal: Use prompt configs to separate prompt templates from dataset preparation.

Time: ~5 minutes

In this guide, you will:

Write a prompt config YAML file
Apply it during rollout collection with gym eval run --no-serve
Optionally materialize prompts into JSONL with gym dataset render

Prerequisites:

NeMo Gym installed (Installation)
A JSONL dataset with raw fields (e.g. question, expected_answer)

Overview

A prompt config is a YAML file with a required user field and an optional system field. Placeholders like {question} are filled from each data row’s top-level fields during rollout collection.

Prompt configs and pre-populated responses_create_params.input in the JSONL data are mutually exclusive. Use one or the other. If any row already contains responses_create_params.input when a prompt config is specified, an error is raised.

Prompt Config Format

Minimal (user message only)

1 # The {question} placeholder is filled from each row's "question" field.
2 user: "{question}"

With system message

1 # Math chain-of-thought prompt with system message.
2 # Expects rows with a "question" field.
3 system: "You are a helpful math assistant. Think step by step and put your final answer in \\boxed{{}}."
4 user: "{question}"

Multiple fields

1 # Expects rows with "question" and "context" fields.
2 system: "Answer the question using the provided context."
3 user: |
4   Context: {context}
5 
6   Question: {question}

Literal braces must be doubled ({{ / }}). For example, \\boxed{{}} produces \boxed{} in the output.

Usage

At rollout time

Pass --prompt-config <path> to gym eval run --no-serve:

$ gym eval run --no-serve \
>     --agent my_agent \
>     --input data/raw_problems.jsonl \
>     --output results/rollouts.jsonl \
>     --prompt-config /path/to/my_prompt.yaml \
>     --num-repeats 5 \
>     --max-output-tokens 16384 \
>     --temperature 1.0

The --prompt-config path must be either an absolute path or a path relative to the Gym repository root.

The input JSONL should contain raw fields (e.g. question, expected_answer) without responses_create_params.input. The prompt config builds the input messages during rollout collection.

Standalone materialization

Use gym dataset render to write a prompt template into JSONL without running rollouts:

$ gym dataset render \
>     --input data/raw_problems.jsonl \
>     --prompt-config /path/to/my_prompt.yaml \
>     --output data/materialized.jsonl

This produces a new JSONL file with responses_create_params.input populated from the template. This is useful for inspection or passing to other tools that expect pre-populated input.

Input Data Format

When using prompt configs, your input JSONL should have the placeholder fields at the top level:

1 {"question": "What is 2+2?", "expected_answer": "4"}
2 {"question": "What is 3*5?", "expected_answer": "15"}

Other fields in responses_create_params (such as tools and temperature) are preserved. Only input is built from the template.

CLI Parameters

`gym eval run --no-serve`

Parameter	Required	Description
`--prompt-config`	No	Path to a prompt YAML file. Mutually exclusive with pre-populated `responses_create_params.input` in the JSONL data.

See CLI Commands for the full list of gym eval run --no-serve parameters.

`gym dataset render`

Parameter	Required	Description
`--input`	Yes	Raw JSONL data (no `responses_create_params.input`).
`--prompt-config`	Yes	Path to prompt YAML file to apply.
`--output`	Yes	Output path for materialized JSONL with populated prompts.

How It Works

The prompt YAML is loaded and validated (must have a user key)
All rows are checked for conflicts. If any row already has responses_create_params.input, an error is raised
For each row, placeholders in system and user are filled from the row’s fields
The resulting messages are set as responses_create_params.input