Prompt Config

View as Markdown

Apply YAML-based prompt templates at rollout time to build responses_create_params.input on the fly. This enables prompt sweeps without re-preparing JSONL data.

Goal: Use prompt configs to separate prompt templates from dataset preparation.

Time: ~5 minutes

In this guide, you will:

  1. Write a prompt config YAML file
  2. Apply it during rollout collection with gym eval run --no-serve
  3. Optionally materialize prompts into JSONL with gym dataset render

Prerequisites:

  • NeMo Gym installed (Installation)
  • A JSONL dataset with raw fields (e.g. question, expected_answer)

Overview

A prompt config is a YAML file with a required user field and an optional system field. Placeholders like {question} are filled from each data row’s top-level fields during rollout collection.

Prompt configs and pre-populated responses_create_params.input in the JSONL data are mutually exclusive. Use one or the other. If any row already contains responses_create_params.input when a prompt config is specified, an error is raised.


Prompt Config Format

Minimal (user message only)

1# The {question} placeholder is filled from each row's "question" field.
2user: "{question}"

With system message

1# Math chain-of-thought prompt with system message.
2# Expects rows with a "question" field.
3system: "You are a helpful math assistant. Think step by step and put your final answer in \\boxed{{}}."
4user: "{question}"

Multiple fields

1# Expects rows with "question" and "context" fields.
2system: "Answer the question using the provided context."
3user: |
4 Context: {context}
5
6 Question: {question}

Literal braces must be doubled ({{ / }}). For example, \\boxed{{}} produces \boxed{} in the output.


Usage

At rollout time

Pass --prompt-config <path> to gym eval run --no-serve:

$gym eval run --no-serve \
> --agent my_agent \
> --input data/raw_problems.jsonl \
> --output results/rollouts.jsonl \
> --prompt-config /path/to/my_prompt.yaml \
> --num-repeats 5 \
> --max-output-tokens 16384 \
> --temperature 1.0

The --prompt-config path must be either an absolute path or a path relative to the Gym repository root.

The input JSONL should contain raw fields (e.g. question, expected_answer) without responses_create_params.input. The prompt config builds the input messages during rollout collection.

Standalone materialization

Use gym dataset render to write a prompt template into JSONL without running rollouts:

$gym dataset render \
> --input data/raw_problems.jsonl \
> --prompt-config /path/to/my_prompt.yaml \
> --output data/materialized.jsonl

This produces a new JSONL file with responses_create_params.input populated from the template. This is useful for inspection or passing to other tools that expect pre-populated input.


Input Data Format

When using prompt configs, your input JSONL should have the placeholder fields at the top level:

1{"question": "What is 2+2?", "expected_answer": "4"}
2{"question": "What is 3*5?", "expected_answer": "15"}

Other fields in responses_create_params (such as tools and temperature) are preserved. Only input is built from the template.


CLI Parameters

gym eval run --no-serve

ParameterRequiredDescription
--prompt-configNoPath to a prompt YAML file. Mutually exclusive with pre-populated responses_create_params.input in the JSONL data.

See CLI Commands for the full list of gym eval run --no-serve parameters.

gym dataset render

ParameterRequiredDescription
--inputYesRaw JSONL data (no responses_create_params.input).
--prompt-configYesPath to prompt YAML file to apply.
--outputYesOutput path for materialized JSONL with populated prompts.

How It Works

  1. The prompt YAML is loaded and validated (must have a user key)
  2. All rows are checked for conflicts. If any row already has responses_create_params.input, an error is raised
  3. For each row, placeholders in system and user are filled from the row’s fields
  4. The resulting messages are set as responses_create_params.input