Prompt Config#
Apply YAML-based prompt templates at rollout time to build responses_create_params.input on the fly. This enables prompt sweeps without re-preparing JSONL data.
Goal: Use prompt configs to separate prompt templates from dataset preparation.
Time: ~5 minutes
In this guide, you will:
Write a prompt config YAML file
Apply it during rollout collection with
ng_collect_rolloutsOptionally materialize prompts into JSONL with
ng_materialize_prompts
Prerequisites:
NeMo Gym installed (Detailed Setup Guide)
A JSONL dataset with raw fields (e.g.
question,expected_answer)
Overview#
A prompt config is a YAML file with a required user field and an optional system field. Placeholders like {question} are filled from each data row’s top-level fields during rollout collection.
Prompt configs and pre-populated responses_create_params.input in the JSONL data are mutually exclusive. Use one or the other. If any row already contains responses_create_params.input when a prompt config is specified, an error is raised.
Prompt Config Format#
Minimal (user message only)#
# The {question} placeholder is filled from each row's "question" field.
user: "{question}"
With system message#
# Math chain-of-thought prompt with system message.
# Expects rows with a "question" field.
system: "You are a helpful math assistant. Think step by step and put your final answer in \\boxed{{}}."
user: "{question}"
Multiple fields#
# Expects rows with "question" and "context" fields.
system: "Answer the question using the provided context."
user: |
Context: {context}
Question: {question}
Note
Literal braces must be doubled ({{ / }}). For example, \\boxed{{}} produces \boxed{} in the output.
Usage#
At rollout time#
Pass +prompt_config=<path> to ng_collect_rollouts:
ng_collect_rollouts \
+agent_name=my_agent \
+input_jsonl_fpath=data/raw_problems.jsonl \
+output_jsonl_fpath=results/rollouts.jsonl \
+prompt_config=/path/to/my_prompt.yaml \
+num_repeats=5 \
"+responses_create_params={max_output_tokens: 16384, temperature: 1.0}"
The +prompt_config path must be either an absolute path or a path relative to the Gym repository root.
The input JSONL should contain raw fields (e.g. question, expected_answer) without responses_create_params.input. The prompt config builds the input messages during rollout collection.
Standalone materialization#
Use ng_materialize_prompts to write a prompt template into JSONL without running rollouts:
ng_materialize_prompts \
+input_jsonl_fpath=data/raw_problems.jsonl \
+prompt_config=/path/to/my_prompt.yaml \
+output_jsonl_fpath=data/materialized.jsonl
This produces a new JSONL file with responses_create_params.input populated from the template. This is useful for inspection or passing to other tools that expect pre-populated input.
Input Data Format#
When using prompt configs, your input JSONL should have the placeholder fields at the top level:
{"question": "What is 2+2?", "expected_answer": "4"}
{"question": "What is 3*5?", "expected_answer": "15"}
Other fields in responses_create_params (such as tools and temperature) are preserved. Only input is built from the template.
CLI Parameters#
ng_collect_rollouts#
Parameter |
Required |
Description |
|---|---|---|
|
No |
Path to a prompt YAML file. Mutually exclusive with pre-populated |
See CLI Commands for the full list of ng_collect_rollouts parameters.
ng_materialize_prompts#
Parameter |
Required |
Description |
|---|---|---|
|
Yes |
Raw JSONL data (no |
|
Yes |
Path to prompt YAML file to apply. |
|
Yes |
Output path for materialized JSONL with populated prompts. |
How It Works#
The prompt YAML is loaded and validated (must have a
userkey)All rows are checked for conflicts. If any row already has
responses_create_params.input, an error is raisedFor each row, placeholders in
systemanduserare filled from the row’s fieldsThe resulting messages are set as
responses_create_params.input