Data
NeMo Gym datasets use JSONL format for reinforcement learning (RL) training. Each dataset connects to an agent server (orchestrates agent-environment interactions) which routes requests to a resources server (provides tools and computes rewards).
Prerequisites
- NeMo Gym installed: See Installation
- Repository cloned (for built-in datasets):
NeMo Gym uses OpenAI-compatible schemas for model server compatibility. No OpenAI account required—local servers like vLLM use the same format.
Data Format
Each JSONL line requires a responses_create_params field following the OpenAI Responses API schema:
Additional fields like expected_answer vary by resources server—the component that provides tools and reward signals.
Required Fields
Optional Fields
Check resources_servers/<name>/README.md for fields required by each resources server’s verify() method.
The agent_ref Field
The agent_ref field maps each row to a specific resources server. A training dataset can blend multiple resources servers in a single file—agent_ref tells NeMo Gym which server handles each row.
You don’t create agent_ref manually. The ng_prepare_data tool adds it automatically based on your config file. The tool matches the agent type (responses_api_agents) with the agent name from the config.
Example Data
Quick Start
Run this command from the repository root:
Success: Finished! message and data/test/example_metrics.json created.
Dataset Types
Configuration
Define datasets in your agent server’s YAML config:
Valid Licenses
Apache 2.0 · MIT · GNU General Public License v3.0 · Creative Commons Attribution 4.0 International · Creative Commons Attribution-ShareAlike 4.0 International · TBD · NVIDIA Internal Use Only, Do Not Distribute
Workflow
Validation Modes
To prepare training data with auto-download:
HuggingFace downloads require authentication. Set hf_token in env.yaml or export HF_TOKEN.
Common Errors
Guides
Full data preparation workflow.
data-prepFetch datasets from HuggingFace Hub.
huggingfaceYAML-based prompt templates applied at rollout time.
promptsCLI Commands
See Cli Commands for details.
Large Datasets
- Validation streams line-by-line (memory-efficient)
- Single-threaded; >100K samples may take minutes
- Use
num_repeatsinstead of duplicating JSONL lines