RL Data Preparation#
Data preparation for the RL pipeline downloads nvidia/Nemotron-3-Super-RL-Training-Blends from HuggingFace, resolves placeholder entries by fetching from external datasets (DAPO, Skywork), and produces 6 data blends with train/val splits.
Pipeline#
%%{init: {'theme': 'base', 'themeVariables': { 'primaryBorderColor': '#333333', 'lineColor': '#333333', 'primaryTextColor': '#333333'}}}%%
flowchart LR
subgraph prep["Data Preparation"]
direction LR
hf["HuggingFace<br/>Dataset"] --> resolve["Placeholder<br/>Resolution"]
resolve --> jsonl["JSONL<br/>Format"]
jsonl --> split["Train/Val<br/>Split"]
end
split --> blends["6 Data<br/>Blends"]
style hf fill:#e1f5fe,stroke:#2196f3
style resolve fill:#e1f5fe,stroke:#2196f3
style jsonl fill:#f3e5f5,stroke:#9c27b0
style split fill:#f3e5f5,stroke:#9c27b0
style blends fill:#e8f5e9,stroke:#4caf50
Stage |
What Happens |
|---|---|
HuggingFace Dataset |
Download |
Placeholder Resolution |
Resolve |
JSONL Format |
Convert to JSONL with |
Train/Val Split |
Last 100 rows held out for validation per blend |
6 Data Blends |
|
Quick Start (Standalone)#
The simplest way to prepare data is using the upstream script directly:
# Download RL data blends
uvx --from huggingface-hub hf download nvidia/Nemotron-3-Super-RL-Training-Blends \
--repo-type dataset --local-dir=data_with_placeholders
# Fill in placeholders
chmod +x data_with_placeholders/fill_placeholders.py
./data_with_placeholders/fill_placeholders.py \
--input-dir data_with_placeholders --output-dir data_filled
# Create train/val splits (last 100 rows held out for validation)
for f in data_filled/*.jsonl; do
name=$(basename "$f" .jsonl)
mkdir -p "data/$name"
head -n -100 "$f" > "data/$name/train-split.jsonl"
tail -n 100 "$f" > "data/$name/val-split.jsonl"
done
Nemotron CLI (with xenna pipeline)#
Alternatively, use the Nemotron CLI which runs the xenna pipeline with Ray for distributed processing and W&B artifact tracking:
# Prepare data for each sub-stage
uv run nemotron super3 data prep rl rlvr --run YOUR-CLUSTER
uv run nemotron super3 data prep rl swe1 --run YOUR-CLUSTER
uv run nemotron super3 data prep rl swe2 --run YOUR-CLUSTER
uv run nemotron super3 data prep rl rlhf --run YOUR-CLUSTER
Each sub-stage has its own data prep command because the data blends differ (RLVR uses HF placeholder resolution, while SWE/RLHF use direct JSONL splitting).
--run YOUR-CLUSTERrefers to a profile defined in yourenv.tomlfile. See the env.toml setup guide for details.
Option |
Description |
|---|---|
|
Execute on Slurm via NeMo-Run |
|
Limit rows per dataset (for testing) |
|
Force re-run, ignoring cache |
Output#
output/stage2_rl_resolved/
├── rlvr1/
│ ├── train-split.jsonl
│ └── val-split.jsonl
├── rlvr2/
│ ├── train-split.jsonl
│ └── val-split.jsonl
├── rlvr3/
│ ├── train-split.jsonl
│ └── val-split.jsonl
├── swe1/
│ ├── train-split.jsonl
│ └── val-split.jsonl
├── swe2/
│ ├── train-split.jsonl
│ └── val-split.jsonl
├── rlhf/
│ ├── train-split.jsonl
│ └── val-split.jsonl
└── manifest.json
Recipe Source: src/nemotron/recipes/super3/stage2_rl/data_prep.py