RL Data Preparation#

Data preparation for the RL pipeline downloads nvidia/Nemotron-3-Super-RL-Training-Blends from HuggingFace, resolves placeholder entries by fetching from external datasets (DAPO, Skywork), and produces 6 data blends with train/val splits.


Pipeline#

        %%{init: {'theme': 'base', 'themeVariables': { 'primaryBorderColor': '#333333', 'lineColor': '#333333', 'primaryTextColor': '#333333'}}}%%
flowchart LR
    subgraph prep["Data Preparation"]
        direction LR
        hf["HuggingFace<br/>Dataset"] --> resolve["Placeholder<br/>Resolution"]
        resolve --> jsonl["JSONL<br/>Format"]
        jsonl --> split["Train/Val<br/>Split"]
    end
    split --> blends["6 Data<br/>Blends"]

    style hf fill:#e1f5fe,stroke:#2196f3
    style resolve fill:#e1f5fe,stroke:#2196f3
    style jsonl fill:#f3e5f5,stroke:#9c27b0
    style split fill:#f3e5f5,stroke:#9c27b0
    style blends fill:#e8f5e9,stroke:#4caf50
    

Stage

What Happens

HuggingFace Dataset

Download nvidia/Nemotron-3-Super-RL-Training-Blends (6 blend files)

Placeholder Resolution

Resolve _hf_placeholder records by fetching from external datasets (DAPO, Skywork) and applying template restoration

JSONL Format

Convert to JSONL with question, expected_answer, and responses_create_params fields

Train/Val Split

Last 100 rows held out for validation per blend

6 Data Blends

rlvr1/, rlvr2/, rlvr3/, swe1/, swe2/, rlhf/ — each with train-split.jsonl + val-split.jsonl


Quick Start (Standalone)#

The simplest way to prepare data is using the upstream script directly:

# Download RL data blends
uvx --from huggingface-hub hf download nvidia/Nemotron-3-Super-RL-Training-Blends \
    --repo-type dataset --local-dir=data_with_placeholders

# Fill in placeholders
chmod +x data_with_placeholders/fill_placeholders.py
./data_with_placeholders/fill_placeholders.py \
    --input-dir data_with_placeholders --output-dir data_filled

# Create train/val splits (last 100 rows held out for validation)
for f in data_filled/*.jsonl; do
  name=$(basename "$f" .jsonl)
  mkdir -p "data/$name"
  head -n -100 "$f" > "data/$name/train-split.jsonl"
  tail -n 100 "$f" > "data/$name/val-split.jsonl"
done

Nemotron CLI (with xenna pipeline)#

Alternatively, use the Nemotron CLI which runs the xenna pipeline with Ray for distributed processing and W&B artifact tracking:

# Prepare data for each sub-stage
uv run nemotron super3 data prep rl rlvr --run YOUR-CLUSTER
uv run nemotron super3 data prep rl swe1 --run YOUR-CLUSTER
uv run nemotron super3 data prep rl swe2 --run YOUR-CLUSTER
uv run nemotron super3 data prep rl rlhf --run YOUR-CLUSTER

Each sub-stage has its own data prep command because the data blends differ (RLVR uses HF placeholder resolution, while SWE/RLHF use direct JSONL splitting).

--run YOUR-CLUSTER refers to a profile defined in your env.toml file. See the env.toml setup guide for details.

Option

Description

--run <profile>

Execute on Slurm via NeMo-Run

sample=N

Limit rows per dataset (for testing)

force=true

Force re-run, ignoring cache


Output#

output/stage2_rl_resolved/
├── rlvr1/
│   ├── train-split.jsonl
│   └── val-split.jsonl
├── rlvr2/
│   ├── train-split.jsonl
│   └── val-split.jsonl
├── rlvr3/
│   ├── train-split.jsonl
│   └── val-split.jsonl
├── swe1/
│   ├── train-split.jsonl
│   └── val-split.jsonl
├── swe2/
│   ├── train-split.jsonl
│   └── val-split.jsonl
├── rlhf/
│   ├── train-split.jsonl
│   └── val-split.jsonl
└── manifest.json

Recipe Source: src/nemotron/recipes/super3/stage2_rl/data_prep.py