nemo_rl.data.hf_datasets.dpo#

Module Contents#

Classes#

DPODataset

Dataset class for Direct Preference Optimization (DPO) training.

API#

class nemo_rl.data.hf_datasets.dpo.DPODataset(train_data_path: str, val_data_path: str)[source]#

Dataset class for Direct Preference Optimization (DPO) training.

This class handles loading of preference data for DPO training. The input JSON files should contain examples with the following structure: { “prompt”: str, # The input prompt/context “chosen_response”: str, # The preferred/winning response “rejected_response”: str # The non-preferred/losing response }

Parameters:
  • train_data_path (str) – Path to the JSON file containing training data

  • val_data_path (str) – Path to the JSON file containing validation data

Initialization