nemo_rl.data.hf_datasets.dpo
#
Module Contents#
Classes#
Dataset class for Direct Preference Optimization (DPO) training. |
API#
- class nemo_rl.data.hf_datasets.dpo.DPODataset(train_data_path: str, val_data_path: str)[source]#
Dataset class for Direct Preference Optimization (DPO) training.
This class handles loading of preference data for DPO training. The input JSON files should contain examples with the following structure: { “prompt”: str, # The input prompt/context “chosen_response”: str, # The preferred/winning response “rejected_response”: str # The non-preferred/losing response }
- Parameters:
train_data_path (str) – Path to the JSON file containing training data
val_data_path (str) – Path to the JSON file containing validation data
Initialization