nemo_rl.data.datasets.preference_datasets.binary_preference_dataset#

Module Contents#

Classes#

BinaryPreferenceDataset

Dataset class for binary preference data which can be loaded from a JSON file.

Functions#

API#

nemo_rl.data.datasets.preference_datasets.binary_preference_dataset.to_preference_data_format(
data: dict[str, Any],
prompt_key: str,
chosen_key: str,
rejected_key: str,
) dict[str, list[dict[str, Any]]]#
class nemo_rl.data.datasets.preference_datasets.binary_preference_dataset.BinaryPreferenceDataset(
train_data_path: str,
val_data_path: Optional[str] = None,
prompt_key: str = 'prompt',
chosen_key: str = 'chosen',
rejected_key: str = 'rejected',
train_split: Optional[str] = None,
val_split: Optional[str] = None,
)#

Dataset class for binary preference data which can be loaded from a JSON file.

This class handles loading of preference data for DPO and RM training. It will be converted to the format of PreferenceDataset through the to_preference_data_format function.

The input JSONL files should contain valid JSON objects formatted like this: { prompt_key: str, # The input prompt/context chosen_key: str, # The preferred/winning response rejected_key: str, # The non-preferred/losing response }

Parameters:
  • train_data_path โ€“ Path to the JSON file containing training data

  • val_data_path โ€“ Path to the JSON file containing validation data

  • prompt_key โ€“ Key for the input prompt/context, default is โ€œpromptโ€

  • chosen_key โ€“ Key for the preferred/winning response, default is โ€œchosenโ€

  • rejected_key โ€“ Key for the non-preferred/losing response, default is โ€œrejectedโ€

  • train_split โ€“ Split name for the training data, used for HuggingFace datasets, default is None

  • val_split โ€“ Split name for the validation data, used for HuggingFace datasets, default is None

Initialization