nemo_rl.data.datasets.response_datasets.response_dataset#

Module Contents#

Classes#

ResponseDataset

Dataset class for response data which can be loaded from a JSON file.

API#

class nemo_rl.data.datasets.response_datasets.response_dataset.ResponseDataset(
train_data_path: str,
val_data_path: Optional[str] = None,
input_key: str = 'input',
output_key: str = 'output',
train_split: Optional[str] = None,
val_split: Optional[str] = None,
)#

Dataset class for response data which can be loaded from a JSON file.

This class handles loading of response data for SFT and RL training. The input JSONL files should contain valid JSON objects formatted like this: { input_key: str, # The input prompt/context output_key: str, # The output response/answer }

Parameters:
  • train_data_path – Path to the JSON file containing training data

  • val_data_path – Path to the JSON file containing validation data

  • input_key – Key for the input text

  • output_key – Key for the output text

  • train_split – Split name for the training data, used for HuggingFace datasets, default is None

  • val_split – Split name for the validation data, used for HuggingFace datasets, default is None

Initialization

add_messages_key(
example: dict[str, Any],
) dict[str, list[dict[str, Any]]]#