`nemo_rl.data.datasets.response_datasets.response_dataset`#

Module Contents#

Classes#

ResponseDataset

Dataset class for response data which can be loaded from a JSON file.

API#

class nemo_rl.data.datasets.response_datasets.response_dataset.ResponseDataset( train_data_path: str, val_data_path: Optional[str] = None, input_key: str = 'input', output_key: str = 'output', train_split: Optional[str] = None, val_split: Optional[str] = None, )#

Bases: nemo_rl.data.datasets.raw_dataset.RawDataset

Dataset class for response data which can be loaded from a JSON file.

This class handles loading of response data for SFT and RL training. The input JSONL files should contain valid JSON objects formatted like this: { input_key: str, # The input prompt/context output_key: str, # The output response/answer }

Parameters:

train_data_path – Path to the JSON file containing training data
val_data_path – Path to the JSON file containing validation data
input_key – Key for the input text
output_key – Key for the output text
train_split – Split name for the training data, used for HuggingFace datasets, default is None
val_split – Split name for the validation data, used for HuggingFace datasets, default is None

Initialization

add_messages_key( example: dict[str, Any], task_name: str = 'ResponseDataset', ) → dict[str, str | list[dict[str, Any]]]#

nemo_rl.data.datasets.response_datasets.response_dataset#

Module Contents#

Classes#

API#

`nemo_rl.data.datasets.response_datasets.response_dataset`#