nemo_rl.data.datasets.response_datasets.response_dataset
#
Module Contents#
Classes#
Dataset class for response data which can be loaded from a JSON file. |
API#
- class nemo_rl.data.datasets.response_datasets.response_dataset.ResponseDataset(
- train_data_path: str,
- val_data_path: Optional[str] = None,
- input_key: str = 'input',
- output_key: str = 'output',
- train_split: Optional[str] = None,
- val_split: Optional[str] = None,
Dataset class for response data which can be loaded from a JSON file.
This class handles loading of response data for SFT and RL training. The input JSONL files should contain valid JSON objects formatted like this: { input_key: str, # The input prompt/context output_key: str, # The output response/answer }
- Parameters:
train_data_path – Path to the JSON file containing training data
val_data_path – Path to the JSON file containing validation data
input_key – Key for the input text
output_key – Key for the output text
train_split – Split name for the training data, used for HuggingFace datasets, default is None
val_split – Split name for the validation data, used for HuggingFace datasets, default is None
Initialization
- add_messages_key(
- example: dict[str, Any],