nemo_rl.data.datasets.response_datasets.response_dataset#
Module Contents#
Classes#
Dataset class for response data which can be loaded from a JSON file. |
API#
- class nemo_rl.data.datasets.response_datasets.response_dataset.ResponseDataset(
- data_path: str,
- input_key: str = 'input',
- output_key: str = 'output',
- split: Optional[str] = None,
- split_validation_size: float = 0,
- seed: int = 42,
- **kwargs,
Bases:
nemo_rl.data.datasets.raw_dataset.RawDatasetDataset class for response data which can be loaded from a JSON file.
This class handles loading of response data for SFT and RL training. The input JSONL files should contain valid JSON objects formatted like this: { input_key: str, # The input prompt/context output_key: str, # The output response/answer }
- Parameters:
data_path – Path to the dataset JSON file
input_key – Key for the input text, default is “input”
output_key – Key for the output text, default is “output”
split – Optional split name for the dataset, used for HuggingFace datasets
split_validation_size – Size of the validation data, default is 0
seed – Seed for train/validation split when split_validation_size > 0, default is 42
Initialization
- format_data(data: dict[str, Any]) dict[str, Any]#