nemo_rl.data.datasets.response_datasets.openmathinstruct2#

Module Contents#

Classes#

OpenMathInstruct2Dataset

Simple wrapper around the OpenMathInstruct2 dataset.

API#

class nemo_rl.data.datasets.response_datasets.openmathinstruct2.OpenMathInstruct2Dataset(
output_key: str = 'expected_answer',
split: str = 'train_1M',
split_validation_size: float = 0.05,
seed: int = 42,
**kwargs,
)#

Bases: nemo_rl.data.datasets.raw_dataset.RawDataset

Simple wrapper around the OpenMathInstruct2 dataset.

Parameters:
  • output_key – Key for the output text, default is “expected_answer”

  • split – Split name for the dataset, default is “train_1M”

  • split_validation_size – Size of the validation data, default is 0.05

  • seed – Seed for train/validation split when split_validation_size > 0, default is 42

Initialization

format_data(data: dict[str, Any]) dict[str, Any]#