`nemo_rl.data.datasets.response_datasets.openmathinstruct2`#

Module Contents#

Classes#

OpenMathInstruct2Dataset

Functions#

`format_math`
`prepare_openinstructmath2_dataset`	Load and split the OpenMathInstruct-2 dataset into train and validation sets using HF’s train_test_split.

API#

nemo_rl.data.datasets.response_datasets.openmathinstruct2.format_math( data: dict[str, str | float | int], output_key: str = 'expected_answer', task_name: str = 'OpenMathInstruct-2', ) → dict[str, list[Any] | str]#

nemo_rl.data.datasets.response_datasets.openmathinstruct2.prepare_openinstructmath2_dataset( split: str = 'train_1M', seed: int = 42, test_size: float = 0.05, output_key: str = 'expected_answer', task_name: str = 'OpenMathInstruct-2', ) → dict[str, datasets.Dataset | None]#: Load and split the OpenMathInstruct-2 dataset into train and validation sets using HF’s train_test_split.

class nemo_rl.data.datasets.response_datasets.openmathinstruct2.OpenMathInstruct2Dataset( split: str = 'train_1M', seed: int = 42, test_size: float = 0.05, output_key: str = 'expected_answer', prompt_file: Optional[str] = None, )#

Bases: nemo_rl.data.datasets.raw_dataset.RawDataset

Initialization

Initialize the OpenMathInstruct2 dataset with train/validation split.

Parameters:

seed – Random seed for reproducible splitting
test_size – Proportion of data to use for validation (0.0-1.0)

nemo_rl.data.datasets.response_datasets.openmathinstruct2#

Module Contents#

Classes#

Functions#

API#

`nemo_rl.data.datasets.response_datasets.openmathinstruct2`#