nemo_rl.data.hf_datasets.openmathinstruct2#

Module Contents#

Classes#

Functions#

format_math

prepare_openinstructmath2_dataset

Load and split the OpenMathInstruct-2 dataset into train and validation sets using HF’s train_test_split.

API#

nemo_rl.data.hf_datasets.openmathinstruct2.format_math(data, output_key: str = 'expected_answer')[source]#
nemo_rl.data.hf_datasets.openmathinstruct2.prepare_openinstructmath2_dataset(
split: str = 'train_1M',
seed=42,
test_size=0.05,
output_key: str = 'expected_answer',
)[source]#

Load and split the OpenMathInstruct-2 dataset into train and validation sets using HF’s train_test_split.

class nemo_rl.data.hf_datasets.openmathinstruct2.OpenMathInstruct2Dataset(
split: str = 'train_1M',
seed: int = 42,
test_size: float = 0.05,
output_key: str = 'expected_answer',
prompt_file: str = None,
)[source]#

Initialization

Initialize the OpenMathInstruct2 dataset with train/validation split.

Parameters:
  • seed – Random seed for reproducible splitting

  • test_size – Proportion of data to use for validation (0.0-1.0)