nemo_rl.data.hf_datasets.openmathinstruct2
#
Module Contents#
Classes#
Functions#
Load and split the OpenMathInstruct-2 dataset into train and validation sets using HF’s train_test_split. |
API#
- nemo_rl.data.hf_datasets.openmathinstruct2.format_math(data, output_key: str = 'expected_answer')[source]#
- nemo_rl.data.hf_datasets.openmathinstruct2.prepare_openinstructmath2_dataset(
- split: str = 'train_1M',
- seed=42,
- test_size=0.05,
- output_key: str = 'expected_answer',
Load and split the OpenMathInstruct-2 dataset into train and validation sets using HF’s train_test_split.
- class nemo_rl.data.hf_datasets.openmathinstruct2.OpenMathInstruct2Dataset(
- split: str = 'train_1M',
- seed: int = 42,
- test_size: float = 0.05,
- output_key: str = 'expected_answer',
- prompt_file: str = None,
Initialization
Initialize the OpenMathInstruct2 dataset with train/validation split.
- Parameters:
seed – Random seed for reproducible splitting
test_size – Proportion of data to use for validation (0.0-1.0)