bridge.data.hf_processors.openmathinstruct2#
Processing functions for OpenMathInstruct-2 dataset.
Dataset: https://huggingface.co/datasets/nvidia/OpenMathInstruct-2
OpenMathInstruct-2 contains math problems with generated solutions. Each example
has problem, generated_solution, and expected_answer fields.
Module Contents#
Functions#
Process a single OpenMathInstruct-2 example into the required format. |
API#
- bridge.data.hf_processors.openmathinstruct2.process_openmathinstruct2_example(
- example: dict[str, Any],
- _tokenizer: Optional[megatron.bridge.training.tokenizers.tokenizer.MegatronTokenizer] = None,
Process a single OpenMathInstruct-2 example into the required format.
Transforms a raw OpenMathInstruct-2 dataset example into the standard format expected by the HFDatasetBuilder for fine-tuning.
- Parameters:
example â Raw example containing âproblemâ, âgenerated_solutionâ, and âexpected_answerâ
tokenizer â Optional tokenizer (not used in this processor)
- Returns:
ProcessExampleOutput with formatted input/output and original answers
.. rubric:: Example
example = { ⊠âproblemâ: âWhat is 2 + 3?â, ⊠âgenerated_solutionâ: âWe add 2 and 3 to get 5.â, ⊠âexpected_answerâ: â5â, ⊠} result = process_openmathinstruct2_example(example) print(result[âinputâ]) Problem: What is 2 + 3? Solution: