bridge.data.hf_processors.squad
#
Processing functions for Squad dataset.
Module Contents#
Functions#
Process a single Squad example into the required format. |
API#
- bridge.data.hf_processors.squad.process_squad_example(
- example: dict[str, Any],
- tokenizer: Optional[megatron.bridge.training.tokenizers.tokenizer.MegatronTokenizer] = None,
Process a single Squad example into the required format.
This function transforms a raw Squad dataset example into the standard format expected by the HFDatasetBuilder for fine-tuning.
- Parameters:
example â Raw Squad example containing âcontextâ, âquestionâ, and âanswersâ
tokenizer â Optional tokenizer (not used in this example)
- Returns:
ProcessExampleOutput with formatted input/output and original answers
.. rubric:: Example
example = { ⊠âcontextâ: âThe Amazon rainforest is a moist broadleaf forest.â, ⊠âquestionâ: âWhat type of forest is the Amazon rainforest?â, ⊠âanswersâ: { ⊠âtextâ: [âmoist broadleaf forestâ, âbroadleaf forestâ], ⊠âanswer_startâ: [25, 31] ⊠} ⊠} result = process_squad_example(example) print(result[âinputâ]) Context: The Amazon rainforest is a moist broadleaf forest. Question: What type of forest is the Amazon rainforest? Answer: