`bridge.data.hf_processors.squad`#

Processing functions for Squad dataset.

Module Contents#

Functions#

process_squad_example

Process a single Squad example into the required format.

API#

bridge.data.hf_processors.squad.process_squad_example( example: dict[str, Any], tokenizer: Optional[megatron.bridge.training.tokenizers.tokenizer.MegatronTokenizer] = None, ) → megatron.bridge.data.builders.hf_dataset.ProcessExampleOutput#

Process a single Squad example into the required format.

This function transforms a raw Squad dataset example into the standard format expected by the HFDatasetBuilder for fine-tuning.

Parameters:

example – Raw Squad example containing ‘context’, ‘question’, and ‘answers’
tokenizer – Optional tokenizer (not used in this example)

Returns:

ProcessExampleOutput with formatted input/output and original answers

.. rubric:: Example

example = { … “context”: “The Amazon rainforest is a moist broadleaf forest.”, … “question”: “What type of forest is the Amazon rainforest?”, … “answers”: { … “text”: [“moist broadleaf forest”, “broadleaf forest”], … “answer_start”: [25, 31] … } … } result = process_squad_example(example) print(result[“input”]) Context: The Amazon rainforest is a moist broadleaf forest. Question: What type of forest is the Amazon rainforest? Answer:

bridge.data.hf_processors.squad#

Module Contents#

Functions#

API#

`bridge.data.hf_processors.squad`#