nemo_rl.data.datasets.eval_datasets.mmau#
MMAU (Massive Multitask Audio Understanding) evaluation dataset.
Module Contents#
Classes#
MMAU evaluation dataset. |
Data#
API#
- nemo_rl.data.datasets.eval_datasets.mmau.DEFAULT_TEMPLATE#
‘{question} Please choose the answer from the following options: {choices}. Output the final answer i…’
- class nemo_rl.data.datasets.eval_datasets.mmau.MMAUDataset(
- dataset_name: str = 'TwinkStart/MMAU',
- split: str = 'v05.15.25',
MMAU evaluation dataset.
Loads the TwinkStart/MMAU HF dataset and formats each item into the messages format expected by vlm_hf_data_processor.
- Parameters:
dataset_name – HuggingFace dataset name.
split – Dataset split to load.
Initialization
- format_data(data: dict[str, Any]) dict[str, Any]#
Convert a raw MMAU item into messages format for vlm_hf_data_processor.