nemo_rl.data.datasets.response_datasets.avqa#
Module Contents#
Classes#
Wrapper around the AVQA (Audio-Visual Question Answering) dataset. |
Functions#
Resample audio to target sample rate. |
|
Parse the HF dataset question format. |
Data#
API#
- nemo_rl.data.datasets.response_datasets.avqa.DEFAULT_TEMPLATE#
‘{question} Please choose the answer from the following options: {choices}. Output the final answer i…’
- nemo_rl.data.datasets.response_datasets.avqa._resample_audio(audio_array, orig_sr, target_sr=16000)#
Resample audio to target sample rate.
- nemo_rl.data.datasets.response_datasets.avqa._parse_question(question_text)#
Parse the HF dataset question format.
Input: “How many animals are there in the video?\nChoices:\nA. 3\nB. One\nC. 4\nD. 2” Returns: (question, choices_list)
- class nemo_rl.data.datasets.response_datasets.avqa.AVQADataset(
- split: str = 'train',
- split_validation_size: float = 0,
- seed: int = 42,
- max_samples: int | None = None,
- **kwargs,
Bases:
nemo_rl.data.datasets.raw_dataset.RawDatasetWrapper around the AVQA (Audio-Visual Question Answering) dataset.
Formats audio samples into OpenAI-style messages for audio QA fine-tuning with Qwen2.5-Omni.
- Parameters:
split – Split name for the dataset. Supported: “train”, “validation”.
max_samples – Maximum number of samples to load.
seed – Random seed for splitting the dataset.
split_validation_size – Size of the validation set.
Initialization
- task_name#
‘avqa’
- format_data(data: dict[str, Any]) dict[str, Any]#