nemo_rl.data.datasets.response_datasets.avqa#

Module Contents#

Classes#

AVQADataset

Wrapper around the AVQA (Audio-Visual Question Answering) dataset.

Functions#

_resample_audio

Resample audio to target sample rate.

_parse_question

Parse the HF dataset question format.

Data#

API#

nemo_rl.data.datasets.response_datasets.avqa.DEFAULT_TEMPLATE#

‘{question} Please choose the answer from the following options: {choices}. Output the final answer i…’

nemo_rl.data.datasets.response_datasets.avqa._resample_audio(audio_array, orig_sr, target_sr=16000)#

Resample audio to target sample rate.

nemo_rl.data.datasets.response_datasets.avqa._parse_question(question_text)#

Parse the HF dataset question format.

Input: “How many animals are there in the video?\nChoices:\nA. 3\nB. One\nC. 4\nD. 2” Returns: (question, choices_list)

class nemo_rl.data.datasets.response_datasets.avqa.AVQADataset(
split: str = 'train',
split_validation_size: float = 0,
seed: int = 42,
max_samples: int | None = None,
**kwargs,
)#

Bases: nemo_rl.data.datasets.raw_dataset.RawDataset

Wrapper around the AVQA (Audio-Visual Question Answering) dataset.

Formats audio samples into OpenAI-style messages for audio QA fine-tuning with Qwen2.5-Omni.

Parameters:
  • split – Split name for the dataset. Supported: “train”, “validation”.

  • max_samples – Maximum number of samples to load.

  • seed – Random seed for splitting the dataset.

  • split_validation_size – Size of the validation set.

Initialization

task_name#

‘avqa’

format_data(data: dict[str, Any]) dict[str, Any]#