nemo_automodel.components.datasets.vlm.collate_fns#

Module Contents#

Functions#

_find_pattern_indices

_extract_assistant_text

build_labels

Construct label and optional loss-mask tensors aligned to assistant responses.

phi4_mm_collate_fn

Collate function for Phi-4 MM model audio input

qwen2_5_collate_fn

Collate function for Qwen2.5 VL model.

qwen3_omni_collate_fn

Collate function for Qwen3 Omni processors.

default_collate_fn

Default collate function for multimodal VLM datasets.

Data#

API#

nemo_automodel.components.datasets.vlm.collate_fns.logger#

‘getLogger(…)’

nemo_automodel.components.datasets.vlm.collate_fns._find_pattern_indices(
template,
pattern,
search_start_index=0,
allow_first_token_mismatch=False,
)#
nemo_automodel.components.datasets.vlm.collate_fns._extract_assistant_text(message: Dict[str, Any]) str#
nemo_automodel.components.datasets.vlm.collate_fns.build_labels(
input_ids_batch: torch.Tensor,
conversations: Sequence[Sequence[Dict[str, Any]]],
processor,
) tuple[torch.Tensor, Optional[torch.Tensor]]#

Construct label and optional loss-mask tensors aligned to assistant responses.

nemo_automodel.components.datasets.vlm.collate_fns.phi4_mm_collate_fn(examples, processor)#

Collate function for Phi-4 MM model audio input

nemo_automodel.components.datasets.vlm.collate_fns.qwen2_5_collate_fn(
examples: list,
processor,
) dict[str, torch.Tensor]#

Collate function for Qwen2.5 VL model.

nemo_automodel.components.datasets.vlm.collate_fns.qwen3_omni_collate_fn(
examples: Sequence[Dict[str, Any]],
processor,
use_audio_in_video: bool = False,
) Dict[str, torch.Tensor]#

Collate function for Qwen3 Omni processors.

nemo_automodel.components.datasets.vlm.collate_fns.default_collate_fn(
examples: Sequence[Dict[str, Any]],
processor,
) Dict[str, torch.Tensor]#

Default collate function for multimodal VLM datasets.

nemo_automodel.components.datasets.vlm.collate_fns.COLLATE_FNS#

None