nemo_automodel.components.datasets.vlm.collate_fns#

Module Contents#

Functions#

create_loss_mask_with_start_of_response_token

Create loss mask by finding start of turn token positions, similar to squad.py approach.

phi4_mm_collate_fn

Collate function for Phi-4 MM model audio input

qwen2_5_collate_fn

Collate function for Qwen2.5 VL model.

default_collate_fn

Default collate function for VLM models.

Data#

API#

nemo_automodel.components.datasets.vlm.collate_fns.create_loss_mask_with_start_of_response_token(
input_ids,
processor,
start_of_response_token=None,
)[source]#

Create loss mask by finding start of turn token positions, similar to squad.py approach.

Parameters:
  • input_ids – List or tensor of token IDs for a single example

  • processor – Processor/tokenizer to convert token string to ID

  • start_of_response_token – String token that marks the start of turns (e.g., “<start_of_turn>model\n”)

Returns:

List of 0/1 flags where 0 = masked (prompt), 1 = unmasked (response)

Return type:

loss_mask

nemo_automodel.components.datasets.vlm.collate_fns.phi4_mm_collate_fn(examples, processor)[source]#

Collate function for Phi-4 MM model audio input

nemo_automodel.components.datasets.vlm.collate_fns.qwen2_5_collate_fn(
examples: list,
processor,
start_of_response_token='<|im_start|>assistant\n',
) dict[str, torch.Tensor][source]#

Collate function for Qwen2.5 VL model.

nemo_automodel.components.datasets.vlm.collate_fns.default_collate_fn(
examples: list,
processor,
start_of_response_token=None,
) dict[str, torch.Tensor][source]#

Default collate function for VLM models.

nemo_automodel.components.datasets.vlm.collate_fns.COLLATE_FNS#

None