bridge.models.nemotron_omni.data.collate_fn#
Nemotron Omni collator implementations.
Module Contents#
Functions#
Collate function for Nemotron Omni model (vision + audio + language). |
Data#
API#
- bridge.models.nemotron_omni.data.collate_fn.CHATML_ASSISTANT_START#
β<|im_start|>assistant\nβ
- bridge.models.nemotron_omni.data.collate_fn.CHATML_TURN_END#
β<|im_end|>β
- bridge.models.nemotron_omni.data.collate_fn.nemotron_omni_collate_fn(
- examples: list,
- processor,
- start_of_response_token=None,
- *,
- pack_sequences: bool = False,
Collate function for Nemotron Omni model (vision + audio + language).
Extends nemotron_nano_v2_vl_collate_fn with audio support. Each example may carry an
audio_pathfield pointing to a 16 kHz mono WAV file. Audio is converted to mel spectrograms and added to the batch assound_clips/sound_lengthtensors consumed by LLaVAModel.forward().When
pack_sequences=True, samples in the microbatch are concatenated along the sequence dim into a single[1, sum(L_i)]batch, andcu_seqlens/cu_seqlens_unpadded/cu_seqlens_argmin/max_seqlenare emitted so TEβs THD attention kernels handle per-sample masking without an attention mask. Requiresmbs > 1to be meaningful.