`bridge.models.nemotron_omni.data.collate_fn`#

Nemotron Omni collator implementations.

Module Contents#

Functions#

nemotron_omni_collate_fn

Collate function for Nemotron Omni model (vision + audio + language).

Data#

`CHATML_ASSISTANT_START`
`CHATML_TURN_END`

API#

bridge.models.nemotron_omni.data.collate_fn.CHATML_ASSISTANT_START#: ‘<|im_start|>assistant\n’

bridge.models.nemotron_omni.data.collate_fn.CHATML_TURN_END#: ‘<|im_end|>’

bridge.models.nemotron_omni.data.collate_fn.nemotron_omni_collate_fn( examples: list, processor, start_of_response_token=None, *, pack_sequences: bool = False, ) → dict[str, torch.Tensor]#

Collate function for Nemotron Omni model (vision + audio + language).

Extends nemotron_nano_v2_vl_collate_fn with audio support. Each example may carry an audio_path field pointing to a 16 kHz mono WAV file. Audio is converted to mel spectrograms and added to the batch as sound_clips / sound_length tensors consumed by LLaVAModel.forward().

When pack_sequences=True, samples in the microbatch are concatenated along the sequence dim into a single [1, sum(L_i)] batch, and cu_seqlens / cu_seqlens_unpadded / cu_seqlens_argmin / max_seqlen are emitted so TE’s THD attention kernels handle per-sample masking without an attention mask. Requires mbs > 1 to be meaningful.

bridge.models.nemotron_omni.data.collate_fn#

Module Contents#

Functions#

Data#

API#

`bridge.models.nemotron_omni.data.collate_fn`#