nemo_automodel.components.models.qwen2_5_omni.model
nemo_automodel.components.models.qwen2_5_omni.model
Qwen2.5-Omni Thinker for ASR / multimodal text generation.
Qwen2.5-Omni is the dense predecessor of Qwen3-Omni-Moe. For NeMo
AutoModel we only train the Thinker (audio + image + video + text); the
talker and token2wav components are dropped from the loaded checkpoint by
:class:Qwen2_5OmniStateDictAdapter.
Compared with :mod:nemo_automodel.components.models.qwen3_omni_moe.model,
this module is intentionally minimal:
- inherits HFās
Qwen2_5OmniThinkerForConditionalGenerationdirectly (the text backbone is a standard dense Qwen2 transformer with MRoPE, so no custom rewrite is needed); - adds :class:
HFCheckpointingMixinfor NeMo-compatible save/load; - attaches :class:
Qwen2_5OmniStateDictAdapterforthinker.*prefix handling; - does NOT inherit
MoEFSDPSyncMixin(dense, no experts).
Module Contents
Classes
Functions
Data
API
Bases: HFCheckpointingMixin, HFQwen2_5OmniThinkerForConditionalGeneration
Qwen2.5-Omni Thinker (audio + image + video + text ā text).
Multimodal forward that mirrors HFās Thinker but supports cut-CE.
This re-implements the body of HFās
Qwen2_5OmniThinkerForConditionalGeneration.forward (same
audio/image/video embedding merge and MRoPE index computation) so we
can (a) gate the lm_head projection on logits_to_keep and
(b) surface the FINAL hidden states (the lm_head input) on the
returned :class:~transformers.modeling_outputs.CausalLMOutputWithPast.
Together these let the recipe enable
:class:FusedLinearCrossEntropy (cut-CE): it checks logits_to_keep
is in the signature and that the output carries hidden_states.
Audio is mandatory for ASR; image / video paths are kept enabled so the same class supports the full Thinker modality set.
Parameters:
If 0 (default), project all positions (no slice
ā DTensor cannot slice a full range). Otherwise compute logits
only for the last logits_to_keep positions before lm_head.
When set, the returned output carries the final hidden states spanning the full sequence.
Returns:
class:~transformers.modeling_outputs.CausalLMOutputWithPast with
Return the thinker sub-config regardless of whether a full Omni or Thinker-only config was passed in.