nemo_automodel.components.models.qwen3_5_moe.model
nemo_automodel.components.models.qwen3_5_moe.model
Qwen3.5-MoE (VL) NeMo Automodel support.
Module Contents
Classes
Functions
Data
API
Bases: Qwen3_5MoeTextRotaryEmbedding
Ensure inv_freq stays in float32 across .to(dtype) calls.
Bases: Qwen3_5MoeVisionRotaryEmbedding
Ensure the vision rotary inv_freq buffer remains float32.
Bases: Block
Block that uses the Qwen3.5-MoE native GatedDeltaNet (separate in_proj_qkv, in_proj_z, in_proj_b, in_proj_a)
Mirror :meth:Block.forward but thread NEAT-packing kwargs into
CPAwareGatedDeltaNet.
The parent Block.forward calls linear_attn with only
hidden_states and attention_mask; for packed sequences the
gated_delta_rule kernel additionally needs cu_seqlens /
indices to reset state at document boundaries (issue #2131).
Derived once per forward from the indexed attention mask.
Bases: CausalLMOutputWithPast
Qwen3.5-MoE output extended with MTP auxiliary hidden states.
Bases: HFCheckpointingMixin, HFQwen3_5MoeForConditionalGeneration, MoEFSDPSyncMixin
Qwen3.5-MoE VL conditional generation model using NeMo backend components.
Build full-sequence multimodal embeddings and mRoPE positions before CP sharding.
Bases: HFQwen3_5MoeModel
Thin wrapper that exposes language_model internals as properties
expected by the NeMo training loop (e.g. model.layers).
Bases: Module
Qwen3.5-MoE text decoder rebuilt on top of the Qwen3-Next Block.
Return a Qwen3.5-MoE backend with TE fused RoPE disabled.
The Qwen3.5 full-attention blocks reuse Qwen3-Next attention, and VLM/packed execution can present THD-shaped q/k tensors. TE fused RoPE expects 4D inputs in this path, so use non-fused RoPE while preserving the rest of the backend.
Build Qwen3.5-MoE MTP runtime config from HF-style config fields.
Construct Qwen3.5-MoE MTP blocks.