nemo_automodel.components.models.mistral4.model
nemo_automodel.components.models.mistral4.model
Module Contents
Classes
Functions
Data
API
Bases: HFCheckpointingMixin, Module, MoEFSDPSyncMixin
Full multimodal Mistral 4: Pixtral vision + projector + Mistral4 MLA/MoE text backbone.
Follows KimiK25VLForConditionalGeneration pattern: inherits from nn.Module (not HF PreTrainedModel) to avoid FSDP conflicts.
Only handle configs whose text backbone is Mistral4 (MoE + MLA).
Bases: Module
VLM wrapper composing vision tower + projector + Mistral4 text backend.
Follows KimiK25VLModel pattern: plain nn.Module (not HF PreTrainedModel) to avoid FSDP conflicts from PreTrainedModel’s module registration hooks. Vision processing logic is replicated from HF Mistral3Model.
Encode images through vision tower + projector (from HF Mistral3Model).
Bases: HFCheckpointingMixin, Module, MoEFSDPSyncMixin
Bases: MLA
MLA with Llama 4 attention scaling for Mistral 4.
Compared to DeepSeek V3 MLA, adds position-dependent scaling to q_pe after RoPE (llama_4_scaling_beta). RoPE itself uses the same complex-number approach as DSV3.
Bases: Module
Bases: Module
Backend-aware Mistral4 text model for use inside the multimodal wrapper.
Wraps Mistral4Model in self.model (like KimiK25VLLanguageModelBackend wraps DeepseekV3Model). This ensures embed_tokens/layers/norm are accessed via @property aliases rather than as direct nn.Module children, which avoids FSDP double-root-init when the parallelizer wraps both embed_tokens and this module.
Build MoEConfig from a Mistral4 text config.
Position-dependent attention scaling for long-context extrapolation (Llama 4 / Mistral 4).