nemo_automodel.components.models.mimo_v2_flash.model
nemo_automodel.components.models.mimo_v2_flash.model
Module Contents
Classes
Functions
Data
API
Bases: Module
MiMo-V2-Flash attention with full and sliding-window variants.
Bases: Module
Decoder block that alternates dense MLP and routed-MoE layers.
Bases: HFCheckpointingMixin, Module, MoEFSDPSyncMixin
Causal LM wrapper for MiMo-V2-Flash with Automodel checkpoint adapters.
Keep the SWA rotary embedding on every PP stage.
Forward pass producing text logits.
Parameters:
Input token IDs [B, S] (or THD-packed [T]/[1, T]).
Pre-computed input embeddings (optional).
Optional position indices.
2D padding mask, 4D additive mask, or per-type dict.
Optional MoE padding mask.
If 0, compute logits for all positions (training default);
otherwise compute only the last logits_to_keep positions.
When set, the returned output carries the final
hidden states (input to lm_head) in hidden_states.
Additional arguments forwarded to the base model.
Returns: CausalLMOutputWithPast
class:~transformers.modeling_outputs.CausalLMOutputWithPast with
Bases: Module
Backbone model for Xiaomi MiMo-V2-Flash.
Bases: Module
Rotary embedding module matching MiMo-V2-Flash partial-RoPE behavior.
Bases: Module
RMSNorm used by MiMo-V2-Flash decoder blocks.