nemo_automodel.components.models.mimo_v2_flash.state_dict_adapter
nemo_automodel.components.models.mimo_v2_flash.state_dict_adapter
Module Contents
Classes
Functions
Data
API
Bases: MoESplitExpertsStateDictMixin, StateDictAdapter
Convert MiMo-V2-Flash HF checkpoints to Automodel’s grouped MoE layout.
HF stores routed experts as split per-expert projections:
mlp.experts.{E}.{gate,up,down}_proj.weight. Automodel groups those
into gate_and_up_projs and down_projs so EP can shard experts
without materializing every expert on every rank.
Convert Automodel state_dict to the HF MiMo-V2-Flash layout.
Note: The quantization parameter is accepted for interface
compatibility but is ignored. MiMo-V2-Flash is distributed as an
FP8 HF checkpoint, so this adapter always emits FP8 weights plus
_scale_inv companions for keys that match _should_quantize_key,
regardless of the caller’s preference.