nemo_automodel.components.models.minimax_m2.model
nemo_automodel.components.models.minimax_m2.model
Module Contents
Classes
Data
API
Bases: Module
MiniMax-M2 transformer block with attention + MoE.
Bases: HFCheckpointingMixin, Module, MoEFSDPSyncMixin
Forward pass returning :class:~transformers.modeling_outputs.CausalLMOutputWithPast.
Parameters:
Input token IDs. BSHD: [B, S]; THD: [1, T] (squeezed internally).
Optional position indices.
Optional attention mask.
Optional padding mask.
If 0, compute logits for all positions (training default);
otherwise only compute logits for the last logits_to_keep token positions.
Whether to return the final hidden states on the output.
Additional arguments forwarded to the base model (e.g. qkv_format).
Returns: CausalLMOutputWithPast
class:~transformers.modeling_outputs.CausalLMOutputWithPast with logits
Bases: Module