nemo_automodel.components.models.step3p5.model
nemo_automodel.components.models.step3p5.model
Module Contents
Classes
Functions
Data
API
Bases: Module
Step3p5 transformer block with attention, MLP/MoE, and shared experts.
Bases: HFCheckpointingMixin, Module, MoEFSDPSyncMixin
Step3p5 model for causal language modeling.
Forward pass returning :class:~transformers.modeling_outputs.CausalLMOutputWithPast.
Supports both BSHD format (input_ids shape [B, S]) and THD format
(input_ids shape [1, T]); when attn_kwargs["qkv_format"] == "thd",
inputs are squeezed to THD before the base-model forward and logits (and the
final hidden_states) are unsqueezed back to a leading-batch dimension on exit.
Parameters:
Input token IDs.
Optional position indices.
Optional 2D padding mask.
Optional padding mask used by the THD squeeze helper.
If 0 (default), compute logits for all positions; if > 0
(or a tensor), only compute logits for the last logits_to_keep positions
(avoids materialising the full logit matrix during generation / fused CE).
Whether to carry the final hidden states on the output.
Additional arguments forwarded to the base model.
Returns: CausalLMOutputWithPast
class:~transformers.modeling_outputs.CausalLMOutputWithPast with logits and,
Bases: Module
Step3p5 transformer model.
Keep Step router correction bias in fp32 after module-wide dtype casts.
Parse moe_layers_enum to get set of MoE layer indices.
Parameters:
Tuple/list of layer indices, integer, comma-separated string, or None. HF Step-3.5-Flash uses tuple format like (3, 4, 5, …, 44).
Total number of hidden layers.
Returns: set[int]
Set of layer indices that should be MoE layers.