nemo_automodel.components.models.glm_moe_dsa.model
nemo_automodel.components.models.glm_moe_dsa.model
Module Contents
Classes
Data
API
Bases: Module
Run the block and return (hidden_states, topk_indices).
topk_indices is this layer’s DSA selection — freshly computed on “full” layers,
or prev_topk_indices passed through on “shared” layers — so the caller can thread
it to subsequent shared layers (GLM IndexShare).
Bases: HFCheckpointingMixin, Module, MoEFSDPSyncMixin
True when this module is a trimmed pipeline-parallel stage (not the whole model).
Forward pass.
Single process (no pipeline parallelism): returns
:class:~transformers.modeling_outputs.CausalLMOutputWithPast, threading the IndexShare
top-k internally (seeded None).
Pipeline parallelism: input_ids is the upstream hidden state on non-first stages and
*carry holds the previous stage’s running top-k selection. Non-last stages return
(hidden_states, topk_indices) and the last stage returns the logits tensor.
Parameters:
Token IDs (BSHD [B, S] / THD [1, T]) on the first stage, or the
upstream hidden state on later pipeline stages.
Optional (topk_indices,) carried from the previous pipeline stage.
Optional masks / positions.
If 0, project all positions; else only the last logits_to_keep.
When set (single-process), carry final hidden states on the output.
Additional arguments forwarded to the base model.
Declare PP inter-stage I/O metas, threading the IndexShare top-k as a carry tensor.
Non-first stages additionally receive the previous “full” layer’s top-k selection, and non-last stages emit the running selection, so a stage that begins with a “shared” layer has the top-k it needs (correct at any sequence length).
Bases: Module
Run the decoder stack, returning (hidden_states, topk_indices).
prev_topk_indices seeds the IndexShare running selection (used under pipeline
parallelism, where an earlier “full” layer lives on the previous stage); it is None
in the single-process path. The returned topk_indices is the running selection at the
end of this stage’s layers, so it can be carried to the next pipeline stage.