nemo_automodel.components.models.deepseek_v3.model
nemo_automodel.components.models.deepseek_v3.model
Module Contents
Classes
Data
API
Bases: Module
Forward pass for the Transformer block.
Parameters:
Input tensor.
Precomputed complex exponential values for rotary embeddings.
Boolean tensor indicating padding positions.
Returns: torch.Tensor
torch.Tensor: Output tensor after block computation.
Bases: HFCheckpointingMixin, Module, MoEFSDPSyncMixin
Forward pass returning :class:~transformers.modeling_outputs.CausalLMOutputWithPast.
Parameters:
Input token IDs. BSHD: [B, S]; THD: [1, T] (squeezed internally).
Optional position indices.
Optional attention mask.
Optional padding mask.
If 0 (default), compute logits for all positions; otherwise
only compute logits for the last logits_to_keep positions (avoids
materialising the full logit matrix during generation / fused CE training).
Whether to carry the final hidden states on the output.
Additional arguments forwarded to the base model (e.g. qkv_format, cu_seqlens, CP kwargs).
Returns: CausalLMOutputWithPast
class:~transformers.modeling_outputs.CausalLMOutputWithPast with logits and,
Bases: Module