nemo_automodel.components.models.gpt_oss.model
nemo_automodel.components.models.gpt_oss.model
Module Contents
Classes
Data
API
Bases: Module
Bases: HFCheckpointingMixin, Module, MoEFSDPSyncMixin
Forward pass returning :class:~transformers.modeling_outputs.CausalLMOutputWithPast.
Parameters:
Token IDs. BSHD: [B, S]; THD: [1, T] (squeezed internally).
Optional position indices.
Optional attention mask.
Optional padding mask.
If 0 (default) compute logits for all positions; otherwise
compute logits only for the last logits_to_keep positions (used by
memory-efficient fused cross-entropy / generation).
When truthy, the returned output carries the final hidden
states (the input to lm_head) so the recipe can run fused cross-entropy.
Additional attention kwargs forwarded to the base model
(e.g. qkv_format, cu_seqlens, seq_idx, CP kwargs).
Returns: CausalLMOutputWithPast
CausalLMOutputWithPast with logits and optional hidden_states.
Bases: Module