nemo_automodel.components.models.deepseek_v32.model

View as Markdown

DeepSeek V3.2 Model.

Contains DeepseekV32Block, DeepseekV32Model, and DeepseekV32ForCausalLM. These classes subclass from DeepSeek V3, with the main difference being the use of DeepseekV32MLA (with Indexer) instead of the standard MLA.

Module Contents

Classes

NameDescription
DeepseekV32BlockTransformer block for DeepSeek V3.2.
DeepseekV32ForCausalLMDeepSeek V3.2 for Causal Language Modeling.
DeepseekV32ModelDeepSeek V3.2 Model.

Data

ModelClass

API

class nemo_automodel.components.models.deepseek_v32.model.DeepseekV32Block(
layer_idx: int,
config: nemo_automodel.components.models.deepseek_v32.config.DeepseekV32Config,
moe_config: nemo_automodel.components.moe.config.MoEConfig,
backend: nemo_automodel.components.models.common.BackendConfig
)

Bases: Block

Transformer block for DeepSeek V3.2.

Subclasses V3 Block, using DeepseekV32MLA (with Indexer) instead of the standard MLA.

input_layernorm
is_moe_layer
= layer_idx >= config.first_k_dense_replace
mlp
post_attention_layernorm
self_attn
= DeepseekV32MLA(config, backend)
class nemo_automodel.components.models.deepseek_v32.model.DeepseekV32ForCausalLM(
config: nemo_automodel.components.models.deepseek_v32.config.DeepseekV32Config,
moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None,
backend: nemo_automodel.components.models.common.BackendConfig | None = None,
kwargs = {}
)

Bases: DeepseekV3ForCausalLM

DeepSeek V3.2 for Causal Language Modeling.

Subclasses V3 ForCausalLM, using DeepseekV32Model and DeepSeekV32StateDictAdapter.

backend
= backend or BackendConfig()
lm_head
model
state_dict_adapter
nemo_automodel.components.models.deepseek_v32.model.DeepseekV32ForCausalLM.forward(
input_ids: torch.Tensor,
position_ids: torch.Tensor | None = None,
attention_mask: torch.Tensor | None = None,
padding_mask: torch.Tensor | None = None,
logits_to_keep: typing.Union[int, torch.Tensor] = 0,
output_hidden_states: typing.Optional[bool] = None,
attn_kwargs: typing.Any = {}
) -> transformers.modeling_outputs.CausalLMOutputWithPast

Forward pass returning :class:CausalLMOutputWithPast.

Supports both BSHD (input_ids shape [B, S] -> hidden states [B, S, H]) and THD (qkv_format == "thd"; hidden states [T, H] after the batch dim is squeezed, with logits unsqueezed back to [1, T, V] on exit).

Parameters:

input_ids
torch.Tensor

Input token IDs.

position_ids
torch.Tensor | NoneDefaults to None

Optional position indices.

attention_mask
torch.Tensor | NoneDefaults to None

Optional attention mask.

padding_mask
torch.Tensor | NoneDefaults to None

Optional padding mask.

logits_to_keep
Union[int, torch.Tensor]Defaults to 0

If 0 (default) project all positions; if > 0 (or a tensor of indices) only the last logits_to_keep positions are projected through lm_head (memory-efficient generation / fused-CE training).

output_hidden_states
Optional[bool]Defaults to None

When truthy, the returned output carries the final (pre-lm_head) hidden states spanning the full sequence.

**attn_kwargs
AnyDefaults to {}

Additional attention kwargs forwarded to the base model.

Returns: CausalLMOutputWithPast

class:~transformers.modeling_outputs.CausalLMOutputWithPast with

nemo_automodel.components.models.deepseek_v32.model.DeepseekV32ForCausalLM.from_config(
config: nemo_automodel.components.models.deepseek_v32.config.DeepseekV32Config,
moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None,
backend: nemo_automodel.components.models.common.BackendConfig | None = None,
kwargs = {}
)
classmethod
nemo_automodel.components.models.deepseek_v32.model.DeepseekV32ForCausalLM.from_pretrained(
pretrained_model_name_or_path: str,
model_args = (),
kwargs = {}
)
classmethod
nemo_automodel.components.models.deepseek_v32.model.DeepseekV32ForCausalLM.get_input_embeddings()
nemo_automodel.components.models.deepseek_v32.model.DeepseekV32ForCausalLM.get_output_embeddings()
nemo_automodel.components.models.deepseek_v32.model.DeepseekV32ForCausalLM.set_input_embeddings(
value
)
nemo_automodel.components.models.deepseek_v32.model.DeepseekV32ForCausalLM.set_output_embeddings(
new_embeddings
)
class nemo_automodel.components.models.deepseek_v32.model.DeepseekV32Model(
config: nemo_automodel.components.models.deepseek_v32.config.DeepseekV32Config,
backend: nemo_automodel.components.models.common.BackendConfig,
moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None,
moe_overrides: dict | None = None
)

Bases: DeepseekV3Model

DeepSeek V3.2 Model.

Subclasses V3 Model, using DeepseekV32Block instead of Block.

embed_tokens
layers
= torch.nn.ModuleDict()
max_seq_len
= config.max_position_embeddings
moe_config
= moe_config or MoEConfig(**moe_defaults)
norm
nemo_automodel.components.models.deepseek_v32.model.ModelClass = DeepseekV32ForCausalLM