nemo_automodel.components.models.deepseek_v32.model
nemo_automodel.components.models.deepseek_v32.model
DeepSeek V3.2 Model.
Contains DeepseekV32Block, DeepseekV32Model, and DeepseekV32ForCausalLM. These classes subclass from DeepSeek V3, with the main difference being the use of DeepseekV32MLA (with Indexer) instead of the standard MLA.
Module Contents
Classes
Data
API
Bases: Block
Transformer block for DeepSeek V3.2.
Subclasses V3 Block, using DeepseekV32MLA (with Indexer) instead of the standard MLA.
Bases: DeepseekV3ForCausalLM
DeepSeek V3.2 for Causal Language Modeling.
Subclasses V3 ForCausalLM, using DeepseekV32Model and DeepSeekV32StateDictAdapter.
Forward pass returning :class:CausalLMOutputWithPast.
Supports both BSHD (input_ids shape [B, S] -> hidden states
[B, S, H]) and THD (qkv_format == "thd"; hidden states [T, H]
after the batch dim is squeezed, with logits unsqueezed back to
[1, T, V] on exit).
Parameters:
Input token IDs.
Optional position indices.
Optional attention mask.
Optional padding mask.
If 0 (default) project all positions; if > 0
(or a tensor of indices) only the last logits_to_keep positions
are projected through lm_head (memory-efficient generation /
fused-CE training).
When truthy, the returned output carries the
final (pre-lm_head) hidden states spanning the full sequence.
Additional attention kwargs forwarded to the base model.
Returns: CausalLMOutputWithPast
class:~transformers.modeling_outputs.CausalLMOutputWithPast with
Bases: DeepseekV3Model
DeepSeek V3.2 Model.
Subclasses V3 Model, using DeepseekV32Block instead of Block.