`nemo_automodel.components.models.deepseek_v32.state_dict_adapter`#

State dict adapter for DeepSeek V3.2.

Extends DeepSeekV3StateDictAdapter with mappings for the new Indexer weights.

Module Contents#

Classes#

DeepSeekV32StateDictAdapter

State dict adapter for DeepSeek V3.2.

API#

class nemo_automodel.components.models.deepseek_v32.state_dict_adapter.DeepSeekV32StateDictAdapter( config: transformers.DeepseekV3Config, moe_config: nemo_automodel.components.moe.config.MoEConfig, backend: nemo_automodel.components.models.common.BackendConfig, dtype: torch.dtype = torch.float32, )#

Bases: nemo_automodel.components.models.deepseek_v3.state_dict_adapter.DeepSeekV3StateDictAdapter

State dict adapter for DeepSeek V3.2.

Initialization

_base_non_quantized_keys#: [‘input_layernorm.weight’, ‘post_attention_layernorm.weight’, ‘norm.weight’, ‘lm_head.weight’, ‘embe…

_indexer_non_quantized_keys#: [‘indexer.k_norm.weight’, ‘indexer.k_norm.bias’, ‘indexer.weights_proj.weight’]

property _non_quantized_keys: list[str]#: Get the full list of non-quantized keys including indexer keys.

_add_quantization_scale_inv_tensors( state_dict: dict[str, Any], ) → dict[str, Any]#: Add quantization scale tensors, handling indexer-specific keys.

convert_single_tensor_to_hf(

fqn: str,

tensor: Any,

**kwargs,

) → list[tuple[str, Any]]#

Convert a single tensor from native format to HuggingFace format.

Handles both standard V3 tensors and V3.2 indexer tensors, ensuring indexer LayerNorm weights are not quantized.

nemo_automodel.components.models.deepseek_v32.state_dict_adapter#

Module Contents#

Classes#

API#

`nemo_automodel.components.models.deepseek_v32.state_dict_adapter`#