nemo_automodel.components.models.deepseek_v32.state_dict_adapter#
State dict adapter for DeepSeek V3.2.
Extends DeepSeekV3StateDictAdapter with mappings for the new Indexer weights.
Module Contents#
Classes#
State dict adapter for DeepSeek V3.2. |
API#
- class nemo_automodel.components.models.deepseek_v32.state_dict_adapter.DeepSeekV32StateDictAdapter(
- config: transformers.DeepseekV3Config,
- moe_config: nemo_automodel.components.moe.config.MoEConfig,
- backend: nemo_automodel.components.models.common.BackendConfig,
- dtype: torch.dtype = torch.float32,
Bases:
nemo_automodel.components.models.deepseek_v3.state_dict_adapter.DeepSeekV3StateDictAdapterState dict adapter for DeepSeek V3.2.
Initialization
- _base_non_quantized_keys#
[‘input_layernorm.weight’, ‘post_attention_layernorm.weight’, ‘norm.weight’, ‘lm_head.weight’, ‘embe…
- _indexer_non_quantized_keys#
[‘indexer.k_norm.weight’, ‘indexer.k_norm.bias’, ‘indexer.weights_proj.weight’]
- property _non_quantized_keys: list[str]#
Get the full list of non-quantized keys including indexer keys.
- _add_quantization_scale_inv_tensors(
- state_dict: dict[str, Any],
Add quantization scale tensors, handling indexer-specific keys.
- convert_single_tensor_to_hf(
- fqn: str,
- tensor: Any,
- **kwargs,
Convert a single tensor from native format to HuggingFace format.
Handles both standard V3 tensors and V3.2 indexer tensors, ensuring indexer LayerNorm weights are not quantized.