`nemo_automodel.components.models.deepseek_v3.state_dict_adapter`#

Module Contents#

Classes#

DeepSeekV3StateDictAdapter

Functions#

`dequantize_from_fp8`	Minimal FP8 dequantization: cast to dtype and divide by inverse scale. Broadcasts scale_inv over the last dimension of weight.
`calculate_scale_shape`	Compute expected shape for per-row inverse scales.

API#

class nemo_automodel.components.models.deepseek_v3.state_dict_adapter.DeepSeekV3StateDictAdapter( config: transformers.DeepseekV3Config, moe_config: nemo_automodel.components.moe.layers.MoEConfig, backend: nemo_automodel.components.moe.utils.BackendConfig, dtype: torch.dtype = torch.float32, )#

Bases: nemo_automodel.components.moe.state_dict_mixin.MoEStateDictMixin, nemo_automodel.components.checkpoint.state_dict_adapter.StateDictAdapter

_dequantize( state_dict: dict[str, Any], ) → dict[str, Any]#

_add_quantization_scale_inv_tensors( state_dict: dict[str, Any], ) → dict[str, Any]#

to_hf( state_dict: dict[str, Any], exclude_key_regex: Optional[str] = None, ) → dict[str, Any]#: Convert from native model state dict to HuggingFace format. Automatically detects format based on backend.enable_deepep configuration.

from_hf( hf_state_dict: dict[str, Any], device_mesh: Optional[torch.distributed.device_mesh.DeviceMesh] = None, target_format: str = 'auto', ) → dict[str, Any]#

Convert HF checkpoint to native format.

Dequantize FP8 tensors if scale_inv buffers are provided
Aggregate per-expert weights into grouped tensors
If device_mesh is provided, only load experts needed for the current rank

nemo_automodel.components.models.deepseek_v3.state_dict_adapter.dequantize_from_fp8( weight: torch.Tensor, scale_inv: torch.Tensor, dtype: torch.dtype = torch.float32, ) → torch.Tensor#: Minimal FP8 dequantization: cast to dtype and divide by inverse scale. Broadcasts scale_inv over the last dimension of weight.

nemo_automodel.components.models.deepseek_v3.state_dict_adapter.calculate_scale_shape(weight: torch.Tensor) → tuple[int, ...]#

Compute expected shape for per-row inverse scales.

2D [out, in] -> [out, 1]
3D [N, out, in] -> [N, out, 1] Fallback: last dim collapsed to 1

nemo_automodel.components.models.deepseek_v3.state_dict_adapter#

Module Contents#

Classes#

Functions#

API#

`nemo_automodel.components.models.deepseek_v3.state_dict_adapter`#