nemo_automodel.components.models.deepseek_v3.state_dict_adapter#

Module Contents#

Classes#

Functions#

dequantize_from_fp8

Minimal FP8 dequantization: cast to dtype and divide by inverse scale. Broadcasts scale_inv over the last dimension of weight.

calculate_scale_shape

Compute expected shape for per-row inverse scales.

API#

class nemo_automodel.components.models.deepseek_v3.state_dict_adapter.DeepSeekV3StateDictAdapter(
config: transformers.DeepseekV3Config,
moe_config: nemo_automodel.components.moe.layers.MoEConfig,
backend: nemo_automodel.components.moe.utils.BackendConfig,
dtype: torch.dtype = torch.float32,
)#

Bases: nemo_automodel.components.moe.state_dict_mixin.MoEStateDictMixin, nemo_automodel.components.checkpoint.state_dict_adapter.StateDictAdapter

_dequantize(
state_dict: dict[str, Any],
) dict[str, Any]#
_add_quantization_scale_inv_tensors(
state_dict: dict[str, Any],
) dict[str, Any]#
to_hf(
state_dict: dict[str, Any],
exclude_key_regex: Optional[str] = None,
) dict[str, Any]#

Convert from native model state dict to HuggingFace format. Automatically detects format based on backend.enable_deepep configuration.

from_hf(
hf_state_dict: dict[str, Any],
device_mesh: Optional[torch.distributed.device_mesh.DeviceMesh] = None,
target_format: str = 'auto',
) dict[str, Any]#

Convert HF checkpoint to native format.

  • Dequantize FP8 tensors if scale_inv buffers are provided

  • Aggregate per-expert weights into grouped tensors

  • If device_mesh is provided, only load experts needed for the current rank

nemo_automodel.components.models.deepseek_v3.state_dict_adapter.dequantize_from_fp8(
weight: torch.Tensor,
scale_inv: torch.Tensor,
dtype: torch.dtype = torch.float32,
) torch.Tensor#

Minimal FP8 dequantization: cast to dtype and divide by inverse scale. Broadcasts scale_inv over the last dimension of weight.

nemo_automodel.components.models.deepseek_v3.state_dict_adapter.calculate_scale_shape(weight: torch.Tensor) tuple[int, ...]#

Compute expected shape for per-row inverse scales.

  • 2D [out, in] -> [out, 1]

  • 3D [N, out, in] -> [N, out, 1] Fallback: last dim collapsed to 1