nemo_automodel.components.models.deepseek_v3.state_dict_adapter#
Module Contents#
Classes#
Functions#
Minimal FP8 dequantization: cast to dtype and divide by inverse scale. Broadcasts scale_inv over the last dimension of weight. |
|
Compute expected shape for per-row inverse scales. |
API#
- class nemo_automodel.components.models.deepseek_v3.state_dict_adapter.DeepSeekV3StateDictAdapter(
- config: transformers.DeepseekV3Config,
- moe_config: nemo_automodel.components.moe.layers.MoEConfig,
- backend: nemo_automodel.components.moe.utils.BackendConfig,
- dtype: torch.dtype = torch.float32,
Bases:
nemo_automodel.components.moe.state_dict_mixin.MoEStateDictMixin,nemo_automodel.components.checkpoint.state_dict_adapter.StateDictAdapter- _dequantize(
- state_dict: dict[str, Any],
- _add_quantization_scale_inv_tensors(
- state_dict: dict[str, Any],
- to_hf(
- state_dict: dict[str, Any],
- exclude_key_regex: Optional[str] = None,
Convert from native model state dict to HuggingFace format. Automatically detects format based on backend.enable_deepep configuration.
- from_hf(
- hf_state_dict: dict[str, Any],
- device_mesh: Optional[torch.distributed.device_mesh.DeviceMesh] = None,
- target_format: str = 'auto',
Convert HF checkpoint to native format.
Dequantize FP8 tensors if scale_inv buffers are provided
Aggregate per-expert weights into grouped tensors
If device_mesh is provided, only load experts needed for the current rank
- nemo_automodel.components.models.deepseek_v3.state_dict_adapter.dequantize_from_fp8(
- weight: torch.Tensor,
- scale_inv: torch.Tensor,
- dtype: torch.dtype = torch.float32,
Minimal FP8 dequantization: cast to dtype and divide by inverse scale. Broadcasts scale_inv over the last dimension of weight.
- nemo_automodel.components.models.deepseek_v3.state_dict_adapter.calculate_scale_shape(weight: torch.Tensor) tuple[int, ...]#
Compute expected shape for per-row inverse scales.
2D [out, in] -> [out, 1]
3D [N, out, in] -> [N, out, 1] Fallback: last dim collapsed to 1