nemo_automodel.components.models.deepseek_v3.state_dict_adapter
#
Module Contents#
Classes#
Functions#
Data#
API#
- nemo_automodel.components.models.deepseek_v3.state_dict_adapter.logger#
‘getLogger(…)’
- nemo_automodel.components.models.deepseek_v3.state_dict_adapter.BLOCK_SIZE#
128
- class nemo_automodel.components.models.deepseek_v3.state_dict_adapter.DeepSeekV3StateDictAdapter(
- config: transformers.DeepseekV3Config,
- moe_config: nemo_automodel.components.moe.layers.MoEConfig,
- backend: nemo_automodel.components.moe.utils.BackendConfig,
- dtype: torch.dtype = torch.float32,
Bases:
nemo_automodel.components.moe.state_dict_mixin.MoESplitExpertsStateDictMixin
,nemo_automodel.components.checkpoint.state_dict_adapter.StateDictAdapter
- _dequantize(
- state_dict: dict[str, Any],
- _add_quantization_scale_inv_tensors(
- state_dict: dict[str, Any],
- to_hf(
- state_dict: dict[str, Any],
- exclude_key_regex: Optional[str] = None,
- quantization: bool = False,
- **kwargs,
Convert from native model state dict to HuggingFace format. Automatically detects format based on backend.enable_deepep configuration.
- from_hf(
- hf_state_dict: dict[str, Any],
- device_mesh: Optional[torch.distributed.device_mesh.DeviceMesh] = None,
- **kwargs,
Convert HF checkpoint to native format.
Dequantize FP8 tensors if scale_inv buffers are provided
Aggregate per-expert weights into grouped tensors
If device_mesh is provided, only load experts needed for the current rank
- nemo_automodel.components.models.deepseek_v3.state_dict_adapter.calculate_scale_shape(
- weight: torch.Tensor,
- BLOCK_SIZE: int = BLOCK_SIZE,
- nemo_automodel.components.models.deepseek_v3.state_dict_adapter.dequantize_from_fp8(
- weight: torch.Tensor,
- scale_inv: torch.Tensor,
- dtype=torch.bfloat16,
- BLOCK_SIZE: int = BLOCK_SIZE,