nemo_automodel.components.models.deepseek_v3.state_dict_adapter#
Module Contents#
Classes#
Functions#
Check if a key should be quantized based on its name. |
|
Create a scale_inv tensor for a weight. |
|
Slice scale_inv tensor to match a DTensor weight’s local portion. |
|
Data#
API#
- nemo_automodel.components.models.deepseek_v3.state_dict_adapter.logger#
‘getLogger(…)’
- nemo_automodel.components.models.deepseek_v3.state_dict_adapter.BLOCK_SIZE#
128
- nemo_automodel.components.models.deepseek_v3.state_dict_adapter.NON_QUANTIZED_KEY_PATTERNS#
[‘input_layernorm.weight’, ‘post_attention_layernorm.weight’, ‘norm.weight’, ‘lm_head.weight’, ‘embe…
- nemo_automodel.components.models.deepseek_v3.state_dict_adapter.should_quantize_key(key: str) bool#
Check if a key should be quantized based on its name.
- nemo_automodel.components.models.deepseek_v3.state_dict_adapter.create_scale_inv_for_weight(
- weight: torch.Tensor,
- block_size: int = BLOCK_SIZE,
Create a scale_inv tensor for a weight.
Note: scale_inv is always created as a regular tensor (not DTensor) because the scale_inv shape (based on 128x128 blocks) doesn’t align with DTensor sharding boundaries. During dequantization, _slice_scale_for_dtensor handles extracting the correct scale blocks for DTensor weights.
- Parameters:
weight – The weight tensor (may be a DTensor)
block_size – The FP8 quantization block size
- Returns:
scale_inv tensor with shape based on GLOBAL weight shape
- class nemo_automodel.components.models.deepseek_v3.state_dict_adapter.DeepSeekV3StateDictAdapter(
- config: transformers.DeepseekV3Config,
- moe_config: nemo_automodel.components.moe.config.MoEConfig,
- backend: nemo_automodel.components.models.common.BackendConfig,
- dtype: torch.dtype = torch.float32,
Bases:
nemo_automodel.components.moe.state_dict_mixin.MoESplitExpertsStateDictMixin,nemo_automodel.components.checkpoint.state_dict_adapter.StateDictAdapter- _dequantize(
- state_dict: dict[str, Any],
- to_hf(
- state_dict: dict[str, Any],
- exclude_key_regex: Optional[str] = None,
- quantization: bool = False,
- **kwargs,
Convert from native model state dict to HuggingFace format. Automatically detects format based on backend.dispatcher configuration.
- from_hf(
- hf_state_dict: dict[str, Any],
- device_mesh: Optional[torch.distributed.device_mesh.DeviceMesh] = None,
- **kwargs,
Convert HF checkpoint to native format.
Dequantize FP8 tensors if scale_inv buffers are provided
Aggregate per-expert weights into grouped tensors
If device_mesh is provided, only load experts needed for the current rank
- convert_single_tensor_to_hf(
- fqn: str,
- tensor: Any,
- **kwargs,
Convert a single tensor from native format to HuggingFace format.
- Parameters:
fqn – Fully qualified name of the tensor in native format
tensor – The tensor to convert
**kwargs – Additional arguments for conversion
- Returns:
List of (fqn, tensor) tuples in HuggingFace format
- nemo_automodel.components.models.deepseek_v3.state_dict_adapter._slice_scale_for_dtensor(
- scale_inv: torch.Tensor,
- weight_dtensor: torch.Tensor,
- weight_local: torch.Tensor,
- block_size: int = BLOCK_SIZE,
Slice scale_inv tensor to match a DTensor weight’s local portion.
When weight is sharded via DTensor but scale_inv is a regular tensor, we need to extract only the scale blocks that correspond to the local portion of the weight.
- Parameters:
scale_inv – The full (global) scale_inv tensor
weight_dtensor – The DTensor weight (has device_mesh and placements)
weight_local – The local portion of the weight
block_size – The FP8 quantization block size (default 128)
- Returns:
The sliced scale_inv tensor matching the local weight’s blocks
- nemo_automodel.components.models.deepseek_v3.state_dict_adapter.calculate_scale_shape(
- weight: torch.Tensor,
- BLOCK_SIZE: int = BLOCK_SIZE,
- nemo_automodel.components.models.deepseek_v3.state_dict_adapter._dequantize_with_torch(
- weight: torch.Tensor,
- scale_inv: torch.Tensor,
- dtype: torch.dtype,
- block_size: int,
- nemo_automodel.components.models.deepseek_v3.state_dict_adapter._dequantize_with_triton(
- weight: torch.Tensor,
- scale_inv: torch.Tensor,
- dtype: torch.dtype,
- block_size: int,
- nemo_automodel.components.models.deepseek_v3.state_dict_adapter.dequantize_from_fp8(
- weight: torch.Tensor,
- scale_inv: torch.Tensor,
- dtype=torch.bfloat16,
- BLOCK_SIZE: int = BLOCK_SIZE,
- name: str = '',