nemo_automodel.components.models.deepseek_v3.state_dict_adapter#

Module Contents#

Classes#

Functions#

should_quantize_key

Check if a key should be quantized based on its name.

create_scale_inv_for_weight

Create a scale_inv tensor for a weight.

_slice_scale_for_dtensor

Slice scale_inv tensor to match a DTensor weight’s local portion.

calculate_scale_shape

_dequantize_with_torch

_dequantize_with_triton

dequantize_from_fp8

Data#

API#

nemo_automodel.components.models.deepseek_v3.state_dict_adapter.logger#

‘getLogger(…)’

nemo_automodel.components.models.deepseek_v3.state_dict_adapter.BLOCK_SIZE#

128

nemo_automodel.components.models.deepseek_v3.state_dict_adapter.NON_QUANTIZED_KEY_PATTERNS#

[‘input_layernorm.weight’, ‘post_attention_layernorm.weight’, ‘norm.weight’, ‘lm_head.weight’, ‘embe…

nemo_automodel.components.models.deepseek_v3.state_dict_adapter.should_quantize_key(key: str) bool#

Check if a key should be quantized based on its name.

nemo_automodel.components.models.deepseek_v3.state_dict_adapter.create_scale_inv_for_weight(
weight: torch.Tensor,
block_size: int = BLOCK_SIZE,
) torch.Tensor#

Create a scale_inv tensor for a weight.

Note: scale_inv is always created as a regular tensor (not DTensor) because the scale_inv shape (based on 128x128 blocks) doesn’t align with DTensor sharding boundaries. During dequantization, _slice_scale_for_dtensor handles extracting the correct scale blocks for DTensor weights.

Parameters:
  • weight – The weight tensor (may be a DTensor)

  • block_size – The FP8 quantization block size

Returns:

scale_inv tensor with shape based on GLOBAL weight shape

class nemo_automodel.components.models.deepseek_v3.state_dict_adapter.DeepSeekV3StateDictAdapter(
config: transformers.DeepseekV3Config,
moe_config: nemo_automodel.components.moe.config.MoEConfig,
backend: nemo_automodel.components.models.common.BackendConfig,
dtype: torch.dtype = torch.float32,
)#

Bases: nemo_automodel.components.moe.state_dict_mixin.MoESplitExpertsStateDictMixin, nemo_automodel.components.checkpoint.state_dict_adapter.StateDictAdapter

_dequantize(
state_dict: dict[str, Any],
) dict[str, Any]#
to_hf(
state_dict: dict[str, Any],
exclude_key_regex: Optional[str] = None,
quantization: bool = False,
**kwargs,
) dict[str, Any]#

Convert from native model state dict to HuggingFace format. Automatically detects format based on backend.dispatcher configuration.

from_hf(
hf_state_dict: dict[str, Any],
device_mesh: Optional[torch.distributed.device_mesh.DeviceMesh] = None,
**kwargs,
) dict[str, Any]#

Convert HF checkpoint to native format.

  • Dequantize FP8 tensors if scale_inv buffers are provided

  • Aggregate per-expert weights into grouped tensors

  • If device_mesh is provided, only load experts needed for the current rank

convert_single_tensor_to_hf(
fqn: str,
tensor: Any,
**kwargs,
) list[tuple[str, Any]]#

Convert a single tensor from native format to HuggingFace format.

Parameters:
  • fqn – Fully qualified name of the tensor in native format

  • tensor – The tensor to convert

  • **kwargs – Additional arguments for conversion

Returns:

List of (fqn, tensor) tuples in HuggingFace format

nemo_automodel.components.models.deepseek_v3.state_dict_adapter._slice_scale_for_dtensor(
scale_inv: torch.Tensor,
weight_dtensor: torch.Tensor,
weight_local: torch.Tensor,
block_size: int = BLOCK_SIZE,
) torch.Tensor#

Slice scale_inv tensor to match a DTensor weight’s local portion.

When weight is sharded via DTensor but scale_inv is a regular tensor, we need to extract only the scale blocks that correspond to the local portion of the weight.

Parameters:
  • scale_inv – The full (global) scale_inv tensor

  • weight_dtensor – The DTensor weight (has device_mesh and placements)

  • weight_local – The local portion of the weight

  • block_size – The FP8 quantization block size (default 128)

Returns:

The sliced scale_inv tensor matching the local weight’s blocks

nemo_automodel.components.models.deepseek_v3.state_dict_adapter.calculate_scale_shape(
weight: torch.Tensor,
BLOCK_SIZE: int = BLOCK_SIZE,
) torch.Size#
nemo_automodel.components.models.deepseek_v3.state_dict_adapter._dequantize_with_torch(
weight: torch.Tensor,
scale_inv: torch.Tensor,
dtype: torch.dtype,
block_size: int,
) torch.Tensor#
nemo_automodel.components.models.deepseek_v3.state_dict_adapter._dequantize_with_triton(
weight: torch.Tensor,
scale_inv: torch.Tensor,
dtype: torch.dtype,
block_size: int,
) torch.Tensor#
nemo_automodel.components.models.deepseek_v3.state_dict_adapter.dequantize_from_fp8(
weight: torch.Tensor,
scale_inv: torch.Tensor,
dtype=torch.bfloat16,
BLOCK_SIZE: int = BLOCK_SIZE,
name: str = '',
) torch.Tensor#