nemo_automodel.components.models.deepseek_v3.state_dict_adapter

View as Markdown

Module Contents

Classes

Functions

NameDescription
_dequantize_with_torch-
_dequantize_with_triton-
_slice_scale_for_dtensorSlice scale_inv tensor to match a DTensor weight’s local portion.
_weight_dequant_kernel-
calculate_scale_shape-
create_scale_inv_for_weightCreate a scale_inv tensor for a weight.
dequantize_from_fp8-
should_quantize_keyCheck if a key should be quantized based on its name.

Data

BLOCK_SIZE

NON_QUANTIZED_KEY_PATTERNS

_TRITON_AVAILABLE

logger

API

class nemo_automodel.components.models.deepseek_v3.state_dict_adapter.DeepSeekV3StateDictAdapter(
config: transformers.DeepseekV3Config,
moe_config: nemo_automodel.components.moe.config.MoEConfig,
backend: nemo_automodel.components.models.common.BackendConfig,
dtype: torch.dtype = torch.float32
)

Bases: MoESplitExpertsStateDictMixin, StateDictAdapter

from_hf_map
nemo_automodel.components.models.deepseek_v3.state_dict_adapter.DeepSeekV3StateDictAdapter._dequantize(
state_dict: dict[str, typing.Any]
) -> dict[str, typing.Any]
nemo_automodel.components.models.deepseek_v3.state_dict_adapter.DeepSeekV3StateDictAdapter.convert_single_tensor_to_hf(
fqn: str,
tensor: typing.Any,
kwargs = {}
) -> list[tuple[str, typing.Any]]

Convert a single tensor from native format to HuggingFace format.

Parameters:

fqn
str

Fully qualified name of the tensor in native format

tensor
Any

The tensor to convert

**kwargs
Defaults to {}

Additional arguments for conversion

Returns: list[tuple[str, Any]]

List of (fqn, tensor) tuples in HuggingFace format

nemo_automodel.components.models.deepseek_v3.state_dict_adapter.DeepSeekV3StateDictAdapter.from_hf(
hf_state_dict: dict[str, typing.Any],
device_mesh: typing.Optional[torch.distributed.device_mesh.DeviceMesh] = None,
kwargs = {}
) -> dict[str, typing.Any]

Convert HF checkpoint to native format.

  • Dequantize FP8 tensors if scale_inv buffers are provided
  • Aggregate per-expert weights into grouped tensors
  • If device_mesh is provided, only load experts needed for the current rank
nemo_automodel.components.models.deepseek_v3.state_dict_adapter.DeepSeekV3StateDictAdapter.to_hf(
state_dict: dict[str, typing.Any],
exclude_key_regex: typing.Optional[str] = None,
quantization: bool = False,
kwargs = {}
) -> dict[str, typing.Any]

Convert from native model state dict to HuggingFace format. Automatically detects format based on backend.dispatcher configuration.

nemo_automodel.components.models.deepseek_v3.state_dict_adapter._dequantize_with_torch(
weight: torch.Tensor,
scale_inv: torch.Tensor,
dtype: torch.dtype,
block_size: int
) -> torch.Tensor
nemo_automodel.components.models.deepseek_v3.state_dict_adapter._dequantize_with_triton(
weight: torch.Tensor,
scale_inv: torch.Tensor,
dtype: torch.dtype,
block_size: int
) -> torch.Tensor
nemo_automodel.components.models.deepseek_v3.state_dict_adapter._slice_scale_for_dtensor(
scale_inv: torch.Tensor,
weight_dtensor: torch.Tensor,
weight_local: torch.Tensor,
block_size: int = BLOCK_SIZE
) -> torch.Tensor

Slice scale_inv tensor to match a DTensor weight’s local portion.

When weight is sharded via DTensor but scale_inv is a regular tensor, we need to extract only the scale blocks that correspond to the local portion of the weight.

Parameters:

scale_inv
torch.Tensor

The full (global) scale_inv tensor

weight_dtensor
torch.Tensor

The DTensor weight (has device_mesh and placements)

weight_local
torch.Tensor

The local portion of the weight

block_size
intDefaults to BLOCK_SIZE

The FP8 quantization block size (default 128)

Returns: torch.Tensor

The sliced scale_inv tensor matching the local weight’s blocks

nemo_automodel.components.models.deepseek_v3.state_dict_adapter._weight_dequant_kernel(
x_ptr,
s_ptr,
y_ptr,
M,
N,
stride_xm,
stride_xn,
stride_ym,
stride_yn,
stride_sm,
stride_sn,
BLOCK_SIZE: triton.language.constexpr
)
nemo_automodel.components.models.deepseek_v3.state_dict_adapter.calculate_scale_shape(
weight: torch.Tensor,
BLOCK_SIZE: int = BLOCK_SIZE
) -> torch.Size
nemo_automodel.components.models.deepseek_v3.state_dict_adapter.create_scale_inv_for_weight(
weight: torch.Tensor,
block_size: int = BLOCK_SIZE
) -> torch.Tensor

Create a scale_inv tensor for a weight.

Note: scale_inv is always created as a regular tensor (not DTensor) because the scale_inv shape (based on 128x128 blocks) doesn’t align with DTensor sharding boundaries. During dequantization, _slice_scale_for_dtensor handles extracting the correct scale blocks for DTensor weights.

Parameters:

weight
torch.Tensor

The weight tensor (may be a DTensor)

block_size
intDefaults to BLOCK_SIZE

The FP8 quantization block size

Returns: torch.Tensor

scale_inv tensor with shape based on GLOBAL weight shape

nemo_automodel.components.models.deepseek_v3.state_dict_adapter.dequantize_from_fp8(
weight: torch.Tensor,
scale_inv: torch.Tensor,
dtype = torch.bfloat16,
BLOCK_SIZE: int = BLOCK_SIZE,
name: str = ''
) -> torch.Tensor
nemo_automodel.components.models.deepseek_v3.state_dict_adapter.should_quantize_key(
key: str
) -> bool

Check if a key should be quantized based on its name.

nemo_automodel.components.models.deepseek_v3.state_dict_adapter.BLOCK_SIZE = 128
nemo_automodel.components.models.deepseek_v3.state_dict_adapter.NON_QUANTIZED_KEY_PATTERNS = ['input_layernorm.weight', 'post_attention_layernorm.weight', 'norm.weight', 'lm...
nemo_automodel.components.models.deepseek_v3.state_dict_adapter._TRITON_AVAILABLE = True
nemo_automodel.components.models.deepseek_v3.state_dict_adapter.logger = logging.getLogger(__name__)