> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.models.deepseek_v3.state_dict_adapter

## Module Contents

### Classes

| Name                                                                                                                        | Description |
| --------------------------------------------------------------------------------------------------------------------------- | ----------- |
| [`DeepSeekV3StateDictAdapter`](#nemo_automodel-components-models-deepseek_v3-state_dict_adapter-DeepSeekV3StateDictAdapter) | -           |

### Functions

| Name                                                                                                                          | Description                                                        |
| ----------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------ |
| [`_dequantize_with_torch`](#nemo_automodel-components-models-deepseek_v3-state_dict_adapter-_dequantize_with_torch)           | -                                                                  |
| [`_dequantize_with_triton`](#nemo_automodel-components-models-deepseek_v3-state_dict_adapter-_dequantize_with_triton)         | -                                                                  |
| [`_slice_scale_for_dtensor`](#nemo_automodel-components-models-deepseek_v3-state_dict_adapter-_slice_scale_for_dtensor)       | Slice scale\_inv tensor to match a DTensor weight's local portion. |
| [`_weight_dequant_kernel`](#nemo_automodel-components-models-deepseek_v3-state_dict_adapter-_weight_dequant_kernel)           | -                                                                  |
| [`calculate_scale_shape`](#nemo_automodel-components-models-deepseek_v3-state_dict_adapter-calculate_scale_shape)             | -                                                                  |
| [`create_scale_inv_for_weight`](#nemo_automodel-components-models-deepseek_v3-state_dict_adapter-create_scale_inv_for_weight) | Create a scale\_inv tensor for a weight.                           |
| [`dequantize_from_fp8`](#nemo_automodel-components-models-deepseek_v3-state_dict_adapter-dequantize_from_fp8)                 | -                                                                  |
| [`should_quantize_key`](#nemo_automodel-components-models-deepseek_v3-state_dict_adapter-should_quantize_key)                 | Check if a key should be quantized based on its name.              |

### Data

[`BLOCK_SIZE`](#nemo_automodel-components-models-deepseek_v3-state_dict_adapter-BLOCK_SIZE)

[`NON_QUANTIZED_KEY_PATTERNS`](#nemo_automodel-components-models-deepseek_v3-state_dict_adapter-NON_QUANTIZED_KEY_PATTERNS)

[`_TRITON_AVAILABLE`](#nemo_automodel-components-models-deepseek_v3-state_dict_adapter-_TRITON_AVAILABLE)

[`logger`](#nemo_automodel-components-models-deepseek_v3-state_dict_adapter-logger)

### API

```python
class nemo_automodel.components.models.deepseek_v3.state_dict_adapter.DeepSeekV3StateDictAdapter(
    config: transformers.DeepseekV3Config,
    moe_config: nemo_automodel.components.moe.config.MoEConfig,
    backend: nemo_automodel.components.models.common.BackendConfig,
    dtype: torch.dtype = torch.float32
)
```

**Bases:** [MoESplitExpertsStateDictMixin](/nemo-automodel/nemo_automodel/components/moe/state_dict_mixin#nemo_automodel-components-moe-state_dict_mixin-MoESplitExpertsStateDictMixin), [StateDictAdapter](/nemo-automodel/nemo_automodel/components/checkpoint/state_dict_adapter#nemo_automodel-components-checkpoint-state_dict_adapter-StateDictAdapter)

```python
nemo_automodel.components.models.deepseek_v3.state_dict_adapter.DeepSeekV3StateDictAdapter._dequantize(
    state_dict: dict[str, typing.Any]
) -> dict[str, typing.Any]
```

```python
nemo_automodel.components.models.deepseek_v3.state_dict_adapter.DeepSeekV3StateDictAdapter.convert_single_tensor_to_hf(
    fqn: str,
    tensor: typing.Any,
    kwargs = {}
) -> list[tuple[str, typing.Any]]
```

Convert a single tensor from native format to HuggingFace format.

**Parameters:**

Fully qualified name of the tensor in native format

The tensor to convert

Additional arguments for conversion

**Returns:** `list[tuple[str, Any]]`

List of (fqn, tensor) tuples in HuggingFace format

```python
nemo_automodel.components.models.deepseek_v3.state_dict_adapter.DeepSeekV3StateDictAdapter.from_hf(
    hf_state_dict: dict[str, typing.Any],
    device_mesh: typing.Optional[torch.distributed.device_mesh.DeviceMesh] = None,
    kwargs = {}
) -> dict[str, typing.Any]
```

Convert HF checkpoint to native format.

* Dequantize FP8 tensors if scale\_inv buffers are provided
* Aggregate per-expert weights into grouped tensors
* If device\_mesh is provided, only load experts needed for the current rank

```python
nemo_automodel.components.models.deepseek_v3.state_dict_adapter.DeepSeekV3StateDictAdapter.to_hf(
    state_dict: dict[str, typing.Any],
    exclude_key_regex: typing.Optional[str] = None,
    quantization: bool = False,
    kwargs = {}
) -> dict[str, typing.Any]
```

Convert from native model state dict to HuggingFace format.
Automatically detects format based on backend.dispatcher configuration.

```python
nemo_automodel.components.models.deepseek_v3.state_dict_adapter._dequantize_with_torch(
    weight: torch.Tensor,
    scale_inv: torch.Tensor,
    dtype: torch.dtype,
    block_size: int
) -> torch.Tensor
```

```python
nemo_automodel.components.models.deepseek_v3.state_dict_adapter._dequantize_with_triton(
    weight: torch.Tensor,
    scale_inv: torch.Tensor,
    dtype: torch.dtype,
    block_size: int
) -> torch.Tensor
```

```python
nemo_automodel.components.models.deepseek_v3.state_dict_adapter._slice_scale_for_dtensor(
    scale_inv: torch.Tensor,
    weight_dtensor: torch.Tensor,
    weight_local: torch.Tensor,
    block_size: int = BLOCK_SIZE
) -> torch.Tensor
```

Slice scale\_inv tensor to match a DTensor weight's local portion.

When weight is sharded via DTensor but scale\_inv is a regular tensor,
we need to extract only the scale blocks that correspond to the local
portion of the weight.

**Parameters:**

The full (global) scale\_inv tensor

The DTensor weight (has device\_mesh and placements)

The local portion of the weight

The FP8 quantization block size (default 128)

**Returns:** `torch.Tensor`

The sliced scale\_inv tensor matching the local weight's blocks

```python
nemo_automodel.components.models.deepseek_v3.state_dict_adapter._weight_dequant_kernel(
    x_ptr,
    s_ptr,
    y_ptr,
    M,
    N,
    stride_xm,
    stride_xn,
    stride_ym,
    stride_yn,
    stride_sm,
    stride_sn,
    BLOCK_SIZE: triton.language.constexpr
)
```

```python
nemo_automodel.components.models.deepseek_v3.state_dict_adapter.calculate_scale_shape(
    weight: torch.Tensor,
    BLOCK_SIZE: int = BLOCK_SIZE
) -> torch.Size
```

```python
nemo_automodel.components.models.deepseek_v3.state_dict_adapter.create_scale_inv_for_weight(
    weight: torch.Tensor,
    block_size: int = BLOCK_SIZE
) -> torch.Tensor
```

Create a scale\_inv tensor for a weight.

Note: scale\_inv is always created as a regular tensor (not DTensor) because
the scale\_inv shape (based on 128x128 blocks) doesn't align with DTensor
sharding boundaries. During dequantization, \_slice\_scale\_for\_dtensor handles
extracting the correct scale blocks for DTensor weights.

**Parameters:**

The weight tensor (may be a DTensor)

The FP8 quantization block size

**Returns:** `torch.Tensor`

scale\_inv tensor with shape based on GLOBAL weight shape

```python
nemo_automodel.components.models.deepseek_v3.state_dict_adapter.dequantize_from_fp8(
    weight: torch.Tensor,
    scale_inv: torch.Tensor,
    dtype = torch.bfloat16,
    BLOCK_SIZE: int = BLOCK_SIZE,
    name: str = ''
) -> torch.Tensor
```

```python
nemo_automodel.components.models.deepseek_v3.state_dict_adapter.should_quantize_key(
    key: str
) -> bool
```

Check if a key should be quantized based on its name.

```python
nemo_automodel.components.models.deepseek_v3.state_dict_adapter.BLOCK_SIZE = 128
```

```python
nemo_automodel.components.models.deepseek_v3.state_dict_adapter.NON_QUANTIZED_KEY_PATTERNS = ['input_layernorm.weight', 'post_attention_layernorm.weight', 'norm.weight', 'lm...
```

```python
nemo_automodel.components.models.deepseek_v3.state_dict_adapter._TRITON_AVAILABLE = True
```

```python
nemo_automodel.components.models.deepseek_v3.state_dict_adapter.logger = logging.getLogger(__name__)
```