nemo_automodel.components.models.qwen3_5.state_dict_adapter

View as Markdown

State-dict adapter for Qwen3.5 dense (non-MoE) models.

Qwen3.5 dense keeps its GatedDeltaNet SSM-gating parameters (A_log / dt_bias) in a fp32 _fp32_params holder. The model’s state dict therefore contains keys of the form ...linear_attn._fp32_params.A_log instead of the original ...linear_attn.A_log.

This adapter renames keys at save/load boundaries so that on-disk checkpoints match the original HF Qwen3.5 layout (bare A_log) and are directly loadable via transformers.AutoModelForImageTextToText.from_pretrained.

Module Contents

Classes

NameDescription
Qwen3_5DenseStateDictAdapterAdapter that hides the _fp32_params wrapping in saved checkpoints.

Functions

NameDescription
_route_to_fp32_holder-
_strip_fp32_prefix-
map_qwen3_5_mtp_from_hf_keyMap HF Qwen3.5 MTP keys to Automodel’s Megatron-style MTP module.
map_qwen3_5_mtp_to_hf_keyMap Automodel Qwen3.5 MTP keys back to HF checkpoint keys.

Data

_BARE_FP32_PARAM_NAMES

_FP32_PARAMS_TO_BARE

_MTP_HF_TO_NATIVE

_MTP_NATIVE_TO_HF

API

class nemo_automodel.components.models.qwen3_5.state_dict_adapter.Qwen3_5DenseStateDictAdapter(
route_linear_attn_fp32_params: bool = True
)

Bases: StateDictAdapter

Adapter that hides the _fp32_params wrapping in saved checkpoints.

nemo_automodel.components.models.qwen3_5.state_dict_adapter.Qwen3_5DenseStateDictAdapter._map_from_hf_key(
key: str
) -> str
nemo_automodel.components.models.qwen3_5.state_dict_adapter.Qwen3_5DenseStateDictAdapter.convert_single_tensor_to_hf(
fqn: str,
tensor: typing.Any,
kwargs: typing.Any = {}
) -> list[tuple[str, typing.Any]]
nemo_automodel.components.models.qwen3_5.state_dict_adapter.Qwen3_5DenseStateDictAdapter.from_hf(
hf_state_dict: dict[str, typing.Any],
device_mesh: typing.Optional[typing.Any] = None,
kwargs: typing.Any = {}
) -> dict[str, typing.Any]
nemo_automodel.components.models.qwen3_5.state_dict_adapter.Qwen3_5DenseStateDictAdapter.to_hf(
state_dict: dict[str, typing.Any],
kwargs: typing.Any = {}
) -> dict[str, typing.Any]
nemo_automodel.components.models.qwen3_5.state_dict_adapter._route_to_fp32_holder(
key: str
) -> str
nemo_automodel.components.models.qwen3_5.state_dict_adapter._strip_fp32_prefix(
key: str
) -> str
nemo_automodel.components.models.qwen3_5.state_dict_adapter.map_qwen3_5_mtp_from_hf_key(
key: str
) -> str

Map HF Qwen3.5 MTP keys to Automodel’s Megatron-style MTP module.

nemo_automodel.components.models.qwen3_5.state_dict_adapter.map_qwen3_5_mtp_to_hf_key(
key: str
) -> str

Map Automodel Qwen3.5 MTP keys back to HF checkpoint keys.

nemo_automodel.components.models.qwen3_5.state_dict_adapter._BARE_FP32_PARAM_NAMES = ('A_log', 'dt_bias')
nemo_automodel.components.models.qwen3_5.state_dict_adapter._FP32_PARAMS_TO_BARE = re.compile('(\\.linear_attn)\\._fp32_params\\.')
nemo_automodel.components.models.qwen3_5.state_dict_adapter._MTP_HF_TO_NATIVE = {'mtp.fc.weight': 'mtp.layers.0.eh_proj.weight', 'mtp.pre_fc_norm_embedding.weig...
nemo_automodel.components.models.qwen3_5.state_dict_adapter._MTP_NATIVE_TO_HF = {v: k for k, v in (_MTP_HF_TO_NATIVE.items())}