nemo_automodel.components.models.hy_mt2.state_dict_adapter

View as Markdown

State dict conversion between the on-disk tencent/Hy-MT2-30B-A3B HF checkpoint and Automodel’s native (grouped-experts) format.

The on-disk key layout is identical to tencent/Hy3-preview because both share model_type: "hy_v3" and architectures: ["HYV3ForCausalLM"]:

model.layers.{L}.mlp.expert_bias # [n_experts] model.layers.{L}.mlp.router.gate.weight # [n_experts, hidden] model.layers.{L}.mlp.experts.{E}.gate_proj.weight # [moe_inter, hidden] model.layers.{L}.mlp.experts.{E}.up_proj.weight # [moe_inter, hidden] model.layers.{L}.mlp.experts.{E}.down_proj.weight # [hidden, moe_inter] model.layers.{L}.mlp.shared_mlp.{gate,up,down}_proj.weight # shared expert

Automodel native:

model.layers.{L}.mlp.gate.e_score_correction_bias # [n_local] model.layers.{L}.mlp.gate.weight # [n_experts, hidden] model.layers.{L}.mlp.experts.gate_and_up_projs # grouped model.layers.{L}.mlp.experts.down_projs # grouped model.layers.{L}.mlp.shared_experts.{gate,up,down}_proj.weight

This adapter handles three on-disk-specific renames plus per-expert split/merge (via MoESplitExpertsStateDictMixin). It is functionally a clone of HYV3StateDictAdapter; kept separate so future Hy-MT2-only key changes (e.g. an MTP / aux-head extension that Hy-MT2 ships but Hy3-preview does not) can be added here without affecting Hy3-preview.

Module Contents

Classes

NameDescription
HyMT2StateDictAdapterBridges Automodel native (grouped experts) and on-disk Hy-MT2 HF format.

Data

_HF_TO_NATIVE_RENAMES

_NATIVE_TO_HF_RENAMES

logger

API

class nemo_automodel.components.models.hy_mt2.state_dict_adapter.HyMT2StateDictAdapter(
config: typing.Any,
moe_config: nemo_automodel.components.moe.config.MoEConfig,
backend: nemo_automodel.components.models.common.BackendConfig,
dtype: torch.dtype = torch.bfloat16
)

Bases: MoESplitExpertsStateDictMixin, StateDictAdapter

Bridges Automodel native (grouped experts) and on-disk Hy-MT2 HF format.

nemo_automodel.components.models.hy_mt2.state_dict_adapter.HyMT2StateDictAdapter._is_mtp_key(
key: str
) -> bool

Return True if key belongs to an MTP layer (index >= num_hidden_layers).

Hy-MT2-30B-A3B does not appear to ship MTP layers in its public checkpoint, but the filter is kept as a defensive no-op so the adapter remains symmetric with HYV3StateDictAdapter.

nemo_automodel.components.models.hy_mt2.state_dict_adapter.HyMT2StateDictAdapter.convert_single_tensor_to_hf(
fqn: str,
tensor: typing.Any,
kwargs = {}
) -> list[tuple[str, typing.Any]]

Per-tensor variant of to_hf for streaming-save code paths.

nemo_automodel.components.models.hy_mt2.state_dict_adapter.HyMT2StateDictAdapter.from_hf(
hf_state_dict: dict[str, typing.Any],
device_mesh: typing.Optional[torch.distributed.device_mesh.DeviceMesh] = None,
kwargs = {}
) -> dict[str, typing.Any]

On-disk Hy-MT2 HF -> native: filter MTP, rename, then merge experts.

nemo_automodel.components.models.hy_mt2.state_dict_adapter.HyMT2StateDictAdapter.to_hf(
state_dict: dict[str, typing.Any],
exclude_key_regex: typing.Optional[str] = None,
kwargs = {}
) -> dict[str, typing.Any]

Native -> on-disk Hy-MT2 HF: per-expert split + name renames.

nemo_automodel.components.models.hy_mt2.state_dict_adapter._HF_TO_NATIVE_RENAMES: tuple[tuple[Pattern[str], str], ...] = ((re.compile('\\.mlp\\.expert_bias$'), '.mlp.gate.e_score_correction_bias'), (re...
nemo_automodel.components.models.hy_mt2.state_dict_adapter._NATIVE_TO_HF_RENAMES: tuple[tuple[Pattern[str], str], ...] = ((re.compile('\\.mlp\\.gate\\.e_score_correction_bias$'), '.mlp.expert_bias'), (...
nemo_automodel.components.models.hy_mt2.state_dict_adapter.logger = logging.getLogger(__name__)