nemo_automodel.components.models.ling_v2.state_dict_adapter

View as Markdown

HF <-> NeMo state-dict adapter for BailingMoeV2 (Ling 2.0).

Handles the rename map between the HuggingFace checkpoint layout

model.word_embeddings.weight model.layers.{N}.attention.query_key_value.weight # fused [Q | K | V] model.layers.{N}.attention.dense.weight model.layers.{N}.attention.query_layernorm.weight model.layers.{N}.attention.key_layernorm.weight model.layers.{N}.mlp.gate.weight model.layers.{N}.mlp.gate.expert_bias model.layers.{N}.mlp.experts.{E}.{gate_proj,up_proj,down_proj}.weight model.layers.{N}.mlp.shared_experts.{gate_proj,up_proj,down_proj}.weight

and the native NeMo layout used by this package

model.embed_tokens.weight model.layers.{N}.self_attn.{q_proj,k_proj,v_proj,o_proj}.weight model.layers.{N}.self_attn.{q_norm,k_norm}.weight model.layers.{N}.mlp.gate.weight model.layers.{N}.mlp.gate.e_score_correction_bias model.layers.{N}.mlp.experts.{gate_and_up_projs,down_projs} model.layers.{N}.mlp.shared_experts.{gate_proj,up_proj,down_proj}.weight

The per-expert grouping is delegated to MoESplitExpertsStateDictMixin; this adapter only normalises the surrounding key names and splits the fused QKV.

Module Contents

Classes

NameDescription
BailingMoeV2StateDictAdapterState-dict adapter for BailingMoeV2 / Ling 2.0 checkpoints.

Functions

Data

_LAYER_QKV_RE

_RENAME_PAIRS_HF_TO_NATIVE

API

class nemo_automodel.components.models.ling_v2.state_dict_adapter.BailingMoeV2StateDictAdapter(
config: nemo_automodel.components.models.ling_v2.config.BailingMoeV2Config,
moe_config: nemo_automodel.components.moe.config.MoEConfig,
backend: nemo_automodel.components.models.common.BackendConfig,
dtype: torch.dtype = torch.bfloat16
)

Bases: MoESplitExpertsStateDictMixin, StateDictAdapter

State-dict adapter for BailingMoeV2 / Ling 2.0 checkpoints.

nemo_automodel.components.models.ling_v2.state_dict_adapter.BailingMoeV2StateDictAdapter._split_fused_qkv_and_rename(
hf_state_dict: dict[str, typing.Any]
) -> dict[str, typing.Any]

Split each fused query_key_value weight into q/k/v and apply renames.

nemo_automodel.components.models.ling_v2.state_dict_adapter.BailingMoeV2StateDictAdapter.convert_single_tensor_to_hf(
fqn: str,
tensor: typing.Any,
kwargs = {}
) -> list[tuple[str, typing.Any]]

Convert a single native tensor to HuggingFace format.

q_proj / k_proj / v_proj tensors cannot be re-fused without their two siblings; the caller should batch them through :meth:to_hf instead. This single-tensor path emits the per-projection HF key (which is not the standard fused name) so that the value is not silently dropped during DCP save adapters that walk tensors one-by-one.

nemo_automodel.components.models.ling_v2.state_dict_adapter.BailingMoeV2StateDictAdapter.from_hf(
hf_state_dict: dict[str, typing.Any],
device_mesh: typing.Optional[torch.distributed.device_mesh.DeviceMesh] = None,
kwargs = {}
) -> dict[str, typing.Any]
nemo_automodel.components.models.ling_v2.state_dict_adapter.BailingMoeV2StateDictAdapter.to_hf(
state_dict: dict[str, typing.Any],
exclude_key_regex: typing.Optional[str] = None,
quantization: bool = False,
kwargs = {}
) -> dict[str, typing.Any]
nemo_automodel.components.models.ling_v2.state_dict_adapter._rename_hf_to_native(
key: str
) -> str
nemo_automodel.components.models.ling_v2.state_dict_adapter._rename_native_to_hf(
key: str
) -> str
nemo_automodel.components.models.ling_v2.state_dict_adapter._LAYER_QKV_RE = re.compile('^(?P<prefix>(?:.*\\.)?layers\\.\\d+)\\.attention\\.query_key_value\\...
nemo_automodel.components.models.ling_v2.state_dict_adapter._RENAME_PAIRS_HF_TO_NATIVE: tuple[tuple[str, str], ...] = (('model.word_embeddings.', 'model.embed_tokens.'), ('.attention.dense.', '.self...