`nemo_automodel.components.models.ling_v2.state_dict_adapter`#

HF <-> NeMo state-dict adapter for BailingMoeV2 (Ling 2.0).

Handles the rename map between the HuggingFace checkpoint layout

model.word_embeddings.weight
model.layers.{N}.attention.query_key_value.weight      # fused [Q | K | V]
model.layers.{N}.attention.dense.weight
model.layers.{N}.attention.query_layernorm.weight
model.layers.{N}.attention.key_layernorm.weight
model.layers.{N}.mlp.gate.weight
model.layers.{N}.mlp.gate.expert_bias
model.layers.{N}.mlp.experts.{E}.{gate_proj,up_proj,down_proj}.weight
model.layers.{N}.mlp.shared_experts.{gate_proj,up_proj,down_proj}.weight

and the native NeMo layout used by this package

model.embed_tokens.weight
model.layers.{N}.self_attn.{q_proj,k_proj,v_proj,o_proj}.weight
model.layers.{N}.self_attn.{q_norm,k_norm}.weight
model.layers.{N}.mlp.gate.weight
model.layers.{N}.mlp.gate.e_score_correction_bias
model.layers.{N}.mlp.experts.{gate_and_up_projs,down_projs}
model.layers.{N}.mlp.shared_experts.{gate_proj,up_proj,down_proj}.weight

The per-expert grouping is delegated to MoESplitExpertsStateDictMixin; this adapter only normalises the surrounding key names and splits the fused QKV.

Module Contents#

Classes#

BailingMoeV2StateDictAdapter

State-dict adapter for BailingMoeV2 / Ling 2.0 checkpoints.

Functions#

`_rename_hf_to_native`
`_rename_native_to_hf`

Data#

`_RENAME_PAIRS_HF_TO_NATIVE`
`_LAYER_QKV_RE`

API#

nemo_automodel.components.models.ling_v2.state_dict_adapter._RENAME_PAIRS_HF_TO_NATIVE: tuple[tuple[str, str], ...]#: ((‘model.word_embeddings.’, ‘model.embed_tokens.’), (‘.attention.dense.’, ‘.self_attn.o_proj.’), (’….

nemo_automodel.components.models.ling_v2.state_dict_adapter._LAYER_QKV_RE#: ‘compile(…)’

nemo_automodel.components.models.ling_v2.state_dict_adapter._rename_hf_to_native(key: str) → str#

nemo_automodel.components.models.ling_v2.state_dict_adapter._rename_native_to_hf(key: str) → str#

class nemo_automodel.components.models.ling_v2.state_dict_adapter.BailingMoeV2StateDictAdapter( config: nemo_automodel.components.models.ling_v2.config.BailingMoeV2Config, moe_config: nemo_automodel.components.moe.config.MoEConfig, backend: nemo_automodel.components.models.common.BackendConfig, dtype: torch.dtype = torch.bfloat16, )#

Bases: nemo_automodel.components.moe.state_dict_mixin.MoESplitExpertsStateDictMixin, nemo_automodel.components.checkpoint.state_dict_adapter.StateDictAdapter

State-dict adapter for BailingMoeV2 / Ling 2.0 checkpoints.

Initialization

from_hf(

hf_state_dict: dict[str, Any],

device_mesh: Optional[torch.distributed.device_mesh.DeviceMesh] = None,

**kwargs,

) → dict[str, Any]#

_split_fused_qkv_and_rename( hf_state_dict: dict[str, Any], ) → dict[str, Any]#: Split each fused query_key_value weight into q/k/v and apply renames.

to_hf(

state_dict: dict[str, Any],

exclude_key_regex: Optional[str] = None,

quantization: bool = False,

**kwargs,

) → dict[str, Any]#

convert_single_tensor_to_hf(

fqn: str,

tensor: Any,

**kwargs,

) → list[tuple[str, Any]]#

Convert a single native tensor to HuggingFace format.

q_proj / k_proj / v_proj tensors cannot be re-fused without their two siblings; the caller should batch them through :meth:to_hf instead. This single-tensor path emits the per-projection HF key (which is not the standard fused name) so that the value is not silently dropped during DCP save adapters that walk tensors one-by-one.

nemo_automodel.components.models.ling_v2.state_dict_adapter#