nemo_automodel.components.models.ling_v2.state_dict_adapter
nemo_automodel.components.models.ling_v2.state_dict_adapter
HF <-> NeMo state-dict adapter for BailingMoeV2 (Ling 2.0).
Handles the rename map between the HuggingFace checkpoint layout
model.word_embeddings.weight model.layers.{N}.attention.query_key_value.weight # fused [Q | K | V] model.layers.{N}.attention.dense.weight model.layers.{N}.attention.query_layernorm.weight model.layers.{N}.attention.key_layernorm.weight model.layers.{N}.mlp.gate.weight model.layers.{N}.mlp.gate.expert_bias model.layers.{N}.mlp.experts.{E}.{gate_proj,up_proj,down_proj}.weight model.layers.{N}.mlp.shared_experts.{gate_proj,up_proj,down_proj}.weight
and the native NeMo layout used by this package
model.embed_tokens.weight model.layers.{N}.self_attn.{q_proj,k_proj,v_proj,o_proj}.weight model.layers.{N}.self_attn.{q_norm,k_norm}.weight model.layers.{N}.mlp.gate.weight model.layers.{N}.mlp.gate.e_score_correction_bias model.layers.{N}.mlp.experts.{gate_and_up_projs,down_projs} model.layers.{N}.mlp.shared_experts.{gate_proj,up_proj,down_proj}.weight
The per-expert grouping is delegated to MoESplitExpertsStateDictMixin; this
adapter only normalises the surrounding key names and splits the fused QKV.
Module Contents
Classes
Functions
Data
API
Bases: MoESplitExpertsStateDictMixin, StateDictAdapter
State-dict adapter for BailingMoeV2 / Ling 2.0 checkpoints.
Split each fused query_key_value weight into q/k/v and apply renames.
Convert a single native tensor to HuggingFace format.
q_proj / k_proj / v_proj tensors cannot be re-fused without
their two siblings; the caller should batch them through :meth:to_hf
instead. This single-tensor path emits the per-projection HF key (which
is not the standard fused name) so that the value is not silently
dropped during DCP save adapters that walk tensors one-by-one.