nemo_automodel.components.models.ling_v2.state_dict_adapter#
HF <-> NeMo state-dict adapter for BailingMoeV2 (Ling 2.0).
Handles the rename map between the HuggingFace checkpoint layout
model.word_embeddings.weight
model.layers.{N}.attention.query_key_value.weight # fused [Q | K | V]
model.layers.{N}.attention.dense.weight
model.layers.{N}.attention.query_layernorm.weight
model.layers.{N}.attention.key_layernorm.weight
model.layers.{N}.mlp.gate.weight
model.layers.{N}.mlp.gate.expert_bias
model.layers.{N}.mlp.experts.{E}.{gate_proj,up_proj,down_proj}.weight
model.layers.{N}.mlp.shared_experts.{gate_proj,up_proj,down_proj}.weight
and the native NeMo layout used by this package
model.embed_tokens.weight
model.layers.{N}.self_attn.{q_proj,k_proj,v_proj,o_proj}.weight
model.layers.{N}.self_attn.{q_norm,k_norm}.weight
model.layers.{N}.mlp.gate.weight
model.layers.{N}.mlp.gate.e_score_correction_bias
model.layers.{N}.mlp.experts.{gate_and_up_projs,down_projs}
model.layers.{N}.mlp.shared_experts.{gate_proj,up_proj,down_proj}.weight
The per-expert grouping is delegated to MoESplitExpertsStateDictMixin; this
adapter only normalises the surrounding key names and splits the fused QKV.
Module Contents#
Classes#
State-dict adapter for BailingMoeV2 / Ling 2.0 checkpoints. |
Functions#
Data#
API#
- nemo_automodel.components.models.ling_v2.state_dict_adapter._RENAME_PAIRS_HF_TO_NATIVE: tuple[tuple[str, str], ...]#
((‘model.word_embeddings.’, ‘model.embed_tokens.’), (‘.attention.dense.’, ‘.self_attn.o_proj.’), (’….
- nemo_automodel.components.models.ling_v2.state_dict_adapter._LAYER_QKV_RE#
‘compile(…)’
- nemo_automodel.components.models.ling_v2.state_dict_adapter._rename_hf_to_native(key: str) str#
- nemo_automodel.components.models.ling_v2.state_dict_adapter._rename_native_to_hf(key: str) str#
- class nemo_automodel.components.models.ling_v2.state_dict_adapter.BailingMoeV2StateDictAdapter(
- config: nemo_automodel.components.models.ling_v2.config.BailingMoeV2Config,
- moe_config: nemo_automodel.components.moe.config.MoEConfig,
- backend: nemo_automodel.components.models.common.BackendConfig,
- dtype: torch.dtype = torch.bfloat16,
Bases:
nemo_automodel.components.moe.state_dict_mixin.MoESplitExpertsStateDictMixin,nemo_automodel.components.checkpoint.state_dict_adapter.StateDictAdapterState-dict adapter for BailingMoeV2 / Ling 2.0 checkpoints.
Initialization
- from_hf(
- hf_state_dict: dict[str, Any],
- device_mesh: Optional[torch.distributed.device_mesh.DeviceMesh] = None,
- **kwargs,
- _split_fused_qkv_and_rename(
- hf_state_dict: dict[str, Any],
Split each fused
query_key_valueweight into q/k/v and apply renames.
- to_hf(
- state_dict: dict[str, Any],
- exclude_key_regex: Optional[str] = None,
- quantization: bool = False,
- **kwargs,
- convert_single_tensor_to_hf(
- fqn: str,
- tensor: Any,
- **kwargs,
Convert a single native tensor to HuggingFace format.
q_proj/k_proj/v_projtensors cannot be re-fused without their two siblings; the caller should batch them through :meth:to_hfinstead. This single-tensor path emits the per-projection HF key (which is not the standard fused name) so that the value is not silently dropped during DCP save adapters that walk tensors one-by-one.