`nemo_automodel.components.models.ling_v2.model`#

BailingMoeV2 model (Ling 2.0 family).

Architecture summary (from the public inclusionAI/Ling-{mini,flash,1T}-2.0 checkpoints):

GQA attention with per-head QK-RMSNorm and partial RoPE (rotates the first head_dim * partial_rotary_factor channels only).
first_k_dense_replace dense MLP layers at the start of the stack; the remaining layers are sigmoid-routed grouped MoE with shared experts and an aux-loss-free per-expert bias (DeepSeek-V3-style routing).
Single shared expert with intermediate size moe_intermediate_size.
MTP heads (num_nextn_predict_layers) are disabled in all published checkpoints and intentionally not modeled here.

Example (YAML):

model:
  _target_: nemo_automodel.NeMoAutoModelForCausalLM.from_pretrained
  pretrained_model_name_or_path: inclusionAI/Ling-mini-2.0

Module Contents#

`Block`	Single transformer block: attention + (dense MLP or MoE) + residuals.
`BailingMoeV2Model`	Embedding + decoder stack + final norm. No LM head.
`BailingMoeV2ForCausalLM`	Causal-LM head wrapping `BailingMoeV2Model`.

class nemo_automodel.components.models.ling_v2.model.Block( layer_idx: int, config: nemo_automodel.components.models.ling_v2.config.BailingMoeV2Config, moe_config: nemo_automodel.components.moe.config.MoEConfig, backend: nemo_automodel.components.models.common.BackendConfig, )#

Bases: torch.nn.Module

Single transformer block: attention + (dense MLP or MoE) + residuals.

Initialization

_mlp( x: torch.Tensor, padding_mask: torch.Tensor | None, ) → torch.Tensor#

class nemo_automodel.components.models.ling_v2.model.BailingMoeV2Model( config: nemo_automodel.components.models.ling_v2.config.BailingMoeV2Config, backend: nemo_automodel.components.models.common.BackendConfig, *, moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None, moe_overrides: dict | None = None, )#

Bases: torch.nn.Module

Embedding + decoder stack + final norm. No LM head.

Initialization

update_moe_gate_bias() → None#: No-op for SFT; published Ling checkpoints freeze the expert_bias buffer.

class nemo_automodel.components.models.ling_v2.model.BailingMoeV2ForCausalLM(

config: nemo_automodel.components.models.ling_v2.config.BailingMoeV2Config,

moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None,

backend: nemo_automodel.components.models.common.BackendConfig | None = None,

**kwargs,

)#

Causal-LM head wrapping BailingMoeV2Model.

Initialization

initialize_weights( buffer_device: torch.device | None = None, dtype: torch.dtype = torch.bfloat16, ) → None#