nemo_automodel.components.models.ling_v2.model#

BailingMoeV2 model (Ling 2.0 family).

Architecture summary (from the public inclusionAI/Ling-{mini,flash,1T}-2.0 checkpoints):

  • GQA attention with per-head QK-RMSNorm and partial RoPE (rotates the first head_dim * partial_rotary_factor channels only).

  • first_k_dense_replace dense MLP layers at the start of the stack; the remaining layers are sigmoid-routed grouped MoE with shared experts and an aux-loss-free per-expert bias (DeepSeek-V3-style routing).

  • Single shared expert with intermediate size moe_intermediate_size.

  • MTP heads (num_nextn_predict_layers) are disabled in all published checkpoints and intentionally not modeled here.

Example (YAML):

model:
  _target_: nemo_automodel.NeMoAutoModelForCausalLM.from_pretrained
  pretrained_model_name_or_path: inclusionAI/Ling-mini-2.0

Module Contents#

Classes#

Block

Single transformer block: attention + (dense MLP or MoE) + residuals.

BailingMoeV2Model

Embedding + decoder stack + final norm. No LM head.

BailingMoeV2ForCausalLM

Causal-LM head wrapping BailingMoeV2Model.

Data#

API#

class nemo_automodel.components.models.ling_v2.model.Block(
layer_idx: int,
config: nemo_automodel.components.models.ling_v2.config.BailingMoeV2Config,
moe_config: nemo_automodel.components.moe.config.MoEConfig,
backend: nemo_automodel.components.models.common.BackendConfig,
)#

Bases: torch.nn.Module

Single transformer block: attention + (dense MLP or MoE) + residuals.

Initialization

forward(
x: torch.Tensor,
*,
freqs_cis: torch.Tensor,
attention_mask: torch.Tensor | None = None,
padding_mask: torch.Tensor | None = None,
**attn_kwargs: Any,
) torch.Tensor#
_mlp(
x: torch.Tensor,
padding_mask: torch.Tensor | None,
) torch.Tensor#
init_weights(buffer_device: torch.device) None#
class nemo_automodel.components.models.ling_v2.model.BailingMoeV2Model(
config: nemo_automodel.components.models.ling_v2.config.BailingMoeV2Config,
backend: nemo_automodel.components.models.common.BackendConfig,
*,
moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None,
moe_overrides: dict | None = None,
)#

Bases: torch.nn.Module

Embedding + decoder stack + final norm. No LM head.

Initialization

forward(
input_ids: torch.Tensor | None = None,
*,
inputs_embeds: torch.Tensor | None = None,
position_ids: torch.Tensor | None = None,
attention_mask: torch.Tensor | None = None,
padding_mask: torch.Tensor | None = None,
**attn_kwargs: Any,
) torch.Tensor#
update_moe_gate_bias() None#

No-op for SFT; published Ling checkpoints freeze the expert_bias buffer.

init_weights(buffer_device: torch.device | None = None) None#
class nemo_automodel.components.models.ling_v2.model.BailingMoeV2ForCausalLM(
config: nemo_automodel.components.models.ling_v2.config.BailingMoeV2Config,
moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None,
backend: nemo_automodel.components.models.common.BackendConfig | None = None,
**kwargs,
)#

Bases: nemo_automodel.components.models.common.hf_checkpointing_mixin.HFCheckpointingMixin, torch.nn.Module, nemo_automodel.components.moe.fsdp_mixin.MoEFSDPSyncMixin

Causal-LM head wrapping BailingMoeV2Model.

Initialization

_keep_in_fp32_modules_strict#

[‘e_score_correction_bias’]

_pp_keep_self_forward: bool#

True

classmethod from_config(
config: nemo_automodel.components.models.ling_v2.config.BailingMoeV2Config,
moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None,
backend: nemo_automodel.components.models.common.BackendConfig | None = None,
**kwargs,
)#
classmethod from_pretrained(
pretrained_model_name_or_path: str,
*model_args,
**kwargs,
)#
get_input_embeddings()#
set_input_embeddings(value)#
get_output_embeddings()#
set_output_embeddings(new_embeddings)#
forward(
input_ids: torch.Tensor,
*,
position_ids: torch.Tensor | None = None,
attention_mask: torch.Tensor | None = None,
padding_mask: torch.Tensor | None = None,
**attn_kwargs: Any,
) torch.Tensor#
update_moe_gate_bias() None#
initialize_weights(
buffer_device: torch.device | None = None,
dtype: torch.dtype = torch.bfloat16,
) None#
nemo_automodel.components.models.ling_v2.model.ModelClass#

None