nemo_automodel.components.models.ling_v2.layers#
Attention layer for BailingMoeV2 (Ling 2.0).
GQA + per-head QK-RMSNorm + partial RoPE. Equivalent to Qwen3-MoE attention
with an additional partial_rotary_factor knob that rotates only the first
head_dim * partial_rotary_factor channels and passes the rest through
(GPT-J / GPT-NeoX half-RoPE).
Module Contents#
Classes#
Bailing MoE V2 attention block. |
API#
- class nemo_automodel.components.models.ling_v2.layers.BailingMoeV2Attention(
- config,
- backend: nemo_automodel.components.models.common.BackendConfig,
Bases:
torch.nn.ModuleBailing MoE V2 attention block.
Shapes:
Input
x:[B, S, H](or[T, H]in THD format).Projections:
q:[B, S, n_heads, head_dim]k, v:[B, S, n_kv_heads, head_dim]Output:
[B, S, H].
Initialization
- forward(
- x: torch.Tensor,
- *,
- freqs_cis: torch.Tensor,
- attention_mask: torch.Tensor | None = None,
- **attn_kwargs: Any,
- init_weights(
- buffer_device: torch.device,
- init_std: float = 0.02,