nemo_automodel.components.models.hy_v3.layers#
Module Contents#
Classes#
HYV3 attention with GQA, per-head Q/K RMSNorm, and RoPE. |
API#
- class nemo_automodel.components.models.hy_v3.layers.HYV3Attention(
- config: Any,
- backend: nemo_automodel.components.models.common.BackendConfig,
Bases:
torch.nn.ModuleHYV3 attention with GQA, per-head Q/K RMSNorm, and RoPE.
Architecture:
q_proj: [hidden, n_heads * head_dim]
k_proj / v_proj: [hidden, n_kv_heads * head_dim]
q_norm / k_norm: RMSNorm applied per-head before RoPE
RoPE applied after per-head norm
Initialization
- forward(
- x: torch.Tensor,
- *,
- freqs_cis: torch.Tensor,
- attention_mask: torch.Tensor | None = None,
- **attn_kwargs: Any,
- init_weights(buffer_device: torch.device, init_std: float = 0.02)#