nemo_automodel.components.models.hy_v3.model#
HYV3ForCausalLM — Tencent Hy3-preview (295B MoE) SFT support.
Architecture (from tencent/Hy3-preview config.json):
80 transformer layers; layer 0 is dense, layers 1-79 are MoE
MoE: 192 routed experts + 1 shared expert, top-8 activated
Sigmoid routing with expert-bias correction (e_score_correction_bias)
GQA: 64 Q heads, 8 KV heads, head_dim=128
Per-head QK RMSNorm before RoPE
256K context, rope_theta=11158840
Module Contents#
Classes#
Data#
API#
- class nemo_automodel.components.models.hy_v3.model.Block(
- layer_idx: int,
- config: Any,
- moe_config: nemo_automodel.components.moe.config.MoEConfig,
- backend: nemo_automodel.components.models.common.BackendConfig,
Bases:
torch.nn.ModuleInitialization
- forward(
- x: torch.Tensor,
- *,
- freqs_cis: torch.Tensor,
- attention_mask: torch.Tensor | None = None,
- padding_mask: torch.Tensor | None = None,
- **attn_kwargs: Any,
- _mlp(
- x: torch.Tensor,
- padding_mask: torch.Tensor | None,
- init_weights(buffer_device: torch.device)#
- class nemo_automodel.components.models.hy_v3.model.HYV3Model(
- config: Any,
- backend: nemo_automodel.components.models.common.BackendConfig,
- *,
- moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None,
- moe_overrides: dict | None = None,
Bases:
torch.nn.ModuleInitialization
- forward(
- input_ids: torch.Tensor,
- *,
- position_ids: torch.Tensor | None = None,
- attention_mask: torch.Tensor | None = None,
- padding_mask: torch.Tensor | None = None,
- **attn_kwargs: Any,
- init_weights(buffer_device: torch.device | None = None) None#
- class nemo_automodel.components.models.hy_v3.model.HYV3ForCausalLM(
- config: Any,
- moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None,
- backend: nemo_automodel.components.models.common.BackendConfig | None = None,
- **kwargs,
Bases:
nemo_automodel.components.models.common.hf_checkpointing_mixin.HFCheckpointingMixin,torch.nn.Module,nemo_automodel.components.moe.fsdp_mixin.MoEFSDPSyncMixin- classmethod from_config(
- config: Any,
- moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None,
- backend: nemo_automodel.components.models.common.BackendConfig | None = None,
- **kwargs,
- classmethod from_pretrained(
- pretrained_model_name_or_path: str,
- *model_args,
- **kwargs,
- get_input_embeddings()#
- set_input_embeddings(value)#
- get_output_embeddings()#
- set_output_embeddings(new_embeddings)#
- forward(
- input_ids: torch.Tensor,
- *,
- position_ids: torch.Tensor | None = None,
- attention_mask: torch.Tensor | None = None,
- padding_mask: torch.Tensor | None = None,
- **attn_kwargs: Any,
- update_moe_gate_bias() None#
- initialize_weights(
- buffer_device: torch.device | None = None,
- dtype: torch.dtype = torch.bfloat16,
- nemo_automodel.components.models.hy_v3.model.ModelClass#
None