nemo_automodel.components.models.hy_mt2.config#
Module Contents#
Classes#
Configuration class for Tencent Hy-MT2-30B-A3B (translation MoE). |
API#
- class nemo_automodel.components.models.hy_mt2.config.HyMT2Config(
- vocab_size: int = 120832,
- hidden_size: int = 2048,
- intermediate_size: int = 6912,
- moe_intermediate_size: int = 768,
- expert_hidden_dim: int = 768,
- num_hidden_layers: int = 48,
- num_attention_heads: int = 32,
- num_key_value_heads: int = 4,
- head_dim: int = 128,
- num_experts: int = 128,
- num_shared_experts: int = 1,
- num_experts_per_tok: int = 8,
- router_scaling_factor: float = 2.826,
- route_norm: bool = True,
- moe_router_enable_expert_bias: bool = True,
- moe_router_use_sigmoid: bool = True,
- first_k_dense_replace: int = 1,
- max_position_embeddings: int = 262144,
- rope_theta: float = 11158840.0,
- rope_scaling: dict | None = None,
- rms_norm_eps: float = 1e-05,
- qk_norm: bool = True,
- attention_bias: bool = False,
- hidden_act: str = 'silu',
- enable_lm_head_fp32: bool = True,
- enable_attention_fp32_softmax: bool = False,
- enable_moe_fp32_combine: bool = False,
- use_cache: bool = True,
- pad_token_id: int | None = 120002,
- bos_token_id: int = 120000,
- eos_token_id: int = 120025,
- tie_word_embeddings: bool = False,
- torch_dtype: str = 'bfloat16',
- **kwargs,
Bases:
transformers.PretrainedConfigConfiguration class for Tencent Hy-MT2-30B-A3B (translation MoE).
Architecture (from tencent/Hy-MT2-30B-A3B config.json):
48 transformer layers; layer 0 is dense, layers 1-47 are MoE
MoE: 128 routed experts + 1 shared expert, top-8 activated
Sigmoid routing with expert-bias correction (e_score_correction_bias) and router_scaling_factor = 2.826
route_norm = True (normalize top-k routing weights)
GQA: 32 Q heads, 4 KV heads, head_dim=128, hidden_size=2048
Per-head Q/K RMSNorm before RoPE (qk_norm)
256K context, rope_theta=11158840
vocab_size=120832, dense intermediate_size=6912, moe_intermediate_size=768
enable_lm_head_fp32 = True (HF reference upcasts lm_head to fp32)
Note: the on-disk HF checkpoint declares
model_type: "hy_v3"andarchitectures: ["HYV3ForCausalLM"]. NeMo AutoModel’s existingHYV3Configtherefore winsAutoConfig.from_pretrained. This class is provided for tests and for standalone instantiation; the model code inmodel.pyis duck-typed againstconfig.<field>and works with either config class.Initialization
- model_type#
‘hy_mt2’
- keys_to_ignore_at_inference#
[‘past_key_values’]