nemo_automodel.components.models.hy_mt2.config

Module Contents

Classes

Name	Description
`HyMT2Config`	Configuration class for Tencent Hy-MT2-30B-A3B (translation MoE).

API

class nemo_automodel.components.models.hy_mt2.config.HyMT2Config(
    vocab_size: int = 120832,
    hidden_size: int = 2048,
    intermediate_size: int = 6912,
    moe_intermediate_size: int = 768,
    expert_hidden_dim: int = 768,
    num_hidden_layers: int = 48,
    num_attention_heads: int = 32,
    num_key_value_heads: int = 4,
    head_dim: int = 128,
    num_experts: int = 128,
    num_shared_experts: int = 1,
    num_experts_per_tok: int = 8,
    router_scaling_factor: float = 2.826,
    route_norm: bool = True,
    moe_router_enable_expert_bias: bool = True,
    moe_router_use_sigmoid: bool = True,
    first_k_dense_replace: int = 1,
    max_position_embeddings: int = 262144,
    rope_theta: float = 11158840.0,
    rope_scaling: dict | None = None,
    rms_norm_eps: float = 1e-05,
    qk_norm: bool = True,
    attention_bias: bool = False,
    hidden_act: str = 'silu',
    enable_lm_head_fp32: bool = True,
    enable_attention_fp32_softmax: bool = False,
    enable_moe_fp32_combine: bool = False,
    use_cache: bool = True,
    pad_token_id: int | None = 120002,
    bos_token_id: int = 120000,
    eos_token_id: int = 120025,
    tie_word_embeddings: bool = False,
    torch_dtype: str = 'bfloat16',
    kwargs = {}
)

Bases: PretrainedConfig

Configuration class for Tencent Hy-MT2-30B-A3B (translation MoE).

Architecture (from tencent/Hy-MT2-30B-A3B config.json):

48 transformer layers; layer 0 is dense, layers 1-47 are MoE
MoE: 128 routed experts + 1 shared expert, top-8 activated
Sigmoid routing with expert-bias correction (e_score_correction_bias) and router_scaling_factor = 2.826
route_norm = True (normalize top-k routing weights)
GQA: 32 Q heads, 4 KV heads, head_dim=128, hidden_size=2048
Per-head Q/K RMSNorm before RoPE (qk_norm)
256K context, rope_theta=11158840
vocab_size=120832, dense intermediate_size=6912, moe_intermediate_size=768
enable_lm_head_fp32 = True (HF reference upcasts lm_head to fp32)

Note: the on-disk HF checkpoint declares model_type: "hy_v3" and architectures: ["HYV3ForCausalLM"]. NeMo AutoModel’s existing HYV3Config therefore wins AutoConfig.from_pretrained. This class is provided for tests and for standalone instantiation; the model code in model.py is duck-typed against config.<field> and works with either config class.

keys_to_ignore_at_inference

= ['past_key_values']

model_type

= 'hy_mt2'