nemo_automodel.components.models.hy_mt2.config
nemo_automodel.components.models.hy_mt2.config
Module Contents
Classes
API
Bases: PretrainedConfig
Configuration class for Tencent Hy-MT2-30B-A3B (translation MoE).
Architecture (from tencent/Hy-MT2-30B-A3B config.json):
- 48 transformer layers; layer 0 is dense, layers 1-47 are MoE
- MoE: 128 routed experts + 1 shared expert, top-8 activated
- Sigmoid routing with expert-bias correction (e_score_correction_bias) and router_scaling_factor = 2.826
- route_norm = True (normalize top-k routing weights)
- GQA: 32 Q heads, 4 KV heads, head_dim=128, hidden_size=2048
- Per-head Q/K RMSNorm before RoPE (qk_norm)
- 256K context, rope_theta=11158840
- vocab_size=120832, dense intermediate_size=6912, moe_intermediate_size=768
- enable_lm_head_fp32 = True (HF reference upcasts lm_head to fp32)
Note: the on-disk HF checkpoint declares model_type: "hy_v3" and
architectures: ["HYV3ForCausalLM"]. NeMo AutoModel’s existing
HYV3Config therefore wins AutoConfig.from_pretrained. This class
is provided for tests and for standalone instantiation; the model code in
model.py is duck-typed against config.<field> and works with either
config class.
keys_to_ignore_at_inference
model_type