nemo_automodel.components.models.hy_mt2.config#

Module Contents#

Classes#

HyMT2Config

Configuration class for Tencent Hy-MT2-30B-A3B (translation MoE).

API#

class nemo_automodel.components.models.hy_mt2.config.HyMT2Config(
vocab_size: int = 120832,
hidden_size: int = 2048,
intermediate_size: int = 6912,
moe_intermediate_size: int = 768,
expert_hidden_dim: int = 768,
num_hidden_layers: int = 48,
num_attention_heads: int = 32,
num_key_value_heads: int = 4,
head_dim: int = 128,
num_experts: int = 128,
num_shared_experts: int = 1,
num_experts_per_tok: int = 8,
router_scaling_factor: float = 2.826,
route_norm: bool = True,
moe_router_enable_expert_bias: bool = True,
moe_router_use_sigmoid: bool = True,
first_k_dense_replace: int = 1,
max_position_embeddings: int = 262144,
rope_theta: float = 11158840.0,
rope_scaling: dict | None = None,
rms_norm_eps: float = 1e-05,
qk_norm: bool = True,
attention_bias: bool = False,
hidden_act: str = 'silu',
enable_lm_head_fp32: bool = True,
enable_attention_fp32_softmax: bool = False,
enable_moe_fp32_combine: bool = False,
use_cache: bool = True,
pad_token_id: int | None = 120002,
bos_token_id: int = 120000,
eos_token_id: int = 120025,
tie_word_embeddings: bool = False,
torch_dtype: str = 'bfloat16',
**kwargs,
)#

Bases: transformers.PretrainedConfig

Configuration class for Tencent Hy-MT2-30B-A3B (translation MoE).

Architecture (from tencent/Hy-MT2-30B-A3B config.json):

  • 48 transformer layers; layer 0 is dense, layers 1-47 are MoE

  • MoE: 128 routed experts + 1 shared expert, top-8 activated

  • Sigmoid routing with expert-bias correction (e_score_correction_bias) and router_scaling_factor = 2.826

  • route_norm = True (normalize top-k routing weights)

  • GQA: 32 Q heads, 4 KV heads, head_dim=128, hidden_size=2048

  • Per-head Q/K RMSNorm before RoPE (qk_norm)

  • 256K context, rope_theta=11158840

  • vocab_size=120832, dense intermediate_size=6912, moe_intermediate_size=768

  • enable_lm_head_fp32 = True (HF reference upcasts lm_head to fp32)

Note: the on-disk HF checkpoint declares model_type: "hy_v3" and architectures: ["HYV3ForCausalLM"]. NeMo AutoModel’s existing HYV3Config therefore wins AutoConfig.from_pretrained. This class is provided for tests and for standalone instantiation; the model code in model.py is duck-typed against config.<field> and works with either config class.

Initialization

model_type#

‘hy_mt2’

keys_to_ignore_at_inference#

[‘past_key_values’]