`nemo_automodel.components.models.hy_mt2.config`#

Module Contents#

Classes#

HyMT2Config

Configuration class for Tencent Hy-MT2-30B-A3B (translation MoE).

API#

class nemo_automodel.components.models.hy_mt2.config.HyMT2Config(

vocab_size: int = 120832,

hidden_size: int = 2048,

intermediate_size: int = 6912,

moe_intermediate_size: int = 768,

expert_hidden_dim: int = 768,

num_hidden_layers: int = 48,

num_attention_heads: int = 32,

num_key_value_heads: int = 4,

head_dim: int = 128,

num_experts: int = 128,

num_shared_experts: int = 1,

num_experts_per_tok: int = 8,

router_scaling_factor: float = 2.826,

route_norm: bool = True,

moe_router_enable_expert_bias: bool = True,

moe_router_use_sigmoid: bool = True,

first_k_dense_replace: int = 1,

max_position_embeddings: int = 262144,

rope_theta: float = 11158840.0,

rope_scaling: dict | None = None,

rms_norm_eps: float = 1e-05,

qk_norm: bool = True,

attention_bias: bool = False,

hidden_act: str = 'silu',

enable_lm_head_fp32: bool = True,

enable_attention_fp32_softmax: bool = False,

enable_moe_fp32_combine: bool = False,

use_cache: bool = True,

pad_token_id: int | None = 120002,

bos_token_id: int = 120000,

eos_token_id: int = 120025,

tie_word_embeddings: bool = False,

torch_dtype: str = 'bfloat16',

**kwargs,

)#

Bases: transformers.PretrainedConfig

Configuration class for Tencent Hy-MT2-30B-A3B (translation MoE).

Architecture (from tencent/Hy-MT2-30B-A3B config.json):

48 transformer layers; layer 0 is dense, layers 1-47 are MoE
MoE: 128 routed experts + 1 shared expert, top-8 activated
Sigmoid routing with expert-bias correction (e_score_correction_bias) and router_scaling_factor = 2.826
route_norm = True (normalize top-k routing weights)
GQA: 32 Q heads, 4 KV heads, head_dim=128, hidden_size=2048
Per-head Q/K RMSNorm before RoPE (qk_norm)
256K context, rope_theta=11158840
vocab_size=120832, dense intermediate_size=6912, moe_intermediate_size=768
enable_lm_head_fp32 = True (HF reference upcasts lm_head to fp32)

Note: the on-disk HF checkpoint declares model_type: "hy_v3" and architectures: ["HYV3ForCausalLM"]. NeMo AutoModel’s existing HYV3Config therefore wins AutoConfig.from_pretrained. This class is provided for tests and for standalone instantiation; the model code in model.py is duck-typed against config.<field> and works with either config class.

Initialization

model_type#: ‘hy_mt2’

keys_to_ignore_at_inference#: [‘past_key_values’]

nemo_automodel.components.models.hy_mt2.config#

Module Contents#

Classes#

API#

`nemo_automodel.components.models.hy_mt2.config`#