> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.models.hy_mt2.config

## Module Contents

### Classes

| Name                                                                         | Description                                                       |
| ---------------------------------------------------------------------------- | ----------------------------------------------------------------- |
| [`HyMT2Config`](#nemo_automodel-components-models-hy_mt2-config-HyMT2Config) | Configuration class for Tencent Hy-MT2-30B-A3B (translation MoE). |

### API

```python
class nemo_automodel.components.models.hy_mt2.config.HyMT2Config(
    vocab_size: int = 120832,
    hidden_size: int = 2048,
    intermediate_size: int = 6912,
    moe_intermediate_size: int = 768,
    expert_hidden_dim: int = 768,
    num_hidden_layers: int = 48,
    num_attention_heads: int = 32,
    num_key_value_heads: int = 4,
    head_dim: int = 128,
    num_experts: int = 128,
    num_shared_experts: int = 1,
    num_experts_per_tok: int = 8,
    router_scaling_factor: float = 2.826,
    route_norm: bool = True,
    moe_router_enable_expert_bias: bool = True,
    moe_router_use_sigmoid: bool = True,
    first_k_dense_replace: int = 1,
    max_position_embeddings: int = 262144,
    rope_theta: float = 11158840.0,
    rope_scaling: dict | None = None,
    rms_norm_eps: float = 1e-05,
    qk_norm: bool = True,
    attention_bias: bool = False,
    hidden_act: str = 'silu',
    enable_lm_head_fp32: bool = True,
    enable_attention_fp32_softmax: bool = False,
    enable_moe_fp32_combine: bool = False,
    use_cache: bool = True,
    pad_token_id: int | None = 120002,
    bos_token_id: int = 120000,
    eos_token_id: int = 120025,
    tie_word_embeddings: bool = False,
    torch_dtype: str = 'bfloat16',
    kwargs = {}
)
```

**Bases:** `PretrainedConfig`

Configuration class for Tencent Hy-MT2-30B-A3B (translation MoE).

Architecture (from tencent/Hy-MT2-30B-A3B config.json):

* 48 transformer layers; layer 0 is dense, layers 1-47 are MoE
* MoE: 128 routed experts + 1 shared expert, top-8 activated
* Sigmoid routing with expert-bias correction (e\_score\_correction\_bias)
  and router\_scaling\_factor = 2.826
* route\_norm = True (normalize top-k routing weights)
* GQA: 32 Q heads, 4 KV heads, head\_dim=128, hidden\_size=2048
* Per-head Q/K RMSNorm before RoPE (qk\_norm)
* 256K context, rope\_theta=11158840
* vocab\_size=120832, dense intermediate\_size=6912, moe\_intermediate\_size=768
* enable\_lm\_head\_fp32 = True (HF reference upcasts lm\_head to fp32)

Note: the on-disk HF checkpoint declares `model_type: "hy_v3"` and
`architectures: ["HYV3ForCausalLM"]`. NeMo AutoModel's existing
`HYV3Config` therefore wins `AutoConfig.from_pretrained`. This class
is provided for tests and for standalone instantiation; the model code in
`model.py` is duck-typed against `config.&lt;field&gt;` and works with either
config class.