> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.moe.config

MoE model configuration.

## Module Contents

### Classes

| Name                                                                         | Description                                             |
| ---------------------------------------------------------------------------- | ------------------------------------------------------- |
| [`MoEConfig`](#nemo_automodel-components-moe-config-MoEConfig)               | Configuration for routed and shared MoE expert modules. |
| [`MoEMetricsConfig`](#nemo_automodel-components-moe-config-MoEMetricsConfig) | Configuration for MoE load balance metrics logging.     |

### API

```python
class nemo_automodel.components.moe.config.MoEConfig(
    n_routed_experts: int,
    n_shared_experts: int,
    n_activated_experts: int,
    n_expert_groups: int,
    n_limited_groups: int,
    train_gate: bool,
    gate_bias_update_factor: float,
    aux_loss_coeff: float,
    score_func: str,
    route_scale: float,
    dim: int,
    inter_dim: int,
    moe_inter_dim: int,
    norm_topk_prob: bool,
    router_bias: bool = False,
    expert_bias: bool = False,
    expert_activation: typing.Literal['swiglu', 'swigluoai', 'quick_geglu', 'geglu', 'relu2'] = 'swiglu',
    activation_alpha: float = 1.702,
    activation_limit: float = 7.0,
    swiglu_limit: float = 0.0,
    softmax_before_topk: bool = False,
    dtype: str | torch.dtype = torch.bfloat16,
    shared_expert_gate: bool = False,
    shared_expert_inter_dim: int | None = None,
    shared_expert_activation: str = 'swiglu',
    force_e_score_correction_bias: bool = False,
    moe_latent_size: int | None = None
)
```

Dataclass

Configuration for routed and shared MoE expert modules.

Dimension used for expert projections (latent size when set, otherwise model dim).

```python
nemo_automodel.components.moe.config.MoEConfig.__post_init__()
```

```python
class nemo_automodel.components.moe.config.MoEMetricsConfig(
    enabled: bool = False,
    mode: str = 'brief',
    detailed_every_steps: typing.Optional[int] = None,
    top_k_experts: int = 0
)
```

Dataclass

Configuration for MoE load balance metrics logging.