`nemo_automodel.components.moe.config`#

MoE parallelizer configuration.

Module Contents#

Classes#

`MoEParallelizerConfig`	Configuration for MoE model parallelization (EP + FSDP settings).
`MoEConfig`
`MoEMetricsConfig`	Configuration for MoE load balance metrics logging.

API#

class nemo_automodel.components.moe.config.MoEParallelizerConfig#

Configuration for MoE model parallelization (EP + FSDP settings).

ignore_router_for_ac: bool#: False

reshard_after_forward: bool#: False

lm_head_precision: Optional[Union[str, torch.dtype]]#: None

wrap_outer_model: bool#: True

to_dict() → Dict[str, Any]#

class nemo_automodel.components.moe.config.MoEConfig#

n_routed_experts: int#: None

n_shared_experts: int#: None

n_activated_experts: int#: None

n_expert_groups: int#: None

n_limited_groups: int#: None

train_gate: bool#: None

gate_bias_update_factor: float#: None

aux_loss_coeff: float#: None

score_func: str#: None

route_scale: float#: None

dim: int#: None

inter_dim: int#: None

moe_inter_dim: int#: None

norm_topk_prob: bool#: None

router_bias: bool#: False

expert_bias: bool#: False

expert_activation: Literal[swiglu, quick_geglu, relu2]#: ‘swiglu’

activation_alpha: float#: 1.702

activation_limit: float#: 7.0

softmax_before_topk: bool#: False

dtype: str | torch.dtype#: None

shared_expert_gate: bool#: False

shared_expert_inter_dim: int | None#: None

shared_expert_activation: str#: ‘swiglu’

force_e_score_correction_bias: bool#: False

__post_init__()#

class nemo_automodel.components.moe.config.MoEMetricsConfig#

Configuration for MoE load balance metrics logging.

.. attribute:: enabled

Whether to enable load balance metric tracking.

.. attribute:: mode

Logging mode - “brief” for scalar line charts only, “detailed” adds per-layer breakdowns.

.. attribute:: detailed_every_steps

How often to log detailed metrics (only used when mode=”detailed”). None means every step.

.. attribute:: top_k_experts

Number of top (highest) and bottom (lowest) utilization experts to emit per layer. Reduces wandb key count for models with many experts.

enabled: bool#: False

mode: str#: ‘brief’

detailed_every_steps: Optional[int]#: None

top_k_experts: int#: 5

nemo_automodel.components.moe.config#

Module Contents#

Classes#

API#

`nemo_automodel.components.moe.config`#