nemo_automodel.components.moe.config#
MoE parallelizer configuration.
Module Contents#
Classes#
Configuration for MoE model parallelization (EP + FSDP settings). |
|
Configuration for MoE load balance metrics logging. |
API#
- class nemo_automodel.components.moe.config.MoEParallelizerConfig#
Configuration for MoE model parallelization (EP + FSDP settings).
- ignore_router_for_ac: bool#
False
- reshard_after_forward: bool#
False
- lm_head_precision: Optional[Union[str, torch.dtype]]#
None
- wrap_outer_model: bool#
True
- to_dict() Dict[str, Any]#
- class nemo_automodel.components.moe.config.MoEConfig#
- n_routed_experts: int#
None
None
- n_activated_experts: int#
None
- n_expert_groups: int#
None
- n_limited_groups: int#
None
- train_gate: bool#
None
- gate_bias_update_factor: float#
None
- aux_loss_coeff: float#
None
- score_func: str#
None
- route_scale: float#
None
- dim: int#
None
- inter_dim: int#
None
- moe_inter_dim: int#
None
- norm_topk_prob: bool#
None
- router_bias: bool#
False
- expert_bias: bool#
False
- expert_activation: Literal[swiglu, quick_geglu, relu2]#
âswigluâ
- activation_alpha: float#
1.702
- activation_limit: float#
7.0
- softmax_before_topk: bool#
False
- dtype: str | torch.dtype#
None
False
None
âswigluâ
- force_e_score_correction_bias: bool#
False
- __post_init__()#
- class nemo_automodel.components.moe.config.MoEMetricsConfig#
Configuration for MoE load balance metrics logging.
.. attribute:: enabled
Whether to enable load balance metric tracking.
.. attribute:: mode
Logging mode - âbriefâ for scalar line charts only, âdetailedâ adds per-layer breakdowns.
.. attribute:: detailed_every_steps
How often to log detailed metrics (only used when mode=âdetailedâ). None means every step.
.. attribute:: top_k_experts
Number of top (highest) and bottom (lowest) utilization experts to emit per layer. Reduces wandb key count for models with many experts.
- enabled: bool#
False
- mode: str#
âbriefâ
- detailed_every_steps: Optional[int]#
None
- top_k_experts: int#
5