bridge.training.utils.moe_token_drop
#
Module Contents#
Functions#
Token drop improves performance by better balancing work across experts, but may affect convergence. |
API#
- bridge.training.utils.moe_token_drop.apply_moe_token_drop(
- model_provider: megatron.core.transformer.transformer_config.TransformerConfig,
- moe_expert_capacity_factor: float = 1.0,
- moe_pad_expert_input_to_capacity: bool = True,
Token drop improves performance by better balancing work across experts, but may affect convergence.
- Parameters:
model_provider (TransformerConfig) – The transformer config to apply the token drop settings to
moe_expert_capacity_factor (float) – The capacity factor for all experts
moe_pad_expert_input_to_capacity (bool) – Pad the input for each expert to the expert capacity length
- Raises:
AssertionError – If moe_router_load_balancing_type is not aux_loss, seq_aux_loss, or none
AssertionError – If moe_token_dispatcher_type is not alltoall or alltoall_seq
ValueError – If moe_expert_capacity_factor is not set and moe_pad_expert_input_to_capacity is True