bridge.training.utils.moe_token_drop#
Module Contents#
Functions#
Token drop improves performance by better balancing work across experts, but may affect convergence. |
Data#
API#
- bridge.training.utils.moe_token_drop.logger: logging.Logger#
βgetLogger(β¦)β
- bridge.training.utils.moe_token_drop.apply_moe_token_drop(
- model_provider: megatron.core.transformer.transformer_config.TransformerConfig,
- moe_expert_capacity_factor: float = 1.0,
- moe_pad_expert_input_to_capacity: bool = True,
Token drop improves performance by better balancing work across experts, but may affect convergence.
MoE token drop is applicable to MoE models only.
- Parameters:
model_provider (TransformerConfig) β The transformer config to apply the token drop settings to
moe_expert_capacity_factor (float) β The capacity factor for all experts
moe_pad_expert_input_to_capacity (bool) β Pad the input for each expert to the expert capacity length
- Raises:
AssertionError β If moe_router_load_balancing_type is not aux_loss, seq_aux_loss, or none
AssertionError β If moe_token_dispatcher_type is not alltoall or alltoall_seq
ValueError β If moe_expert_capacity_factor is not set and moe_pad_expert_input_to_capacity is True