bridge.training.utils.moe_token_drop#

Module Contents#

Functions#

apply_moe_token_drop

Token drop improves performance by better balancing work across experts, but may affect convergence.

API#

bridge.training.utils.moe_token_drop.apply_moe_token_drop(
model_provider: megatron.core.transformer.transformer_config.TransformerConfig,
moe_expert_capacity_factor: float = 1.0,
moe_pad_expert_input_to_capacity: bool = True,
) None#

Token drop improves performance by better balancing work across experts, but may affect convergence.

Parameters:
  • model_provider (TransformerConfig) – The transformer config to apply the token drop settings to

  • moe_expert_capacity_factor (float) – The capacity factor for all experts

  • moe_pad_expert_input_to_capacity (bool) – Pad the input for each expert to the expert capacity length

Raises:
  • AssertionError – If moe_router_load_balancing_type is not aux_loss, seq_aux_loss, or none

  • AssertionError – If moe_token_dispatcher_type is not alltoall or alltoall_seq

  • ValueError – If moe_expert_capacity_factor is not set and moe_pad_expert_input_to_capacity is True