`bridge.training.utils.moe_token_drop`#

Module Contents#

Functions#

apply_moe_token_drop

Token drop improves performance by better balancing work across experts, but may affect convergence.

Data#

logger

API#

bridge.training.utils.moe_token_drop.logger: logging.Logger#: ‘getLogger(…)’

bridge.training.utils.moe_token_drop.apply_moe_token_drop( model_provider: megatron.core.transformer.transformer_config.TransformerConfig, moe_expert_capacity_factor: float = 1.0, moe_pad_expert_input_to_capacity: bool = True, ) → None#

Token drop improves performance by better balancing work across experts, but may affect convergence.

MoE token drop is applicable to MoE models only.

Parameters:

model_provider (TransformerConfig) – The transformer config to apply the token drop settings to
moe_expert_capacity_factor (float) – The capacity factor for all experts
moe_pad_expert_input_to_capacity (bool) – Pad the input for each expert to the expert capacity length

Raises:

AssertionError – If moe_router_load_balancing_type is not aux_loss, seq_aux_loss, or none
AssertionError – If moe_token_dispatcher_type is not alltoall or alltoall_seq
ValueError – If moe_expert_capacity_factor is not set and moe_pad_expert_input_to_capacity is True

bridge.training.utils.moe_token_drop#

Module Contents#

Functions#

Data#

API#

`bridge.training.utils.moe_token_drop`#