core.models.gpt.moe_module_specs#

Module Contents#

Functions#

get_moe_module_spec

Helper function to get module spec for MoE.

get_moe_module_spec_for_backend

Helper function to get module spec for MoE

get_inference_optimized_moe_spec

MoE module spec for inference-optimized transformer impl.

API#

core.models.gpt.moe_module_specs.get_moe_module_spec(
use_te: Optional[bool] = True,
num_experts: Optional[int] = None,
moe_grouped_gemm: Optional[bool] = False,
) megatron.core.transformer.spec_utils.ModuleSpec#

Helper function to get module spec for MoE.

Called by mamba_layer_specs.py for standard (non-inference) MoE specs. The GPT layer specs call get_moe_module_spec_for_backend directly.

Parameters:
  • use_te – Whether to use Transformer Engine.

  • num_experts – Number of experts.

  • moe_grouped_gemm – Whether to use grouped GEMM.

  • moe_use_legacy_grouped_gemm – Whether to use legacy grouped GEMM.

core.models.gpt.moe_module_specs.get_moe_module_spec_for_backend(
backend: megatron.core.models.backends.BackendSpecProvider,
num_experts: Optional[int] = None,
moe_grouped_gemm: Optional[bool] = False,
use_te_activation_func: bool = False,
) megatron.core.transformer.spec_utils.ModuleSpec#

Helper function to get module spec for MoE

core.models.gpt.moe_module_specs.get_inference_optimized_moe_spec() megatron.core.transformer.spec_utils.ModuleSpec#

MoE module spec for inference-optimized transformer impl.

Uses InferenceSpecProvider to select inference-optimized modules: InferenceTopKRouter, InferenceGroupedMLP. MoELayer detects inference mode via config.transformer_impl and sets up the inference dispatcher internally.

Called by mamba_layer_specs.py and gpt_layer_specs.py.