core.models.gpt.moe_module_specs#
Module Contents#
Functions#
Helper function to get module spec for MoE. |
|
Helper function to get module spec for MoE |
|
MoE module spec for inference-optimized transformer impl. |
API#
- core.models.gpt.moe_module_specs.get_moe_module_spec(
- use_te: Optional[bool] = True,
- num_experts: Optional[int] = None,
- moe_grouped_gemm: Optional[bool] = False,
Helper function to get module spec for MoE.
Called by mamba_layer_specs.py for standard (non-inference) MoE specs. The GPT layer specs call get_moe_module_spec_for_backend directly.
- Parameters:
use_te – Whether to use Transformer Engine.
num_experts – Number of experts.
moe_grouped_gemm – Whether to use grouped GEMM.
moe_use_legacy_grouped_gemm – Whether to use legacy grouped GEMM.
- core.models.gpt.moe_module_specs.get_moe_module_spec_for_backend(
- backend: megatron.core.models.backends.BackendSpecProvider,
- num_experts: Optional[int] = None,
- moe_grouped_gemm: Optional[bool] = False,
- use_te_activation_func: bool = False,
Helper function to get module spec for MoE
- core.models.gpt.moe_module_specs.get_inference_optimized_moe_spec() megatron.core.transformer.spec_utils.ModuleSpec#
MoE module spec for inference-optimized transformer impl.
Uses InferenceSpecProvider to select inference-optimized modules: InferenceTopKRouter, InferenceGroupedMLP. MoELayer detects inference mode via config.transformer_impl and sets up the inference dispatcher internally.
Called by mamba_layer_specs.py and gpt_layer_specs.py.