core.models.gpt.gpt_layer_specs#
Module Contents#
Functions#
Use this spec to use inference optimized linear layers. |
|
Use this spec to use lower-level Transformer Engine modules (required for fp8 training). |
|
Use this spec for an implementation using only modules in Megatron-Core. |
|
Helper function to get module spec for MLP/MoE |
|
Helper function to get module spec for MLP/MoE |
|
GPT block spec. |
|
GPT Multi-Token Prediction (MTP) block spec. |
|
GPT Multi-Token Prediction (MTP) block spec. |
API#
- core.models.gpt.gpt_layer_specs.get_gpt_layer_with_inference_spec(
- qk_layernorm: Optional[bool] = False,
- multi_latent_attention: Optional[bool] = False,
- qk_l2_norm: Optional[bool] = False,
Use this spec to use inference optimized linear layers.
- Parameters:
qk_layernorm (bool, optional) – To use layernorm for queries/keys. Defaults to False.
multi_latent_attention (bool, optional) – To use MLA. Defaults to False.
qk_l2_norm (bool, optional) – To use l2 norm for queries/keys. Defaults to False.
- core.models.gpt.gpt_layer_specs.get_gpt_layer_with_transformer_engine_spec(
- num_experts: Optional[int] = None,
- moe_grouped_gemm: Optional[bool] = False,
- qk_layernorm: Optional[bool] = False,
- multi_latent_attention: Optional[bool] = False,
- fp8: Optional[str] = None,
- moe_use_legacy_grouped_gemm: Optional[bool] = False,
- qk_l2_norm: Optional[bool] = False,
- use_te_op_fuser: Optional[bool] = False,
- use_kitchen: bool = False,
- use_te_activation_func: bool = False,
- use_kitchen_attention: bool = False,
- kitchen_attention_backend: str = 'sdpa',
Use this spec to use lower-level Transformer Engine modules (required for fp8 training).
- Parameters:
num_experts (int, optional) – Number of experts. Defaults to None.
moe_grouped_gemm (bool, optional) – To use Grouped GEMM. Defaults to False.
qk_layernorm (bool, optional) – To use layernorm for queries/keys. Defaults to False.
fp8 (str, optional) – Deprecated. For temporary Nemo compatibility.
moe_use_legacy_grouped_gemm (bool, optional) – Force use the legacy GroupedMLP. Defaults to False.
qk_l2_norm (bool, optional) – To use l2 norm for queries/keys. Defaults to False.
use_te_op_fuser (bool, optional) – Use Transformer Engine’s operation-based API, which may enable certain operation fusions. Defaults to False.
- Returns:
Module specification with TE modules
- Return type:
- core.models.gpt.gpt_layer_specs.get_gpt_layer_local_spec(
- num_experts: Optional[int] = None,
- moe_grouped_gemm: Optional[bool] = False,
- qk_layernorm: Optional[bool] = False,
- multi_latent_attention: Optional[bool] = False,
- fp8: Optional[str] = None,
- moe_use_legacy_grouped_gemm: Optional[bool] = False,
- normalization: Optional[str] = None,
- qk_l2_norm: Optional[bool] = False,
- use_kitchen: bool = False,
- use_kitchen_attention: bool = False,
- kitchen_attention_backend: str = 'sdpa',
Use this spec for an implementation using only modules in Megatron-Core.
- Parameters:
num_experts (int, optional) – Number of experts. Defaults to None.
moe_grouped_gemm (bool, optional) – To use Grouped GEMM. Defaults to False.
qk_layernorm (bool, optional) – To use layernorm for queries/keys. Defaults to False.
fp8 (str, optional) – Deprecated. For temporary Nemo compatibility.
moe_use_legacy_grouped_gemm (bool, optional) – Force use the legacy GroupedMLP. Defaults to False.
qk_l2_norm (bool, optional) – To use l2 norm for queries/keys. Defaults to False.
- Returns:
Module specification with Megatron-Core modules
- Return type:
- core.models.gpt.gpt_layer_specs._get_mlp_module_spec(
- use_te: Optional[bool] = True,
- num_experts: Optional[int] = None,
- moe_grouped_gemm: Optional[bool] = False,
- fp8: Optional[str] = None,
- moe_use_legacy_grouped_gemm: Optional[bool] = False,
- core.models.gpt.gpt_layer_specs.get_mlp_module_spec(
- use_te: Optional[bool] = True,
- num_experts: Optional[int] = None,
- moe_grouped_gemm: Optional[bool] = False,
- fp8: Optional[str] = None,
- moe_use_legacy_grouped_gemm: Optional[bool] = False,
- use_te_op_fuser: Optional[bool] = False,
Helper function to get module spec for MLP/MoE
- core.models.gpt.gpt_layer_specs.get_mlp_module_spec_for_backend(
- backend: megatron.core.models.backends.BackendSpecProvider,
- num_experts: Optional[int] = None,
- moe_grouped_gemm: Optional[bool] = False,
- moe_use_legacy_grouped_gemm: Optional[bool] = False,
- use_te_op_fuser: Optional[bool] = False,
- use_te_activation_func: bool = False,
Helper function to get module spec for MLP/MoE
- core.models.gpt.gpt_layer_specs.get_gpt_decoder_block_spec(
- config: megatron.core.transformer.transformer_config.TransformerConfig,
- use_transformer_engine: bool,
- normalization: Optional[str] = None,
- qk_l2_norm: Optional[bool] = False,
- vp_stage: Optional[int] = None,
- pp_rank: Optional[int] = None,
GPT block spec.
- core.models.gpt.gpt_layer_specs.get_gpt_mtp_block_spec(
- config: megatron.core.transformer.transformer_config.TransformerConfig,
- spec: Union[megatron.core.transformer.transformer_block.TransformerBlockSubmodules, megatron.core.transformer.spec_utils.ModuleSpec],
- use_transformer_engine: bool,
- vp_stage: Optional[int] = None,
- pp_rank: Optional[int] = None,
GPT Multi-Token Prediction (MTP) block spec.
- core.models.gpt.gpt_layer_specs.get_gpt_mtp_block_spec_for_backend(
- config: megatron.core.transformer.transformer_config.TransformerConfig,
- spec: Union[megatron.core.transformer.transformer_block.TransformerBlockSubmodules, megatron.core.transformer.spec_utils.ModuleSpec],
- backend: megatron.core.models.backends.BackendSpecProvider,
- vp_stage: Optional[int] = None,
- pp_rank: Optional[int] = None,
GPT Multi-Token Prediction (MTP) block spec.