core.models.gpt.gpt_layer_specs#

Module Contents#

Functions#

get_gpt_layer_with_inference_spec

Use this spec to use inference optimized linear layers.

get_gpt_layer_with_transformer_engine_spec

Use this spec to use lower-level Transformer Engine modules (required for fp8 training).

get_gpt_layer_local_spec

Use this spec for an implementation using only modules in Megatron-Core.

_get_mlp_module_spec

get_mlp_module_spec

Helper function to get module spec for MLP/MoE

get_mlp_module_spec_for_backend

Helper function to get module spec for MLP/MoE

get_gpt_decoder_block_spec

GPT block spec.

get_gpt_mtp_block_spec

GPT Multi-Token Prediction (MTP) block spec.

get_gpt_mtp_block_spec_for_backend

GPT Multi-Token Prediction (MTP) block spec.

API#

core.models.gpt.gpt_layer_specs.get_gpt_layer_with_inference_spec(
qk_layernorm: Optional[bool] = False,
multi_latent_attention: Optional[bool] = False,
qk_l2_norm: Optional[bool] = False,
) megatron.core.transformer.spec_utils.ModuleSpec#

Use this spec to use inference optimized linear layers.

Parameters:
  • qk_layernorm (bool, optional) – To use layernorm for queries/keys. Defaults to False.

  • multi_latent_attention (bool, optional) – To use MLA. Defaults to False.

  • qk_l2_norm (bool, optional) – To use l2 norm for queries/keys. Defaults to False.

core.models.gpt.gpt_layer_specs.get_gpt_layer_with_transformer_engine_spec(
num_experts: Optional[int] = None,
moe_grouped_gemm: Optional[bool] = False,
qk_layernorm: Optional[bool] = False,
multi_latent_attention: Optional[bool] = False,
fp8: Optional[str] = None,
moe_use_legacy_grouped_gemm: Optional[bool] = False,
qk_l2_norm: Optional[bool] = False,
use_te_op_fuser: Optional[bool] = False,
use_kitchen: bool = False,
use_te_activation_func: bool = False,
use_kitchen_attention: bool = False,
kitchen_attention_backend: str = 'sdpa',
) megatron.core.transformer.spec_utils.ModuleSpec#

Use this spec to use lower-level Transformer Engine modules (required for fp8 training).

Parameters:
  • num_experts (int, optional) – Number of experts. Defaults to None.

  • moe_grouped_gemm (bool, optional) – To use Grouped GEMM. Defaults to False.

  • qk_layernorm (bool, optional) – To use layernorm for queries/keys. Defaults to False.

  • fp8 (str, optional) – Deprecated. For temporary Nemo compatibility.

  • moe_use_legacy_grouped_gemm (bool, optional) – Force use the legacy GroupedMLP. Defaults to False.

  • qk_l2_norm (bool, optional) – To use l2 norm for queries/keys. Defaults to False.

  • use_te_op_fuser (bool, optional) – Use Transformer Engine’s operation-based API, which may enable certain operation fusions. Defaults to False.

Returns:

Module specification with TE modules

Return type:

ModuleSpec

core.models.gpt.gpt_layer_specs.get_gpt_layer_local_spec(
num_experts: Optional[int] = None,
moe_grouped_gemm: Optional[bool] = False,
qk_layernorm: Optional[bool] = False,
multi_latent_attention: Optional[bool] = False,
fp8: Optional[str] = None,
moe_use_legacy_grouped_gemm: Optional[bool] = False,
normalization: Optional[str] = None,
qk_l2_norm: Optional[bool] = False,
use_kitchen: bool = False,
use_kitchen_attention: bool = False,
kitchen_attention_backend: str = 'sdpa',
) megatron.core.transformer.spec_utils.ModuleSpec#

Use this spec for an implementation using only modules in Megatron-Core.

Parameters:
  • num_experts (int, optional) – Number of experts. Defaults to None.

  • moe_grouped_gemm (bool, optional) – To use Grouped GEMM. Defaults to False.

  • qk_layernorm (bool, optional) – To use layernorm for queries/keys. Defaults to False.

  • fp8 (str, optional) – Deprecated. For temporary Nemo compatibility.

  • moe_use_legacy_grouped_gemm (bool, optional) – Force use the legacy GroupedMLP. Defaults to False.

  • qk_l2_norm (bool, optional) – To use l2 norm for queries/keys. Defaults to False.

Returns:

Module specification with Megatron-Core modules

Return type:

ModuleSpec

core.models.gpt.gpt_layer_specs._get_mlp_module_spec(
use_te: Optional[bool] = True,
num_experts: Optional[int] = None,
moe_grouped_gemm: Optional[bool] = False,
fp8: Optional[str] = None,
moe_use_legacy_grouped_gemm: Optional[bool] = False,
)#
core.models.gpt.gpt_layer_specs.get_mlp_module_spec(
use_te: Optional[bool] = True,
num_experts: Optional[int] = None,
moe_grouped_gemm: Optional[bool] = False,
fp8: Optional[str] = None,
moe_use_legacy_grouped_gemm: Optional[bool] = False,
use_te_op_fuser: Optional[bool] = False,
) megatron.core.transformer.spec_utils.ModuleSpec#

Helper function to get module spec for MLP/MoE

core.models.gpt.gpt_layer_specs.get_mlp_module_spec_for_backend(
backend: megatron.core.models.backends.BackendSpecProvider,
num_experts: Optional[int] = None,
moe_grouped_gemm: Optional[bool] = False,
moe_use_legacy_grouped_gemm: Optional[bool] = False,
use_te_op_fuser: Optional[bool] = False,
use_te_activation_func: bool = False,
) megatron.core.transformer.spec_utils.ModuleSpec#

Helper function to get module spec for MLP/MoE

core.models.gpt.gpt_layer_specs.get_gpt_decoder_block_spec(
config: megatron.core.transformer.transformer_config.TransformerConfig,
use_transformer_engine: bool,
normalization: Optional[str] = None,
qk_l2_norm: Optional[bool] = False,
vp_stage: Optional[int] = None,
pp_rank: Optional[int] = None,
) megatron.core.transformer.transformer_block.TransformerBlockSubmodules#

GPT block spec.

core.models.gpt.gpt_layer_specs.get_gpt_mtp_block_spec(
config: megatron.core.transformer.transformer_config.TransformerConfig,
spec: Union[megatron.core.transformer.transformer_block.TransformerBlockSubmodules, megatron.core.transformer.spec_utils.ModuleSpec],
use_transformer_engine: bool,
vp_stage: Optional[int] = None,
pp_rank: Optional[int] = None,
) megatron.core.transformer.multi_token_prediction.MultiTokenPredictionBlockSubmodules#

GPT Multi-Token Prediction (MTP) block spec.

core.models.gpt.gpt_layer_specs.get_gpt_mtp_block_spec_for_backend(
config: megatron.core.transformer.transformer_config.TransformerConfig,
spec: Union[megatron.core.transformer.transformer_block.TransformerBlockSubmodules, megatron.core.transformer.spec_utils.ModuleSpec],
backend: megatron.core.models.backends.BackendSpecProvider,
vp_stage: Optional[int] = None,
pp_rank: Optional[int] = None,
) megatron.core.transformer.multi_token_prediction.MultiTokenPredictionBlockSubmodules#

GPT Multi-Token Prediction (MTP) block spec.