core.post_training.modelopt.gpt.model_specs#

Module Contents#

Functions#

get_gpt_modelopt_spec

Mix the native spec with TENorm.

API#

core.post_training.modelopt.gpt.model_specs.get_gpt_modelopt_spec(
config: megatron.core.transformer.transformer_config.TransformerConfig,
local_core_attention: bool = False,
remap_te_layernorm: bool = False,
real_quant_cfg: str = 'None',
qk_l2_norm: bool = False,
use_arbitrary_attention_mask: bool = False,
)#

Mix the native spec with TENorm.

This is essentially the native local spec except for the layernorm implementation is using TENorm from Transformer-Engine. The issue is that FusedLayerNorm from apex has stopped supporting RMSNorm needed by llama.

Parameters:
  • config – model’s transformer config

  • local_core_attention – whether to use local DotProductAttention or TEDotProductAttention

  • remap_te_layernorm – whether to perform sharded state_dict prefix mapping on layernorm

  • real_quant_cfg – Model Optimizer real quantization config

  • qk_l2_norm – whether to use Llama4 L2 norm for Q and K

  • use_arbitrary_attention_mask – whether to use arbitrary attention mask instead of causal