bridge.models.gpt_provider#

Module Contents#

Classes#

GPTModelProvider

Configuration and provider for Megatron Core GPT models.

GPTProvider175B

Configuration for a 175B parameter GPT model.

Functions#

transformer_engine_layer_spec

Create a Transformer Engine layer specification based on the provided config.

transformer_engine_full_layer_spec

Create a full Transformer Engine layer specification with autocast support.

local_layer_spec

Create a local layer specification without Transformer Engine.

modelopt_transformer_layer_spec

Layer specification for quantization with ModelOpt.

default_layer_spec

Determine the most appropriate layer specification based on availability.

mtp_block_spec

Pass in the MTP block spec if model has MTP layers.

_patch_yarn_concentration_factor

Patch MCore _yarn_get_concentration_factor_from_config for None handling.

_patch_te_grouped_linear_single_grouped_weight

Guard for main/dev branch submodule compat: single_grouped_weight/bias kwargs.

Data#

API#

bridge.models.gpt_provider.logger#

‘getLogger(…)’

bridge.models.gpt_provider.transformer_engine_layer_spec(
config: GPTModelProvider,
) megatron.core.transformer.ModuleSpec#

Create a Transformer Engine layer specification based on the provided config.

bridge.models.gpt_provider.transformer_engine_full_layer_spec(
config: GPTModelProvider,
) megatron.core.transformer.ModuleSpec#

Create a full Transformer Engine layer specification with autocast support.

Parameters:

config – GPT configuration object

Returns:

Module specification for full TE layers

Return type:

ModuleSpec

bridge.models.gpt_provider.local_layer_spec(
config: GPTModelProvider,
) megatron.core.transformer.ModuleSpec#

Create a local layer specification without Transformer Engine.

Parameters:

config – GPT configuration object

Returns:

Module specification for local implementation layers

Return type:

ModuleSpec

bridge.models.gpt_provider.modelopt_transformer_layer_spec(
config: GPTModelProvider,
) megatron.core.transformer.ModuleSpec#

Layer specification for quantization with ModelOpt.

bridge.models.gpt_provider.default_layer_spec(
config: GPTModelProvider,
) megatron.core.transformer.ModuleSpec#

Determine the most appropriate layer specification based on availability.

class bridge.models.gpt_provider.GPTModelProvider#

Bases: megatron.bridge.models.transformer_config.TransformerConfig, megatron.bridge.models.model_provider.ModelProviderMixin[megatron.core.models.gpt.GPTModel]

Configuration and provider for Megatron Core GPT models.

This class extends TransformerConfig with GPT-specific parameters and provides a method to instantiate configured GPT models.

fp16_lm_cross_entropy: bool#

False

parallel_output: bool#

True

share_embeddings_and_output_weights: bool#

True

make_vocab_size_divisible_by: int#

128

position_embedding_type: Literal[learned_absolute, rope, yarn]#

‘learned_absolute’

rotary_base: int#

10000

rotary_percent: float#

1.0

rope_scaling: bool#

False

rope_scaling_factor: float#

1.0

rotary_scaling_factor: Optional[float]#

None

seq_len_interpolation_factor: Optional[float]#

None

yarn_rotary_scaling_factor: Optional[float]#

None

yarn_original_max_position_embeddings: Optional[int]#

None

yarn_beta_fast: Optional[float]#

None

yarn_beta_slow: Optional[float]#

None

yarn_mscale: Optional[float]#

None

yarn_mscale_all_dim: Optional[float]#

None

yarn_correction_range_round_to_int: Optional[bool]#

None

seq_length: int#

1024

attention_softmax_in_fp32: bool#

False

deallocate_pipeline_outputs: bool#

True

scatter_embedding_sequence_parallel: bool#

True

tp_only_amax_red: bool#

False

tp_comm_overlap_cfg: Optional[Union[str, dict[str, Any]]]#

None

Config file when tp_comm_overlap is enabled.

use_transformer_engine_full_layer_spec: bool#

False

use_transformer_engine_op_fuser: bool#

False

transformer_layer_spec: Union[megatron.core.transformer.ModuleSpec, Callable[[bridge.models.gpt_provider.GPTModelProvider], megatron.core.transformer.ModuleSpec]]#

None

hf_model_id: str | None#

None

Optional HuggingFace model identifier associated with this provider.

vocab_size: Optional[int]#

None

should_pad_vocab: bool#

False

num_moe_experts: Optional[int]#

None

moe_grouped_gemm: bool#

False

qk_layernorm: bool#

False

fp8: Optional[str]#

None

normalization: str#

‘LayerNorm’

mtp_enabled: bool#

False

init_model_with_meta_device: bool#

False

use_te_rng_tracker: bool#

False

virtual_pipeline_model_parallel_size: Optional[int]#

None

account_for_embedding_in_pipeline_split: bool#

False

account_for_loss_in_pipeline_split: bool#

False

masked_softmax_fusion: bool#

True

cross_entropy_loss_fusion: bool#

True

gradient_accumulation_fusion: bool#

‘field(…)’

restore_modelopt_state: bool#

False

use_arbitrary_attention_mask: Optional[bool]#

None

_pg_collection: Optional[megatron.core.process_groups_config.ProcessGroupCollection]#

None

provide(
pre_process=None,
post_process=None,
vp_stage=None,
) megatron.core.models.gpt.GPTModel#

Configure and instantiate a Megatron Core GPT model based on this configuration.

Parameters:
  • pre_process – Whether to include pre-processing in the model, defaults to first pipeline stage

  • post_process – Whether to include post-processing in the model, defaults to last pipeline stage

  • vp_stage – Virtual pipeline stage

Returns:

Configured Megatron Core GPT model instance

Return type:

MCoreGPTModel

bridge.models.gpt_provider.mtp_block_spec(
config: bridge.models.gpt_provider.GPTModelProvider,
vp_stage: Optional[int] = None,
) Optional[megatron.core.transformer.ModuleSpec]#

Pass in the MTP block spec if model has MTP layers.

Parameters:

config – GPT configuration object

Returns:

The MTP module specification

Return type:

ModuleSpec

class bridge.models.gpt_provider.GPTProvider175B#

Bases: bridge.models.gpt_provider.GPTModelProvider

Configuration for a 175B parameter GPT model.

Predefined configuration for a massive GPT model with 96 layers, 12288 hidden size, and 96 attention heads.

seq_length: int#

2048

num_layers: int#

96

hidden_size: int#

12288

ffn_hidden_size: int#

49152

num_attention_heads: int#

96

hidden_dropout: float#

0.0

attention_dropout: float#

0.0

bias_activation_fusion: bool#

True

bias_dropout_add_fusion: bool#

True

use_transformer_engine_full_layer_spec: bool#

True

layernorm_zero_centered_gamma: bool#

True

bridge.models.gpt_provider._patch_yarn_concentration_factor()#

Patch MCore _yarn_get_concentration_factor_from_config for None handling.

GPTModelProvider defines yarn_rotary_scaling_factor as Optional[float] = None, but MCore uses hasattr() which returns True for dataclass fields set to None. This causes a crash for non-YARN models. Use getattr + is not None instead.

TODO: Remove once upstream MCore merges the fix.

bridge.models.gpt_provider._patch_te_grouped_linear_single_grouped_weight()#

Guard for main/dev branch submodule compat: single_grouped_weight/bias kwargs.

MCore dev (commit 5c544844) passes single_grouped_weight and single_grouped_bias to TE GroupedLinear.__init__ when is_te_min_version("2.14.0"). However some TE 2.14.0 builds only expose a single single_grouped_parameter kwarg. Remap so both APIs work.

TODO: remove guard once TE ships the split weight/bias API in a stable release and the CI container is updated.