bridge.models.gpt_provider#
Module Contents#
Classes#
Configuration and provider for Megatron Core GPT models. |
|
Configuration for a 175B parameter GPT model. |
Functions#
Create a Transformer Engine layer specification based on the provided config. |
|
Create a full Transformer Engine layer specification with autocast support. |
|
Create a local layer specification without Transformer Engine. |
|
Layer specification for quantization with ModelOpt. |
|
Determine the most appropriate layer specification based on availability. |
|
Pass in the MTP block spec if model has MTP layers. |
|
Patch MCore _yarn_get_concentration_factor_from_config for None handling. |
|
Guard for main/dev branch submodule compat: single_grouped_weight/bias kwargs. |
Data#
API#
- bridge.models.gpt_provider.logger#
‘getLogger(…)’
- bridge.models.gpt_provider.transformer_engine_layer_spec(
- config: GPTModelProvider,
Create a Transformer Engine layer specification based on the provided config.
- bridge.models.gpt_provider.transformer_engine_full_layer_spec(
- config: GPTModelProvider,
Create a full Transformer Engine layer specification with autocast support.
- Parameters:
config – GPT configuration object
- Returns:
Module specification for full TE layers
- Return type:
ModuleSpec
- bridge.models.gpt_provider.local_layer_spec(
- config: GPTModelProvider,
Create a local layer specification without Transformer Engine.
- Parameters:
config – GPT configuration object
- Returns:
Module specification for local implementation layers
- Return type:
ModuleSpec
- bridge.models.gpt_provider.modelopt_transformer_layer_spec(
- config: GPTModelProvider,
Layer specification for quantization with ModelOpt.
- bridge.models.gpt_provider.default_layer_spec(
- config: GPTModelProvider,
Determine the most appropriate layer specification based on availability.
- class bridge.models.gpt_provider.GPTModelProvider#
Bases:
megatron.bridge.models.transformer_config.TransformerConfig,megatron.bridge.models.model_provider.ModelProviderMixin[megatron.core.models.gpt.GPTModel]Configuration and provider for Megatron Core GPT models.
This class extends TransformerConfig with GPT-specific parameters and provides a method to instantiate configured GPT models.
- fp16_lm_cross_entropy: bool#
False
- parallel_output: bool#
True
True
- make_vocab_size_divisible_by: int#
128
- position_embedding_type: Literal[learned_absolute, rope, yarn]#
‘learned_absolute’
- rotary_base: int#
10000
- rotary_percent: float#
1.0
- rope_scaling: bool#
False
- rope_scaling_factor: float#
1.0
- rotary_scaling_factor: Optional[float]#
None
- seq_len_interpolation_factor: Optional[float]#
None
- yarn_rotary_scaling_factor: Optional[float]#
None
- yarn_original_max_position_embeddings: Optional[int]#
None
- yarn_beta_fast: Optional[float]#
None
- yarn_beta_slow: Optional[float]#
None
- yarn_mscale: Optional[float]#
None
- yarn_mscale_all_dim: Optional[float]#
None
- yarn_correction_range_round_to_int: Optional[bool]#
None
- seq_length: int#
1024
- attention_softmax_in_fp32: bool#
False
- deallocate_pipeline_outputs: bool#
True
- scatter_embedding_sequence_parallel: bool#
True
- tp_only_amax_red: bool#
False
- tp_comm_overlap_cfg: Optional[Union[str, dict[str, Any]]]#
None
Config file when tp_comm_overlap is enabled.
- use_transformer_engine_full_layer_spec: bool#
False
- use_transformer_engine_op_fuser: bool#
False
- transformer_layer_spec: Union[megatron.core.transformer.ModuleSpec, Callable[[bridge.models.gpt_provider.GPTModelProvider], megatron.core.transformer.ModuleSpec]]#
None
- hf_model_id: str | None#
None
Optional HuggingFace model identifier associated with this provider.
- vocab_size: Optional[int]#
None
- should_pad_vocab: bool#
False
- num_moe_experts: Optional[int]#
None
- moe_grouped_gemm: bool#
False
- qk_layernorm: bool#
False
- fp8: Optional[str]#
None
- normalization: str#
‘LayerNorm’
- mtp_enabled: bool#
False
- init_model_with_meta_device: bool#
False
- use_te_rng_tracker: bool#
False
- virtual_pipeline_model_parallel_size: Optional[int]#
None
- account_for_embedding_in_pipeline_split: bool#
False
- account_for_loss_in_pipeline_split: bool#
False
- masked_softmax_fusion: bool#
True
- cross_entropy_loss_fusion: bool#
True
- gradient_accumulation_fusion: bool#
‘field(…)’
- restore_modelopt_state: bool#
False
- use_arbitrary_attention_mask: Optional[bool]#
None
- _pg_collection: Optional[megatron.core.process_groups_config.ProcessGroupCollection]#
None
- provide(
- pre_process=None,
- post_process=None,
- vp_stage=None,
Configure and instantiate a Megatron Core GPT model based on this configuration.
- Parameters:
pre_process – Whether to include pre-processing in the model, defaults to first pipeline stage
post_process – Whether to include post-processing in the model, defaults to last pipeline stage
vp_stage – Virtual pipeline stage
- Returns:
Configured Megatron Core GPT model instance
- Return type:
MCoreGPTModel
- bridge.models.gpt_provider.mtp_block_spec(
- config: bridge.models.gpt_provider.GPTModelProvider,
- vp_stage: Optional[int] = None,
Pass in the MTP block spec if model has MTP layers.
- Parameters:
config – GPT configuration object
- Returns:
The MTP module specification
- Return type:
ModuleSpec
- class bridge.models.gpt_provider.GPTProvider175B#
Bases:
bridge.models.gpt_provider.GPTModelProviderConfiguration for a 175B parameter GPT model.
Predefined configuration for a massive GPT model with 96 layers, 12288 hidden size, and 96 attention heads.
- seq_length: int#
2048
- num_layers: int#
96
12288
49152
- num_attention_heads: int#
96
0.0
- attention_dropout: float#
0.0
- bias_activation_fusion: bool#
True
- bias_dropout_add_fusion: bool#
True
- use_transformer_engine_full_layer_spec: bool#
True
- layernorm_zero_centered_gamma: bool#
True
- bridge.models.gpt_provider._patch_yarn_concentration_factor()#
Patch MCore _yarn_get_concentration_factor_from_config for None handling.
GPTModelProvider defines yarn_rotary_scaling_factor as Optional[float] = None, but MCore uses hasattr() which returns True for dataclass fields set to None. This causes a crash for non-YARN models. Use getattr + is not None instead.
TODO: Remove once upstream MCore merges the fix.
- bridge.models.gpt_provider._patch_te_grouped_linear_single_grouped_weight()#
Guard for main/dev branch submodule compat: single_grouped_weight/bias kwargs.
MCore dev (commit 5c544844) passes
single_grouped_weightandsingle_grouped_biasto TEGroupedLinear.__init__whenis_te_min_version("2.14.0"). However some TE 2.14.0 builds only expose a singlesingle_grouped_parameterkwarg. Remap so both APIs work.TODO: remove guard once TE ships the split weight/bias API in a stable release and the CI container is updated.