bridge.models.gpt_provider
#
Module Contents#
Classes#
Configuration and provider for Megatron Core GPT models. |
|
Configuration for a 126M parameter GPT model. |
|
Configuration for a 5B parameter GPT model. |
|
Configuration for a 7B parameter GPT model. |
|
Configuration for a 20B parameter GPT model. |
|
Configuration for a 40B parameter GPT model. |
|
Configuration for a 175B parameter GPT model. |
Functions#
Create a Transformer Engine layer specification based on the provided config. |
|
Create a full Transformer Engine layer specification with autocast support. |
|
Create a local layer specification without Transformer Engine. |
|
Determine the most appropriate layer specification based on availability. |
|
Calculate padded vocab size for tensor parallelism. |
|
Pass in the MTP block spec if model has MTP layers. |
Data#
API#
- bridge.models.gpt_provider.logger#
‘getLogger(…)’
- bridge.models.gpt_provider.transformer_engine_layer_spec(
- config: GPTModelProvider,
Create a Transformer Engine layer specification based on the provided config.
- bridge.models.gpt_provider.transformer_engine_full_layer_spec(
- config: GPTModelProvider,
Create a full Transformer Engine layer specification with autocast support.
- Parameters:
config – GPT configuration object
- Returns:
Module specification for full TE layers
- Return type:
ModuleSpec
- bridge.models.gpt_provider.local_layer_spec(
- config: GPTModelProvider,
Create a local layer specification without Transformer Engine.
- Parameters:
config – GPT configuration object
- Returns:
Module specification for local implementation layers
- Return type:
ModuleSpec
- bridge.models.gpt_provider.default_layer_spec(
- config: GPTModelProvider,
Determine the most appropriate layer specification based on availability.
- class bridge.models.gpt_provider.GPTModelProvider#
Bases:
megatron.core.transformer.transformer_config.TransformerConfig
,megatron.bridge.models.model_provider.ModelProviderMixin
[megatron.core.models.gpt.GPTModel
]Configuration and provider for Megatron Core GPT models.
This class extends TransformerConfig with GPT-specific parameters and provides a method to instantiate configured GPT models.
- fp16_lm_cross_entropy: bool#
False
- parallel_output: bool#
True
True
- make_vocab_size_divisible_by: int#
128
- position_embedding_type: Literal[learned_absolute, rope]#
‘learned_absolute’
- rotary_base: int#
10000
- rotary_percent: float#
1.0
- seq_len_interpolation_factor: Optional[float]#
None
- seq_length: int#
1024
- attention_softmax_in_fp32: bool#
False
- deallocate_pipeline_outputs: bool#
True
- scatter_embedding_sequence_parallel: bool#
True
- tp_only_amax_red: bool#
False
- tp_comm_overlap_cfg: Optional[Union[str, dict[str, Any]]]#
None
Config file when tp_comm_overlap is enabled.
- use_transformer_engine_full_layer_spec: bool#
False
- transformer_layer_spec: Union[megatron.core.transformer.ModuleSpec, Callable[[bridge.models.gpt_provider.GPTModelProvider], megatron.core.transformer.ModuleSpec]]#
None
- generation_config: Optional[Any]#
None
- vocab_size: Optional[int]#
None
- num_moe_experts: Optional[int]#
None
- moe_grouped_gemm: bool#
False
- qk_layernorm: bool#
False
- fp8: Optional[str]#
None
- normalization: str#
‘LayerNorm’
- mtp_enabled: bool#
False
- init_model_with_meta_device: bool#
False
- use_te_rng_tracker: bool#
False
- enable_cuda_graph: bool#
False
- virtual_pipeline_model_parallel_size: Optional[int]#
None
- account_for_embedding_in_pipeline_split: bool#
False
- account_for_loss_in_pipeline_split: bool#
False
- masked_softmax_fusion: bool#
‘field(…)’
- cross_entropy_loss_fusion: bool#
True
- gradient_accumulation_fusion: bool#
‘field(…)’
- bias_activation_fusion: bool#
False
- persist_layer_norm: bool#
False
- bias_dropout_fusion: bool#
‘field(…)’
- apply_rope_fusion: bool#
‘field(…)’
- provide(
- pre_process=None,
- post_process=None,
- vp_stage=None,
- tokenizer=None,
Configure and instantiate a Megatron Core GPT model based on this configuration.
- Parameters:
pre_process – Whether to include pre-processing in the model, defaults to first pipeline stage
post_process – Whether to include post-processing in the model, defaults to last pipeline stage
vp_stage – Virtual pipeline stage
tokenizer – Tokenizer used with the model
- Returns:
Configured Megatron Core GPT model instance
- Return type:
MCoreGPTModel
- bridge.models.gpt_provider.get_vocab_size(
- config: megatron.core.transformer.transformer_config.TransformerConfig,
- vocab_size: int,
- make_vocab_size_divisible_by: int,
Calculate padded vocab size for tensor parallelism.
- bridge.models.gpt_provider.mtp_block_spec(
- config: bridge.models.gpt_provider.GPTModelProvider,
- vp_stage: Optional[int] = None,
Pass in the MTP block spec if model has MTP layers.
- Parameters:
config – GPT configuration object
- Returns:
The MTP module specification
- Return type:
ModuleSpec
- class bridge.models.gpt_provider.GPTProvider126M#
Bases:
bridge.models.gpt_provider.GPTModelProvider
Configuration for a 126M parameter GPT model.
Predefined configuration for a small GPT model with 12 layers, 768 hidden size, and 12 attention heads.
- seq_length: int#
2048
- num_layers: int#
12
768
3072
- num_attention_heads: int#
12
- bias_activation_fusion: bool#
True
- bias_dropout_add_fusion: bool#
True
- class bridge.models.gpt_provider.GPTProvider5B#
Bases:
bridge.models.gpt_provider.GPTModelProvider
Configuration for a 5B parameter GPT model.
Predefined configuration for a medium-sized GPT model with 24 layers, 4096 hidden size, and 32 attention heads.
- seq_length: int#
2048
- num_layers: int#
24
4096
16384
- num_attention_heads: int#
32
- bias_activation_fusion: bool#
True
- bias_dropout_add_fusion: bool#
True
- class bridge.models.gpt_provider.GPTProvider7B#
Bases:
bridge.models.gpt_provider.GPTModelProvider
Configuration for a 7B parameter GPT model.
Predefined configuration for a medium-sized GPT model with 32 layers, 4096 hidden size, and 32 attention heads.
- seq_length: int#
2048
- num_layers: int#
32
4096
10880
- num_attention_heads: int#
32
- bias_activation_fusion: bool#
True
- bias_dropout_add_fusion: bool#
True
- class bridge.models.gpt_provider.GPTProvider20B#
Bases:
bridge.models.gpt_provider.GPTModelProvider
Configuration for a 20B parameter GPT model.
Predefined configuration for a large GPT model with 44 layers, 6144 hidden size, and 48 attention heads.
- seq_length: int#
2048
- num_layers: int#
44
6144
24576
- num_attention_heads: int#
48
- bias_activation_fusion: bool#
True
- bias_dropout_add_fusion: bool#
True
- class bridge.models.gpt_provider.GPTProvider40B#
Bases:
bridge.models.gpt_provider.GPTModelProvider
Configuration for a 40B parameter GPT model.
Predefined configuration for a large GPT model with 48 layers, 8192 hidden size, and 64 attention heads.
- seq_length: int#
2048
- num_layers: int#
48
8192
32768
- num_attention_heads: int#
64
- bias_activation_fusion: bool#
True
- bias_dropout_add_fusion: bool#
True
- class bridge.models.gpt_provider.GPTProvider175B#
Bases:
bridge.models.gpt_provider.GPTModelProvider
Configuration for a 175B parameter GPT model.
Predefined configuration for a massive GPT model with 96 layers, 12288 hidden size, and 96 attention heads.
- seq_length: int#
2048
- num_layers: int#
96
12288
49152
- num_attention_heads: int#
96
0.0
- attention_dropout: float#
0.0
- bias_activation_fusion: bool#
True
- bias_dropout_add_fusion: bool#
True
- layernorm_zero_centered_gamma: bool#
True