bridge.models.gpt_provider#

Module Contents#

Classes#

GPTModelProvider

Configuration and provider for Megatron Core GPT models.

GPTProvider126M

Configuration for a 126M parameter GPT model.

GPTProvider5B

Configuration for a 5B parameter GPT model.

GPTProvider7B

Configuration for a 7B parameter GPT model.

GPTProvider20B

Configuration for a 20B parameter GPT model.

GPTProvider40B

Configuration for a 40B parameter GPT model.

GPTProvider175B

Configuration for a 175B parameter GPT model.

Functions#

transformer_engine_layer_spec

Create a Transformer Engine layer specification based on the provided config.

transformer_engine_full_layer_spec

Create a full Transformer Engine layer specification with autocast support.

local_layer_spec

Create a local layer specification without Transformer Engine.

default_layer_spec

Determine the most appropriate layer specification based on availability.

get_vocab_size

Calculate padded vocab size for tensor parallelism.

mtp_block_spec

Pass in the MTP block spec if model has MTP layers.

Data#

API#

bridge.models.gpt_provider.logger#

‘getLogger(…)’

bridge.models.gpt_provider.transformer_engine_layer_spec(
config: GPTModelProvider,
) megatron.core.transformer.ModuleSpec#

Create a Transformer Engine layer specification based on the provided config.

bridge.models.gpt_provider.transformer_engine_full_layer_spec(
config: GPTModelProvider,
) megatron.core.transformer.ModuleSpec#

Create a full Transformer Engine layer specification with autocast support.

Parameters:

config – GPT configuration object

Returns:

Module specification for full TE layers

Return type:

ModuleSpec

bridge.models.gpt_provider.local_layer_spec(
config: GPTModelProvider,
) megatron.core.transformer.ModuleSpec#

Create a local layer specification without Transformer Engine.

Parameters:

config – GPT configuration object

Returns:

Module specification for local implementation layers

Return type:

ModuleSpec

bridge.models.gpt_provider.default_layer_spec(
config: GPTModelProvider,
) megatron.core.transformer.ModuleSpec#

Determine the most appropriate layer specification based on availability.

class bridge.models.gpt_provider.GPTModelProvider#

Bases: megatron.core.transformer.transformer_config.TransformerConfig, megatron.bridge.models.model_provider.ModelProviderMixin[megatron.core.models.gpt.GPTModel]

Configuration and provider for Megatron Core GPT models.

This class extends TransformerConfig with GPT-specific parameters and provides a method to instantiate configured GPT models.

fp16_lm_cross_entropy: bool#

False

parallel_output: bool#

True

share_embeddings_and_output_weights: bool#

True

make_vocab_size_divisible_by: int#

128

position_embedding_type: Literal[learned_absolute, rope]#

‘learned_absolute’

rotary_base: int#

10000

rotary_percent: float#

1.0

seq_len_interpolation_factor: Optional[float]#

None

seq_length: int#

1024

attention_softmax_in_fp32: bool#

False

deallocate_pipeline_outputs: bool#

True

scatter_embedding_sequence_parallel: bool#

True

tp_only_amax_red: bool#

False

tp_comm_overlap_cfg: Optional[Union[str, dict[str, Any]]]#

None

Config file when tp_comm_overlap is enabled.

use_transformer_engine_full_layer_spec: bool#

False

transformer_layer_spec: Union[megatron.core.transformer.ModuleSpec, Callable[[bridge.models.gpt_provider.GPTModelProvider], megatron.core.transformer.ModuleSpec]]#

None

generation_config: Optional[Any]#

None

vocab_size: Optional[int]#

None

num_moe_experts: Optional[int]#

None

moe_grouped_gemm: bool#

False

qk_layernorm: bool#

False

fp8: Optional[str]#

None

normalization: str#

‘LayerNorm’

mtp_enabled: bool#

False

init_model_with_meta_device: bool#

False

use_te_rng_tracker: bool#

False

enable_cuda_graph: bool#

False

virtual_pipeline_model_parallel_size: Optional[int]#

None

account_for_embedding_in_pipeline_split: bool#

False

account_for_loss_in_pipeline_split: bool#

False

masked_softmax_fusion: bool#

‘field(…)’

cross_entropy_loss_fusion: bool#

True

gradient_accumulation_fusion: bool#

‘field(…)’

bias_activation_fusion: bool#

False

persist_layer_norm: bool#

False

bias_dropout_fusion: bool#

‘field(…)’

apply_rope_fusion: bool#

‘field(…)’

provide(
pre_process=None,
post_process=None,
vp_stage=None,
tokenizer=None,
) megatron.core.models.gpt.GPTModel#

Configure and instantiate a Megatron Core GPT model based on this configuration.

Parameters:
  • pre_process – Whether to include pre-processing in the model, defaults to first pipeline stage

  • post_process – Whether to include post-processing in the model, defaults to last pipeline stage

  • vp_stage – Virtual pipeline stage

  • tokenizer – Tokenizer used with the model

Returns:

Configured Megatron Core GPT model instance

Return type:

MCoreGPTModel

bridge.models.gpt_provider.get_vocab_size(
config: megatron.core.transformer.transformer_config.TransformerConfig,
vocab_size: int,
make_vocab_size_divisible_by: int,
) int#

Calculate padded vocab size for tensor parallelism.

bridge.models.gpt_provider.mtp_block_spec(
config: bridge.models.gpt_provider.GPTModelProvider,
vp_stage: Optional[int] = None,
) Optional[megatron.core.transformer.ModuleSpec]#

Pass in the MTP block spec if model has MTP layers.

Parameters:

config – GPT configuration object

Returns:

The MTP module specification

Return type:

ModuleSpec

class bridge.models.gpt_provider.GPTProvider126M#

Bases: bridge.models.gpt_provider.GPTModelProvider

Configuration for a 126M parameter GPT model.

Predefined configuration for a small GPT model with 12 layers, 768 hidden size, and 12 attention heads.

seq_length: int#

2048

num_layers: int#

12

hidden_size: int#

768

ffn_hidden_size: int#

3072

num_attention_heads: int#

12

bias_activation_fusion: bool#

True

bias_dropout_add_fusion: bool#

True

class bridge.models.gpt_provider.GPTProvider5B#

Bases: bridge.models.gpt_provider.GPTModelProvider

Configuration for a 5B parameter GPT model.

Predefined configuration for a medium-sized GPT model with 24 layers, 4096 hidden size, and 32 attention heads.

seq_length: int#

2048

num_layers: int#

24

hidden_size: int#

4096

ffn_hidden_size: int#

16384

num_attention_heads: int#

32

bias_activation_fusion: bool#

True

bias_dropout_add_fusion: bool#

True

class bridge.models.gpt_provider.GPTProvider7B#

Bases: bridge.models.gpt_provider.GPTModelProvider

Configuration for a 7B parameter GPT model.

Predefined configuration for a medium-sized GPT model with 32 layers, 4096 hidden size, and 32 attention heads.

seq_length: int#

2048

num_layers: int#

32

hidden_size: int#

4096

ffn_hidden_size: int#

10880

num_attention_heads: int#

32

bias_activation_fusion: bool#

True

bias_dropout_add_fusion: bool#

True

class bridge.models.gpt_provider.GPTProvider20B#

Bases: bridge.models.gpt_provider.GPTModelProvider

Configuration for a 20B parameter GPT model.

Predefined configuration for a large GPT model with 44 layers, 6144 hidden size, and 48 attention heads.

seq_length: int#

2048

num_layers: int#

44

hidden_size: int#

6144

ffn_hidden_size: int#

24576

num_attention_heads: int#

48

bias_activation_fusion: bool#

True

bias_dropout_add_fusion: bool#

True

class bridge.models.gpt_provider.GPTProvider40B#

Bases: bridge.models.gpt_provider.GPTModelProvider

Configuration for a 40B parameter GPT model.

Predefined configuration for a large GPT model with 48 layers, 8192 hidden size, and 64 attention heads.

seq_length: int#

2048

num_layers: int#

48

hidden_size: int#

8192

ffn_hidden_size: int#

32768

num_attention_heads: int#

64

bias_activation_fusion: bool#

True

bias_dropout_add_fusion: bool#

True

class bridge.models.gpt_provider.GPTProvider175B#

Bases: bridge.models.gpt_provider.GPTModelProvider

Configuration for a 175B parameter GPT model.

Predefined configuration for a massive GPT model with 96 layers, 12288 hidden size, and 96 attention heads.

seq_length: int#

2048

num_layers: int#

96

hidden_size: int#

12288

ffn_hidden_size: int#

49152

num_attention_heads: int#

96

hidden_dropout: float#

0.0

attention_dropout: float#

0.0

bias_activation_fusion: bool#

True

bias_dropout_add_fusion: bool#

True

layernorm_zero_centered_gamma: bool#

True