bridge.models.gpt.gpt_builder#
Module Contents#
Classes#
Configuration for a Megatron Core GPT model. |
|
Builder to construct Megatron Core GPT models. |
Functions#
Create a Transformer Engine layer specification based on the provided config. |
|
Create a full Transformer Engine layer specification with autocast support. |
|
Create a local layer specification without Transformer Engine. |
|
Layer specification for quantization with ModelOpt. |
|
Determine the most appropriate layer specification based on availability. |
|
Create MTP block spec if model has MTP layers. |
Data#
API#
- bridge.models.gpt.gpt_builder.logger#
‘getLogger(…)’
- bridge.models.gpt.gpt_builder.transformer_engine_layer_spec(
- config: GPTModelConfig,
Create a Transformer Engine layer specification based on the provided config.
- bridge.models.gpt.gpt_builder.transformer_engine_full_layer_spec(
- config: megatron.bridge.models.transformer_config.TransformerConfig,
Create a full Transformer Engine layer specification with autocast support.
- Parameters:
config – GPT configuration object
- Returns:
Module specification for full TE layers
- Return type:
ModuleSpec
- bridge.models.gpt.gpt_builder.local_layer_spec(
- config: megatron.bridge.models.transformer_config.TransformerConfig,
Create a local layer specification without Transformer Engine.
- Parameters:
config – GPT configuration object
- Returns:
Module specification for local implementation layers
- Return type:
ModuleSpec
- bridge.models.gpt.gpt_builder.modelopt_transformer_layer_spec(
- config: GPTModelConfig,
Layer specification for quantization with ModelOpt.
- bridge.models.gpt.gpt_builder.default_layer_spec(
- config: GPTModelConfig,
Determine the most appropriate layer specification based on availability.
- class bridge.models.gpt.gpt_builder.GPTModelConfig#
Bases:
megatron.bridge.models.common.ModelConfigConfiguration for a Megatron Core GPT model.
This is purely a configuration object. All model construction logic lives in
GPTModelBuilder.Contains a
TransformerConfigalongside GPT-specific parameters. Attributes on the embeddedtransformerconfig are accessible directly on this object via__getattr__/__setattr__proxying... note::
vocab_sizemust be set before passing this config toGPTModelBuilder.- builder: ClassVar[str]#
‘megatron.bridge.models.GPTModelBuilder’
- transformer: megatron.bridge.models.transformer_config.TransformerConfig#
None
- transformer_layer_spec: megatron.core.transformer.ModuleSpec | Callable[[bridge.models.gpt.gpt_builder.GPTModelConfig], megatron.core.transformer.ModuleSpec]#
None
- vocab_size: int | None#
None
This represents the unpadded vocab size. The padded vocab size is automatically calculated in the GPTModelBuilder.
- make_vocab_size_divisible_by: int#
128
- should_pad_vocab: bool#
False
Set if the tokenizer provides the vocab size. In this case, the vocab size will be padded. Controls whether vocab size should be padded for tensor parallelism.
- seq_length: int#
1024
- fp16_lm_cross_entropy: bool#
False
- parallel_output: bool#
True
False
- position_embedding_type: Literal[learned_absolute, rope, mrope, yarn, none]#
‘learned_absolute’
- rotary_percent: float#
1.0
- rotary_base: int#
10000
- rope_scaling: bool#
False
- rope_scaling_factor: float#
8.0
- scatter_embedding_sequence_parallel: bool#
True
- seq_len_interpolation_factor: float | None#
None
- tp_comm_overlap_cfg: str | dict[str, Any] | None#
None
Config file when tp_comm_overlap is enabled.
- use_transformer_engine_full_layer_spec: bool#
False
- use_transformer_engine_op_fuser: bool#
False
- use_arbitrary_attention_mask: bool | None#
None
- __getattr__(name: str, /) Any#
- __setattr__(name: str, value: Any, /) None#
- finalize() None#
One time validation to run once config is ready to be used by builder.
- class bridge.models.gpt.gpt_builder.GPTModelBuilder(
- model_config: bridge.models.gpt.gpt_builder.GPTModelConfig,
Bases:
megatron.bridge.models.common.ModelBuilder[megatron.core.models.gpt.GPTModel,bridge.models.gpt.gpt_builder.GPTModelConfig]Builder to construct Megatron Core GPT models.
.. rubric:: Example
transformer_cfg = TransformerConfig(num_layers=32, hidden_size=4096, …) model_cfg = GPTModelConfig(transformer=transformer_cfg, vocab_size=32000, seq_length=2048, …)
Single stage (e.g. inference)
model = GPTModelBuilder(model_cfg).build_model(pg_collection)
Distributed training
models = GPTModelBuilder(model_cfg).build_distributed_models(pg_collection)
Initialization
- build_model(
- pg_collection: megatron.core.process_groups_config.ProcessGroupCollection,
- pre_process: bool | None = None,
- post_process: bool | None = None,
- vp_stage: int | None = None,
Build a single
MCoreGPTModelstage.- Parameters:
pg_collection – Process groups for distributed training
pre_process – Include embedding layer
post_process – Include output layer
vp_stage – Virtual pipeline stage
- Returns:
The constructed model
.. note:: Virtual pipeline model parallelism is not supported for Mamba models.
- build_distributed_models(
- pg_collection: megatron.core.process_groups_config.ProcessGroupCollection,
- ddp_config: megatron.core.distributed.DistributedDataParallelConfig | None = None,
- overlap_param_gather_with_optimizer_step: bool = False,
- use_megatron_fsdp: bool = False,
- use_torch_fsdp2: bool = False,
- wrap_with_ddp: bool = True,
- data_parallel_random_init: bool = True,
- mixed_precision_wrapper: Callable[[Any, megatron.core.transformer.MegatronModule], megatron.core.transformer.MegatronModule] | None = Float16Module,
- model_type: megatron.core.enums.ModelType = ModelType.encoder_or_decoder,
Build model stages and wrap for distributed training.
- Parameters:
pg_collection – Model communication process groups.
ddp_config – DistributedDataParallel configuration
overlap_param_gather_with_optimizer_step – Whether to overlap parameter gather with optimizer step.
use_megatron_fsdp – Whether to use Megatron FSDP
use_torch_fsdp2 – Whether to use Torch FSDP 2.0
wrap_with_ddp – Set to False to skip the DDP/FSDP wrapper.
data_parallel_random_init – Whether to use data parallel random initialization
mixed_precision_wrapper – Mixed precision wrapper, e.g.
Float16Modulemodel_type – Deprecated flag, only used for backwards compatibility.
- Returns:
List of model stages.
- bridge.models.gpt.gpt_builder.mtp_block_spec(
- config: bridge.models.gpt.gpt_builder.GPTModelConfig,
- transformer_layer_spec: megatron.core.transformer.ModuleSpec,
- vp_stage: int | None = None,
Create MTP block spec if model has MTP layers.
- Parameters:
config – full model config
- Returns:
The MTP module specification
- Return type:
ModuleSpec