bridge.models.transformer_config#
Bridge wrapper classes for Megatron Core transformer configurations.
These classes provide deferred post-initialization to support the Bridge configuration override system while maintaining compatibility with Megatron Core’s post_init behavior.
Module Contents#
Classes#
Megatron Core TransformerConfig with deferred post-init. |
|
Megatron Core MLATransformerConfig with deferred post-init. |
|
Megatron Core HeterogeneousTransformerConfig with deferred post-init. |
API#
- class bridge.models.transformer_config.TransformerConfig#
Bases:
megatron.core.transformer.transformer_config.TransformerConfigMegatron Core TransformerConfig with deferred post-init.
This class inherits from Megatron Core’s TransformerConfig but defers the execution of post_init() until finalize() is explicitly called. This allows for field modifications after construction but before computed fields are calculated.
Usage: # Create config with deferred post-init config = TransformerConfig(num_layers=32, hidden_size=4096)
# Modify fields as needed config.seq_length = 8192 config.tensor_model_parallel_size = 2 # Finalize to compute derived fields config.finalize()
- __post_init__() None#
Skip MCore post_init during initial construction.
The original post_init logic is deferred until finalize() is called. This allows for field modifications after construction without invalidating computed fields.
- finalize() None#
Execute the deferred MCore post-init logic.
This method calls the original Megatron Core TransformerConfig.post_init() to compute derived fields based on the current field values. It can be called multiple times safely.
- class bridge.models.transformer_config.MLATransformerConfig#
Bases:
bridge.models.transformer_config.TransformerConfig,megatron.core.transformer.transformer_config.MLATransformerConfigMegatron Core MLATransformerConfig with deferred post-init.
This class inherits from Megatron Core’s MLATransformerConfig but defers the execution of post_init() until finalize() is explicitly called. This allows for field modifications after construction but before computed fields are calculated.
Usage: # Create config with deferred post-init config = MLATransformerConfig(num_layers=32, hidden_size=4096)
# Modify fields as needed config.q_lora_rank = 1536 config.kv_lora_rank = 512 # Finalize to compute derived fields config.finalize()
- __post_init__() None#
Skip MCore post_init during initial construction.
The original post_init logic is deferred until finalize() is called. This allows for field modifications after construction without invalidating computed fields.
- finalize() None#
Execute the deferred MCore post-init logic.
This method calls the original Megatron Core MLATransformerConfig.post_init() to compute derived fields based on the current field values. It can be called multiple times safely.
- class bridge.models.transformer_config.HeterogeneousTransformerConfig#
Bases:
bridge.models.transformer_config.TransformerConfig,megatron.core.transformer.heterogeneous.heterogeneous_config.HeterogeneousTransformerConfigMegatron Core HeterogeneousTransformerConfig with deferred post-init.
This class inherits from both our lazy TransformerConfig and Megatron Core’s HeterogeneousTransformerConfig. The MRO ensures that our lazy post-init behavior is preserved while maintaining all heterogeneous functionality.
CRITICAL: The inheritance order is important for MRO:
TransformerConfig (our lazy version) comes first
MCoreHeterogeneousTransformerConfig comes second
Usage: # Create config with deferred post-init config = HeterogeneousTransformerConfig( num_layers=32, hidden_size=4096, heterogeneous_layers_config_encoded_json=json_string )
# Modify fields as needed config.seq_length = 8192 config.tensor_model_parallel_size = 2 # Finalize to compute derived fields and parse heterogeneous config config.finalize()
- __post_init__() None#
Skip MCore post_init during initial construction.
The original post_init logic is deferred until finalize() is called. This allows for field modifications after construction without invalidating computed fields.
- finalize() None#
Execute the deferred MCore post-init logic.
This method calls the original Megatron Core HeterogeneousTransformerConfig.post_init() to compute derived fields and parse the heterogeneous block configurations. It can be called multiple times safely.