bridge.models.transformer_config#

Bridge wrapper classes for Megatron Core transformer configurations.

These classes provide deferred post-initialization to support the Bridge configuration override system while maintaining compatibility with Megatron Core’s post_init behavior.

Module Contents#

Classes#

TransformerConfig

Megatron Core TransformerConfig with deferred post-init.

MLATransformerConfig

Megatron Core MLATransformerConfig with deferred post-init.

API#

class bridge.models.transformer_config.TransformerConfig#

Bases: megatron.core.transformer.transformer_config.TransformerConfig

Megatron Core TransformerConfig with deferred post-init.

This class inherits from Megatron Core’s TransformerConfig but defers the execution of post_init() until finalize() is explicitly called. This allows for field modifications after construction but before computed fields are calculated.

Usage: # Create config with deferred post-init config = TransformerConfig(num_layers=32, hidden_size=4096)

# Modify fields as needed
config.seq_length = 8192
config.tensor_model_parallel_size = 2

# Finalize to compute derived fields
config.finalize()
__post_init__() None#

Skip MCore post_init during initial construction.

The original post_init logic is deferred until finalize() is called. This allows for field modifications after construction without invalidating computed fields.

finalize() None#

Execute the deferred MCore post-init logic.

This method calls the original Megatron Core TransformerConfig.post_init() to compute derived fields based on the current field values. It can be called multiple times safely.

class bridge.models.transformer_config.MLATransformerConfig#

Bases: bridge.models.transformer_config.TransformerConfig, megatron.core.transformer.transformer_config.MLATransformerConfig

Megatron Core MLATransformerConfig with deferred post-init.

This class inherits from Megatron Core’s MLATransformerConfig but defers the execution of post_init() until finalize() is explicitly called. This allows for field modifications after construction but before computed fields are calculated.

Usage: # Create config with deferred post-init config = MLATransformerConfig(num_layers=32, hidden_size=4096)

# Modify fields as needed
config.q_lora_rank = 1536
config.kv_lora_rank = 512

# Finalize to compute derived fields
config.finalize()
__post_init__() None#

Skip MCore post_init during initial construction.

The original post_init logic is deferred until finalize() is called. This allows for field modifications after construction without invalidating computed fields.

finalize() None#

Execute the deferred MCore post-init logic.

This method calls the original Megatron Core MLATransformerConfig.post_init() to compute derived fields based on the current field values. It can be called multiple times safely.