`bridge.models.transformer_config`#

Bridge wrapper classes for Megatron Core transformer configurations.

These classes provide deferred post-initialization to support the Bridge configuration override system while maintaining compatibility with Megatron Core’s post_init behavior.

Module Contents#

Classes#

`TransformerConfig`	Megatron Core TransformerConfig with deferred post-init.
`MLATransformerConfig`	Megatron Core MLATransformerConfig with deferred post-init.
`HeterogeneousTransformerConfig`	Megatron Core HeterogeneousTransformerConfig with deferred post-init.

Functions#

`_safe_asdict`	Shallow asdict variant that preserves handles like process groups.
`_resolve_string_fields`	Resolve string-valued fields to their runtime types.

API#

bridge.models.transformer_config._safe_asdict(obj, skip_keys: set[str]) → dict#

Shallow asdict variant that preserves handles like process groups.

dataclasses.asdict performs a deep copy of every leaf value, which breaks objects that should remain shared references (e.g., ProcessGroupCollection). This helper mirrors the structure of asdict but returns leaf objects as-is so they are not deep-copied.

bridge.models.transformer_config._resolve_string_fields( config: megatron.core.transformer.transformer_config.TransformerConfig, ) → None#

Resolve string-valued fields to their runtime types.

Handles activation_func (e.g. model.activation_func=silu) and dtype fields params_dtype / pipeline_dtype (e.g. model.params_dtype=bf16) when they arrive as strings via CLI overrides.

class bridge.models.transformer_config.TransformerConfig#

Bases: megatron.core.transformer.transformer_config.TransformerConfig

Megatron Core TransformerConfig with deferred post-init.

This class inherits from Megatron Core’s TransformerConfig but defers the execution of post_init() until finalize() is explicitly called. This allows for field modifications after construction but before computed fields are calculated.

Usage: # Create config with deferred post-init config = TransformerConfig(num_layers=32, hidden_size=4096)

# Modify fields as needed
config.seq_length = 8192
config.tensor_model_parallel_size = 2

# Finalize to compute derived fields
config.finalize()

_NO_COPY_KEYS#: None

__post_init__() → None#

Skip MCore post_init during initial construction.

The original post_init logic is deferred until finalize() is called. This allows for field modifications after construction without invalidating computed fields.

finalize() → None#

Execute the deferred MCore post-init logic.

This method calls the original Megatron Core TransformerConfig.post_init() to compute derived fields based on the current field values. It can be called multiple times safely.

__deepcopy__(memo)#

Custom deepcopy to preserve process group handles when cloning configs.

Certain attributes (_pg_collection, etc.) should remain shared references rather than being wiped or re-created during deepcopy. TODO: This is a temporary hack. Once providers stop embedding the Transformer config and instead hold the MCore config as an attribute, we can remove this.

asdict() → dict#: Return a dict view without deep-copying shared handles (e.g., process groups).

class bridge.models.transformer_config.MLATransformerConfig#

Bases: bridge.models.transformer_config.TransformerConfig, megatron.core.transformer.transformer_config.MLATransformerConfig

Megatron Core MLATransformerConfig with deferred post-init.

This class inherits from Megatron Core’s MLATransformerConfig but defers the execution of post_init() until finalize() is explicitly called. This allows for field modifications after construction but before computed fields are calculated.

Usage: # Create config with deferred post-init config = MLATransformerConfig(num_layers=32, hidden_size=4096)

# Modify fields as needed
config.q_lora_rank = 1536
config.kv_lora_rank = 512

# Finalize to compute derived fields
config.finalize()

__post_init__() → None#

Skip MCore post_init during initial construction.

The original post_init logic is deferred until finalize() is called. This allows for field modifications after construction without invalidating computed fields.

finalize() → None#

Execute the deferred MCore post-init logic.

This method calls the original Megatron Core MLATransformerConfig.post_init() to compute derived fields based on the current field values. It can be called multiple times safely.

class bridge.models.transformer_config.HeterogeneousTransformerConfig#

Bases: bridge.models.transformer_config.TransformerConfig, megatron.core.transformer.heterogeneous.heterogeneous_config.HeterogeneousTransformerConfig

Megatron Core HeterogeneousTransformerConfig with deferred post-init.

This class inherits from both our lazy TransformerConfig and Megatron Core’s HeterogeneousTransformerConfig. The MRO ensures that our lazy post-init behavior is preserved while maintaining all heterogeneous functionality.

CRITICAL: The inheritance order is important for MRO:

TransformerConfig (our lazy version) comes first
MCoreHeterogeneousTransformerConfig comes second

Usage: # Create config with deferred post-init config = HeterogeneousTransformerConfig( num_layers=32, hidden_size=4096, heterogeneous_layers_config_encoded_json=json_string )

# Modify fields as needed
config.seq_length = 8192
config.tensor_model_parallel_size = 2

# Finalize to compute derived fields and parse heterogeneous config
config.finalize()

__post_init__() → None#

Skip MCore post_init during initial construction.

The original post_init logic is deferred until finalize() is called. This allows for field modifications after construction without invalidating computed fields.

finalize() → None#

Execute the deferred MCore post-init logic.

This method calls the original Megatron Core HeterogeneousTransformerConfig.post_init() to compute derived fields and parse the heterogeneous block configurations. It can be called multiple times safely.

get_config_for_layer( layer_number: int, ) → megatron.core.transformer.transformer_config.TransformerConfig#: Return a layer-specific TransformerConfig without deep-copying process groups.

bridge.models.transformer_config#

Module Contents#

Classes#

Functions#

API#

`bridge.models.transformer_config`#