bridge.models.conversion.auto_bridge
#
Module Contents#
Classes#
Automatically select and instantiate the appropriate bridge for a model. |
Data#
API#
- bridge.models.conversion.auto_bridge.MegatronModelT#
‘TypeVar(…)’
- bridge.models.conversion.auto_bridge.DataclassT#
‘TypeVar(…)’
- class bridge.models.conversion.auto_bridge.AutoBridge(
- hf_pretrained: megatron.bridge.models.hf_pretrained.causal_lm.PreTrainedCausalLM | transformers.configuration_utils.PretrainedConfig,
Bases:
typing.Generic
[bridge.models.conversion.auto_bridge.MegatronModelT
]Automatically select and instantiate the appropriate bridge for a model.
This unified bridge class combines automatic model detection with full bridge functionality for converting models between HuggingFace and Megatron formats. It handles the conversion of causal language models (e.g., GPT, Llama, Phi) between HuggingFace’s transformers library format and Megatron-Core’s distributed training format. It manages weight mapping, tensor parallelism distribution, and configuration translation.
The bridge supports both directions of conversion:
HuggingFace → Megatron: For training or inference with Megatron
Megatron → HuggingFace: For saving trained models in HF format
- Parameters:
hf_pretrained – Either a PreTrainedCausalLM instance with loaded model, or a PretrainedConfig for configuration-only operations
.. rubric:: Example
Load and convert a model to Megatron format
bridge = AutoBridge.from_hf_pretrained(“meta-llama/Llama-3-8B”) provider = bridge.to_megatron_provider() megatron_model = provider.provide_distributed_model(wrap_with_ddp=False)
Export a Megatron model back to HuggingFace format
bridge.save_hf_pretrained(megatron_model, “./exported_model”)
Convert weights with custom settings
for name, weight in bridge.export_hf_weights( … megatron_model, … cpu=True … ): … print(f”Exported {name}: {weight.shape}”)
Check if a model is supported before loading
if AutoBridge.can_handle(“microsoft/phi-2”): … bridge = AutoBridge.from_hf_pretrained(“microsoft/phi-2”)
.. note::
The bridge automatically detects the model architecture and applies the appropriate weight mappings. Custom architectures require implementing a MegatronModelBridge subclass.
Initialization
- classmethod list_supported_models() list[str] #
List all model architectures currently supported by the bridge system.
- Returns:
List of supported HuggingFace model architecture names
- classmethod supports(config: Any) bool #
Check if this bridge supports the given model configuration.
A model is supported if it has at least one architecture ending with ‘ForCausalLM’.
- Parameters:
config – HuggingFace model config object
- Returns:
True if this bridge can handle the model, False otherwise
- classmethod from_hf_config(
- config: transformers.configuration_utils.PretrainedConfig,
Create an AutoBridge from a HuggingFace configuration.
This method creates a bridge instance from just a model configuration, without loading any weights. This is useful for:
Creating Megatron models with random initialization
Working with model architectures without downloading weights
Testing and development scenarios
- Parameters:
config – HuggingFace PretrainedConfig instance containing model architecture information
- Returns:
Bridge instance configured for the architecture
- Return type:
- Raises:
ValueError – If the configuration is not for a supported CausalLM model
.. rubric:: Example
from transformers import AutoConfig
Load just the configuration
config = AutoConfig.from_pretrained(“meta-llama/Llama-3-8B”)
Create bridge from config (no weights)
bridge = AutoBridge.from_hf_config(config)
Create Megatron model with random initialization
provider = bridge.to_megatron_provider(load_weights=False) model = provider.provide_distributed_model(wrap_with_ddp=False)
Or use for architecture exploration
transformer_config = bridge.transformer_config print(f”Hidden size: {transformer_config.hidden_size}”) print(f”Num layers: {transformer_config.num_layers}”)
.. seealso::
from_hf_pretrained: Create bridge with loaded weights transformer_config: Access the Megatron TransformerConfig
- classmethod from_hf_pretrained(
- path: Union[str, pathlib.Path],
- **kwargs,
Load an AutoBridge from a pretrained model, automatically detecting the model type.
This method loads a model from HuggingFace Hub or a local directory and creates a bridge instance ready for conversion operations. The model architecture is validated to ensure compatibility.
- Parameters:
path – HuggingFace model ID or path to model directory Examples: “meta-llama/Llama-3-8B”, “./my_model”
**kwargs –
Additional arguments passed to HuggingFace from_hf_pretrained Common options include:
torch_dtype: Model precision (torch.float16, torch.bfloat16)
device_map: Device placement strategy (“auto”, “cuda:0”, etc.)
trust_remote_code: Allow custom model code execution
attn_implementation: Attention implementation (“flash_attention_2”, etc.)
- Returns:
Bridge instance with loaded model
- Return type:
- Raises:
ValueError – If the model architecture is not supported
.. rubric:: Example
Basic loading
bridge = AutoBridge.from_hf_pretrained(“gpt2”)
Load with specific settings
bridge = AutoBridge.from_hf_pretrained( … “meta-llama/Llama-3-8B”, … torch_dtype=torch.float16, … device_map=”auto” … )
Works with local paths too
bridge = AutoBridge.from_hf_pretrained(“/path/to/model”)
- classmethod can_handle(
- path: Union[str, pathlib.Path],
- trust_remote_code: bool = False,
Check if the bridge can handle the model at the given path.
This method allows you to verify model compatibility before attempting to load it, which can be useful for validation or UI feedback.
- Parameters:
path – Path to model directory or HuggingFace model ID Examples: “meta-llama/Llama-3-8B”, “/models/my_model”
trust_remote_code – Whether to trust remote code when loading config. Set to True for models that use custom modeling code.
- Returns:
True if the bridge supports the model, False otherwise
- Return type:
bool
.. rubric:: Example
Check if a model is supported
if AutoBridge.can_handle(“meta-llama/Llama-3-8B”): … print(“Model is supported!”) … else: … print(“Model requires a custom bridge implementation”)
- load_hf_weights(
- model: list[bridge.models.conversion.auto_bridge.MegatronModelT],
- hf_path: str | pathlib.Path | None = None,
Load HuggingFace weights into a Megatron model.
This method handles the conversion and distribution of weights from HuggingFace format to Megatron’s distributed format, including proper tensor parallel and pipeline parallel distribution.
- Parameters:
model – List of Megatron model instances (one per virtual pipeline stage)
hf_path – Optional path to load weights from. If None, uses weights from the bridge’s hf_pretrained instance
- Returns:
The input model with loaded weights
- Raises:
ValueError – If hf_path is None and bridge was created without weights
.. rubric:: Example
Load weights from bridge’s pretrained model
bridge = AutoBridge.from_hf_pretrained(“gpt2”) megatron_model = create_megatron_model() # Your model creation bridge.load_hf_weights(megatron_model)
Load weights from a different checkpoint
bridge.load_hf_weights(megatron_model, “./finetuned_model”)
- export_hf_weights(
- model: list[bridge.models.conversion.auto_bridge.MegatronModelT],
- cpu: bool = False,
- show_progress: bool = True,
- conversion_tasks: Optional[List[megatron.bridge.models.conversion.model_bridge.WeightConversionTask]] = None,
Export Megatron model weights to HuggingFace format.
This method yields weight tensors in HuggingFace format, handling the gathering of distributed tensors and format conversion. It’s useful for streaming weight export or custom processing. All ranks get full tensors.
- Parameters:
model – Megatron model instance or list of instances
cpu – Whether to move tensors to CPU before yielding
show_progress – Display progress bar during export
conversion_tasks (Optional[List[WeightConversionTask]]) – Pre-built conversion tasks. If not provided, tasks will be built automatically from the models. Please note that this is an advanced feature and should be used with caution. The tasks needs to be built with the
get_conversion_tasks
method first and carefully adjust based on your needs.
- Yields:
HFWeightTuple – Named tuples of (param_name, weight_tensor)
.. rubric:: Example
Export and process weights
for name, weight in bridge.export_hf_weights(model): … print(f”{name}: {weight.shape}”)
Export with specific settings
weights = list(bridge.export_hf_weights( … model, … cpu=True … ))
- save_hf_pretrained(
- model: list[bridge.models.conversion.auto_bridge.MegatronModelT],
- path: str | pathlib.Path,
- show_progress: bool = True,
Save a Megatron model in HuggingFace format.
This method exports the complete model including configuration, tokenizer, and weights to a directory that can be loaded with HuggingFace’s from_pretrained methods.
- Parameters:
model – Megatron model instance or list of instances
path – Directory path to save the model
show_progress – Display progress bar during weight export
.. rubric:: Example
Save model after training
bridge.save_hf_pretrained(megatron_model, “./my_finetuned_model”)
Load the saved model with HuggingFace
from transformers import AutoModelForCausalLM hf_model = AutoModelForCausalLM.from_pretrained(“./my_finetuned_model”)
.. note::
This method is collective - all ranks must call it. Only rank 0 saves the configuration files, while weight saving is coordinated across all ranks.
- save_hf_weights(
- model: list[bridge.models.conversion.auto_bridge.MegatronModelT],
- path: str | pathlib.Path,
- show_progress: bool = True,
Save Megatron model weights in HuggingFace safetensors format.
This method exports only the model weights (not configuration or tokenizer) to safetensors files compatible with HuggingFace. It uses streaming save to handle large models efficiently without requiring all weights in memory at once.
The weights are gathered from distributed ranks and saved in the standard HuggingFace sharded format when the model is large.
- Parameters:
model – Megatron model instance or list of instances
path – Directory path where weight files will be saved
show_progress – Display progress bar during export
- Raises:
ValueError – If the state source doesn’t support streaming save
.. rubric:: Example
Save just the weights
bridge.save_hf_weights(megatron_model, “./model_weights”)
Save without progress bar (useful in scripts)
bridge.save_hf_weights(megatron_model, “./weights”, show_progress=False)
.. note::
This method is collective and must be called by all ranks
Uses safetensors format for efficient loading and security
Automatically handles model sharding for large models
The saved weights can be loaded with HuggingFace’s from_pretrained
- save_megatron_model(
- model: list[megatron.core.transformer.module.MegatronModule],
- path: str | pathlib.Path,
Save a Megatron model in native Megatron checkpoint format without optimizer state.
This method saves the model in Megatron’s native checkpoint format, which can be loaded directly by Megatron for training or inference. The checkpoint includes the model configuration and weights, NO optimizer state or other artifacts.
- Parameters:
model – Megatron model instance or list of instances
path – Directory path where the checkpoint will be saved
ckpt_format – Checkpoint format to use (“torch_dist” or other supported formats)
.. rubric:: Example
Save model checkpoint after conversion
bridge.save_megatron_model(megatron_model, “./megatron_checkpoint”)
.. note::
This method is collective and must be called by all ranks
The saved checkpoint can be loaded with Megatron’s checkpoint loading utilities
The checkpoint format follows Megatron’s standard structure for compatibility
- load_megatron_model(
- path: str | pathlib.Path,
- **kwargs: typing_extensions.Unpack[megatron.bridge.models.model_provider.GetModelKwargs],
Load a Megatron model from a native Megatron checkpoint.
This method loads a model from a Megatron checkpoint that was saved using the save_megatron_model method. It reads the checkpoint configuration, creates the appropriate model provider, and loads the weights.
- Parameters:
path – Directory path where the Megatron checkpoint is stored
**kwargs – Additional arguments passed to the model provider
- Returns:
List of Megatron model instances loaded from the checkpoint
.. rubric:: Example
Load a previously saved Megatron model
bridge = AutoBridge.from_hf_config(config) model = bridge.load_megatron_model(“./megatron_checkpoint”)
Load and specify model configuration
model = bridge.load_megatron_model( … “./megatron_checkpoint”, … wrap_with_ddp=False … )
.. note::
This method is collective and must be called by all ranks
The checkpoint must have been saved with save_megatron_model
The model architecture must match the bridge configuration
- classmethod import_ckpt(
- hf_model_id: str | pathlib.Path,
- megatron_path: str | pathlib.Path,
- **kwargs,
Import a HuggingFace model and save it as a Megatron checkpoint.
This is a convenience method that combines loading a HuggingFace model, converting it to Megatron format, and saving it as a native Megatron checkpoint. This is useful for preparing models for Megatron training or creating Megatron checkpoints from pretrained HuggingFace models.
- Parameters:
hf_model_id – HuggingFace model ID or path to model directory Examples: “meta-llama/Llama-3-8B”, “./my_model”
megatron_path – Directory path where the Megatron checkpoint will be saved
**kwargs –
Additional arguments passed to from_hf_pretrained Common options include:
torch_dtype: Model precision (torch.float16, torch.bfloat16)
device_map: Device placement strategy (“auto”, “cuda:0”, etc.)
trust_remote_code: Allow custom model code execution
attn_implementation: Attention implementation (“flash_attention_2”, etc.)
.. rubric:: Example
Basic import
AutoBridge.import_ckpt( … “meta-llama/Llama-3-8B”, … “./megatron_checkpoints/llama3_8b” … )
Import with specific settings
AutoBridge.import_ckpt( … “meta-llama/Llama-3-8B”, … “./megatron_checkpoints/llama3_8b”, … torch_dtype=torch.float16, … device_map=”auto” … )
- export_ckpt(
- megatron_path: str | pathlib.Path,
- hf_path: str | pathlib.Path,
- show_progress: bool = True,
Export a Megatron checkpoint to HuggingFace format.
This is a convenience method that loads a Megatron checkpoint and exports it to HuggingFace format. This is useful for sharing trained models or deploying them with HuggingFace inference tools.
- Parameters:
megatron_path – Directory path where the Megatron checkpoint is stored
hf_path – Directory path where the HuggingFace model will be saved
show_progress – Display progress bar during weight export
.. rubric:: Example
Basic export
bridge = AutoBridge.from_hf_config(config) bridge.export_ckpt( … “./megatron_checkpoints/my_model”, … “./hf_exports/my_model” … )
Export with specific settings
bridge.export_ckpt( … “./megatron_checkpoints/my_model”, … “./hf_exports/my_model”, … show_progress=False … )
Load the exported model with HuggingFace
from transformers import AutoModelForCausalLM hf_model = AutoModelForCausalLM.from_pretrained(“./hf_exports/my_model”)
- push_to_hub(path: str | pathlib.Path) None #
- to_megatron_model(
- load_weights: bool = True,
- hf_path: str | pathlib.Path | None = None,
- **kwargs: typing_extensions.Unpack[megatron.bridge.models.model_provider.GetModelKwargs],
- to_megatron_provider(
- load_weights: bool = True,
- hf_path: str | pathlib.Path | None = None,
Convert to a Megatron model provider.
This method creates a GPTModelProvider configured to match the HuggingFace model’s architecture. The provider can then be used to instantiate Megatron models for training or inference.
- Parameters:
load_weights – Whether to configure the provider to load weights from HuggingFace format. If False, creates model with random initialization.
hf_path – Optional path to load weights from. If None, uses weights from the bridge’s hf_pretrained instance. Useful for loading weights from a different checkpoint.
- Returns:
A configured model provider ready to create Megatron models
- Return type:
.. rubric:: Example
Create provider and model with loaded weights
bridge = AutoBridge.from_hf_pretrained(“meta-llama/Llama-3-8B”) provider = bridge.to_megatron_provider() model = provider.get_model()
Create provider without loading weights (for training from scratch)
provider = bridge.to_megatron_provider(load_weights=False) model = provider.get_model() # Random initialization
Load weights from a different checkpoint
bridge = AutoBridge.from_hf_config(config) # Config only provider = bridge.to_megatron_provider(hf_path=”./finetuned_model”) model = provider.get_model() # Loads finetuned weights
.. seealso::
GPTModelProvider: The provider class for creating models load_weights: Method to load weights into existing models
- get_conversion_tasks(
- megatron_model: Union[bridge.models.conversion.auto_bridge.MegatronModelT, List[bridge.models.conversion.auto_bridge.MegatronModelT]],
- hf_path: str | pathlib.Path | None = None,
Get the conversion tasks for weight conversion between HuggingFace and Megatron formats.
This method returns the planned conversion tasks that would be executed during weight conversion in either direction. Each task contains information about parameter mappings, source and target parameters, and the conversion logic required.
The tasks can be used for both HF→Megatron and Megatron→HF conversions since they contain bidirectional mapping information.
- Parameters:
megatron_model – Megatron model instance or list of instances (one per virtual pipeline stage) that participate in the conversion.
hf_path – Optional path to load HF weights from. If None, uses weights from the bridge’s hf_pretrained instance.
- Returns:
List of conversion tasks that would be executed. Each task contains: - param_name: Megatron parameter name - mapping: The parameter mapping object handling the conversion - pp_rank: Pipeline parallel rank that owns the parameter - vp_stage: Virtual pipeline stage index - megatron_module: Reference to the Megatron module owning the parameter - param_weight: The actual parameter tensor
- Return type:
List[WeightConversionTask]
.. rubric:: Example
bridge = AutoBridge.from_hf_pretrained(“meta-llama/Llama-3.2-1B”) megatron_model = bridge.to_megatron_model(load_weights=False, wrap_with_ddp=False) tasks = bridge.get_conversion_tasks(megatron_model)
for task in tasks: … # For HF→Megatron direction … print(f”HF param {task.mapping.hf_param} -> Megatron param {task.param_name}”) … … # For Megatron→HF direction … hf_params = task.mapping.hf_param … if isinstance(hf_params, str): … print(f”Megatron param {task.param_name} -> HF param {hf_params}”) … else: … print(f”Megatron param {task.param_name} -> HF params {list(hf_params.values())}”) … … print(f” Mapping type: {type(task.mapping).name}”) … print(f” PP rank: {task.pp_rank}, VP stage: {task.vp_stage}”)
.. note::
This method is useful for:
Debugging weight conversion issues in both directions
Understanding parameter mappings between formats
Custom weight conversion implementations
Analyzing model structure differences
Verifying parameter alignment and shapes
- property transformer_config: megatron.core.transformer.transformer_config.TransformerConfig#
- property mla_transformer_config: megatron.core.transformer.transformer_config.MLATransformerConfig#
- property _model_bridge: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge#
- _get_causal_lm_architecture()#
Get the CausalLM architecture class from the HuggingFace model.
- Returns:
The transformers class for the CausalLM architecture
- Raises:
ValueError – If no CausalLM architecture is found or if the class cannot be imported
- classmethod _validate_config(
- config: transformers.configuration_utils.PretrainedConfig,
- path: str | None = None,
- _get_model_instance(
- model: list[bridge.models.conversion.auto_bridge.MegatronModelT],
- _create_config_from_provider(
- source_obj: Any,
- target_dataclass: Type[bridge.models.conversion.auto_bridge.DataclassT],
- __repr__() str #