bridge.models.conversion.auto_bridge#

Module Contents#

Classes#

AutoBridge

Automatically select and instantiate the appropriate bridge for a model.

Data#

API#

bridge.models.conversion.auto_bridge.MegatronModelT#

‘TypeVar(…)’

bridge.models.conversion.auto_bridge.DataclassT#

‘TypeVar(…)’

class bridge.models.conversion.auto_bridge.AutoBridge(
hf_pretrained: megatron.bridge.models.hf_pretrained.causal_lm.PreTrainedCausalLM | transformers.configuration_utils.PretrainedConfig,
)#

Bases: typing.Generic[bridge.models.conversion.auto_bridge.MegatronModelT]

Automatically select and instantiate the appropriate bridge for a model.

This unified bridge class combines automatic model detection with full bridge functionality for converting models between HuggingFace and Megatron formats. It handles the conversion of causal language models (e.g., GPT, Llama, Phi) between HuggingFace’s transformers library format and Megatron-Core’s distributed training format. It manages weight mapping, tensor parallelism distribution, and configuration translation.

The bridge supports both directions of conversion:

  • HuggingFace → Megatron: For training or inference with Megatron

  • Megatron → HuggingFace: For saving trained models in HF format

Parameters:

hf_pretrained – Either a PreTrainedCausalLM instance with loaded model, or a PretrainedConfig for configuration-only operations

.. rubric:: Example

Load and convert a model to Megatron format

bridge = AutoBridge.from_hf_pretrained(“meta-llama/Llama-3-8B”) provider = bridge.to_megatron_provider() megatron_model = provider.provide_distributed_model(wrap_with_ddp=False)

Export a Megatron model back to HuggingFace format

bridge.save_hf_pretrained(megatron_model, “./exported_model”)

Convert weights with custom settings

for name, weight in bridge.export_hf_weights( … megatron_model, … cpu=True … ): … print(f”Exported {name}: {weight.shape}”)

Check if a model is supported before loading

if AutoBridge.can_handle(“microsoft/phi-2”): … bridge = AutoBridge.from_hf_pretrained(“microsoft/phi-2”)

.. note::

The bridge automatically detects the model architecture and applies the appropriate weight mappings. Custom architectures require implementing a MegatronModelBridge subclass.

Initialization

classmethod list_supported_models() list[str]#

List all model architectures currently supported by the bridge system.

Returns:

List of supported HuggingFace model architecture names

classmethod supports(config: Any) bool#

Check if this bridge supports the given model configuration.

A model is supported if it has at least one architecture ending with ‘ForCausalLM’.

Parameters:

config – HuggingFace model config object

Returns:

True if this bridge can handle the model, False otherwise

classmethod from_hf_config(
config: transformers.configuration_utils.PretrainedConfig,
) bridge.models.conversion.auto_bridge.AutoBridge#

Create an AutoBridge from a HuggingFace configuration.

This method creates a bridge instance from just a model configuration, without loading any weights. This is useful for:

  • Creating Megatron models with random initialization

  • Working with model architectures without downloading weights

  • Testing and development scenarios

Parameters:

config – HuggingFace PretrainedConfig instance containing model architecture information

Returns:

Bridge instance configured for the architecture

Return type:

AutoBridge

Raises:

ValueError – If the configuration is not for a supported CausalLM model

.. rubric:: Example

from transformers import AutoConfig

Load just the configuration

config = AutoConfig.from_pretrained(“meta-llama/Llama-3-8B”)

Create bridge from config (no weights)

bridge = AutoBridge.from_hf_config(config)

Create Megatron model with random initialization

provider = bridge.to_megatron_provider(load_weights=False) model = provider.provide_distributed_model(wrap_with_ddp=False)

Or use for architecture exploration

transformer_config = bridge.transformer_config print(f”Hidden size: {transformer_config.hidden_size}”) print(f”Num layers: {transformer_config.num_layers}”)

.. seealso::

from_hf_pretrained: Create bridge with loaded weights transformer_config: Access the Megatron TransformerConfig

classmethod from_hf_pretrained(
path: Union[str, pathlib.Path],
**kwargs,
) bridge.models.conversion.auto_bridge.AutoBridge#

Load an AutoBridge from a pretrained model, automatically detecting the model type.

This method loads a model from HuggingFace Hub or a local directory and creates a bridge instance ready for conversion operations. The model architecture is validated to ensure compatibility.

Parameters:
  • path – HuggingFace model ID or path to model directory Examples: “meta-llama/Llama-3-8B”, “./my_model”

  • **kwargs

    Additional arguments passed to HuggingFace from_hf_pretrained Common options include:

    • torch_dtype: Model precision (torch.float16, torch.bfloat16)

    • device_map: Device placement strategy (“auto”, “cuda:0”, etc.)

    • trust_remote_code: Allow custom model code execution

    • attn_implementation: Attention implementation (“flash_attention_2”, etc.)

Returns:

Bridge instance with loaded model

Return type:

AutoBridge

Raises:

ValueError – If the model architecture is not supported

.. rubric:: Example

Basic loading

bridge = AutoBridge.from_hf_pretrained(“gpt2”)

Load with specific settings

bridge = AutoBridge.from_hf_pretrained( … “meta-llama/Llama-3-8B”, … torch_dtype=torch.float16, … device_map=”auto” … )

Works with local paths too

bridge = AutoBridge.from_hf_pretrained(“/path/to/model”)

classmethod can_handle(
path: Union[str, pathlib.Path],
trust_remote_code: bool = False,
) bool#

Check if the bridge can handle the model at the given path.

This method allows you to verify model compatibility before attempting to load it, which can be useful for validation or UI feedback.

Parameters:
  • path – Path to model directory or HuggingFace model ID Examples: “meta-llama/Llama-3-8B”, “/models/my_model”

  • trust_remote_code – Whether to trust remote code when loading config. Set to True for models that use custom modeling code.

Returns:

True if the bridge supports the model, False otherwise

Return type:

bool

.. rubric:: Example

Check if a model is supported

if AutoBridge.can_handle(“meta-llama/Llama-3-8B”): … print(“Model is supported!”) … else: … print(“Model requires a custom bridge implementation”)

load_hf_weights(
model: list[bridge.models.conversion.auto_bridge.MegatronModelT],
hf_path: str | pathlib.Path | None = None,
) None#

Load HuggingFace weights into a Megatron model.

This method handles the conversion and distribution of weights from HuggingFace format to Megatron’s distributed format, including proper tensor parallel and pipeline parallel distribution.

Parameters:
  • model – List of Megatron model instances (one per virtual pipeline stage)

  • hf_path – Optional path to load weights from. If None, uses weights from the bridge’s hf_pretrained instance

Returns:

The input model with loaded weights

Raises:

ValueError – If hf_path is None and bridge was created without weights

.. rubric:: Example

Load weights from bridge’s pretrained model

bridge = AutoBridge.from_hf_pretrained(“gpt2”) megatron_model = create_megatron_model() # Your model creation bridge.load_hf_weights(megatron_model)

Load weights from a different checkpoint

bridge.load_hf_weights(megatron_model, “./finetuned_model”)

export_hf_weights(
model: list[bridge.models.conversion.auto_bridge.MegatronModelT],
cpu: bool = False,
show_progress: bool = True,
conversion_tasks: Optional[List[megatron.bridge.models.conversion.model_bridge.WeightConversionTask]] = None,
) Iterable[megatron.bridge.models.conversion.model_bridge.HFWeightTuple]#

Export Megatron model weights to HuggingFace format.

This method yields weight tensors in HuggingFace format, handling the gathering of distributed tensors and format conversion. It’s useful for streaming weight export or custom processing. All ranks get full tensors.

Parameters:
  • model – Megatron model instance or list of instances

  • cpu – Whether to move tensors to CPU before yielding

  • show_progress – Display progress bar during export

  • conversion_tasks (Optional[List[WeightConversionTask]]) – Pre-built conversion tasks. If not provided, tasks will be built automatically from the models. Please note that this is an advanced feature and should be used with caution. The tasks needs to be built with the get_conversion_tasks method first and carefully adjust based on your needs.

Yields:

HFWeightTuple – Named tuples of (param_name, weight_tensor)

.. rubric:: Example

Export and process weights

for name, weight in bridge.export_hf_weights(model): … print(f”{name}: {weight.shape}”)

Export with specific settings

weights = list(bridge.export_hf_weights( … model, … cpu=True … ))

save_hf_pretrained(
model: list[bridge.models.conversion.auto_bridge.MegatronModelT],
path: str | pathlib.Path,
show_progress: bool = True,
) None#

Save a Megatron model in HuggingFace format.

This method exports the complete model including configuration, tokenizer, and weights to a directory that can be loaded with HuggingFace’s from_pretrained methods.

Parameters:
  • model – Megatron model instance or list of instances

  • path – Directory path to save the model

  • show_progress – Display progress bar during weight export

.. rubric:: Example

Save model after training

bridge.save_hf_pretrained(megatron_model, “./my_finetuned_model”)

Load the saved model with HuggingFace

from transformers import AutoModelForCausalLM hf_model = AutoModelForCausalLM.from_pretrained(“./my_finetuned_model”)

.. note::

This method is collective - all ranks must call it. Only rank 0 saves the configuration files, while weight saving is coordinated across all ranks.

save_hf_weights(
model: list[bridge.models.conversion.auto_bridge.MegatronModelT],
path: str | pathlib.Path,
show_progress: bool = True,
) None#

Save Megatron model weights in HuggingFace safetensors format.

This method exports only the model weights (not configuration or tokenizer) to safetensors files compatible with HuggingFace. It uses streaming save to handle large models efficiently without requiring all weights in memory at once.

The weights are gathered from distributed ranks and saved in the standard HuggingFace sharded format when the model is large.

Parameters:
  • model – Megatron model instance or list of instances

  • path – Directory path where weight files will be saved

  • show_progress – Display progress bar during export

Raises:

ValueError – If the state source doesn’t support streaming save

.. rubric:: Example

Save just the weights

bridge.save_hf_weights(megatron_model, “./model_weights”)

Save without progress bar (useful in scripts)

bridge.save_hf_weights(megatron_model, “./weights”, show_progress=False)

.. note::

  • This method is collective and must be called by all ranks

  • Uses safetensors format for efficient loading and security

  • Automatically handles model sharding for large models

  • The saved weights can be loaded with HuggingFace’s from_pretrained

save_megatron_model(
model: list[megatron.core.transformer.module.MegatronModule],
path: str | pathlib.Path,
) None#

Save a Megatron model in native Megatron checkpoint format without optimizer state.

This method saves the model in Megatron’s native checkpoint format, which can be loaded directly by Megatron for training or inference. The checkpoint includes the model configuration and weights, NO optimizer state or other artifacts.

Parameters:
  • model – Megatron model instance or list of instances

  • path – Directory path where the checkpoint will be saved

  • ckpt_format – Checkpoint format to use (“torch_dist” or other supported formats)

.. rubric:: Example

Save model checkpoint after conversion

bridge.save_megatron_model(megatron_model, “./megatron_checkpoint”)

.. note::

  • This method is collective and must be called by all ranks

  • The saved checkpoint can be loaded with Megatron’s checkpoint loading utilities

  • The checkpoint format follows Megatron’s standard structure for compatibility

load_megatron_model(
path: str | pathlib.Path,
**kwargs: typing_extensions.Unpack[megatron.bridge.models.model_provider.GetModelKwargs],
) list[bridge.models.conversion.auto_bridge.MegatronModelT]#

Load a Megatron model from a native Megatron checkpoint.

This method loads a model from a Megatron checkpoint that was saved using the save_megatron_model method. It reads the checkpoint configuration, creates the appropriate model provider, and loads the weights.

Parameters:
  • path – Directory path where the Megatron checkpoint is stored

  • **kwargs – Additional arguments passed to the model provider

Returns:

List of Megatron model instances loaded from the checkpoint

.. rubric:: Example

Load a previously saved Megatron model

bridge = AutoBridge.from_hf_config(config) model = bridge.load_megatron_model(“./megatron_checkpoint”)

Load and specify model configuration

model = bridge.load_megatron_model( … “./megatron_checkpoint”, … wrap_with_ddp=False … )

.. note::

  • This method is collective and must be called by all ranks

  • The checkpoint must have been saved with save_megatron_model

  • The model architecture must match the bridge configuration

classmethod import_ckpt(
hf_model_id: str | pathlib.Path,
megatron_path: str | pathlib.Path,
**kwargs,
) None#

Import a HuggingFace model and save it as a Megatron checkpoint.

This is a convenience method that combines loading a HuggingFace model, converting it to Megatron format, and saving it as a native Megatron checkpoint. This is useful for preparing models for Megatron training or creating Megatron checkpoints from pretrained HuggingFace models.

Parameters:
  • hf_model_id – HuggingFace model ID or path to model directory Examples: “meta-llama/Llama-3-8B”, “./my_model”

  • megatron_path – Directory path where the Megatron checkpoint will be saved

  • **kwargs

    Additional arguments passed to from_hf_pretrained Common options include:

    • torch_dtype: Model precision (torch.float16, torch.bfloat16)

    • device_map: Device placement strategy (“auto”, “cuda:0”, etc.)

    • trust_remote_code: Allow custom model code execution

    • attn_implementation: Attention implementation (“flash_attention_2”, etc.)

.. rubric:: Example

Basic import

AutoBridge.import_ckpt( … “meta-llama/Llama-3-8B”, … “./megatron_checkpoints/llama3_8b” … )

Import with specific settings

AutoBridge.import_ckpt( … “meta-llama/Llama-3-8B”, … “./megatron_checkpoints/llama3_8b”, … torch_dtype=torch.float16, … device_map=”auto” … )

export_ckpt(
megatron_path: str | pathlib.Path,
hf_path: str | pathlib.Path,
show_progress: bool = True,
) None#

Export a Megatron checkpoint to HuggingFace format.

This is a convenience method that loads a Megatron checkpoint and exports it to HuggingFace format. This is useful for sharing trained models or deploying them with HuggingFace inference tools.

Parameters:
  • megatron_path – Directory path where the Megatron checkpoint is stored

  • hf_path – Directory path where the HuggingFace model will be saved

  • show_progress – Display progress bar during weight export

.. rubric:: Example

Basic export

bridge = AutoBridge.from_hf_config(config) bridge.export_ckpt( … “./megatron_checkpoints/my_model”, … “./hf_exports/my_model” … )

Export with specific settings

bridge.export_ckpt( … “./megatron_checkpoints/my_model”, … “./hf_exports/my_model”, … show_progress=False … )

Load the exported model with HuggingFace

from transformers import AutoModelForCausalLM hf_model = AutoModelForCausalLM.from_pretrained(“./hf_exports/my_model”)

push_to_hub(path: str | pathlib.Path) None#
to_megatron_model(
load_weights: bool = True,
hf_path: str | pathlib.Path | None = None,
**kwargs: typing_extensions.Unpack[megatron.bridge.models.model_provider.GetModelKwargs],
) list[bridge.models.conversion.auto_bridge.MegatronModelT]#
to_megatron_provider(
load_weights: bool = True,
hf_path: str | pathlib.Path | None = None,
) megatron.bridge.models.gpt_provider.GPTModelProvider#

Convert to a Megatron model provider.

This method creates a GPTModelProvider configured to match the HuggingFace model’s architecture. The provider can then be used to instantiate Megatron models for training or inference.

Parameters:
  • load_weights – Whether to configure the provider to load weights from HuggingFace format. If False, creates model with random initialization.

  • hf_path – Optional path to load weights from. If None, uses weights from the bridge’s hf_pretrained instance. Useful for loading weights from a different checkpoint.

Returns:

A configured model provider ready to create Megatron models

Return type:

GPTModelProvider

.. rubric:: Example

Create provider and model with loaded weights

bridge = AutoBridge.from_hf_pretrained(“meta-llama/Llama-3-8B”) provider = bridge.to_megatron_provider() model = provider.get_model()

Create provider without loading weights (for training from scratch)

provider = bridge.to_megatron_provider(load_weights=False) model = provider.get_model() # Random initialization

Load weights from a different checkpoint

bridge = AutoBridge.from_hf_config(config) # Config only provider = bridge.to_megatron_provider(hf_path=”./finetuned_model”) model = provider.get_model() # Loads finetuned weights

.. seealso::

GPTModelProvider: The provider class for creating models load_weights: Method to load weights into existing models

get_conversion_tasks(
megatron_model: Union[bridge.models.conversion.auto_bridge.MegatronModelT, List[bridge.models.conversion.auto_bridge.MegatronModelT]],
hf_path: str | pathlib.Path | None = None,
) List[megatron.bridge.models.conversion.model_bridge.WeightConversionTask]#

Get the conversion tasks for weight conversion between HuggingFace and Megatron formats.

This method returns the planned conversion tasks that would be executed during weight conversion in either direction. Each task contains information about parameter mappings, source and target parameters, and the conversion logic required.

The tasks can be used for both HF→Megatron and Megatron→HF conversions since they contain bidirectional mapping information.

Parameters:
  • megatron_model – Megatron model instance or list of instances (one per virtual pipeline stage) that participate in the conversion.

  • hf_path – Optional path to load HF weights from. If None, uses weights from the bridge’s hf_pretrained instance.

Returns:

List of conversion tasks that would be executed. Each task contains: - param_name: Megatron parameter name - mapping: The parameter mapping object handling the conversion - pp_rank: Pipeline parallel rank that owns the parameter - vp_stage: Virtual pipeline stage index - megatron_module: Reference to the Megatron module owning the parameter - param_weight: The actual parameter tensor

Return type:

List[WeightConversionTask]

.. rubric:: Example

bridge = AutoBridge.from_hf_pretrained(“meta-llama/Llama-3.2-1B”) megatron_model = bridge.to_megatron_model(load_weights=False, wrap_with_ddp=False) tasks = bridge.get_conversion_tasks(megatron_model)

for task in tasks: … # For HF→Megatron direction … print(f”HF param {task.mapping.hf_param} -> Megatron param {task.param_name}”) … … # For Megatron→HF direction … hf_params = task.mapping.hf_param … if isinstance(hf_params, str): … print(f”Megatron param {task.param_name} -> HF param {hf_params}”) … else: … print(f”Megatron param {task.param_name} -> HF params {list(hf_params.values())}”) … … print(f” Mapping type: {type(task.mapping).name}”) … print(f” PP rank: {task.pp_rank}, VP stage: {task.vp_stage}”)

.. note::

This method is useful for:

  • Debugging weight conversion issues in both directions

  • Understanding parameter mappings between formats

  • Custom weight conversion implementations

  • Analyzing model structure differences

  • Verifying parameter alignment and shapes

property transformer_config: megatron.core.transformer.transformer_config.TransformerConfig#
property mla_transformer_config: megatron.core.transformer.transformer_config.MLATransformerConfig#
property _model_bridge: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge#
_get_causal_lm_architecture()#

Get the CausalLM architecture class from the HuggingFace model.

Returns:

The transformers class for the CausalLM architecture

Raises:

ValueError – If no CausalLM architecture is found or if the class cannot be imported

classmethod _validate_config(
config: transformers.configuration_utils.PretrainedConfig,
path: str | None = None,
) None#
_get_model_instance(
model: list[bridge.models.conversion.auto_bridge.MegatronModelT],
) bridge.models.conversion.auto_bridge.MegatronModelT#
_create_config_from_provider(
source_obj: Any,
target_dataclass: Type[bridge.models.conversion.auto_bridge.DataclassT],
) bridge.models.conversion.auto_bridge.DataclassT#
__repr__() str#