`bridge.models.conversion.auto_bridge`#

Module Contents#

Classes#

AutoBridge

Automatically select and instantiate the appropriate bridge for a model.

Data#

`MegatronModelT`
`DataclassT`
`SUPPORTED_HF_ARCHITECTURES`
`SUPPORTED_HF_ARCHITECTURES_DISPLAY`

API#

bridge.models.conversion.auto_bridge.MegatronModelT#: ‘TypeVar(…)’

bridge.models.conversion.auto_bridge.DataclassT#: ‘TypeVar(…)’

bridge.models.conversion.auto_bridge.SUPPORTED_HF_ARCHITECTURES: tuple[str, ...]#: (‘ForCausalLM’, ‘ForConditionalGeneration’, ‘NemotronH_Nano_VL_V2’)

bridge.models.conversion.auto_bridge.SUPPORTED_HF_ARCHITECTURES_DISPLAY#: ‘join(…)’

class bridge.models.conversion.auto_bridge.AutoBridge( hf_pretrained: megatron.bridge.models.hf_pretrained.causal_lm.PreTrainedCausalLM | transformers.configuration_utils.PretrainedConfig, )#

Bases: typing.Generic[bridge.models.conversion.auto_bridge.MegatronModelT]

Automatically select and instantiate the appropriate bridge for a model.

This unified bridge class combines automatic model detection with full bridge functionality for converting models between HuggingFace and Megatron formats. It handles the conversion of causal language models (e.g., GPT, Llama, Phi) between HuggingFace’s transformers library format and Megatron-Core’s distributed training format. It manages weight mapping, tensor parallelism distribution, and configuration translation.

The bridge supports both directions of conversion:

HuggingFace → Megatron: For training or inference with Megatron
Megatron → HuggingFace: For saving trained models in HF format

Parameters:: hf_pretrained – Either a PreTrainedCausalLM instance with loaded model, or a PretrainedConfig for configuration-only operations

.. rubric:: Example

Load and convert a model to Megatron format

bridge = AutoBridge.from_hf_pretrained(“meta-llama/Meta-Llama-3-8B”) provider = bridge.to_megatron_provider() megatron_model = provider.provide_distributed_model(wrap_with_ddp=False)

Export a Megatron model back to HuggingFace format

bridge.save_hf_pretrained(megatron_model, “./exported_model”)

Convert weights with custom settings

for name, weight in bridge.export_hf_weights( … megatron_model, … cpu=True … ): … print(f”Exported {name}: {weight.shape}”)

Check if a model is supported before loading

if AutoBridge.can_handle(“microsoft/phi-2”): … bridge = AutoBridge.from_hf_pretrained(“microsoft/phi-2”)

.. note::

The bridge automatically detects the model architecture and applies the appropriate weight mappings. Custom architectures require implementing a MegatronModelBridge subclass.

Initialization

classmethod list_supported_models() → list[str]#

List all model architectures currently supported by the bridge system.

Returns:: List of supported HuggingFace model architecture names

classmethod supports(config: Any) → bool#

Check if this bridge supports the given model configuration.

A model is supported if it has at least one architecture ending with one of the suffixes listed in SUPPORTED_HF_ARCHITECTURES.

Parameters:: config – HuggingFace model config object
Returns:: True if this bridge can handle the model, False otherwise

classmethod from_hf_config( config: transformers.configuration_utils.PretrainedConfig, ) → bridge.models.conversion.auto_bridge.AutoBridge#

Create an AutoBridge from a HuggingFace configuration.

This method creates a bridge instance from just a model configuration, without loading any weights. This is useful for:

Creating Megatron models with random initialization
Working with model architectures without downloading weights
Testing and development scenarios

Parameters:: config – HuggingFace PretrainedConfig instance containing model architecture information
Returns:: Bridge instance configured for the architecture
Return type:: AutoBridge
Raises:: ValueError – If the configuration is not for a supported CausalLM model

.. rubric:: Example

from transformers import AutoConfig

Load just the configuration

config = AutoConfig.from_pretrained(“meta-llama/Meta-Llama-3-8B”)

Create bridge from config (no weights)

bridge = AutoBridge.from_hf_config(config)

Create Megatron model with random initialization

provider = bridge.to_megatron_provider(load_weights=False) model = provider.provide_distributed_model(wrap_with_ddp=False)

Or use for architecture exploration

transformer_config = bridge.transformer_config print(f”Hidden size: {transformer_config.hidden_size}”) print(f”Num layers: {transformer_config.num_layers}”)

.. seealso::

from_hf_pretrained: Create bridge with loaded weights transformer_config: Access the Megatron TransformerConfig

classmethod from_hf_pretrained(

path: Union[str, pathlib.Path],

**kwargs,

) → bridge.models.conversion.auto_bridge.AutoBridge#

Load an AutoBridge from a pretrained model, automatically detecting the model type.

This method loads a model from HuggingFace Hub or a local directory and creates a bridge instance ready for conversion operations. The model architecture is validated to ensure compatibility.

Parameters:

path – HuggingFace model ID or path to model directory Examples: “meta-llama/Meta-Llama-3-8B”, “./my_model”
**kwargs –
Additional arguments passed to HuggingFace from_hf_pretrained Common options include:
- torch_dtype: Model precision (torch.float16, torch.bfloat16)
- device_map: Device placement strategy (“auto”, “cuda:0”, etc.)
- trust_remote_code: Allow custom model code execution
- attn_implementation: Attention implementation (“flash_attention_2”, etc.)

Returns:

Bridge instance with loaded model

Return type:

AutoBridge

Raises:

ValueError – If the model architecture is not supported

.. rubric:: Example

Basic loading

bridge = AutoBridge.from_hf_pretrained(“gpt2”)

Load with specific settings

bridge = AutoBridge.from_hf_pretrained( … “meta-llama/Meta-Llama-3-8B”, … torch_dtype=torch.float16, … device_map=”auto” … )

Works with local paths too

bridge = AutoBridge.from_hf_pretrained(“/path/to/model”)

classmethod can_handle( path: Union[str, pathlib.Path], trust_remote_code: bool = False, ) → bool#

Check if the bridge can handle the model at the given path.

This method allows you to verify model compatibility before attempting to load it, which can be useful for validation or UI feedback.

Parameters:

path – Path to model directory or HuggingFace model ID Examples: “meta-llama/Meta-Llama-3-8B”, “/models/my_model”
trust_remote_code – Whether to trust remote code when loading config. Set to True for models that use custom modeling code.

Returns:

True if the bridge supports the model, False otherwise

Return type:

bool

.. rubric:: Example

Check if a model is supported

if AutoBridge.can_handle(“meta-llama/Meta-Llama-3-8B”): … print(“Model is supported!”) … else: … print(“Model requires a custom bridge implementation”)

load_hf_weights( model: list[bridge.models.conversion.auto_bridge.MegatronModelT], hf_path: str | pathlib.Path | None = None, allowed_mismatched_params: list[str] | None = None, ) → None#

Load HuggingFace weights into a Megatron model.

This method handles the conversion and distribution of weights from HuggingFace format to Megatron’s distributed format, including proper tensor parallel and pipeline parallel distribution.

Parameters:

model – List of Megatron model instances (one per virtual pipeline stage)
hf_path – Optional path to load weights from. If None, uses weights from the bridge’s hf_pretrained instance
allowed_mismatched_params – Optional list of parameter names or patterns to allow mismatch (skip instead of raise error).

Returns:

The input model with loaded weights

Raises:

ValueError – If hf_path is None and bridge was created without weights

.. rubric:: Example

Load weights from bridge’s pretrained model

bridge = AutoBridge.from_hf_pretrained(“gpt2”) megatron_model = create_megatron_model() # Your model creation bridge.load_hf_weights(megatron_model)

Load weights from a different checkpoint

bridge.load_hf_weights(megatron_model, “./finetuned_model”)

Load weights with allowed mismatched parameters

bridge.load_hf_weights( … megatron_model, … allowed_mismatched_params=[”.bias”, “decoder.layers.0.”] … )

export_hf_weights( model: list[bridge.models.conversion.auto_bridge.MegatronModelT], cpu: bool = False, show_progress: bool = True, conversion_tasks: Optional[List[megatron.bridge.models.conversion.model_bridge.WeightConversionTask]] = None, ) → Iterable[megatron.bridge.models.conversion.model_bridge.HFWeightTuple]#

Export Megatron model weights to HuggingFace format.

This method yields weight tensors in HuggingFace format, handling the gathering of distributed tensors and format conversion. It’s useful for streaming weight export or custom processing. All ranks get full tensors.

If the model contains LoRA adapters, they will be automatically merged into the base weights before export. This ensures the exported model contains the full fine-tuned weights.

Parameters:

model – Megatron model instance or list of instances
cpu – Whether to move tensors to CPU before yielding
show_progress – Display progress bar during export
conversion_tasks (Optional[List[WeightConversionTask]]) – Pre-built conversion tasks. If not provided, tasks will be built automatically from the models. Please note that this is an advanced feature and should be used with caution. The tasks needs to be built with the get_conversion_tasks method first and carefully adjust based on your needs.

Yields:

HFWeightTuple – Named tuples of (param_name, weight_tensor)

.. rubric:: Example

Export and process weights

for name, weight in bridge.export_hf_weights(model): … print(f”{name}: {weight.shape}”)

Export with specific settings

weights = list(bridge.export_hf_weights( … model, … cpu=True … ))

save_hf_pretrained( model: list[bridge.models.conversion.auto_bridge.MegatronModelT], path: str | pathlib.Path, show_progress: bool = True, source_path: Optional[Union[str, pathlib.Path]] = None, strict: bool = True, ) → None#

Save a Megatron model in HuggingFace format.

This method exports the complete model including configuration, tokenizer, and weights to a directory that can be loaded with HuggingFace’s from_pretrained methods.

If the model contains LoRA adapters, they will be automatically merged into the base weights before saving. This ensures the saved model contains the full fine-tuned weights.

If the original model was loaded with trust_remote_code=True, any custom modeling files (e.g., modeling_.py, configuration_.py) will be preserved to ensure the saved model can be loaded properly.

Parameters:

model – Megatron model instance or list of instances
path – Directory path to save the model
show_progress – Display progress bar during weight export
source_path – Path to the directory containing custom modeling files to be preserved. This is useful when converting from Megatron checkpoints where the original HuggingFace model with custom modeling files needs to be referenced. If not specified, the path will be automatically determined from the HuggingFace configuration.
strict – Whether to perform strict validation during weight export

.. rubric:: Example

Save model after training

bridge.save_hf_pretrained(megatron_model, “./my_finetuned_model”)

Load the saved model with HuggingFace

from transformers import AutoModelForCausalLM hf_model = AutoModelForCausalLM.from_pretrained(“./my_finetuned_model”)

.. note::

This method is collective - all ranks must call it. Only rank 0 saves the configuration files, while weight saving is coordinated across all ranks.

save_hf_weights( model: list[bridge.models.conversion.auto_bridge.MegatronModelT], path: str | pathlib.Path, show_progress: bool = True, strict: bool = True, ) → None#

Save Megatron model weights in HuggingFace safetensors format.

This method exports only the model weights (not configuration or tokenizer) to safetensors files compatible with HuggingFace. It uses streaming save to handle large models efficiently without requiring all weights in memory at once.

If the model contains LoRA adapters, they will be automatically merged into the base weights before saving. This ensures the saved weights contain the full fine-tuned parameters.

The weights are gathered from distributed ranks and saved in the standard HuggingFace sharded format when the model is large.

Parameters:

model – Megatron model instance or list of instances
path – Directory path where weight files will be saved
show_progress – Display progress bar during export

Raises:

ValueError – If the state source doesn’t support streaming save

.. rubric:: Example

Save just the weights

bridge.save_hf_weights(megatron_model, “./model_weights”)

Save without progress bar (useful in scripts)

bridge.save_hf_weights(megatron_model, “./weights”, show_progress=False)

.. note::

This method is collective and must be called by all ranks
Uses safetensors format for efficient loading and security
Automatically handles model sharding for large models
The saved weights can be loaded with HuggingFace’s from_pretrained

save_megatron_model( model: list[megatron.core.transformer.module.MegatronModule], path: str | pathlib.Path, hf_tokenizer_path: Optional[str | pathlib.Path] = None, ) → None#

Save a Megatron model in native Megatron checkpoint format without optimizer state.

This method saves the model in Megatron’s native checkpoint format, which can be loaded directly by Megatron for training or inference. The checkpoint includes the model configuration and weights, NO optimizer state or other artifacts.

Parameters:

model – Megatron model instance or list of instances
path – Directory path where the checkpoint will be saved
hf_tokenizer_path – Optional HuggingFace model ID or path for tokenizer metadata. If provided, the tokenizer metadata will be included in the checkpoint.

.. rubric:: Example

Save model checkpoint after conversion

bridge.save_megatron_model(megatron_model, “./megatron_checkpoint”)

Save model checkpoint with tokenizer metadata

bridge.save_megatron_model( … megatron_model, … “./megatron_checkpoint”, … hf_tokenizer_path=”meta-llama/Meta-Llama-3-8B” … )

.. note::

This method is collective and must be called by all ranks
The saved checkpoint can be loaded with Megatron’s checkpoint loading utilities
The checkpoint format follows Megatron’s standard structure for compatibility

load_megatron_model(

path: str | pathlib.Path,

*,

mp_overrides: megatron.bridge.models.model_provider.ModelParallelKwargs | None = None,

**kwargs: typing_extensions.Unpack[megatron.bridge.models.model_provider.GetModelKwargs],

) → list[bridge.models.conversion.auto_bridge.MegatronModelT]#

Load a Megatron model from a native Megatron checkpoint.

This method loads a model from a Megatron checkpoint that was saved using the save_megatron_model method. It reads the checkpoint configuration, creates the appropriate model provider, and loads the weights.

Parameters:

path – Directory path where the Megatron checkpoint is stored
mp_overrides – Optional model-parallel overrides to apply to the loaded config.
**kwargs – Additional arguments passed to the model provider

Returns:

List of Megatron model instances loaded from the checkpoint

.. rubric:: Example

Load a previously saved Megatron model

bridge = AutoBridge.from_hf_config(config) model = bridge.load_megatron_model(“./megatron_checkpoint”)

Load and specify model configuration

model = bridge.load_megatron_model( … “./megatron_checkpoint”, … wrap_with_ddp=False … )

.. note::

This method is collective and must be called by all ranks
The checkpoint must have been saved with save_megatron_model
The model architecture must match the bridge configuration

classmethod import_ckpt(

hf_model_id: str | pathlib.Path,

megatron_path: str | pathlib.Path,

**kwargs,

) → None#

Import a HuggingFace model and save it as a Megatron checkpoint.

This is a convenience method that combines loading a HuggingFace model, converting it to Megatron format, and saving it as a native Megatron checkpoint. This is useful for preparing models for Megatron training or creating Megatron checkpoints from pretrained HuggingFace models.

Parameters:

hf_model_id – HuggingFace model ID or path to model directory Examples: “meta-llama/Meta-Llama-3-8B”, “./my_model”
megatron_path – Directory path where the Megatron checkpoint will be saved
**kwargs –
Additional arguments passed to from_hf_pretrained Common options include:
- torch_dtype: Model precision (torch.float16, torch.bfloat16)
- device_map: Device placement strategy (“auto”, “cuda:0”, etc.)
- trust_remote_code: Allow custom model code execution
- attn_implementation: Attention implementation (“flash_attention_2”, etc.)

.. rubric:: Example

Basic import

AutoBridge.import_ckpt( … “meta-llama/Meta-Llama-3-8B”, … “./megatron_checkpoints/llama3_8b” … )

Import with specific settings

AutoBridge.import_ckpt( … “meta-llama/Meta-Llama-3-8B”, … “./megatron_checkpoints/llama3_8b”, … torch_dtype=torch.float16, … device_map=”auto” … )

export_ckpt( megatron_path: str | pathlib.Path, hf_path: str | pathlib.Path, show_progress: bool = True, strict: bool = False, source_path: Optional[Union[str, pathlib.Path]] = None, ) → None#

Export a Megatron checkpoint to HuggingFace format.

This is a convenience method that loads a Megatron checkpoint and exports it to HuggingFace format. This is useful for sharing trained models or deploying them with HuggingFace inference tools.

Parameters:

megatron_path – Directory path where the Megatron checkpoint is stored
hf_path – Directory path where the HuggingFace model will be saved
show_progress – Display progress bar during weight export
strict – Whether to perform strict validation during weight export
source_path – Path to the directory containing custom modeling files to be preserved. This is useful when converting from Megatron checkpoints where the original HuggingFace model with custom modeling files needs to be referenced. If not specified, the path will be automatically determined from the HuggingFace configuration.

.. rubric:: Example

Basic export

bridge = AutoBridge.from_hf_config(config) bridge.export_ckpt( … “./megatron_checkpoints/my_model”, … “./hf_exports/my_model” … )

Export with specific settings

bridge.export_ckpt( … “./megatron_checkpoints/my_model”, … “./hf_exports/my_model”, … show_progress=False … )

Load the exported model with HuggingFace

from transformers import AutoModelForCausalLM hf_model = AutoModelForCausalLM.from_pretrained(“./hf_exports/my_model”)

push_to_hub(path: str | pathlib.Path) → None#

to_megatron_model(

load_weights: bool = True,

hf_path: str | pathlib.Path | None = None,

**kwargs: typing_extensions.Unpack[megatron.bridge.models.model_provider.GetModelKwargs],

) → list[bridge.models.conversion.auto_bridge.MegatronModelT]#

to_megatron_provider( load_weights: bool = True, hf_path: str | pathlib.Path | None = None, ) → megatron.bridge.models.gpt_provider.GPTModelProvider#

Convert to a Megatron model provider.

This method creates a GPTModelProvider configured to match the HuggingFace model’s architecture. The provider can then be used to instantiate Megatron models for training or inference.

Parameters:

load_weights – Whether to configure the provider to load weights from HuggingFace format. If False, creates model with random initialization.
hf_path – Optional path to load weights from. If None, uses weights from the bridge’s hf_pretrained instance. Useful for loading weights from a different checkpoint.

Returns:

A configured model provider ready to create Megatron models

Return type:

GPTModelProvider

.. rubric:: Example

Create provider and model with loaded weights

bridge = AutoBridge.from_hf_pretrained(“meta-llama/Meta-Llama-3-8B”) provider = bridge.to_megatron_provider() model = provider.get_model()

Create provider without loading weights (for training from scratch)

provider = bridge.to_megatron_provider(load_weights=False) model = provider.get_model() # Random initialization

Load weights from a different checkpoint

bridge = AutoBridge.from_hf_config(config) # Config only provider = bridge.to_megatron_provider(hf_path=”./finetuned_model”) model = provider.get_model() # Loads finetuned weights

.. seealso::

GPTModelProvider: The provider class for creating models load_weights: Method to load weights into existing models

static get_hf_model_id_from_checkpoint( path: str | pathlib.Path, ) → str | None#

Get the HuggingFace model identifier stored in a checkpoint.

Parameters:: path – Path to a Megatron checkpoint directory, either the root directory containing iteration subdirectories or a specific iteration directory.
Returns:: The HuggingFace model ID or path recorded in the checkpoint metadata if present, otherwise None.

get_conversion_tasks( megatron_model: Union[bridge.models.conversion.auto_bridge.MegatronModelT, List[bridge.models.conversion.auto_bridge.MegatronModelT]], hf_path: str | pathlib.Path | None = None, ) → List[megatron.bridge.models.conversion.model_bridge.WeightConversionTask]#

Get the conversion tasks for weight conversion between HuggingFace and Megatron formats.

This method returns the planned conversion tasks that would be executed during weight conversion in either direction. Each task contains information about parameter mappings, source and target parameters, and the conversion logic required.

The tasks can be used for both HF→Megatron and Megatron→HF conversions since they contain bidirectional mapping information.

Parameters:

megatron_model – Megatron model instance or list of instances (one per virtual pipeline stage) that participate in the conversion.
hf_path – Optional path to load HF weights from. If None, uses weights from the bridge’s hf_pretrained instance.

Returns:

List of conversion tasks that would be executed. Each task contains: - param_name: Megatron parameter name - mapping: The parameter mapping object handling the conversion - pp_rank: Pipeline parallel rank that owns the parameter - vp_stage: Virtual pipeline stage index - megatron_module: Reference to the Megatron module owning the parameter - param_weight: The actual parameter tensor

Return type:

List[WeightConversionTask]

.. rubric:: Example

bridge = AutoBridge.from_hf_pretrained(“meta-llama/Llama-3.2-1B”) megatron_model = bridge.to_megatron_model(load_weights=False, wrap_with_ddp=False) tasks = bridge.get_conversion_tasks(megatron_model)

for task in tasks: … # For HF→Megatron direction … print(f”HF param {task.mapping.hf_param} -> Megatron param {task.param_name}”) … … # For Megatron→HF direction … hf_params = task.mapping.hf_param … if isinstance(hf_params, str): … print(f”Megatron param {task.param_name} -> HF param {hf_params}”) … else: … print(f”Megatron param {task.param_name} -> HF params {list(hf_params.values())}”) … … print(f” Mapping type: {type(task.mapping).name}”) … print(f” PP rank: {task.pp_rank}, VP stage: {task.vp_stage}”)

.. note::

This method is useful for:

Debugging weight conversion issues in both directions
Understanding parameter mappings between formats
Custom weight conversion implementations
Analyzing model structure differences
Verifying parameter alignment and shapes

property transformer_config: megatron.core.transformer.transformer_config.TransformerConfig#

property mla_transformer_config: megatron.core.transformer.transformer_config.MLATransformerConfig#

property _model_bridge: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge#

_causal_lm_architecture()#

Resolve the model’s CausalLM architecture for dispatch.

Behavior:

If the model can be imported from transformers directly, return the actual transformers class object.
Otherwise, if the model uses HuggingFace auto_map, return the architecture’s class name as a string (e.g., “DeepseekV2ForCausalLM”).

Returns:: The transformers class for the CausalLM architecture or the architecture’s class name as a string for auto_map models.
Return type:: str | type
Raises:: ValueError – If no CausalLM architecture is found or cannot be resolved.

classmethod _validate_config( config: transformers.configuration_utils.PretrainedConfig, path: str | None = None, ) → None#

_get_model_instance( model: list[bridge.models.conversion.auto_bridge.MegatronModelT], ) → bridge.models.conversion.auto_bridge.MegatronModelT#

_create_config_from_provider( source_obj: Any, target_dataclass: Type[bridge.models.conversion.auto_bridge.DataclassT], ) → bridge.models.conversion.auto_bridge.DataclassT#

__repr__() → str#

bridge.models.conversion.auto_bridge#

Module Contents#

Classes#

Data#

API#

`bridge.models.conversion.auto_bridge`#