bridge.models.megatron_mimo.conversion.orchestrator#

MegatronMIMO HF<->Megatron weight conversion orchestrator.

Module Contents#

Classes#

MIMOComponent

One MIMO component route in the source bridge mapping registry.

MIMOConversionTask

A standard conversion task annotated with its MIMO route.

MegatronMIMOBridge

AutoBridge subclass for MegatronMIMO checkpoint conversion.

Functions#

validate_route_table

Validate component routes against the MIMO parallelism config.

_check_no_prefix_overlap

register_mimo_conversion_spec

Register a MIMO conversion spec builder for a standard bridge class.

_build_default_mimo_provider

Build a MIMO provider from the source bridge’s standard provider.

_build_default_mimo_conversion_spec

Build MIMO provider/routes from standard bridge/provider metadata.

_build_default_mimo_routes

get_mimo_conversion_spec

Resolve an explicit or metadata-derived MIMO conversion spec builder.

supports_mimo_conversion

Return whether a standard bridge advertises MIMO conversion support.

_reset_registry_for_tests

Clear all registered conversion specs. Test-only helper.

build_route_local_registry

Filter and prefix-strip a source mapping registry for one MIMO route.

make_route_local_bridge

Clone a source bridge and override its registry for one route.

_bridged_parallel_state

Temporarily set Megatron-Core parallel_state globals from a MIMO pg_collection.

component_pg_context

Temporarily attach pg_collection to a module for the duration of conversion.

_iter_active_routes

Yield (route, pg_collection) pairs for components this rank owns.

import_hf_to_megatron_mimo

Import HF weights into a constructed MegatronMIMO model.

export_megatron_mimo_to_hf

Export a MegatronMIMO model to HF format, yielding (name, tensor) pairs.

save_hf_pretrained_mimo

Save a MegatronMIMO model in HuggingFace format.

_copy_hf_artifacts

_stream_mimo_weights_to_rank0

Stream global HF tensors on rank 0 while all active ranks drain collectives.

_is_component_export_representative

_process_group_rank

_is_safetensors_source

Data#

API#

bridge.models.megatron_mimo.conversion.orchestrator.logger#

‘getLogger(…)’

class bridge.models.megatron_mimo.conversion.orchestrator.MIMOComponent#

One MIMO component route in the source bridge mapping registry.

name: str#

None

source_prefix: str#

None

target_module_path: str#

None

__post_init__() None#
bridge.models.megatron_mimo.conversion.orchestrator.validate_route_table(
routes: list[bridge.models.megatron_mimo.conversion.orchestrator.MIMOComponent],
*,
parallelism_config: megatron.bridge.models.megatron_mimo.megatron_mimo_config.MegatronMIMOParallelismConfig,
modality_submodules_spec: Optional[dict[str, megatron.core.transformer.spec_utils.ModuleSpec]] = None,
) None#

Validate component routes against the MIMO parallelism config.

bridge.models.megatron_mimo.conversion.orchestrator._check_no_prefix_overlap(
routes: list[bridge.models.megatron_mimo.conversion.orchestrator.MIMOComponent],
) None#
bridge.models.megatron_mimo.conversion.orchestrator.MIMOConversionSpecBuilder#

None

bridge.models.megatron_mimo.conversion.orchestrator._CONVERSION_SPECS: dict[type, bridge.models.megatron_mimo.conversion.orchestrator.MIMOConversionSpecBuilder]#

None

bridge.models.megatron_mimo.conversion.orchestrator.register_mimo_conversion_spec(
source_bridge_class: type,
) Callable[[bridge.models.megatron_mimo.conversion.orchestrator.MIMOConversionSpecBuilder], bridge.models.megatron_mimo.conversion.orchestrator.MIMOConversionSpecBuilder]#

Register a MIMO conversion spec builder for a standard bridge class.

bridge.models.megatron_mimo.conversion.orchestrator._build_default_mimo_provider(
source_bridge: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge,
hf_pretrained: Any,
parallelism_config: megatron.bridge.models.megatron_mimo.megatron_mimo_config.MegatronMIMOParallelismConfig,
) megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOProvider#

Build a MIMO provider from the source bridge’s standard provider.

bridge.models.megatron_mimo.conversion.orchestrator._build_default_mimo_conversion_spec(
source_bridge: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge,
hf_pretrained: Any,
parallelism_config: megatron.bridge.models.megatron_mimo.megatron_mimo_config.MegatronMIMOParallelismConfig,
) tuple[megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOProvider, list[bridge.models.megatron_mimo.conversion.orchestrator.MIMOComponent]]#

Build MIMO provider/routes from standard bridge/provider metadata.

bridge.models.megatron_mimo.conversion.orchestrator._build_default_mimo_routes(
source_bridge: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge,
standard_provider: object,
) list[bridge.models.megatron_mimo.conversion.orchestrator.MIMOComponent]#
bridge.models.megatron_mimo.conversion.orchestrator.get_mimo_conversion_spec(
source_bridge_class: type,
) bridge.models.megatron_mimo.conversion.orchestrator.MIMOConversionSpecBuilder#

Resolve an explicit or metadata-derived MIMO conversion spec builder.

bridge.models.megatron_mimo.conversion.orchestrator.supports_mimo_conversion(source_bridge_class: type) bool#

Return whether a standard bridge advertises MIMO conversion support.

bridge.models.megatron_mimo.conversion.orchestrator._reset_registry_for_tests() None#

Clear all registered conversion specs. Test-only helper.

bridge.models.megatron_mimo.conversion.orchestrator.build_route_local_registry(
source_registry: megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry,
route: bridge.models.megatron_mimo.conversion.orchestrator.MIMOComponent,
) megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#

Filter and prefix-strip a source mapping registry for one MIMO route.

bridge.models.megatron_mimo.conversion.orchestrator.make_route_local_bridge(
source_bridge: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge,
route: bridge.models.megatron_mimo.conversion.orchestrator.MIMOComponent,
*,
route_local_registry: megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry | None = None,
) megatron.bridge.models.conversion.model_bridge.MegatronModelBridge#

Clone a source bridge and override its registry for one route.

class bridge.models.megatron_mimo.conversion.orchestrator.MIMOConversionTask#

A standard conversion task annotated with its MIMO route.

route: bridge.models.megatron_mimo.conversion.orchestrator.MIMOComponent#

None

task: megatron.bridge.models.conversion.model_bridge.WeightConversionTask#

None

class bridge.models.megatron_mimo.conversion.orchestrator.MegatronMIMOBridge(
hf_pretrained: megatron.bridge.models.hf_pretrained.causal_lm.PreTrainedCausalLM | transformers.configuration_utils.PretrainedConfig,
*,
parallelism_config: megatron.bridge.models.megatron_mimo.megatron_mimo_config.MegatronMIMOParallelismConfig,
source_bridge: Optional[megatron.bridge.models.conversion.model_bridge.MegatronModelBridge] = None,
)#

Bases: megatron.bridge.models.conversion.auto_bridge.AutoBridge

AutoBridge subclass for MegatronMIMO checkpoint conversion.

Initialization

classmethod from_bridge(
bridge: megatron.bridge.models.conversion.auto_bridge.AutoBridge,
*,
parallelism_config: megatron.bridge.models.megatron_mimo.megatron_mimo_config.MegatronMIMOParallelismConfig,
) bridge.models.megatron_mimo.conversion.orchestrator.MegatronMIMOBridge#

Create a MIMO bridge from a resolved standard bridge.

classmethod from_hf_pretrained(
path: Union[str, pathlib.Path],
*,
parallelism_config: megatron.bridge.models.megatron_mimo.megatron_mimo_config.MegatronMIMOParallelismConfig,
**kwargs,
) bridge.models.megatron_mimo.conversion.orchestrator.MegatronMIMOBridge#

Resolve the standard bridge from HF, then wrap it for MIMO.

abstractmethod classmethod from_hf_config(
config: transformers.configuration_utils.PretrainedConfig,
) bridge.models.megatron_mimo.conversion.orchestrator.MegatronMIMOBridge#
abstractmethod classmethod from_auto_config(
megatron_path: str,
hf_model_id: str,
trust_remote_code: bool = False,
) megatron.bridge.models.conversion.auto_bridge.AutoBridge#
property _model_bridge: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge#
property routes: list[bridge.models.megatron_mimo.conversion.orchestrator.MIMOComponent]#

Return the route table resolved for this source bridge.

abstractmethod to_megatron_provider(
load_weights: bool = False,
hf_path: str | pathlib.Path | None = None,
) megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOProvider#

Use to_megatron_mimo_provider() for MegatronMIMO conversion.

to_megatron_mimo_provider(
load_weights: bool = False,
hf_path: str | pathlib.Path | None = None,
) megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOProvider#

Build the MIMO provider and route table for this HF source.

validate_mimo_conversion_support() None#

Validate MIMO conversion support by resolving the real provider and routes.

to_megatron_model(
load_weights: bool = True,
hf_path: str | pathlib.Path | None = None,
**kwargs,
) list[megatron.core.transformer.module.MegatronModule]#

Build a distributed MIMO model and optionally import HF weights.

build_mimo_model(
*,
mimo_provider: Optional[megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOProvider] = None,
ddp_config: Optional[megatron.core.distributed.DistributedDataParallelConfig] = None,
fp16: bool = False,
bf16: bool = True,
seed: int = 0,
wrap_with_ddp: bool = True,
data_parallel_random_init: bool = True,
) megatron.core.models.mimo.MimoModel#

Build the MIMO model and cache its infrastructure.

load_hf_weights(
model: megatron.core.models.mimo.MimoModel | list[megatron.core.models.mimo.MimoModel],
hf_path: str | pathlib.Path | None = None,
allowed_mismatched_params: list[str] | None = None,
*,
infra: Optional[megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOInfra] = None,
) megatron.core.models.mimo.MimoModel#

Load HF weights into a constructed MegatronMIMO model.

export_hf_weights(
model: megatron.core.models.mimo.MimoModel | list[megatron.core.models.mimo.MimoModel],
cpu: bool = True,
show_progress: bool = True,
conversion_tasks: dict[str, list[megatron.bridge.models.conversion.model_bridge.WeightConversionTask]] | None = None,
merge_adapter_weights: bool = True,
*,
infra: Optional[megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOInfra] = None,
) Iterable[megatron.bridge.models.conversion.model_bridge.HFWeightTuple]#

Export MIMO weights as a rank-0 HF tensor stream.

get_conversion_tasks(
megatron_model: megatron.core.models.mimo.MimoModel | list[megatron.core.models.mimo.MimoModel],
hf_path: str | pathlib.Path | None = None,
*,
infra: Optional[megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOInfra] = None,
) list[bridge.models.megatron_mimo.conversion.orchestrator.MIMOConversionTask]#

Return route-annotated conversion tasks for active MIMO components.

save_hf_pretrained(
model: megatron.core.models.mimo.MimoModel | list[megatron.core.models.mimo.MimoModel],
path: str | pathlib.Path,
show_progress: bool = True,
source_path: Optional[Union[str, pathlib.Path]] = None,
strict: bool = False,
*,
infra: Optional[megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOInfra] = None,
) None#

Save a MegatronMIMO model in HuggingFace format.

save_megatron_model(
model: megatron.core.models.mimo.MimoModel | list[megatron.core.models.mimo.MimoModel],
path: str | pathlib.Path,
hf_tokenizer_path: Optional[str | pathlib.Path] = None,
low_memory_save: bool = False,
hf_tokenizer_kwargs: Optional[dict] = None,
*,
infra: Optional[megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOInfra] = None,
mimo_provider: Optional[megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOProvider] = None,
) None#

Save a MegatronMIMO checkpoint.

load_megatron_model(
path: str | pathlib.Path,
*,
parallelism_config: Optional[megatron.bridge.models.megatron_mimo.megatron_mimo_config.MegatronMIMOParallelismConfig] = None,
ddp_config: Optional[megatron.core.distributed.DistributedDataParallelConfig] = None,
fp16: bool = False,
bf16: bool = True,
wrap_with_ddp: bool = False,
data_parallel_random_init: bool = False,
) megatron.core.models.mimo.MimoModel#

Load a MegatronMIMO checkpoint and cache its provider/infra.

import_ckpt(
megatron_path: str | pathlib.Path,
*,
hf_tokenizer_path: Optional[str | pathlib.Path] = None,
hf_tokenizer_kwargs: Optional[dict] = None,
) None#

Import HF weights and write a MegatronMIMO checkpoint.

export_ckpt(
megatron_path: str | pathlib.Path,
hf_path: str | pathlib.Path,
show_progress: bool = True,
strict: bool = False,
source_path: Optional[Union[str, pathlib.Path]] = None,
) None#

Load a MegatronMIMO checkpoint and export it to HuggingFace.

abstractmethod export_adapter_weights(*args, **kwargs)#
abstractmethod save_hf_adapter(*args, **kwargs) None#
abstractmethod export_adapter_ckpt(*args, **kwargs) None#
_resolve_hf_pretrained(
hf_path: str | pathlib.Path | None,
) Any#
_require_infra(
infra: Optional[megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOInfra] = None,
) megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOInfra#
_require_provider() megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOProvider#
static _coerce_mimo_model(
model: megatron.core.models.mimo.MimoModel | list[megatron.core.models.mimo.MimoModel],
) megatron.core.models.mimo.MimoModel#
_hf_identifier() str | None#
bridge.models.megatron_mimo.conversion.orchestrator._bridged_parallel_state(
pg_collection: Any,
) Iterator[None]#

Temporarily set Megatron-Core parallel_state globals from a MIMO pg_collection.

MIMO never initialises the MCore parallel_state globals, but the standard bridge reads them directly. This context bridges per-route groups in and restores them on exit.

bridge.models.megatron_mimo.conversion.orchestrator.component_pg_context(
module: torch.nn.Module,
pg_collection: Any,
) Iterator[None]#

Temporarily attach pg_collection to a module for the duration of conversion.

If the module already carries a pg_collection (the normal MIMO-provider path), it is trusted and not overwritten. Otherwise the supplied pg_collection is attached and removed on exit.

bridge.models.megatron_mimo.conversion.orchestrator._iter_active_routes(
routes: list[bridge.models.megatron_mimo.conversion.orchestrator.MIMOComponent],
pg_collections: dict[str, Any],
) Iterator[tuple[bridge.models.megatron_mimo.conversion.orchestrator.MIMOComponent, Any]]#

Yield (route, pg_collection) pairs for components this rank owns.

Skips any route whose pg_collections.get(route.name) is None. Raises if a route name is missing from pg_collections entirely — that means the MIMO infra was built with a different component set than the route table declares.

bridge.models.megatron_mimo.conversion.orchestrator.import_hf_to_megatron_mimo(
*,
source_bridge: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge,
hf_pretrained: Any,
mimo_model: torch.nn.Module,
routes: list[bridge.models.megatron_mimo.conversion.orchestrator.MIMOComponent],
pg_collections: dict[str, Any],
allowed_mismatched_params: list[str] | None = None,
) torch.nn.Module#

Import HF weights into a constructed MegatronMIMO model.

Drives MegatronModelBridge.load_weights_hf_to_megatron once per active route with a prefix-stripped registry and the route’s pg_collection. Returns mimo_model for convenience.

bridge.models.megatron_mimo.conversion.orchestrator.export_megatron_mimo_to_hf(
*,
source_bridge: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge,
hf_pretrained: Any,
mimo_model: torch.nn.Module,
routes: list[bridge.models.megatron_mimo.conversion.orchestrator.MIMOComponent],
pg_collections: dict[str, Any],
cpu: bool = True,
show_progress: bool = True,
conversion_tasks: dict[str, list[Any]] | None = None,
merge_adapter_weights: bool = True,
) Iterator[megatron.bridge.models.conversion.model_bridge.HFWeightTuple]#

Export a MegatronMIMO model to HF format, yielding (name, tensor) pairs.

Drives MegatronModelBridge.stream_weights_megatron_to_hf once per active route. HF names are unchanged from the source bridge — only the Megatron-side megatron_param is prefix-stripped, so routes produce disjoint subsets of the HF state dict.

bridge.models.megatron_mimo.conversion.orchestrator.save_hf_pretrained_mimo(
bridge: megatron.bridge.models.conversion.auto_bridge.AutoBridge,
mimo_model: torch.nn.Module,
routes: list[bridge.models.megatron_mimo.conversion.orchestrator.MIMOComponent],
pg_collections: dict[str, Any],
path: Union[str, pathlib.Path],
*,
source_path: Optional[Union[str, pathlib.Path]] = None,
strict: bool = False,
show_progress: bool = True,
) None#

Save a MegatronMIMO model in HuggingFace format.

bridge.models.megatron_mimo.conversion.orchestrator._copy_hf_artifacts(
bridge: megatron.bridge.models.conversion.auto_bridge.AutoBridge,
output_path: pathlib.Path,
*,
source_path: Optional[Union[str, pathlib.Path]] = None,
) None#
bridge.models.megatron_mimo.conversion.orchestrator._stream_mimo_weights_to_rank0(
*,
source_bridge: Any,
hf_pretrained: Any,
mimo_model: torch.nn.Module,
routes: list[bridge.models.megatron_mimo.conversion.orchestrator.MIMOComponent],
pg_collections: dict[str, Any],
show_progress: bool,
) Iterator[tuple[str, torch.Tensor]]#

Stream global HF tensors on rank 0 while all active ranks drain collectives.

bridge.models.megatron_mimo.conversion.orchestrator._is_component_export_representative(pg_collection: Any) bool#
bridge.models.megatron_mimo.conversion.orchestrator._process_group_rank(group: Any) int#
bridge.models.megatron_mimo.conversion.orchestrator._is_safetensors_source(
bridge: megatron.bridge.models.conversion.auto_bridge.AutoBridge,
) bool#