bridge.models.megatron_mimo.conversion.orchestrator#
MegatronMIMO HF<->Megatron weight conversion orchestrator.
Module Contents#
Classes#
One MIMO component route in the source bridge mapping registry. |
|
A standard conversion task annotated with its MIMO route. |
|
AutoBridge subclass for MegatronMIMO checkpoint conversion. |
Functions#
Validate component routes against the MIMO parallelism config. |
|
Register a MIMO conversion spec builder for a standard bridge class. |
|
Build a MIMO provider from the source bridge’s standard provider. |
|
Build MIMO provider/routes from standard bridge/provider metadata. |
|
Resolve an explicit or metadata-derived MIMO conversion spec builder. |
|
Return whether a standard bridge advertises MIMO conversion support. |
|
Clear all registered conversion specs. Test-only helper. |
|
Filter and prefix-strip a source mapping registry for one MIMO route. |
|
Clone a source bridge and override its registry for one route. |
|
Temporarily set Megatron-Core |
|
Temporarily attach |
|
Yield (route, pg_collection) pairs for components this rank owns. |
|
Import HF weights into a constructed MegatronMIMO model. |
|
Export a MegatronMIMO model to HF format, yielding |
|
Save a MegatronMIMO model in HuggingFace format. |
|
Stream global HF tensors on rank 0 while all active ranks drain collectives. |
|
Data#
API#
- bridge.models.megatron_mimo.conversion.orchestrator.logger#
‘getLogger(…)’
- class bridge.models.megatron_mimo.conversion.orchestrator.MIMOComponent#
One MIMO component route in the source bridge mapping registry.
- name: str#
None
- source_prefix: str#
None
- target_module_path: str#
None
- __post_init__() None#
- bridge.models.megatron_mimo.conversion.orchestrator.validate_route_table(
- routes: list[bridge.models.megatron_mimo.conversion.orchestrator.MIMOComponent],
- *,
- parallelism_config: megatron.bridge.models.megatron_mimo.megatron_mimo_config.MegatronMIMOParallelismConfig,
- modality_submodules_spec: Optional[dict[str, megatron.core.transformer.spec_utils.ModuleSpec]] = None,
Validate component routes against the MIMO parallelism config.
- bridge.models.megatron_mimo.conversion.orchestrator._check_no_prefix_overlap( ) None#
- bridge.models.megatron_mimo.conversion.orchestrator.MIMOConversionSpecBuilder#
None
- bridge.models.megatron_mimo.conversion.orchestrator._CONVERSION_SPECS: dict[type, bridge.models.megatron_mimo.conversion.orchestrator.MIMOConversionSpecBuilder]#
None
- bridge.models.megatron_mimo.conversion.orchestrator.register_mimo_conversion_spec(
- source_bridge_class: type,
Register a MIMO conversion spec builder for a standard bridge class.
- bridge.models.megatron_mimo.conversion.orchestrator._build_default_mimo_provider(
- source_bridge: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge,
- hf_pretrained: Any,
- parallelism_config: megatron.bridge.models.megatron_mimo.megatron_mimo_config.MegatronMIMOParallelismConfig,
Build a MIMO provider from the source bridge’s standard provider.
- bridge.models.megatron_mimo.conversion.orchestrator._build_default_mimo_conversion_spec(
- source_bridge: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge,
- hf_pretrained: Any,
- parallelism_config: megatron.bridge.models.megatron_mimo.megatron_mimo_config.MegatronMIMOParallelismConfig,
Build MIMO provider/routes from standard bridge/provider metadata.
- bridge.models.megatron_mimo.conversion.orchestrator._build_default_mimo_routes(
- source_bridge: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge,
- standard_provider: object,
- bridge.models.megatron_mimo.conversion.orchestrator.get_mimo_conversion_spec(
- source_bridge_class: type,
Resolve an explicit or metadata-derived MIMO conversion spec builder.
- bridge.models.megatron_mimo.conversion.orchestrator.supports_mimo_conversion(source_bridge_class: type) bool#
Return whether a standard bridge advertises MIMO conversion support.
- bridge.models.megatron_mimo.conversion.orchestrator._reset_registry_for_tests() None#
Clear all registered conversion specs. Test-only helper.
- bridge.models.megatron_mimo.conversion.orchestrator.build_route_local_registry(
- source_registry: megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry,
- route: bridge.models.megatron_mimo.conversion.orchestrator.MIMOComponent,
Filter and prefix-strip a source mapping registry for one MIMO route.
- bridge.models.megatron_mimo.conversion.orchestrator.make_route_local_bridge(
- source_bridge: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge,
- route: bridge.models.megatron_mimo.conversion.orchestrator.MIMOComponent,
- *,
- route_local_registry: megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry | None = None,
Clone a source bridge and override its registry for one route.
- class bridge.models.megatron_mimo.conversion.orchestrator.MIMOConversionTask#
A standard conversion task annotated with its MIMO route.
- task: megatron.bridge.models.conversion.model_bridge.WeightConversionTask#
None
- class bridge.models.megatron_mimo.conversion.orchestrator.MegatronMIMOBridge(
- hf_pretrained: megatron.bridge.models.hf_pretrained.causal_lm.PreTrainedCausalLM | transformers.configuration_utils.PretrainedConfig,
- *,
- parallelism_config: megatron.bridge.models.megatron_mimo.megatron_mimo_config.MegatronMIMOParallelismConfig,
- source_bridge: Optional[megatron.bridge.models.conversion.model_bridge.MegatronModelBridge] = None,
Bases:
megatron.bridge.models.conversion.auto_bridge.AutoBridgeAutoBridge subclass for MegatronMIMO checkpoint conversion.
Initialization
- classmethod from_bridge(
- bridge: megatron.bridge.models.conversion.auto_bridge.AutoBridge,
- *,
- parallelism_config: megatron.bridge.models.megatron_mimo.megatron_mimo_config.MegatronMIMOParallelismConfig,
Create a MIMO bridge from a resolved standard bridge.
- classmethod from_hf_pretrained(
- path: Union[str, pathlib.Path],
- *,
- parallelism_config: megatron.bridge.models.megatron_mimo.megatron_mimo_config.MegatronMIMOParallelismConfig,
- **kwargs,
Resolve the standard bridge from HF, then wrap it for MIMO.
- abstractmethod classmethod from_hf_config(
- config: transformers.configuration_utils.PretrainedConfig,
- abstractmethod classmethod from_auto_config(
- megatron_path: str,
- hf_model_id: str,
- trust_remote_code: bool = False,
- property _model_bridge: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge#
- property routes: list[bridge.models.megatron_mimo.conversion.orchestrator.MIMOComponent]#
Return the route table resolved for this source bridge.
- abstractmethod to_megatron_provider(
- load_weights: bool = False,
- hf_path: str | pathlib.Path | None = None,
Use to_megatron_mimo_provider() for MegatronMIMO conversion.
- to_megatron_mimo_provider(
- load_weights: bool = False,
- hf_path: str | pathlib.Path | None = None,
Build the MIMO provider and route table for this HF source.
- validate_mimo_conversion_support() None#
Validate MIMO conversion support by resolving the real provider and routes.
- to_megatron_model(
- load_weights: bool = True,
- hf_path: str | pathlib.Path | None = None,
- **kwargs,
Build a distributed MIMO model and optionally import HF weights.
- build_mimo_model(
- *,
- mimo_provider: Optional[megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOProvider] = None,
- ddp_config: Optional[megatron.core.distributed.DistributedDataParallelConfig] = None,
- fp16: bool = False,
- bf16: bool = True,
- seed: int = 0,
- wrap_with_ddp: bool = True,
- data_parallel_random_init: bool = True,
Build the MIMO model and cache its infrastructure.
- load_hf_weights(
- model: megatron.core.models.mimo.MimoModel | list[megatron.core.models.mimo.MimoModel],
- hf_path: str | pathlib.Path | None = None,
- allowed_mismatched_params: list[str] | None = None,
- *,
- infra: Optional[megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOInfra] = None,
Load HF weights into a constructed MegatronMIMO model.
- export_hf_weights(
- model: megatron.core.models.mimo.MimoModel | list[megatron.core.models.mimo.MimoModel],
- cpu: bool = True,
- show_progress: bool = True,
- conversion_tasks: dict[str, list[megatron.bridge.models.conversion.model_bridge.WeightConversionTask]] | None = None,
- merge_adapter_weights: bool = True,
- *,
- infra: Optional[megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOInfra] = None,
Export MIMO weights as a rank-0 HF tensor stream.
- get_conversion_tasks(
- megatron_model: megatron.core.models.mimo.MimoModel | list[megatron.core.models.mimo.MimoModel],
- hf_path: str | pathlib.Path | None = None,
- *,
- infra: Optional[megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOInfra] = None,
Return route-annotated conversion tasks for active MIMO components.
- save_hf_pretrained(
- model: megatron.core.models.mimo.MimoModel | list[megatron.core.models.mimo.MimoModel],
- path: str | pathlib.Path,
- show_progress: bool = True,
- source_path: Optional[Union[str, pathlib.Path]] = None,
- strict: bool = False,
- *,
- infra: Optional[megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOInfra] = None,
Save a MegatronMIMO model in HuggingFace format.
- save_megatron_model(
- model: megatron.core.models.mimo.MimoModel | list[megatron.core.models.mimo.MimoModel],
- path: str | pathlib.Path,
- hf_tokenizer_path: Optional[str | pathlib.Path] = None,
- low_memory_save: bool = False,
- hf_tokenizer_kwargs: Optional[dict] = None,
- *,
- infra: Optional[megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOInfra] = None,
- mimo_provider: Optional[megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOProvider] = None,
Save a MegatronMIMO checkpoint.
- load_megatron_model(
- path: str | pathlib.Path,
- *,
- parallelism_config: Optional[megatron.bridge.models.megatron_mimo.megatron_mimo_config.MegatronMIMOParallelismConfig] = None,
- ddp_config: Optional[megatron.core.distributed.DistributedDataParallelConfig] = None,
- fp16: bool = False,
- bf16: bool = True,
- wrap_with_ddp: bool = False,
- data_parallel_random_init: bool = False,
Load a MegatronMIMO checkpoint and cache its provider/infra.
- import_ckpt(
- megatron_path: str | pathlib.Path,
- *,
- hf_tokenizer_path: Optional[str | pathlib.Path] = None,
- hf_tokenizer_kwargs: Optional[dict] = None,
Import HF weights and write a MegatronMIMO checkpoint.
- export_ckpt(
- megatron_path: str | pathlib.Path,
- hf_path: str | pathlib.Path,
- show_progress: bool = True,
- strict: bool = False,
- source_path: Optional[Union[str, pathlib.Path]] = None,
Load a MegatronMIMO checkpoint and export it to HuggingFace.
- abstractmethod export_adapter_weights(*args, **kwargs)#
- abstractmethod save_hf_adapter(*args, **kwargs) None#
- abstractmethod export_adapter_ckpt(*args, **kwargs) None#
- _resolve_hf_pretrained(
- hf_path: str | pathlib.Path | None,
- _require_infra(
- infra: Optional[megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOInfra] = None,
- _require_provider() megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOProvider#
- static _coerce_mimo_model(
- model: megatron.core.models.mimo.MimoModel | list[megatron.core.models.mimo.MimoModel],
- _hf_identifier() str | None#
- bridge.models.megatron_mimo.conversion.orchestrator._bridged_parallel_state(
- pg_collection: Any,
Temporarily set Megatron-Core
parallel_stateglobals from a MIMO pg_collection.MIMO never initialises the MCore parallel_state globals, but the standard bridge reads them directly. This context bridges per-route groups in and restores them on exit.
- bridge.models.megatron_mimo.conversion.orchestrator.component_pg_context(
- module: torch.nn.Module,
- pg_collection: Any,
Temporarily attach
pg_collectionto a module for the duration of conversion.If the module already carries a pg_collection (the normal MIMO-provider path), it is trusted and not overwritten. Otherwise the supplied pg_collection is attached and removed on exit.
- bridge.models.megatron_mimo.conversion.orchestrator._iter_active_routes(
- routes: list[bridge.models.megatron_mimo.conversion.orchestrator.MIMOComponent],
- pg_collections: dict[str, Any],
Yield (route, pg_collection) pairs for components this rank owns.
Skips any route whose
pg_collections.get(route.name)isNone. Raises if a route name is missing frompg_collectionsentirely — that means the MIMO infra was built with a different component set than the route table declares.
- bridge.models.megatron_mimo.conversion.orchestrator.import_hf_to_megatron_mimo(
- *,
- source_bridge: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge,
- hf_pretrained: Any,
- mimo_model: torch.nn.Module,
- routes: list[bridge.models.megatron_mimo.conversion.orchestrator.MIMOComponent],
- pg_collections: dict[str, Any],
- allowed_mismatched_params: list[str] | None = None,
Import HF weights into a constructed MegatronMIMO model.
Drives
MegatronModelBridge.load_weights_hf_to_megatrononce per active route with a prefix-stripped registry and the route’s pg_collection. Returnsmimo_modelfor convenience.
- bridge.models.megatron_mimo.conversion.orchestrator.export_megatron_mimo_to_hf(
- *,
- source_bridge: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge,
- hf_pretrained: Any,
- mimo_model: torch.nn.Module,
- routes: list[bridge.models.megatron_mimo.conversion.orchestrator.MIMOComponent],
- pg_collections: dict[str, Any],
- cpu: bool = True,
- show_progress: bool = True,
- conversion_tasks: dict[str, list[Any]] | None = None,
- merge_adapter_weights: bool = True,
Export a MegatronMIMO model to HF format, yielding
(name, tensor)pairs.Drives
MegatronModelBridge.stream_weights_megatron_to_hfonce per active route. HF names are unchanged from the source bridge — only the Megatron-sidemegatron_paramis prefix-stripped, so routes produce disjoint subsets of the HF state dict.
- bridge.models.megatron_mimo.conversion.orchestrator.save_hf_pretrained_mimo(
- bridge: megatron.bridge.models.conversion.auto_bridge.AutoBridge,
- mimo_model: torch.nn.Module,
- routes: list[bridge.models.megatron_mimo.conversion.orchestrator.MIMOComponent],
- pg_collections: dict[str, Any],
- path: Union[str, pathlib.Path],
- *,
- source_path: Optional[Union[str, pathlib.Path]] = None,
- strict: bool = False,
- show_progress: bool = True,
Save a MegatronMIMO model in HuggingFace format.
- bridge.models.megatron_mimo.conversion.orchestrator._copy_hf_artifacts(
- bridge: megatron.bridge.models.conversion.auto_bridge.AutoBridge,
- output_path: pathlib.Path,
- *,
- source_path: Optional[Union[str, pathlib.Path]] = None,
- bridge.models.megatron_mimo.conversion.orchestrator._stream_mimo_weights_to_rank0(
- *,
- source_bridge: Any,
- hf_pretrained: Any,
- mimo_model: torch.nn.Module,
- routes: list[bridge.models.megatron_mimo.conversion.orchestrator.MIMOComponent],
- pg_collections: dict[str, Any],
- show_progress: bool,
Stream global HF tensors on rank 0 while all active ranks drain collectives.
- bridge.models.megatron_mimo.conversion.orchestrator._is_component_export_representative(pg_collection: Any) bool#
- bridge.models.megatron_mimo.conversion.orchestrator._process_group_rank(group: Any) int#
- bridge.models.megatron_mimo.conversion.orchestrator._is_safetensors_source(
- bridge: megatron.bridge.models.conversion.auto_bridge.AutoBridge,