bridge.models.megatron_mimo.conversion.mimo_model_io#

MegatronMIMO model save/load helpers.

Module Contents#

Functions#

save_megatron_mimo_model

Save a MegatronMIMO model in Megatron distributed-checkpoint format.

load_megatron_mimo_model

Load a MegatronMIMO model from a Megatron distributed-checkpoint.

_snapshot_derived_spec_fields

Capture the provider’s runtime-derived spec fields for restoration.

_clear_derived_spec_fields

Reset derived spec fields so yaml serialisation captures only inputs.

_restore_derived_spec_fields

Restore derived spec fields after a save, leaving the provider usable.

_resolve_iter_folder

Resolve path to an iter_* folder, or pick the latest under it.

Data#

API#

bridge.models.megatron_mimo.conversion.mimo_model_io.logger#

‘getLogger(…)’

bridge.models.megatron_mimo.conversion.mimo_model_io.save_megatron_mimo_model(
model: megatron.core.models.mimo.MimoModel,
infra: megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOInfra,
provider: megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOProvider,
path: Union[str, pathlib.Path],
*,
hf_tokenizer_path: Optional[Union[str, pathlib.Path]] = None,
hf_tokenizer_kwargs: Optional[dict] = None,
ckpt_format: str = 'torch_dist',
) None#

Save a MegatronMIMO model in Megatron distributed-checkpoint format.

Parameters:
  • model – Constructed MimoModel.

  • infra – MegatronMIMOInfra from model construction.

  • provider – Provider used to reconstruct the model on load.

  • path – Directory to save the dist-checkpoint into.

  • hf_tokenizer_path – Optional HF model ID or path for tokenizer assets.

  • hf_tokenizer_kwargs – Optional kwargs for AutoTokenizer.from_pretrained.

  • ckpt_format – Checkpoint format. Default "torch_dist".

bridge.models.megatron_mimo.conversion.mimo_model_io.load_megatron_mimo_model(
path: Union[str, pathlib.Path],
*,
parallelism_config: Optional[megatron.bridge.models.megatron_mimo.megatron_mimo_config.MegatronMIMOParallelismConfig] = None,
ddp_config: Optional[megatron.core.distributed.DistributedDataParallelConfig] = None,
fp16: bool = False,
bf16: bool = True,
wrap_with_ddp: bool = False,
data_parallel_random_init: bool = False,
) tuple[megatron.core.models.mimo.MimoModel, megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOInfra, megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOProvider]#

Load a MegatronMIMO model from a Megatron distributed-checkpoint.

Parameters:
  • path – Checkpoint parent directory or an iter_* directory.

  • parallelism_config – Optional per-component parallelism override.

  • ddp_config – DDP config forwarded to build_megatron_mimo_model.

  • bf16 (fp16 /) – Precision flags forwarded to model construction.

  • wrap_with_ddp – Whether to DDP-wrap.

  • data_parallel_random_init – Forwarded to build_megatron_mimo_model.

Returns:

(mimo_model, infra, provider).

bridge.models.megatron_mimo.conversion.mimo_model_io._snapshot_derived_spec_fields(
provider: megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOProvider,
) dict#

Capture the provider’s runtime-derived spec fields for restoration.

bridge.models.megatron_mimo.conversion.mimo_model_io._clear_derived_spec_fields(
provider: megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOProvider,
) None#

Reset derived spec fields so yaml serialisation captures only inputs.

bridge.models.megatron_mimo.conversion.mimo_model_io._restore_derived_spec_fields(
provider: megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOProvider,
saved: dict,
) None#

Restore derived spec fields after a save, leaving the provider usable.

bridge.models.megatron_mimo.conversion.mimo_model_io._resolve_iter_folder(path: pathlib.Path) pathlib.Path#

Resolve path to an iter_* folder, or pick the latest under it.