bridge.models.megatron_mimo.conversion.mimo_model_io#
MegatronMIMO model save/load helpers.
Module Contents#
Functions#
Save a MegatronMIMO model in Megatron distributed-checkpoint format. |
|
Load a MegatronMIMO model from a Megatron distributed-checkpoint. |
|
Capture the provider’s runtime-derived spec fields for restoration. |
|
Reset derived spec fields so yaml serialisation captures only inputs. |
|
Restore derived spec fields after a save, leaving the provider usable. |
|
Resolve |
Data#
API#
- bridge.models.megatron_mimo.conversion.mimo_model_io.logger#
‘getLogger(…)’
- bridge.models.megatron_mimo.conversion.mimo_model_io.save_megatron_mimo_model(
- model: megatron.core.models.mimo.MimoModel,
- infra: megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOInfra,
- provider: megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOProvider,
- path: Union[str, pathlib.Path],
- *,
- hf_tokenizer_path: Optional[Union[str, pathlib.Path]] = None,
- hf_tokenizer_kwargs: Optional[dict] = None,
- ckpt_format: str = 'torch_dist',
Save a MegatronMIMO model in Megatron distributed-checkpoint format.
- Parameters:
model – Constructed
MimoModel.infra –
MegatronMIMOInfrafrom model construction.provider – Provider used to reconstruct the model on load.
path – Directory to save the dist-checkpoint into.
hf_tokenizer_path – Optional HF model ID or path for tokenizer assets.
hf_tokenizer_kwargs – Optional kwargs for
AutoTokenizer.from_pretrained.ckpt_format – Checkpoint format. Default
"torch_dist".
- bridge.models.megatron_mimo.conversion.mimo_model_io.load_megatron_mimo_model(
- path: Union[str, pathlib.Path],
- *,
- parallelism_config: Optional[megatron.bridge.models.megatron_mimo.megatron_mimo_config.MegatronMIMOParallelismConfig] = None,
- ddp_config: Optional[megatron.core.distributed.DistributedDataParallelConfig] = None,
- fp16: bool = False,
- bf16: bool = True,
- wrap_with_ddp: bool = False,
- data_parallel_random_init: bool = False,
Load a MegatronMIMO model from a Megatron distributed-checkpoint.
- Parameters:
path – Checkpoint parent directory or an
iter_*directory.parallelism_config – Optional per-component parallelism override.
ddp_config – DDP config forwarded to
build_megatron_mimo_model.bf16 (fp16 /) – Precision flags forwarded to model construction.
wrap_with_ddp – Whether to DDP-wrap.
data_parallel_random_init – Forwarded to
build_megatron_mimo_model.
- Returns:
(mimo_model, infra, provider).
- bridge.models.megatron_mimo.conversion.mimo_model_io._snapshot_derived_spec_fields(
- provider: megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOProvider,
Capture the provider’s runtime-derived spec fields for restoration.
- bridge.models.megatron_mimo.conversion.mimo_model_io._clear_derived_spec_fields(
- provider: megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOProvider,
Reset derived spec fields so yaml serialisation captures only inputs.
- bridge.models.megatron_mimo.conversion.mimo_model_io._restore_derived_spec_fields(
- provider: megatron.bridge.models.megatron_mimo.megatron_mimo_provider.MegatronMIMOProvider,
- saved: dict,
Restore derived spec fields after a save, leaving the provider usable.
- bridge.models.megatron_mimo.conversion.mimo_model_io._resolve_iter_folder(path: pathlib.Path) pathlib.Path#
Resolve
pathto aniter_*folder, or pick the latest under it.