bridge.training.post_training.checkpointing#

Input/output checkpointing for ModelOpt.

Module Contents#

Functions#

_get_modelopt_checkpoint_path

Get the path to use for ModelOpt operations (handles iteration directories).

has_modelopt_state

Check if ModelOpt state exists inside the checkpoint path.

load_modelopt_state

Load modelopt_state from a checkpoint.

API#

bridge.training.post_training.checkpointing._get_modelopt_checkpoint_path(checkpoint_path: str) str#

Get the path to use for ModelOpt operations (handles iteration directories).

bridge.training.post_training.checkpointing.has_modelopt_state(checkpoint_path: str) bool#

Check if ModelOpt state exists inside the checkpoint path.

Checks for modelopt_state in iteration directories (iter_*) or root directory. NOTE: Ignores distillation state which is deprecated and unused.

Parameters:

checkpoint_path – Path to the checkpoint directory

Returns:

True if modelopt_state folder exists and contains nontrivial state, else False.

bridge.training.post_training.checkpointing.load_modelopt_state(
model: list[megatron.core.transformer.module.MegatronModule],
checkpoint_path: str,
) None#

Load modelopt_state from a checkpoint.

Parameters:
  • model – The model to load the modelopt_state into

  • checkpoint_path – Path to the checkpoint directory