bridge.training.post_training.checkpointing#

Input/output checkpointing for ModelOpt.

Module Contents#

Functions#

_get_modelopt_checkpoint_path

Get the path to use for ModelOpt operations (handles iteration directories).

has_modelopt_state

Check if modelopt_state folder exists inside the checkpoint path.

load_modelopt_state

Load modelopt_state from a checkpoint.

_has_only_kd_state

API#

bridge.training.post_training.checkpointing._get_modelopt_checkpoint_path(checkpoint_path: str) str#

Get the path to use for ModelOpt operations (handles iteration directories).

bridge.training.post_training.checkpointing.has_modelopt_state(
checkpoint_path: str,
ignore_kd_state: bool = False,
) bool#

Check if modelopt_state folder exists inside the checkpoint path.

Checks for modelopt_state in iteration directories (iter_*) or root directory.

Parameters:
  • checkpoint_path – Path to the checkpoint directory

  • ignore_kd_state – If True, ignore the distillation state, as it is a placeholder

Returns:

True if modelopt_state folder exists when ignore_kd_state is False, True if modelopt_state folder exists when ignore_kd_state is True and has only distillation state, False otherwise

bridge.training.post_training.checkpointing.load_modelopt_state(
model: list[megatron.core.transformer.module.MegatronModule],
checkpoint_path: str,
) None#

Load modelopt_state from a checkpoint.

Parameters:
  • model – The model to load the modelopt_state into

  • checkpoint_path – Path to the checkpoint directory

bridge.training.post_training.checkpointing._has_only_kd_state(modelopt_state_path: str) bool#