core.dist_checkpointing.core#
Module for managing distributed checkpoints metadata.
Module Contents#
Classes#
Documents backends used in the checkpoint. |
Functions#
Checks if |
|
Returns checkpoint config if |
|
Save given config to checkpoint directory. |
Data#
API#
- core.dist_checkpointing.core.CONFIG_FNAME#
‘metadata.json’
- exception core.dist_checkpointing.core.CheckpointingException#
Bases:
ExceptionBase checkpointing related exception
Initialization
Initialize self. See help(type(self)) for accurate signature.
- class core.dist_checkpointing.core.CheckpointingConfig#
Documents backends used in the checkpoint.
Checkpoint config keeps track of formats used for storing the sharded tensors (sharded_backend) and other objects (common_backend).
Note that versioning is not for the checkpoint content (which is application specific), but for the checkpoint format itself.
- sharded_backend: str#
None
- sharded_backend_version: int#
1
- common_backend: str#
‘torch’
- common_backend_version: int#
1
- core.dist_checkpointing.core.check_is_distributed_checkpoint(checkpoint_dir)#
Checks if
metadata.jsonexists in the checkpoint and is a valid config.- Parameters:
checkpoint_dir – checkpoint directory
- Returns:
True if
metadata.jsonexists in the checkpoint and is a valid config.- Return type:
bool
- core.dist_checkpointing.core.maybe_load_config(
- checkpoint_dir: str,
Returns checkpoint config if
checkpoint_diris a distributed checkpoint and None otherwise- Parameters:
checkpoint_dir – checkpoint directory
- Returns:
None if checkpoint is not a valid distributed checkpoint
- Return type:
CheckpointingConfig (optional)
- core.dist_checkpointing.core.save_config(
- config: core.dist_checkpointing.core.CheckpointingConfig,
- checkpoint_dir: str,
Save given config to checkpoint directory.
- Parameters:
config – checkpoint config
checkpoint_dir – checkpoint directory
- Returns:
None