nemo_automodel.checkpoint._backports.default_planner
#
Module Contents#
Classes#
DefaultLoadPlanner that adds multiple features on top of LoadPlanner. |
|
Extension of DefaultLoadPlanner, which rebuilds state_dict from the saved metadata. Useful for loading in state_dict without first initializing a model, such as when converting a DCP checkpoint into a Torch save file. |
Functions#
Create global load plan used by DefaultLoadPlanner. |
|
Create the |
|
Create the global plan and metadata used by DefaultSavePlanner. |
|
Return the |
|
Check if two boxes overlap. Tuples are (offset, lengths). |
|
Data#
API#
- nemo_automodel.checkpoint._backports.default_planner.logger: logging.Logger#
‘getLogger(…)’
- nemo_automodel.checkpoint._backports.default_planner.__all__#
[‘DefaultSavePlanner’, ‘DefaultLoadPlanner’, ‘create_default_local_load_plan’, ‘create_default_globa…
- class nemo_automodel.checkpoint._backports.default_planner.DefaultSavePlanner(
- flatten_state_dict: bool = True,
- flatten_sharded_tensors: bool = True,
- dedup_replicated_tensors: Optional[bool] = None,
- dedup_save_to_lowest_rank: bool = False,
- enable_plan_caching: bool = False,
Bases:
torch.distributed.checkpoint.planner.SavePlanner
- mappings: torch.distributed.checkpoint._nested_dict.FLATTEN_MAPPING#
None
- set_up_planner(
- state_dict: torch.distributed.checkpoint.metadata.STATE_DICT_TYPE,
- storage_meta: Optional[torch.distributed.checkpoint.metadata.StorageMeta] = None,
- is_coordinator: bool = False,
- _dedup_save_plans(
- all_plans: list[torch.distributed.checkpoint.planner.SavePlan],
- _create_global_plan(
- all_plans: list[torch.distributed.checkpoint.planner.SavePlan],
- _create_global_plan_with_caching(
- all_plans: list[torch.distributed.checkpoint.planner.SavePlan],
Create global plan with caching. Returns a tuple of global_plan_delta, global_plan, metadata.
- create_global_plan(
- all_plans: list[torch.distributed.checkpoint.planner.SavePlan],
- _finish_plan_with_caching(
- new_plan: torch.distributed.checkpoint.planner.SavePlan,
- finish_plan(
- new_plan: torch.distributed.checkpoint.planner.SavePlan,
- resolve_data(
- write_item: torch.distributed.checkpoint.planner.WriteItem,
- class nemo_automodel.checkpoint._backports.default_planner.DefaultLoadPlanner(
- flatten_state_dict: bool = True,
- flatten_sharded_tensors: bool = True,
- allow_partial_load: bool = False,
Bases:
torch.distributed.checkpoint.planner.LoadPlanner
DefaultLoadPlanner that adds multiple features on top of LoadPlanner.
In particular it adds the following:
flatten_state_dict: Handle state_dict with nested dicts flatten_sharded_tensors: For FSDP in 2D parallel mode allow_partial_load: If False, will raise a runtime error if a key is present in state_dict, but not in the checkpoint.
Initialization
- original_state_dict: torch.distributed.checkpoint.metadata.STATE_DICT_TYPE#
None
- mappings: torch.distributed.checkpoint._nested_dict.FLATTEN_MAPPING#
None
- set_up_planner(
- state_dict: torch.distributed.checkpoint.metadata.STATE_DICT_TYPE,
- metadata: Optional[torch.distributed.checkpoint.metadata.Metadata] = None,
- is_coordinator: bool = False,
- create_global_plan(
- global_plan: list[torch.distributed.checkpoint.planner.LoadPlan],
- finish_plan(
- new_plan: torch.distributed.checkpoint.planner.LoadPlan,
- load_bytes(
- read_item: torch.distributed.checkpoint.planner.ReadItem,
- value: io.BytesIO,
- commit_tensor(
- read_item: torch.distributed.checkpoint.planner.ReadItem,
- tensor: torch.Tensor,
- class nemo_automodel.checkpoint._backports.default_planner._EmptyStateDictLoadPlanner(keys=None, *args, **kwargs)[source]#
Bases:
nemo_automodel.checkpoint._backports.default_planner.DefaultLoadPlanner
Extension of DefaultLoadPlanner, which rebuilds state_dict from the saved metadata. Useful for loading in state_dict without first initializing a model, such as when converting a DCP checkpoint into a Torch save file.
. N.B.
state_dict
must be an empty dictionary when used with this LoadPlanner.. warning:: Because the entire state dict is initialized, It’s recommended to only utilize this LoadPlanner on a single rank or process to avoid OOM.
Initialization
- nemo_automodel.checkpoint._backports.default_planner.create_default_local_load_plan(
- state_dict: dict[str, Any],
- metadata: torch.distributed.checkpoint.metadata.Metadata,
- strict: bool = True,
- nemo_automodel.checkpoint._backports.default_planner.create_default_global_load_plan(
- all_plans: list[torch.distributed.checkpoint.planner.LoadPlan],
Create global load plan used by DefaultLoadPlanner.
The default load behavior involved no global coordination and this function currently doesn’t change the local plans.
- nemo_automodel.checkpoint._backports.default_planner.create_default_local_save_plan(
- state_dict: dict[str, Any],
- is_coordinator: bool,
Create the
SavePlan
used by DefaultSavePlanner.On non-coordinator ranks, this function ignores tensors and non-tensor objects, only producing writes for ShardedTensor objects.
On the coordinator rank, produce writes for all values.
- nemo_automodel.checkpoint._backports.default_planner.create_default_global_save_plan(
- all_plans: list[torch.distributed.checkpoint.planner.SavePlan],
- rewrite_index_hints: bool = True,
Create the global plan and metadata used by DefaultSavePlanner.
Metadata is produced by concatenating the metadata of all
WriteItem
from the supplied plans.The only global planning change is to update index hints in all
MetadataIndex
objects ifrewrite_index_hints
is True.
- nemo_automodel.checkpoint._backports.default_planner._create_default_local_metadata(
- state_dict: torch.distributed.checkpoint.metadata.STATE_DICT_TYPE,
Return the
Metadata
if DefaultSavePlanner was used to checkpointstate_dict
.
- nemo_automodel.checkpoint._backports.default_planner._check_box_overlap(
- box0: torch.distributed.checkpoint.metadata.ChunkStorageMetadata,
- box1: torch.distributed.checkpoint.metadata.ChunkStorageMetadata,
Check if two boxes overlap. Tuples are (offset, lengths).