nemo_automodel.components.checkpoint._backports.default_planner#
Module Contents#
Classes#
DefaultLoadPlanner that adds multiple features on top of LoadPlanner. |
|
Extension of DefaultLoadPlanner, which rebuilds state_dict from the saved metadata. Useful for loading in state_dict without first initializing a model, such as when converting a DCP checkpoint into a Torch save file. |
Functions#
Create global load plan used by DefaultLoadPlanner. |
|
Create the |
|
Create the global plan and metadata used by DefaultSavePlanner. |
|
Return the |
|
Check if two boxes overlap. Tuples are (offset, lengths). |
|
Data#
API#
- nemo_automodel.components.checkpoint._backports.default_planner.logger: logging.Logger#
‘getLogger(…)’
- nemo_automodel.components.checkpoint._backports.default_planner.__all__#
[‘DefaultSavePlanner’, ‘DefaultLoadPlanner’, ‘create_default_local_load_plan’, ‘create_default_globa…
- class nemo_automodel.components.checkpoint._backports.default_planner.DefaultSavePlanner(
- flatten_state_dict: bool = True,
- flatten_sharded_tensors: bool = True,
- dedup_replicated_tensors: Optional[bool] = None,
- dedup_save_to_lowest_rank: bool = False,
- enable_plan_caching: bool = False,
Bases:
torch.distributed.checkpoint.planner.SavePlannerInitialization
- mappings: torch.distributed.checkpoint._nested_dict.FLATTEN_MAPPING#
None
- set_up_planner(
- state_dict: torch.distributed.checkpoint.metadata.STATE_DICT_TYPE,
- storage_meta: Optional[torch.distributed.checkpoint.metadata.StorageMeta] = None,
- is_coordinator: bool = False,
- create_local_plan() torch.distributed.checkpoint.planner.SavePlan#
- _dedup_save_plans(
- all_plans: list[torch.distributed.checkpoint.planner.SavePlan],
- _create_global_plan(
- all_plans: list[torch.distributed.checkpoint.planner.SavePlan],
- _create_global_plan_with_caching(
- all_plans: list[torch.distributed.checkpoint.planner.SavePlan],
Create global plan with caching. Returns a tuple of global_plan_delta, global_plan, metadata.
- create_global_plan(
- all_plans: list[torch.distributed.checkpoint.planner.SavePlan],
- _finish_plan_with_caching(
- new_plan: torch.distributed.checkpoint.planner.SavePlan,
- finish_plan(
- new_plan: torch.distributed.checkpoint.planner.SavePlan,
- resolve_data(
- write_item: torch.distributed.checkpoint.planner.WriteItem,
- lookup_object(
- index: torch.distributed.checkpoint.metadata.MetadataIndex,
Extension from the planner interface to make it easy to extend the default planner.
- transform_object(
- write_item: torch.distributed.checkpoint.planner.WriteItem,
- object: Any,
Extension from the planner interface to make it easy to extend the default planner.
- class nemo_automodel.components.checkpoint._backports.default_planner.DefaultLoadPlanner(
- flatten_state_dict: bool = True,
- flatten_sharded_tensors: bool = True,
- allow_partial_load: bool = False,
Bases:
torch.distributed.checkpoint.planner.LoadPlannerDefaultLoadPlanner that adds multiple features on top of LoadPlanner.
In particular it adds the following:
flatten_state_dict: Handle state_dict with nested dicts flatten_sharded_tensors: For FSDP in 2D parallel mode allow_partial_load: If False, will raise a runtime error if a key is present in state_dict, but not in the checkpoint.
Initialization
- original_state_dict: torch.distributed.checkpoint.metadata.STATE_DICT_TYPE#
None
- mappings: torch.distributed.checkpoint._nested_dict.FLATTEN_MAPPING#
None
- set_up_planner(
- state_dict: torch.distributed.checkpoint.metadata.STATE_DICT_TYPE,
- metadata: Optional[torch.distributed.checkpoint.metadata.Metadata] = None,
- is_coordinator: bool = False,
- create_local_plan() torch.distributed.checkpoint.planner.LoadPlan#
- create_global_plan(
- global_plan: list[torch.distributed.checkpoint.planner.LoadPlan],
- finish_plan(
- new_plan: torch.distributed.checkpoint.planner.LoadPlan,
- load_bytes(
- read_item: torch.distributed.checkpoint.planner.ReadItem,
- value: io.BytesIO,
- resolve_tensor(
- read_item: torch.distributed.checkpoint.planner.ReadItem,
- commit_tensor(
- read_item: torch.distributed.checkpoint.planner.ReadItem,
- tensor: torch.Tensor,
- lookup_tensor(
- index: torch.distributed.checkpoint.metadata.MetadataIndex,
Extension from the planner interface to make it easy to extend the default planner.
- transform_tensor(
- read_item: torch.distributed.checkpoint.planner.ReadItem,
- tensor: torch.Tensor,
Extension from the planner interface to make it easy to extend the default planner.
- class nemo_automodel.components.checkpoint._backports.default_planner._EmptyStateDictLoadPlanner(keys=None, *args, **kwargs)#
Bases:
nemo_automodel.components.checkpoint._backports.default_planner.DefaultLoadPlannerExtension of DefaultLoadPlanner, which rebuilds state_dict from the saved metadata. Useful for loading in state_dict without first initializing a model, such as when converting a DCP checkpoint into a Torch save file.
. N.B.
state_dictmust be an empty dictionary when used with this LoadPlanner.. warning:: Because the entire state dict is initialized, It’s recommended to only utilize this LoadPlanner on a single rank or process to avoid OOM.
Initialization
- _should_include_key(
- key: str,
- metadata: torch.distributed.checkpoint.metadata.Metadata,
- set_up_planner(
- state_dict: torch.distributed.checkpoint.metadata.STATE_DICT_TYPE,
- metadata: Optional[torch.distributed.checkpoint.metadata.Metadata] = None,
- is_coordinator: bool = False,
- nemo_automodel.components.checkpoint._backports.default_planner.create_default_local_load_plan(
- state_dict: dict[str, Any],
- metadata: torch.distributed.checkpoint.metadata.Metadata,
- strict: bool = True,
- nemo_automodel.components.checkpoint._backports.default_planner.create_default_global_load_plan(
- all_plans: list[torch.distributed.checkpoint.planner.LoadPlan],
Create global load plan used by DefaultLoadPlanner.
The default load behavior involved no global coordination and this function currently doesn’t change the local plans.
- nemo_automodel.components.checkpoint._backports.default_planner.create_default_local_save_plan(
- state_dict: dict[str, Any],
- is_coordinator: bool,
Create the
SavePlanused by DefaultSavePlanner.On non-coordinator ranks, this function ignores tensors and non-tensor objects, only producing writes for ShardedTensor objects.
On the coordinator rank, produce writes for all values.
- nemo_automodel.components.checkpoint._backports.default_planner.create_default_global_save_plan(
- all_plans: list[torch.distributed.checkpoint.planner.SavePlan],
- rewrite_index_hints: bool = True,
Create the global plan and metadata used by DefaultSavePlanner.
Metadata is produced by concatenating the metadata of all
WriteItemfrom the supplied plans.The only global planning change is to update index hints in all
MetadataIndexobjects ifrewrite_index_hintsis True.
- nemo_automodel.components.checkpoint._backports.default_planner._create_default_local_metadata(
- state_dict: torch.distributed.checkpoint.metadata.STATE_DICT_TYPE,
Return the
Metadataif DefaultSavePlanner was used to checkpointstate_dict.
- nemo_automodel.components.checkpoint._backports.default_planner._check_box_overlap(
- box0: torch.distributed.checkpoint.metadata.ChunkStorageMetadata,
- box1: torch.distributed.checkpoint.metadata.ChunkStorageMetadata,
Check if two boxes overlap. Tuples are (offset, lengths).
- nemo_automodel.components.checkpoint._backports.default_planner._check_box_bounds(
- outer_box_size: torch.Size,
- inner_box: torch.distributed.checkpoint.metadata.ChunkStorageMetadata,
- nemo_automodel.components.checkpoint._backports.default_planner._validate_global_plan(
- global_plan: list[torch.distributed.checkpoint.planner.SavePlan],
- metadata: torch.distributed.checkpoint.metadata.Metadata,