`nemo_automodel.components.checkpoint._backports.hf_storage`#

Module Contents#

Classes#

`_HuggingFaceStorageWriter`	A writer that writes to a huggingface repository in the huggingface format. Uses Fsspec back-end to communicate with back-end storage. Fsspec registration of the storage solution is required.
`_HuggingFaceStorageReader`	A reader that reads from a huggingface repository in the huggingface format. Uses in Fsspec back-end to communicate with storage. Fsspec registration of the storage solution is required.

Functions#

`_extract_file_index`	Return the 1-based shard index encoded in a safetensors filename.
`get_fqn_to_file_index_mapping`	Get the FQN to file index mapping from the metadata.
`_get_key_renaming_mapping`

Data#

__all__

API#

nemo_automodel.components.checkpoint._backports.hf_storage.__all__#: [‘_HuggingFaceStorageWriter’, ‘_HuggingFaceStorageReader’]

class nemo_automodel.components.checkpoint._backports.hf_storage._HuggingFaceStorageWriter( path: str, fqn_to_index_mapping: Optional[dict[str, int]] = None, thread_count: int = 1, token: Optional[str] = None, save_sharded: bool = False, consolidated_output_path: Optional[str] = None, num_threads_consolidation: Optional[int] = None, )[source]#

Bases: nemo_automodel.components.checkpoint._backports._fsspec_filesystem.FsspecWriter

A writer that writes to a huggingface repository in the huggingface format. Uses Fsspec back-end to communicate with back-end storage. Fsspec registration of the storage solution is required.

Initialization

Initialize the huggingface writer pointing to path.

Parameters:

path – hf directory where the checkpoint will be read from. Needs to have .safetensors files, but can be from any fsspec supported storage, including localFS and hf://. This needs to be a remote path if you want to enable consolidation after saving.
fqn_to_index_mapping – A mapping from tensor FQN to the index of the file that the tensor should be written to. Indices are from 1 to N, where N is the number of files. If not provided, the tensors will be written to a single file. If none, then all the tensors on the same rank will be written to the same file.
token – The token to use to authenticate with huggingface hub.
save_sharded – If True, save the checkpoint as a sharded checkpoint where every rank saves its own shard. Default is False which assumes full tensors are being saved.
consolidated_output_path – If provided, the output path where the consolidated files will be written in the finish step. This needs to be a local fs path right now.
num_threads_consolidation – Number of threads to use for parallel processing of saving data to output files. If not provided, the default value is the number of output files.

prepare_global_plan( plans: list[torch.distributed.checkpoint.planner.SavePlan], ) → list[torch.distributed.checkpoint.planner.SavePlan][source]#

write_data( plan: torch.distributed.checkpoint.planner.SavePlan, planner: torch.distributed.checkpoint.planner.SavePlanner, ) → torch.futures.Future[list[torch.distributed.checkpoint.storage.WriteResult]][source]#

finish( metadata: torch.distributed.checkpoint.metadata.Metadata, results: list[list[torch.distributed.checkpoint.storage.WriteResult]], ) → None[source]#

_split_by_storage_plan( storage_plan: Optional[dict[str, int]], items: list[torch.distributed.checkpoint.planner.WriteItem], ) → dict[int, list[torch.distributed.checkpoint.planner.WriteItem]][source]#

property metadata_path: str#

class nemo_automodel.components.checkpoint._backports.hf_storage._HuggingFaceStorageReader( path: str, token: Optional[str] = None, key_mapping: Optional[dict[str, str]] = None, )[source]#

Bases: nemo_automodel.components.checkpoint._backports._fsspec_filesystem.FsspecReader

A reader that reads from a huggingface repository in the huggingface format. Uses in Fsspec back-end to communicate with storage. Fsspec registration of the storage solution is required.

Initialization

Initialize the huggingface reader pointing to path.

Parameters:

path – hf directory where the checkpoint will be read from.
file (Needs to have .safetensors)
storage (but can be from any fsspec supported)
hf (including localFS and) – //.
token – The token to use to authenticate with huggingface hub.
key_mapping – VLMs in HuggingFace can have their FQNs remapped at load time. This means that the state dict keys are not the same as the loaded model’s FQNs. This mapping is used to map the state dict keys to the loaded model’s FQNs.

Param :

read_data( plan: torch.distributed.checkpoint.planner.LoadPlan, planner: torch.distributed.checkpoint.planner.LoadPlanner, ) → torch.futures.Future[None][source]#

read_metadata() → torch.distributed.checkpoint.metadata.Metadata[source]#

nemo_automodel.components.checkpoint._backports.hf_storage._extract_file_index(filename: str) → int[source]#

Return the 1-based shard index encoded in a safetensors filename.

Supported patterns::

model-00001-of-00008.safetensors
shard-00000-model-00002-of-00008.safetensors
model.safetensors  (single-file checkpoints)

Parameters:: filename – The (relative) safetensors filename.
Returns:: The numeric shard index, defaulting to 1 when no explicit index is present or when the filename cannot be parsed.

nemo_automodel.components.checkpoint._backports.hf_storage.get_fqn_to_file_index_mapping( reference_model_path: str, key_mapping: Optional[dict[str, str]] = None, ) → dict[str, int][source]#

Get the FQN to file index mapping from the metadata.

Parameters:: reference_model_path – Path to reference model to copy file structure from.
Returns:: A mapping from tensor FQN to the index of the file that the tensor should be written to. Indices are from 1 to N, where N is the number of files.

nemo_automodel.components.checkpoint._backports.hf_storage._get_key_renaming_mapping( key: str, key_mapping: Optional[dict[str, str]] = None, ) → str[source]#

nemo_automodel.components.checkpoint._backports.hf_storage#