`nemo_automodel.checkpoint._backports.hf_storage`#

Module Contents#

Classes#

`_HuggingFaceStorageWriter`	A writer that writes to a huggingface repository in the huggingface format. Uses Fsspec back-end to communicate with back-end storage. Fsspec registration of the storage solution is required.
`_HuggingFaceStorageReader`	A reader that reads from a huggingface repository in the huggingface format. Uses in Fsspec back-end to communicate with storage. Fsspec registration of the storage solution is required.

Functions#

`_extract_file_index`	Return the 1-based shard index encoded in a safetensors filename.
`get_fqn_to_file_index_mapping`	Get the FQN to file index mapping from the metadata.

Data#

__all__

API#

nemo_automodel.checkpoint._backports.hf_storage.__all__#: [‘_HuggingFaceStorageWriter’, ‘_HuggingFaceStorageReader’]

class nemo_automodel.checkpoint._backports.hf_storage._HuggingFaceStorageWriter( path: str, fqn_to_index_mapping: Optional[dict[str, int]] = None, thread_count: int = 1, token: Optional[str] = None, save_sharded: bool = False, consolidated_output_path: Optional[str] = None, num_threads_consolidation: Optional[int] = None, )[source]#

Bases: nemo_automodel.checkpoint._backports._fsspec_filesystem.FsspecWriter

A writer that writes to a huggingface repository in the huggingface format. Uses Fsspec back-end to communicate with back-end storage. Fsspec registration of the storage solution is required.

Initialization

Initialize the huggingface writer pointing to path.

Parameters:

path – hf directory where the checkpoint will be read from. Needs to have .safetensors files, but can be from any fsspec supported storage, including localFS and hf://. This needs to be a remote path if you want to enable consolidation after saving.
fqn_to_index_mapping – A mapping from tensor FQN to the index of the file that the tensor should be written to. Indices are from 1 to N, where N is the number of files. If not provided, the tensors will be written to a single file. If none, then all the tensors on the same rank will be written to the same file.
token – The token to use to authenticate with huggingface hub.
save_sharded – If True, save the checkpoint as a sharded checkpoint where every rank saves its own shard. Default is False which assumes full tensors are being saved.
consolidated_output_path – If provided, the output path where the consolidated files will be written in the finish step. This needs to be a local fs path right now.
num_threads_consolidation – Number of threads to use for parallel processing of saving data to output files. If not provided, the default value is the number of output files.

prepare_global_plan( plans: list[torch.distributed.checkpoint.planner.SavePlan], ) → list[torch.distributed.checkpoint.planner.SavePlan][source]#

write_data( plan: torch.distributed.checkpoint.planner.SavePlan, planner: torch.distributed.checkpoint.planner.SavePlanner, ) → torch.futures.Future[list[torch.distributed.checkpoint.storage.WriteResult]][source]#

finish( metadata: torch.distributed.checkpoint.metadata.Metadata, results: list[list[torch.distributed.checkpoint.storage.WriteResult]], ) → None[source]#

_split_by_storage_plan( storage_plan: Optional[dict[str, int]], items: list[torch.distributed.checkpoint.planner.WriteItem], ) → dict[int, list[torch.distributed.checkpoint.planner.WriteItem]][source]#

property metadata_path: str#

class nemo_automodel.checkpoint._backports.hf_storage._HuggingFaceStorageReader( path: str, token: Optional[str] = None, )[source]#

Bases: nemo_automodel.checkpoint._backports._fsspec_filesystem.FsspecReader

A reader that reads from a huggingface repository in the huggingface format. Uses in Fsspec back-end to communicate with storage. Fsspec registration of the storage solution is required.

Initialization

Initialize the huggingface reader pointing to path.

Parameters:

path – hf directory where the checkpoint will be read from.
file (Needs to have .safetensors)
storage (but can be from any fsspec supported)
hf (including localFS and) – //.
token – The token to use to authenticate with huggingface hub.

Param :

read_data( plan: torch.distributed.checkpoint.planner.LoadPlan, planner: torch.distributed.checkpoint.planner.LoadPlanner, ) → torch.futures.Future[None][source]#

read_metadata() → torch.distributed.checkpoint.metadata.Metadata[source]#

nemo_automodel.checkpoint._backports.hf_storage._extract_file_index(filename: str) → int[source]#

Return the 1-based shard index encoded in a safetensors filename.

Supported patterns::

model-00001-of-00008.safetensors
shard-00000-model-00002-of-00008.safetensors
model.safetensors  (single-file checkpoints)

Parameters:: filename – The (relative) safetensors filename.
Returns:: The numeric shard index, defaulting to 1 when no explicit index is present or when the filename cannot be parsed.

nemo_automodel.checkpoint._backports.hf_storage.get_fqn_to_file_index_mapping( reference_model_path: str, ) → dict[str, int][source]#

Get the FQN to file index mapping from the metadata.

Parameters:: reference_model_path – Path to reference model to copy file structure from.
Returns:: A mapping from tensor FQN to the index of the file that the tensor should be written to. Indices are from 1 to N, where N is the number of files.

nemo_automodel.checkpoint._backports.hf_storage#