nemo_automodel.checkpoint._backports.hf_storage#

Module Contents#

Classes#

_HuggingFaceStorageWriter

A writer that writes to a huggingface repository in the huggingface format. Uses Fsspec back-end to communicate with back-end storage. Fsspec registration of the storage solution is required.

_HuggingFaceStorageReader

A reader that reads from a huggingface repository in the huggingface format. Uses in Fsspec back-end to communicate with storage. Fsspec registration of the storage solution is required.

Functions#

_extract_file_index

Return the 1-based shard index encoded in a safetensors filename.

get_fqn_to_file_index_mapping

Get the FQN to file index mapping from the metadata.

Data#

API#

nemo_automodel.checkpoint._backports.hf_storage.__all__#

[‘_HuggingFaceStorageWriter’, ‘_HuggingFaceStorageReader’]

class nemo_automodel.checkpoint._backports.hf_storage._HuggingFaceStorageWriter(
path: str,
fqn_to_index_mapping: Optional[dict[str, int]] = None,
thread_count: int = 1,
token: Optional[str] = None,
save_sharded: bool = False,
consolidated_output_path: Optional[str] = None,
num_threads_consolidation: Optional[int] = None,
)[source]#

Bases: nemo_automodel.checkpoint._backports._fsspec_filesystem.FsspecWriter

A writer that writes to a huggingface repository in the huggingface format. Uses Fsspec back-end to communicate with back-end storage. Fsspec registration of the storage solution is required.

Initialization

Initialize the huggingface writer pointing to path.

Parameters:
  • path – hf directory where the checkpoint will be read from. Needs to have .safetensors files, but can be from any fsspec supported storage, including localFS and hf://. This needs to be a remote path if you want to enable consolidation after saving.

  • fqn_to_index_mapping – A mapping from tensor FQN to the index of the file that the tensor should be written to. Indices are from 1 to N, where N is the number of files. If not provided, the tensors will be written to a single file. If none, then all the tensors on the same rank will be written to the same file.

  • token – The token to use to authenticate with huggingface hub.

  • save_sharded – If True, save the checkpoint as a sharded checkpoint where every rank saves its own shard. Default is False which assumes full tensors are being saved.

  • consolidated_output_path – If provided, the output path where the consolidated files will be written in the finish step. This needs to be a local fs path right now.

  • num_threads_consolidation – Number of threads to use for parallel processing of saving data to output files. If not provided, the default value is the number of output files.

prepare_global_plan(
plans: list[torch.distributed.checkpoint.planner.SavePlan],
) list[torch.distributed.checkpoint.planner.SavePlan][source]#
write_data(
plan: torch.distributed.checkpoint.planner.SavePlan,
planner: torch.distributed.checkpoint.planner.SavePlanner,
) torch.futures.Future[list[torch.distributed.checkpoint.storage.WriteResult]][source]#
finish(
metadata: torch.distributed.checkpoint.metadata.Metadata,
results: list[list[torch.distributed.checkpoint.storage.WriteResult]],
) None[source]#
_split_by_storage_plan(
storage_plan: Optional[dict[str, int]],
items: list[torch.distributed.checkpoint.planner.WriteItem],
) dict[int, list[torch.distributed.checkpoint.planner.WriteItem]][source]#
property metadata_path: str#
class nemo_automodel.checkpoint._backports.hf_storage._HuggingFaceStorageReader(
path: str,
token: Optional[str] = None,
)[source]#

Bases: nemo_automodel.checkpoint._backports._fsspec_filesystem.FsspecReader

A reader that reads from a huggingface repository in the huggingface format. Uses in Fsspec back-end to communicate with storage. Fsspec registration of the storage solution is required.

Initialization

Initialize the huggingface reader pointing to path.

Parameters:
  • path – hf directory where the checkpoint will be read from.

  • file (Needs to have .safetensors)

  • storage (but can be from any fsspec supported)

  • hf (including localFS and) – //.

  • token – The token to use to authenticate with huggingface hub.

Param :

read_data(
plan: torch.distributed.checkpoint.planner.LoadPlan,
planner: torch.distributed.checkpoint.planner.LoadPlanner,
) torch.futures.Future[None][source]#
read_metadata() torch.distributed.checkpoint.metadata.Metadata[source]#
nemo_automodel.checkpoint._backports.hf_storage._extract_file_index(filename: str) int[source]#

Return the 1-based shard index encoded in a safetensors filename.

Supported patterns::

model-00001-of-00008.safetensors
shard-00000-model-00002-of-00008.safetensors
model.safetensors  (single-file checkpoints)
Parameters:

filename – The (relative) safetensors filename.

Returns:

The numeric shard index, defaulting to 1 when no explicit index is present or when the filename cannot be parsed.

nemo_automodel.checkpoint._backports.hf_storage.get_fqn_to_file_index_mapping(
reference_model_path: str,
) dict[str, int][source]#

Get the FQN to file index mapping from the metadata.

Parameters:

reference_model_path – Path to reference model to copy file structure from.

Returns:

A mapping from tensor FQN to the index of the file that the tensor should be written to. Indices are from 1 to N, where N is the number of files.