nemo_automodel.checkpoint._backports.hf_storage
#
Module Contents#
Classes#
A writer that writes to a huggingface repository in the huggingface format. Uses Fsspec back-end to communicate with back-end storage. Fsspec registration of the storage solution is required. |
|
A reader that reads from a huggingface repository in the huggingface format. Uses in Fsspec back-end to communicate with storage. Fsspec registration of the storage solution is required. |
Functions#
Return the 1-based shard index encoded in a safetensors filename. |
|
Get the FQN to file index mapping from the metadata. |
Data#
API#
- nemo_automodel.checkpoint._backports.hf_storage.__all__#
[‘_HuggingFaceStorageWriter’, ‘_HuggingFaceStorageReader’]
- class nemo_automodel.checkpoint._backports.hf_storage._HuggingFaceStorageWriter(
- path: str,
- fqn_to_index_mapping: Optional[dict[str, int]] = None,
- thread_count: int = 1,
- token: Optional[str] = None,
- save_sharded: bool = False,
- consolidated_output_path: Optional[str] = None,
- num_threads_consolidation: Optional[int] = None,
Bases:
nemo_automodel.checkpoint._backports._fsspec_filesystem.FsspecWriter
A writer that writes to a huggingface repository in the huggingface format. Uses Fsspec back-end to communicate with back-end storage. Fsspec registration of the storage solution is required.
Initialization
Initialize the huggingface writer pointing to path.
- Parameters:
path – hf directory where the checkpoint will be read from. Needs to have .safetensors files, but can be from any fsspec supported storage, including localFS and hf://. This needs to be a remote path if you want to enable consolidation after saving.
fqn_to_index_mapping – A mapping from tensor FQN to the index of the file that the tensor should be written to. Indices are from 1 to N, where N is the number of files. If not provided, the tensors will be written to a single file. If none, then all the tensors on the same rank will be written to the same file.
token – The token to use to authenticate with huggingface hub.
save_sharded – If True, save the checkpoint as a sharded checkpoint where every rank saves its own shard. Default is False which assumes full tensors are being saved.
consolidated_output_path – If provided, the output path where the consolidated files will be written in the finish step. This needs to be a local fs path right now.
num_threads_consolidation – Number of threads to use for parallel processing of saving data to output files. If not provided, the default value is the number of output files.
- prepare_global_plan(
- plans: list[torch.distributed.checkpoint.planner.SavePlan],
- write_data(
- plan: torch.distributed.checkpoint.planner.SavePlan,
- planner: torch.distributed.checkpoint.planner.SavePlanner,
- finish(
- metadata: torch.distributed.checkpoint.metadata.Metadata,
- results: list[list[torch.distributed.checkpoint.storage.WriteResult]],
- _split_by_storage_plan(
- storage_plan: Optional[dict[str, int]],
- items: list[torch.distributed.checkpoint.planner.WriteItem],
- property metadata_path: str#
- class nemo_automodel.checkpoint._backports.hf_storage._HuggingFaceStorageReader(
- path: str,
- token: Optional[str] = None,
Bases:
nemo_automodel.checkpoint._backports._fsspec_filesystem.FsspecReader
A reader that reads from a huggingface repository in the huggingface format. Uses in Fsspec back-end to communicate with storage. Fsspec registration of the storage solution is required.
Initialization
Initialize the huggingface reader pointing to path.
- Parameters:
path – hf directory where the checkpoint will be read from.
file (Needs to have .safetensors)
storage (but can be from any fsspec supported)
hf (including localFS and) – //.
token – The token to use to authenticate with huggingface hub.
- Param :
- nemo_automodel.checkpoint._backports.hf_storage._extract_file_index(filename: str) int [source]#
Return the 1-based shard index encoded in a safetensors filename.
Supported patterns::
model-00001-of-00008.safetensors shard-00000-model-00002-of-00008.safetensors model.safetensors (single-file checkpoints)
- Parameters:
filename – The (relative) safetensors filename.
- Returns:
The numeric shard index, defaulting to
1
when no explicit index is present or when the filename cannot be parsed.
- nemo_automodel.checkpoint._backports.hf_storage.get_fqn_to_file_index_mapping(
- reference_model_path: str,
Get the FQN to file index mapping from the metadata.
- Parameters:
reference_model_path – Path to reference model to copy file structure from.
- Returns:
A mapping from tensor FQN to the index of the file that the tensor should be written to. Indices are from 1 to N, where N is the number of files.