nemo_automodel.checkpoint._backports._fsspec_filesystem#

Module Contents#

Classes#

FileSystem

FsspecWriter

Basic implementation of StorageWriter using FFspec.

FsspecReader

Data#

API#

nemo_automodel.checkpoint._backports._fsspec_filesystem.__all__#

[‘FsspecWriter’, ‘FsspecReader’]

class nemo_automodel.checkpoint._backports._fsspec_filesystem.FileSystem[source]#

Bases: nemo_automodel.checkpoint._backports.filesystem.FileSystemBase

create_stream(
path: Union[str, os.PathLike],
mode: str,
) collections.abc.Generator[io.IOBase, None, None][source]#
concat_path(
path: Union[str, os.PathLike],
suffix: str,
) Union[str, os.PathLike][source]#
init_path(
path: Union[str, os.PathLike],
**kwargs,
) Union[str, os.PathLike][source]#
rename(
path: Union[str, os.PathLike],
new_path: Union[str, os.PathLike],
) None[source]#
mkdir(path: Union[str, os.PathLike]) None[source]#
classmethod validate_checkpoint_id(
checkpoint_id: Union[str, os.PathLike],
) bool[source]#
exists(path: Union[str, os.PathLike]) bool[source]#
rm_file(path: Union[str, os.PathLike]) None[source]#
ls(path: Union[str, os.PathLike]) list[str][source]#
class nemo_automodel.checkpoint._backports._fsspec_filesystem.FsspecWriter(
path: Union[str, os.PathLike],
single_file_per_rank: bool = True,
sync_files: bool = True,
thread_count: int = 1,
per_thread_copy_ahead: int = 10000000,
overwrite: bool = True,
_extensions: Optional[collections.abc.Sequence[torch.distributed.checkpoint._extension.StreamTransformExtension]] = None,
serialization_format: nemo_automodel.checkpoint._backports.filesystem.SerializationFormat = SerializationFormat.TORCH_SAVE,
**kwargs,
)[source]#

Bases: nemo_automodel.checkpoint._backports.filesystem.FileSystemWriter

Basic implementation of StorageWriter using FFspec.

This implementation makes the following assumptions and simplifications:

  • The checkpoint path is an empty or non-existing directory.

  • File creation is atomic

The checkpoint consist of one file per write request plus a .metadata file with the serialized metadata.

Initialization

Initialize the writer pointing to path.

Parameters:
  • path – directory where the checkpoint will be written to.

  • single_file_per_rank – Produce one file per rank instead of one file per tensor/blob. Default to True.

  • sync_files – force files to be synced to permanent storage. Default to True.

  • thread_count – Number of IO threads to use to write. Default to 1.

  • per_thread_copy_ahead – How many bytes to copy from the GPU ahead of saving then. Default 10Mb.

  • overwrite – Whether to allow overwriting existing checkpoints. Defaults to True.

  • _extensions – Extensions to apply to output streams (EXPERIMENTAL)

N. B. If sync_files is disabled, there’s no guarantee that the checkpoint will be consistent in the case of a failure.

classmethod validate_checkpoint_id(
checkpoint_id: Union[str, os.PathLike],
) bool[source]#
class nemo_automodel.checkpoint._backports._fsspec_filesystem.FsspecReader(path: Union[str, os.PathLike], **kwargs)[source]#

Bases: nemo_automodel.checkpoint._backports.filesystem.FileSystemReader

classmethod validate_checkpoint_id(
checkpoint_id: Union[str, os.PathLike],
) bool[source]#