`nemo_automodel.checkpoint.checkpointing`#

Checkpoint management utilities for HF models.

Module Contents#

Classes#

CheckpointingConfig

Configuration for checkpointing.

Functions#

`save_model`	Save a model state dictionary to a weights path.
`load_model`	Load a model state dictionary from a weights path.
`save_optimizer`	Save an optimizer state dictionary to a weights path.
`load_optimizer`	Load an optimizer state dictionary from a weights path.
`_get_safetensors_index_path`	Return the directory containing the first `model.safetensors.index.json` found for given model.

API#

class nemo_automodel.checkpoint.checkpointing.CheckpointingConfig[source]#

Configuration for checkpointing.

enabled: bool#: None

checkpoint_dir: str | pathlib.Path#: None

model_save_format: nemo_automodel.checkpoint._backports.filesystem.SerializationFormat | str#: None

model_cache_dir: str | pathlib.Path#: None

model_repo_id: str#: None

save_consolidated: bool#: None

is_peft: bool#: None

__post_init__()[source]#: Convert a raw string such as “safetensors” into the right Enum.

nemo_automodel.checkpoint.checkpointing.save_model( model: torch.nn.Module | transformers.PreTrainedModel, weights_path: str, checkpoint_config: nemo_automodel.checkpoint.checkpointing.CheckpointingConfig, )[source]#

Save a model state dictionary to a weights path.

This function can save a model in the following formats:

safetensors (in HF format)
torch_save (in DCP format)

Parameters:

model – Model to save
weights_path – Path to save model weights
checkpoint_config – Checkpointing configuration

nemo_automodel.checkpoint.checkpointing.load_model( model: torch.nn.Module | transformers.PreTrainedModel, weights_path: str, checkpoint_config: nemo_automodel.checkpoint.checkpointing.CheckpointingConfig, )[source]#

Load a model state dictionary from a weights path.

Parameters:

model – Model to load state into
weights_path – Path to load model weights from
checkpoint_config – Checkpointing configuration

nemo_automodel.checkpoint.checkpointing.save_optimizer( optimizer: torch.optim.Optimizer, model: torch.nn.Module, weights_path: str, scheduler: Optional[Any] = None, )[source]#

Save an optimizer state dictionary to a weights path.

Parameters:

optimizer – Optimizer to save
model – Model to save optimizer state for
weights_path – Path to save optimizer weights
scheduler – Optional scheduler to save

nemo_automodel.checkpoint.checkpointing.load_optimizer( optimizer: torch.optim.Optimizer, model: torch.nn.Module, weights_path: str, scheduler: Optional[Any] = None, )[source]#

Load an optimizer state dictionary from a weights path.

Parameters:

optimizer – Optimizer to load state into
model – Model to load optimizer state for
weights_path – Path to load optimizer weights from
scheduler – Optional scheduler to load state into

nemo_automodel.checkpoint.checkpointing._get_safetensors_index_path(cache_dir: str, repo_id: str) → str[source]#

Return the directory containing the first model.safetensors.index.json found for given model.

If no model.safetensors.index.json is found then it returns None.

For example, if the file located is

/opt/models/models--meta-llama--Llama-3.2-3B/snapshots/13afe.../model.safetensors.index.json

this function will return the directory path

/opt/models/models--meta-llama--Llama-3.2-3B/snapshots/13afe...

This will error if the model hasn’t been downloaded or if the cache directory is incorrect.

Parameters:

cache_dir – Path to cache directory
repo_id – Hugging Face repository ID

Returns:

Path to the directory containing the index file.

Raises:

FileNotFoundError – If the index file is not found.

nemo_automodel.checkpoint.checkpointing#

Module Contents#

Classes#

Functions#

API#

`nemo_automodel.checkpoint.checkpointing`#