nemo_automodel.components.checkpoint.config

View as Markdown

Public config surface for the checkpoint component.

CheckpointingConfig holds the typed parameters that drive checkpointing behaviour and exposes .build() to construct the :class:Checkpointer engine (defined in checkpointing.py). Every field has a sensible default so the recipe layer can construct it directly from the YAML checkpoint: block plus the model-derived model_repo_id / model_cache_dir / is_peft arguments — there is no separate builder/adapter.

Module Contents

Classes

NameDescription
CheckpointingConfigConfiguration for checkpointing.
SaveConsolidatedModeControls when consolidated HF safetensors are exported.

Functions

NameDescription
_is_geq_torch_2_9Check if the current torch version is greater than or equal to 2.9.0.
_normalize_save_consolidatedNormalize legacy bools and string aliases to a consolidated export mode.

Data

__all__

API

class nemo_automodel.components.checkpoint.config.CheckpointingConfig(
enabled: bool = True,
checkpoint_dir: str | pathlib.Path = 'checkpoints/',
model_save_format: str = 'safetensors',
model_cache_dir: str | pathlib.Path | None = None,
model_repo_id: str | None = None,
save_consolidated: bool | str | nemo_automodel.components.checkpoint.config.SaveConsolidatedMode = 'final',
is_peft: bool = False,
model_state_dict_keys: list[str] | None = None,
is_async: bool = False,
dequantize_base_checkpoint: bool | None = None,
original_model_root_dir: str | None = None,
skip_task_head_prefixes_for_base_model: list[str] | None = None,
single_rank_consolidation: bool = False,
staging_dir: str | None = None,
v4_compatible: bool = False,
diffusers_compatible: bool = False,
best_metric_key: str = 'default'
)
Dataclass

Configuration for checkpointing.

Every field has a default so the recipe layer can construct this directly from the YAML checkpoint: block merged with the model-derived model_repo_id / model_cache_dir / is_peft values. When model_cache_dir is None it falls back to the HF hub cache.

best_metric_key
str = 'default'
checkpoint_dir
str | Path = 'checkpoints/'
dequantize_base_checkpoint
bool | None = None
diffusers_compatible
bool = False
enabled
bool = True
is_async
bool = False
is_peft
bool = False
model_cache_dir
str | Path | None = None
model_repo_id
str | None = None
model_save_format
str = 'safetensors'
model_state_dict_keys
list[str] | None = None
original_model_root_dir
str | None = None
save_consolidated
bool | str | SaveConsolidatedMode = 'final'
single_rank_consolidation
bool = False
skip_task_head_prefixes_for_base_model
list[str] | None = None
staging_dir
str | None = None
v4_compatible
bool = False
nemo_automodel.components.checkpoint.config.CheckpointingConfig.__post_init__()

Resolve the cache dir, enforce PEFT constraints, and coerce the save format/mode.

nemo_automodel.components.checkpoint.config.CheckpointingConfig.build(
dp_rank: int,
tp_rank: int,
pp_rank: int,
moe_mesh: torch.distributed.device_mesh.DeviceMesh | None = None
) -> nemo_automodel.components.checkpoint.checkpointing.Checkpointer

Build the :class:Checkpointer engine for this config.

Checkpointer is imported lazily to avoid a circular import (checkpointing.py imports CheckpointingConfig from this module) and to keep the heavy DCP/safetensors deps out of module load.

Parameters:

dp_rank
int

Data-parallel rank.

tp_rank
int

Tensor-parallel rank.

pp_rank
int

Pipeline-parallel rank.

moe_mesh
DeviceMesh | NoneDefaults to None

Optional device mesh for MoE checkpointing.

Returns: Checkpointer

class:Checkpointer.

class nemo_automodel.components.checkpoint.config.SaveConsolidatedMode

Bases: enum.Enum

Controls when consolidated HF safetensors are exported.

EVERY
= 'every'
FALSE
= 'false'
FINAL
= 'final'
nemo_automodel.components.checkpoint.config._is_geq_torch_2_9() -> bool

Check if the current torch version is greater than or equal to 2.9.0.

nemo_automodel.components.checkpoint.config._normalize_save_consolidated(
value: bool | str | nemo_automodel.components.checkpoint.config.SaveConsolidatedMode
) -> nemo_automodel.components.checkpoint.config.SaveConsolidatedMode

Normalize legacy bools and string aliases to a consolidated export mode.

nemo_automodel.components.checkpoint.config.__all__ = ['CheckpointingConfig', 'SaveConsolidatedMode']