nemo_rl.utils.automodel_checkpoint
#
Checkpoint management utilities for HF models.
Module Contents#
Functions#
Infer checkpoint root directory from weights path. |
|
Detect model save format and PEFT status from checkpoint directory. |
|
Save a checkpoint of the model and optionally optimizer state. |
|
Load a model weights and optionally optimizer state. |
API#
- nemo_rl.utils.automodel_checkpoint._infer_checkpoint_root(weights_path: str) str #
Infer checkpoint root directory from weights path.
When weights_path ends with “…/weights/model”, we need the parent of the weights directory (the checkpoint root), not the weights directory itself.
- Parameters:
weights_path – Path to model weights (e.g., “/path/to/policy/weights/model”)
- Returns:
Checkpoint root directory (e.g., “/path/to/policy”)
- Return type:
str
- nemo_rl.utils.automodel_checkpoint.detect_checkpoint_format(weights_path: str) tuple[str, bool] #
Detect model save format and PEFT status from checkpoint directory.
- Parameters:
weights_path – Path to the checkpoint directory (e.g., weights/model)
- Returns:
(model_save_format, is_peft) where: model_save_format is “torch_save” for DCP or “safetensors” for safetensors is_peft is True if PEFT/adapter patterns are detected
- Return type:
tuple
- nemo_rl.utils.automodel_checkpoint.save_checkpoint(
- model: torch.nn.Module,
- weights_path: str,
- optimizer: Optional[torch.optim.Optimizer] = None,
- scheduler: Optional[Any] = None,
- optimizer_path: Optional[str] = None,
- tokenizer: Optional[Any] = None,
- tokenizer_path: Optional[str] = None,
- model_save_format: str = 'safetensors',
- is_peft: bool = False,
- peft_config: Optional[Any] = None,
- save_consolidated: bool = False,
- model_state_dict_keys: Optional[list[str]] = None,
Save a checkpoint of the model and optionally optimizer state.
- Parameters:
model – The PyTorch model to save
weights_path – Path to save model weights
optimizer – Optional optimizer to save
scheduler – Optional scheduler to save
optimizer_path – Path to save optimizer state (required if optimizer provided)
tokenizer – Optional tokenizer to save
tokenizer_path – Path to save tokenizer state (required if tokenizer provided)
model_save_format – Format for saving model (“torch_save” or “safetensors”)
is_peft – Whether the model uses PEFT
peft_config – PEFT configuration if is_peft is True
save_consolidated – Whether to save consolidated checkpoints (for HF compatibility)
model_state_dict_keys – Copy of the model state dict keys before any parallelization. If None, will be extracted from the model’s current state dict.
- nemo_rl.utils.automodel_checkpoint.load_checkpoint(
- model: torch.nn.Module,
- weights_path: str,
- optimizer: Optional[torch.optim.Optimizer] = None,
- scheduler: Optional[Any] = None,
- optimizer_path: Optional[str] = None,
Load a model weights and optionally optimizer state.
- Parameters:
model – The PyTorch model whose weights to update
weights_path – Path to load model weights from
optimizer – Optional optimizer to load state into
scheduler – Optional scheduler to load state into
optimizer_path – Path to load optimizer state from (required if optimizer provided)