What can I help you with?
NVIDIA PhysicsNeMo Core (Latest Release)

PhysicsNeMo Launch Utils

The PhysicsNeMo Launch Utils module utilities that support the saving and loading of model checkpoints. These utilities are used internally by the LaunchLogger, but can also be used by users to save and load model checkpoints.

physicsnemo.launch.utils.checkpoint.get_checkpoint_dir(base_dir: str, model_name: str) → str[source]

Get a checkpoint directory based on a given base directory and model name

Parameters
  • base_dir (str) – Path to the base directory where checkpoints are stored

  • model_name (str, optional) – Name of the model which is generating the checkpoint

Returns

Checkpoint directory

Return type

str

physicsnemo.launch.utils.checkpoint.load_checkpoint(path: str, models: Optional[Union[Module, List[Module]]] = None, optimizer: Optional[optimizer] = None, scheduler: Optional[scheduler] = None, scaler: Optional[scaler] = None, epoch: Optional[int] = None, metadata_dict: Optional[Dict[str, Any]] = {}, device: Union[str, device] = 'cpu') → int[source]

Checkpoint loading utility

This loader is designed to be used with the save checkpoint utility in PhysicsNeMo Launch. Given a path, this method will try to find a checkpoint and load state dictionaries into the provided training objects.

Parameters
  • path (str) – Path to training checkpoint

  • models (Union[torch.nn.Module, List[torch.nn.Module], None], optional) – A single or list of PyTorch models, by default None

  • optimizer (Union[optimizer, None], optional) – Optimizer, by default None

  • scheduler (Union[scheduler, None], optional) – Learning rate scheduler, by default None

  • scaler (Union[scaler, None], optional) – AMP grad scaler, by default None

  • epoch (Union[int, None], optional) – Epoch checkpoint to load. If none is provided this will attempt to load the checkpoint with the largest index, by default None

  • metadata_dict (Optional[Dict[str, Any]], optional) – Dictionary to store metadata from the checkpoint, by default None

  • device (Union[str, torch.device], optional) – Target device, by default “cpu”

Returns

Loaded epoch

Return type

int

physicsnemo.launch.utils.checkpoint.save_checkpoint(path: str, models: Optional[Union[Module, List[Module]]] = None, optimizer: Optional[optimizer] = None, scheduler: Optional[scheduler] = None, scaler: Optional[scaler] = None, epoch: Optional[int] = None, metadata: Optional[Dict[str, Any]] = None) → None[source]

Training checkpoint saving utility

This will save a training checkpoint in the provided path following the file naming convention “checkpoint.{model parallel id}.{epoch/index}.mdlus”. The load checkpoint method in PhysicsNeMo core can then be used to read this file.

Parameters
  • path (str) – Path to save the training checkpoint

  • models (Union[torch.nn.Module, List[torch.nn.Module], None], optional) – A single or list of PyTorch models, by default None

  • optimizer (Union[optimizer, None], optional) – Optimizer, by default None

  • scheduler (Union[scheduler, None], optional) – Learning rate scheduler, by default None

  • scaler (Union[scaler, None], optional) – AMP grad scaler. Will attempt to save on in static capture if none provided, by default None

  • epoch (Union[int, None], optional) – Epoch checkpoint to load. If none this will save the checkpoint in the next valid index, by default None

  • metadata (Optional[Dict[str, Any]], optional) – Additional metadata to save, by default None

Previous PhysicsNeMo Launch Logging
Next Fourier Neural Operater for Darcy Flow
© Copyright 2023, NVIDIA PhysicsNeMo Team. Last updated on Jun 11, 2025.