Modulus Launch Utils

modulus.launch.utils.checkpoint.load_checkpoint(path: str, models: Optional[Union[Module, List[Module]]] = None, optimizer: Optional[optimizer] = None, scheduler: Optional[scheduler] = None, scaler: Optional[scaler] = None, epoch: Optional[int] = None, device: Union[str, device] = 'cpu') → int[source]

Checkpoint loading utility

This loader is designed to be used with the save checkpoint utility in Modulus Launch. Given a path, this method will try to find a checkpoint and load state dictionaries into the provided training objects.

Parameters
  • path (str) – Path to training checkpoint

  • models (Union[torch.nn.Module, List[torch.nn.Module], None], optional) – A single or list of PyTorch models, by default None

  • optimizer (Union[optimizer, None], optional) – Optimizer, by default None

  • scheduler (Union[scheduler, None], optional) – Learning rate scheduler, by default None

  • scaler (Union[scaler, None], optional) – AMP grad scaler, by default None

  • epoch (Union[int, None], optional) – Epoch checkpoint to load. If none is provided this will attempt to load the checkpoint with the largest index, by default None

  • device (Union[str, torch.device], optional) – Target device, by default “cpu”

Returns

Loaded epoch

Return type

int

modulus.launch.utils.checkpoint.save_checkpoint(path: str, models: Optional[Union[Module, List[Module]]] = None, optimizer: Optional[optimizer] = None, scheduler: Optional[scheduler] = None, scaler: Optional[scaler] = None, epoch: Optional[int] = None) → None[source]

Training checkpoint saving utility

This will save a training checkpoint in the provided path following the file naming convention “checkpoint.{model parallel id}.{epoch/index}.mdlus”. The load checkpoint method in Modulus core can then be used to read this file.

Parameters
  • path (str) – Path to save the training checkpoint

  • models (Union[torch.nn.Module, List[torch.nn.Module], None], optional) – A single or list of PyTorch models, by default None

  • optimizer (Union[optimizer, None], optional) – Optimizer, by default None

  • scheduler (Union[scheduler, None], optional) – Learning rate scheduler, by default None

  • scaler (Union[scaler, None], optional) – AMP grad scaler. Will attempt to save on in static capture if none provided, by default None

  • epoch (Union[int, None], optional) – Epoch checkpoint to load. If none this will save the checkpoint in the next valid index, by default None

© Copyright 2023, NVIDIA Modulus Team. Last updated on Sep 21, 2023.