PhysicsNeMo Optim#

Optimizer utilities for PhysicsNeMo.

The PhysicsNeMo Optim module provides optimization utilities for training physics-informed machine learning models. These utilities are designed to work seamlessly with PyTorch’s optimizer ecosystem while providing additional functionality for complex training scenarios.

CombinedOptimizer#

The CombinedOptimizer allows combining multiple PyTorch optimizers into a unified interface. This is particularly useful when different parts of a model require different optimization strategies - for example, using Adam for encoder layers and SGD with momentum for decoder layers.

class physicsnemo.optim.CombinedOptimizer(
optimizers: Sequence[Optimizer],
torch_compile_kwargs: dict[str, Any] | None = None,
)[source]#

Bases: Optimizer

Combine multiple PyTorch optimizers into a single Optimizer-like interface.

This wrapper allows you to use different optimizers for different parts of a model while presenting a unified interface compatible with PyTorch’s training loops and learning rate schedulers. The param_groups from all contained optimizers are concatenated, enabling schedulers to operate transparently across all parameters.

Parameters:
  • optimizers (Sequence[torch.optim.Optimizer]) – Sequence of PyTorch Optimizer instances to combine. Each optimizer should already be configured with its own parameters and hyperparameters. Must contain at least one optimizer.

  • torch_compile_kwargs (dict[str, Any], optional) – Optional dictionary of keyword arguments to pass to torch.compile() when compiling each optimizer’s step function. If None, step functions are not compiled. Compiling can improve performance but may affect serialization. Default is None.

Raises:

ValueError – If optimizers is empty, or if any parameter appears in multiple optimizers (parameter groups must be disjoint).

Notes

  • Parameter Groups: The param_groups attribute aggregates parameter groups from all underlying optimizers, making this wrapper compatible with learning rate schedulers.

  • Closure Behavior: When step() is called with a closure, the closure is passed to each underlying optimizer sequentially. This results in the closure being evaluated multiple times (at least once per optimizer), which triggers multiple forward and backward passes. This behavior matches calling step(closure) on each optimizer individually.

  • Dynamic Parameter Addition: The add_param_group() method is not supported. To add parameters dynamically, add them to the individual optimizers before creating the CombinedOptimizer, or create a new instance.

  • State Access: The state attribute inherited from the base class may not accurately reflect the optimizer state. Access state through the individual optimizers in the optimizers attribute instead.

  • Serialization: The optimizer can be pickled and unpickled. When torch_compile_kwargs is provided, the compiled step functions are reconstructed during unpickling.

Examples

Combine Adam for model backbone and SGD for the head:

>>> import torch
>>> import torch.nn as nn
>>> from torch.optim import Adam, SGD
>>> from physicsnemo.optim import CombinedOptimizer
>>>
>>> model = nn.Sequential(
...     nn.Linear(10, 20),  # backbone
...     nn.ReLU(),
...     nn.Linear(20, 2),   # head
... )
>>> backbone_params = list(model[0].parameters())
>>> head_params = list(model[2].parameters())
>>>
>>> opt1 = Adam(backbone_params, lr=1e-4)
>>> opt2 = SGD(head_params, lr=1e-2, momentum=0.9)
>>> combined_opt = CombinedOptimizer([opt1, opt2])
>>>
>>> # Use with a learning rate scheduler
>>> scheduler = torch.optim.lr_scheduler.StepLR(combined_opt, step_size=10)
>>>
>>> # Standard training loop
>>> for epoch in range(100):
...     combined_opt.zero_grad()
...     loss = model(torch.randn(32, 10)).sum()
...     loss.backward()
...     combined_opt.step()
...     scheduler.step()
__init__(
optimizers: Sequence[Optimizer],
torch_compile_kwargs: dict[str, Any] | None = None,
)[source]#
add_param_group(
param_group: dict[str, Any],
) None[source]#

Add a param group to the Optimizer’s param_groups.

This method is not supported for CombinedOptimizer as it would require logic to determine which underlying optimizer should handle the new group.

Parameters:

param_group (dict[str, Any]) – The parameter group to add.

Raises:

NotImplementedError – Always raises NotImplementedError unless called during initialization.

load_state_dict(
state_dict: dict[str, Any],
) None[source]#

Load the state of all optimizers from a dictionary.

This method restores the state of each underlying optimizer from the provided state dictionary. The state dictionary must have been created by state_dict() from a CombinedOptimizer with the same number of optimizers.

Parameters:

state_dict (dict[str, Any]) – A dictionary containing optimizer states, as returned by state_dict(). Must contain an "optimizers" key mapping to a list of state dictionaries.

Raises:
  • ValueError – If the number of optimizers in state_dict does not match the number of optimizers in this instance.

  • KeyError – If state_dict does not contain the expected structure.

Notes

After loading state, the param_groups attribute is refreshed to reflect any changes in the underlying optimizers.

state_dict() dict[str, Any][source]#

Return the state of all optimizers as a dictionary.

The returned dictionary contains the state dictionaries of all underlying optimizers, allowing the combined optimizer to be checkpointed and restored.

Returns:

A dictionary with a single key "optimizers" mapping to a list of state dictionaries, one for each underlying optimizer in order.

Return type:

dict[str, Any]

Examples

>>> import torch
>>> from physicsnemo.optim import CombinedOptimizer
>>> param1 = torch.nn.Parameter(torch.randn(3))
>>> param2 = torch.nn.Parameter(torch.randn(3))
>>> opt1 = torch.optim.SGD([param1], lr=0.01)
>>> opt2 = torch.optim.Adam([param2], lr=0.001)
>>> combined_opt = CombinedOptimizer([opt1, opt2])
>>> state = combined_opt.state_dict()
>>> list(state.keys())
['optimizers']
>>> len(state["optimizers"])
2
step(
closure: Callable[[], float] | None = None,
) float | None[source]#

Perform a single optimization step.

This method calls the step() method of each underlying optimizer. If a closure is provided, it is passed to each optimizer.

Parameters:

closure (Callable[[], float], optional) – Optional callable that reevaluates the model and returns the loss. If provided, it will be passed to each optimizer’s step function. Default is None.

Returns:

The loss value returned by the last optimizer that returns a non-None value, or None if no closure was provided or no optimizer returned a value. When multiple optimizers return values, the result from the last optimizer in sequence takes precedence.

Return type:

float or None

Notes

The return value semantics match PyTorch’s Optimizer.step() interface, which returns float | None. In practice, most closures return a torch.Tensor loss, and PyTorch optimizers that use the closure will call .item() on it internally before returning.

zero_grad(set_to_none: bool = True) None[source]#

Clear the gradients of all optimized parameters.

This method delegates to the zero_grad() method of each underlying optimizer.

Parameters:

set_to_none (bool, optional) – If True (default), sets gradients to None instead of zero. This reduces memory usage and can improve performance. Matches the upstream PyTorch Optimizer.zero_grad() interface.