PhysicsNeMo Optim#
Optimizer utilities for PhysicsNeMo.
The PhysicsNeMo Optim module provides optimization utilities for training physics-informed machine learning models. These utilities are designed to work seamlessly with PyTorch’s optimizer ecosystem while providing additional functionality for complex training scenarios.
CombinedOptimizer#
The CombinedOptimizer allows combining multiple PyTorch optimizers into a unified
interface. This is particularly useful when different parts of a model require different
optimization strategies - for example, using Adam for encoder layers and SGD with momentum
for decoder layers.
- class physicsnemo.optim.CombinedOptimizer(
- optimizers: Sequence[Optimizer],
- torch_compile_kwargs: dict[str, Any] | None = None,
Bases:
OptimizerCombine multiple PyTorch optimizers into a single Optimizer-like interface.
This wrapper allows you to use different optimizers for different parts of a model while presenting a unified interface compatible with PyTorch’s training loops and learning rate schedulers. The
param_groupsfrom all contained optimizers are concatenated, enabling schedulers to operate transparently across all parameters.- Parameters:
optimizers (Sequence[torch.optim.Optimizer]) – Sequence of PyTorch Optimizer instances to combine. Each optimizer should already be configured with its own parameters and hyperparameters. Must contain at least one optimizer.
torch_compile_kwargs (dict[str, Any], optional) – Optional dictionary of keyword arguments to pass to
torch.compile()when compiling each optimizer’s step function. If None, step functions are not compiled. Compiling can improve performance but may affect serialization. Default is None.
- Raises:
ValueError – If
optimizersis empty, or if any parameter appears in multiple optimizers (parameter groups must be disjoint).
Notes
Parameter Groups: The
param_groupsattribute aggregates parameter groups from all underlying optimizers, making this wrapper compatible with learning rate schedulers.Closure Behavior: When
step()is called with a closure, the closure is passed to each underlying optimizer sequentially. This results in the closure being evaluated multiple times (at least once per optimizer), which triggers multiple forward and backward passes. This behavior matches callingstep(closure)on each optimizer individually.Dynamic Parameter Addition: The
add_param_group()method is not supported. To add parameters dynamically, add them to the individual optimizers before creating the CombinedOptimizer, or create a new instance.State Access: The
stateattribute inherited from the base class may not accurately reflect the optimizer state. Access state through the individual optimizers in theoptimizersattribute instead.Serialization: The optimizer can be pickled and unpickled. When
torch_compile_kwargsis provided, the compiled step functions are reconstructed during unpickling.
Examples
Combine Adam for model backbone and SGD for the head:
>>> import torch >>> import torch.nn as nn >>> from torch.optim import Adam, SGD >>> from physicsnemo.optim import CombinedOptimizer >>> >>> model = nn.Sequential( ... nn.Linear(10, 20), # backbone ... nn.ReLU(), ... nn.Linear(20, 2), # head ... ) >>> backbone_params = list(model[0].parameters()) >>> head_params = list(model[2].parameters()) >>> >>> opt1 = Adam(backbone_params, lr=1e-4) >>> opt2 = SGD(head_params, lr=1e-2, momentum=0.9) >>> combined_opt = CombinedOptimizer([opt1, opt2]) >>> >>> # Use with a learning rate scheduler >>> scheduler = torch.optim.lr_scheduler.StepLR(combined_opt, step_size=10) >>> >>> # Standard training loop >>> for epoch in range(100): ... combined_opt.zero_grad() ... loss = model(torch.randn(32, 10)).sum() ... loss.backward() ... combined_opt.step() ... scheduler.step()
- __init__(
- optimizers: Sequence[Optimizer],
- torch_compile_kwargs: dict[str, Any] | None = None,
- add_param_group(
- param_group: dict[str, Any],
Add a param group to the Optimizer’s param_groups.
This method is not supported for CombinedOptimizer as it would require logic to determine which underlying optimizer should handle the new group.
- Parameters:
param_group (dict[str, Any]) – The parameter group to add.
- Raises:
NotImplementedError – Always raises NotImplementedError unless called during initialization.
- load_state_dict(
- state_dict: dict[str, Any],
Load the state of all optimizers from a dictionary.
This method restores the state of each underlying optimizer from the provided state dictionary. The state dictionary must have been created by
state_dict()from a CombinedOptimizer with the same number of optimizers.- Parameters:
state_dict (dict[str, Any]) – A dictionary containing optimizer states, as returned by
state_dict(). Must contain an"optimizers"key mapping to a list of state dictionaries.- Raises:
ValueError – If the number of optimizers in
state_dictdoes not match the number of optimizers in this instance.KeyError – If
state_dictdoes not contain the expected structure.
Notes
After loading state, the
param_groupsattribute is refreshed to reflect any changes in the underlying optimizers.
- state_dict() dict[str, Any][source]#
Return the state of all optimizers as a dictionary.
The returned dictionary contains the state dictionaries of all underlying optimizers, allowing the combined optimizer to be checkpointed and restored.
- Returns:
A dictionary with a single key
"optimizers"mapping to a list of state dictionaries, one for each underlying optimizer in order.- Return type:
dict[str, Any]
Examples
>>> import torch >>> from physicsnemo.optim import CombinedOptimizer >>> param1 = torch.nn.Parameter(torch.randn(3)) >>> param2 = torch.nn.Parameter(torch.randn(3)) >>> opt1 = torch.optim.SGD([param1], lr=0.01) >>> opt2 = torch.optim.Adam([param2], lr=0.001) >>> combined_opt = CombinedOptimizer([opt1, opt2]) >>> state = combined_opt.state_dict() >>> list(state.keys()) ['optimizers'] >>> len(state["optimizers"]) 2
- step(
- closure: Callable[[], float] | None = None,
Perform a single optimization step.
This method calls the
step()method of each underlying optimizer. If a closure is provided, it is passed to each optimizer.- Parameters:
closure (Callable[[], float], optional) – Optional callable that reevaluates the model and returns the loss. If provided, it will be passed to each optimizer’s step function. Default is None.
- Returns:
The loss value returned by the last optimizer that returns a non-None value, or None if no closure was provided or no optimizer returned a value. When multiple optimizers return values, the result from the last optimizer in sequence takes precedence.
- Return type:
float or None
Notes
The return value semantics match PyTorch’s
Optimizer.step()interface, which returnsfloat | None. In practice, most closures return atorch.Tensorloss, and PyTorch optimizers that use the closure will call.item()on it internally before returning.
- zero_grad(set_to_none: bool = True) None[source]#
Clear the gradients of all optimized parameters.
This method delegates to the
zero_grad()method of each underlying optimizer.- Parameters:
set_to_none (bool, optional) – If True (default), sets gradients to None instead of zero. This reduces memory usage and can improve performance. Matches the upstream PyTorch
Optimizer.zero_grad()interface.