PhysicsNeMo Models#
Basics#
PhysicsNeMo contains its own Model class for constructing neural networks. This model class
is built on top of PyTorch’s nn.Module
and can be used interchangeably within the
PyTorch ecosystem. Using PhysicsNeMo models allows you to leverage various features of
PhysicsNeMo aimed at improving performance and ease of use. These features include, but are
not limited to, model zoo, automatic mixed-precision, CUDA Graphs, and easy checkpointing.
We discuss each of these features in the following sections.
Model Zoo#
PhysicsNeMo contains several optimized, customizable and easy-to-use models. These include some very general models like Fourier Neural Operators (FNOs), ResNet, and Graph Neural Networks (GNNs) as well as domain-specific models like Deep Learning Weather Prediction (DLWP) and Spherical Fourier Neural Operators (SFNO).
For a list of currently available models, please refer the models on GitHub.
Below are some simple examples of how to use these models.
>>> import torch
>>> from physicsnemo.models.mlp.fully_connected import FullyConnected
>>> model = FullyConnected(in_features=32, out_features=64)
>>> input = torch.randn(128, 32)
>>> output = model(input)
>>> output.shape
torch.Size([128, 64])
>>> import torch
>>> from physicsnemo.models.fno.fno import FNO
>>> model = FNO(
in_channels=4,
out_channels=3,
decoder_layers=2,
decoder_layer_size=32,
dimension=2,
latent_channels=32,
num_fno_layers=2,
padding=0,
)
>>> input = torch.randn(32, 4, 32, 32) #(N, C, H, W)
>>> output = model(input)
>>> output.size()
torch.Size([32, 3, 32, 32])
How to write your own PhysicsNeMo model#
There are a few different ways to construct a PhysicsNeMo model. If you are a seasoned PyTorch user, the easiest way would be to write your model using the optimized layers and utilities from PhysicsNeMo or Pytorch. Let’s take a look at a simple example of a UNet model first showing a simple PyTorch implementation and then a PhysicsNeMo implementation that supports CUDA Graphs and Automatic Mixed-Precision.
import torch.nn as nn
class UNet(nn.Module):
def __init__(self, in_channels=1, out_channels=1):
super(UNet, self).__init__()
self.enc1 = self.conv_block(in_channels, 64)
self.enc2 = self.conv_block(64, 128)
self.dec1 = self.upconv_block(128, 64)
self.final = nn.Conv2d(64, out_channels, kernel_size=1)
def conv_block(self, in_channels, out_channels):
return nn.Sequential(
nn.Conv2d(in_channels, out_channels, 3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(2)
)
def upconv_block(self, in_channels, out_channels):
return nn.Sequential(
nn.ConvTranspose2d(in_channels, out_channels, 2, stride=2),
nn.Conv2d(out_channels, out_channels, 3, padding=1),
nn.ReLU(inplace=True)
)
def forward(self, x):
x1 = self.enc1(x)
x2 = self.enc2(x1)
x = self.dec1(x2)
return self.final(x)
Now we show this model rewritten in PhysicsNeMo. First, let us subclass the model from
physicsnemo.Module
instead of torch.nn.Module
. The
physicsnemo.Module
class acts like a direct replacement for the
torch.nn.Module
and provides additional functionality for saving and loading
checkpoints, etc. Refer to the API docs of physicsnemo.Module
for further
details. Additionally, we will add metadata to the model to capture the optimizations
that this model supports. In this case we will enable CUDA Graphs and Automatic Mixed-Precision.
from dataclasses import dataclass
import physicsnemo
import torch.nn as nn
@dataclass
class UNetMetaData(physicsnemo.ModelMetaData):
name: str = "UNet"
# Optimization
jit: bool = True
cuda_graphs: bool = True
amp_cpu: bool = True
amp_gpu: bool = True
class UNet(physicsnemo.Module):
def __init__(self, in_channels=1, out_channels=1):
super(UNet, self).__init__(meta=UNetMetaData())
self.enc1 = self.conv_block(in_channels, 64)
self.enc2 = self.conv_block(64, 128)
self.dec1 = self.upconv_block(128, 64)
self.final = nn.Conv2d(64, out_channels, kernel_size=1)
def conv_block(self, in_channels, out_channels):
return nn.Sequential(
nn.Conv2d(in_channels, out_channels, 3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(2)
)
def upconv_block(self, in_channels, out_channels):
return nn.Sequential(
nn.ConvTranspose2d(in_channels, out_channels, 2, stride=2),
nn.Conv2d(out_channels, out_channels, 3, padding=1),
nn.ReLU(inplace=True)
)
def forward(self, x):
x1 = self.enc1(x)
x2 = self.enc2(x1)
x = self.dec1(x2)
return self.final(x)
Now that we have our PhysicsNeMo model, we can make use of these optimizations using the
physicsnemo.utils.StaticCaptureTraining
decorator. This decorator will capture the
training step function and optimize it for the specified optimizations.
import torch
from physicsnemo.utils import StaticCaptureTraining
model = UNet().to("cuda")
input = torch.randn(8, 1, 128, 128).to("cuda")
output = torch.zeros(8, 1, 64, 64).to("cuda")
optim = torch.optim.Adam(model.parameters(), lr=0.001)
# Create training step function with optimization wrapper
# StaticCaptureTraining calls `backward` on the loss and
# `optimizer.step()` so you don't have to do that
# explicitly.
@StaticCaptureTraining(
model=model,
optim=optim,
cuda_graph_warmup=11,
)
def training_step(invar, outvar):
predvar = model(invar)
loss = torch.sum(torch.pow(predvar - outvar, 2))
return loss
# Sample training loop
for i in range(20):
# In place copy of input and output to support cuda graphs
input.copy_(torch.randn(8, 1, 128, 128).to("cuda"))
output.copy_(torch.zeros(8, 1, 64, 64).to("cuda"))
# Run training step
loss = training_step(input, output)
For the simple model above, you can observe ~1.1x speed-up due to CUDA Graphs and AMP. The speed-up observed changes from model to model and is typically greater for more complex models.
Note
The ModelMetaData
and physicsnemo.Module
do not make the model
support CUDA Graphs, AMP, etc. optimizations automatically. The user is responsible
to write the model code that enables each of these optimizations.
Models in the PhysicsNeMo Model Zoo are written to support many of these optimizations
and checked against PhysicsNeMo’s CI to ensure that they work correctly.
Note
The StaticCaptureTraining
decorator is still under development and may be
refactored in the future.
Converting PyTorch Models to PhysicsNeMo Models#
In the above example we show constructing a PhysicsNeMo model from scratch. However, you
can also convert existing PyTorch models to PhysicsNeMo models in order to leverage
PhysicsNeMo features. To do this, you can use the Module.from_torch
method as shown
below.
from dataclasses import dataclass
import physicsnemo
import torch.nn as nn
class TorchModel(nn.Module):
def __init__(self):
super(TorchModel, self).__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = self.conv1(x)
return self.conv2(x)
@dataclass
class ConvMetaData(ModelMetaData):
name: str = "UNet"
# Optimization
jit: bool = True
cuda_graphs: bool = True
amp_cpu: bool = True
amp_gpu: bool = True
PhysicsNeMoModel = physicsnemo.Module.from_torch(TorchModel, meta=ConvMetaData())
Saving and Loading PhysicsNeMo Models#
As mentioned above, PhysicsNeMo models are interoperable with PyTorch models. This means that
you can save and load PhysicsNeMo models using the standard PyTorch APIs however, we provide
a few additional utilities to make this process easier. A key challenge in saving and
loading models is keeping track of the model metadata such as layer sizes, etc. PhysicsNeMo
models can be saved with this metadata to a custom .mdlus
file. These files allow
for easy loading and instantiation of the model. We show two examples of this below.
The first example shows saving and loading a model from an already instantiated model.
>>> from physicsnemo.models.mlp.fully_connected import FullyConnected
>>> model = FullyConnected(in_features=32, out_features=64)
>>> model.save("model.mdlus") # Save model to .mdlus file
>>> model.load("model.mdlus") # Load model weights from .mdlus file from already instantiated model
>>> model
FullyConnected(
(layers): ModuleList(
(0): FCLayer(
(activation_fn): SiLU()
(linear): Linear(in_features=32, out_features=512, bias=True)
)
(1-5): 5 x FCLayer(
(activation_fn): SiLU()
(linear): Linear(in_features=512, out_features=512, bias=True)
)
)
(final_layer): FCLayer(
(activation_fn): Identity()
(linear): Linear(in_features=512, out_features=64, bias=True)
)
)
The second example shows loading a model from a .mdlus
file without having to
instantiate the model first. We note that in this case we don’t know the class or
parameters to pass to the constructor of the model. However, we can still load the
model from the .mdlus
file.
>>> from physicsnemo import Module
>>> fc_model = Module.from_checkpoint("model.mdlus") # Instantiate model from .mdlus file.
>>> fc_model
FullyConnected(
(layers): ModuleList(
(0): FCLayer(
(activation_fn): SiLU()
(linear): Linear(in_features=32, out_features=512, bias=True)
)
(1-5): 5 x FCLayer(
(activation_fn): SiLU()
(linear): Linear(in_features=512, out_features=512, bias=True)
)
)
(final_layer): FCLayer(
(activation_fn): Identity()
(linear): Linear(in_features=512, out_features=64, bias=True)
)
)
Note
In order to make use of this functionality, the model must have .json
serializable
inputs to the __init__
function. It is highly recommended that all PhysicsNeMo
models be developed with this requirement in mind.
Note
Using Module.from_checkpoint
will not work if the model has any buffers or
parameters that are registered outside of the model’s __init__
function due to
the above requirement. In that case, one should use Module.load
, or ensure
that all model parameters and buffers are registered inside __init__
.
PhysicsNeMo Model Registry and Entry Points#
PhysicsNeMo contains a model registry that allows for easy access and ingestion of models. Below is a simple example of how to use the model registry to obtain a model class.
>>> from physicsnemo.registry import ModelRegistry
>>> model_registry = ModelRegistry()
>>> model_registry.list_models()
['AFNO', 'DLWP', 'FNO', 'FullyConnected', 'GraphCastNet', 'MeshGraphNet', 'One2ManyRNN', 'Pix2Pix', 'SFNO', 'SRResNet']
>>> FullyConnected = model_registry.factory("FullyConnected")
>>> model = FullyConnected(in_features=32, out_features=64)
The model registry also allows exposing models via entry points. This allows for
integration of models into the PhysicsNeMo ecosystem. For example, suppose you have a
package MyPackage
that contains a model MyModel
. You can expose this model
to the PhysicsNeMo registry by adding an entry point to your toml
file. For
example, suppose your package structure is as follows:
# setup.py
from setuptools import setup, find_packages
setup()
# pyproject.toml
[build-system]
requires = ["setuptools", "wheel"]
build-backend = "setuptools.build_meta"
[project]
name = "MyPackage"
description = "My Neural Network Zoo."
version = "0.1.0"
[project.entry-points."physicsnemo.models"]
MyPhysicsNeMoModel = "mypackage.models.MyPhysicsNeMoModel:MyPhysicsNeMoModel"
# mypackage/models.py
import torch.nn as nn
from physicsnemo.models import Module
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = self.conv1(x)
return self.conv2(x)
MyPhysicsNeMoModel = Module.from_pytorch(MyModel)
Once this package is installed, you can access the model via the PhysicsNeMo model registry.
>>> from physicsnemo.registry import ModelRegistry
>>> model_registry = ModelRegistry()
>>> model_registry.list_models()
['MyPhysicsNeMoModel', 'AFNO', 'DLWP', 'FNO', 'FullyConnected', 'GraphCastNet', 'MeshGraphNet', 'One2ManyRNN', 'Pix2Pix', 'SFNO', 'SRResNet']
>>> MyPhysicsNeMoModel = model_registry.factory("MyPhysicsNeMoModel")
For more information on entry points and potential use cases, see this blog post.
Fully Connected Network#
- class physicsnemo.models.mlp.fully_connected.FullyConnected(*args, **kwargs)[source]#
Bases:
Module
A densely-connected MLP architecture
- Parameters:
in_features (int, optional) – Size of input features, by default 512
layer_size (int, optional) – Size of every hidden layer, by default 512
out_features (int, optional) – Size of output features, by default 512
num_layers (int, optional) – Number of hidden layers, by default 6
activation_fn (Union[str, List[str]], optional) – Activation function to use, by default ‘silu’
skip_connections (bool, optional) – Add skip connections every 2 hidden layers, by default False
adaptive_activations (bool, optional) – Use an adaptive activation function, by default False
weight_norm (bool, optional) – Use weight norm on fully connected layers, by default False
weight_fact (bool, optional) – Use weight factorization on fully connected layers, by default False
Example
>>> model = physicsnemo.models.mlp.FullyConnected(in_features=32, out_features=64) >>> input = torch.randn(128, 32) >>> output = model(input) >>> output.size() torch.Size([128, 64])
- forward(x: Tensor) Tensor [source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class physicsnemo.models.mlp.fully_connected.MetaData(
- name: str = 'FullyConnected',
- jit: bool = True,
- cuda_graphs: bool = True,
- amp: bool = True,
- amp_cpu: bool = None,
- amp_gpu: bool = None,
- torch_fx: bool = True,
- bf16: bool = False,
- onnx: bool = True,
- onnx_gpu: bool = None,
- onnx_cpu: bool = None,
- onnx_runtime: bool = True,
- trt: bool = False,
- var_dim: int = -1,
- func_torch: bool = True,
- auto_grad: bool = True,
Bases:
ModelMetaData
Fourier Neural Operators#
- class physicsnemo.models.fno.fno.FNO(*args, **kwargs)[source]#
Bases:
Module
Fourier neural operator (FNO) model.
Note
The FNO architecture supports options for 1D, 2D, 3D and 4D fields which can be controlled using the dimension parameter.
- Parameters:
in_channels (int) – Number of input channels
out_channels (int) – Number of output channels
decoder_layers (int, optional) – Number of decoder layers, by default 1
decoder_layer_size (int, optional) – Number of neurons in decoder layers, by default 32
decoder_activation_fn (str, optional) – Activation function for decoder, by default “silu”
dimension (int) – Model dimensionality (supports 1, 2, 3).
latent_channels (int, optional) – Latent features size in spectral convolutions, by default 32
num_fno_layers (int, optional) – Number of spectral convolutional layers, by default 4
num_fno_modes (Union[int, List[int]], optional) – Number of Fourier modes kept in spectral convolutions, by default 16
padding (int, optional) – Domain padding for spectral convolutions, by default 8
padding_type (str, optional) – Type of padding for spectral convolutions, by default “constant”
activation_fn (str, optional) – Activation function, by default “gelu”
coord_features (bool, optional) – Use coordinate grid as additional feature map, by default True
Example
>>> # define the 2d FNO model >>> model = physicsnemo.models.fno.FNO( ... in_channels=4, ... out_channels=3, ... decoder_layers=2, ... decoder_layer_size=32, ... dimension=2, ... latent_channels=32, ... num_fno_layers=2, ... padding=0, ... ) >>> input = torch.randn(32, 4, 32, 32) #(N, C, H, W) >>> output = model(input) >>> output.size() torch.Size([32, 3, 32, 32])
Note
Reference: Li, Zongyi, et al. “Fourier neural operator for parametric partial differential equations.” arXiv preprint arXiv:2010.08895 (2020).
- forward(x: Tensor) Tensor [source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class physicsnemo.models.fno.fno.FNO1DEncoder(
- in_channels: int = 1,
- num_fno_layers: int = 4,
- fno_layer_size: int = 32,
- num_fno_modes: int | List[int] = 16,
- padding: int | List[int] = 8,
- padding_type: str = 'constant',
- activation_fn: Module = GELU(approximate='none'),
- coord_features: bool = True,
Bases:
Module
1D Spectral encoder for FNO
- Parameters:
in_channels (int, optional) – Number of input channels, by default 1
num_fno_layers (int, optional) – Number of spectral convolutional layers, by default 4
fno_layer_size (int, optional) – Latent features size in spectral convolutions, by default 32
num_fno_modes (Union[int, List[int]], optional) – Number of Fourier modes kept in spectral convolutions, by default 16
padding (Union[int, List[int]], optional) – Domain padding for spectral convolutions, by default 8
padding_type (str, optional) – Type of padding for spectral convolutions, by default “constant”
activation_fn (nn.Module, optional) – Activation function, by default nn.GELU
coord_features (bool, optional) – Use coordinate grid as additional feature map, by default True
- build_fno(num_fno_modes: List[int]) None [source]#
construct FNO block. :param num_fno_modes: Number of Fourier modes kept in spectral convolutions :type num_fno_modes: List[int]
- forward(x: Tensor) Tensor [source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- grid_to_points(
- value: Tensor,
converting from grid based (image) to point based representation
- Parameters:
value (Meshgrid tensor)
- Returns:
Tensor, meshgrid shape
- Return type:
Tuple
- class physicsnemo.models.fno.fno.FNO2DEncoder(
- in_channels: int = 1,
- num_fno_layers: int = 4,
- fno_layer_size: int = 32,
- num_fno_modes: int | List[int] = 16,
- padding: int | List[int] = 8,
- padding_type: str = 'constant',
- activation_fn: Module = GELU(approximate='none'),
- coord_features: bool = True,
Bases:
Module
2D Spectral encoder for FNO
- Parameters:
in_channels (int, optional) – Number of input channels, by default 1
num_fno_layers (int, optional) – Number of spectral convolutional layers, by default 4
fno_layer_size (int, optional) – Latent features size in spectral convolutions, by default 32
num_fno_modes (Union[int, List[int]], optional) – Number of Fourier modes kept in spectral convolutions, by default 16
padding (Union[int, List[int]], optional) – Domain padding for spectral convolutions, by default 8
padding_type (str, optional) – Type of padding for spectral convolutions, by default “constant”
activation_fn (nn.Module, optional) – Activation function, by default nn.GELU
coord_features (bool, optional) – Use coordinate grid as additional feature map, by default True
- build_fno(num_fno_modes: List[int]) None [source]#
construct FNO block. :param num_fno_modes: Number of Fourier modes kept in spectral convolutions :type num_fno_modes: List[int]
- forward(x: Tensor) Tensor [source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- grid_to_points(
- value: Tensor,
converting from grid based (image) to point based representation
- Parameters:
value (Meshgrid tensor)
- Returns:
Tensor, meshgrid shape
- Return type:
Tuple
- class physicsnemo.models.fno.fno.FNO3DEncoder(
- in_channels: int = 1,
- num_fno_layers: int = 4,
- fno_layer_size: int = 32,
- num_fno_modes: int | List[int] = 16,
- padding: int | List[int] = 8,
- padding_type: str = 'constant',
- activation_fn: Module = GELU(approximate='none'),
- coord_features: bool = True,
Bases:
Module
3D Spectral encoder for FNO
- Parameters:
in_channels (int, optional) – Number of input channels, by default 1
num_fno_layers (int, optional) – Number of spectral convolutional layers, by default 4
fno_layer_size (int, optional) – Latent features size in spectral convolutions, by default 32
num_fno_modes (Union[int, List[int]], optional) – Number of Fourier modes kept in spectral convolutions, by default 16
padding (Union[int, List[int]], optional) – Domain padding for spectral convolutions, by default 8
padding_type (str, optional) – Type of padding for spectral convolutions, by default “constant”
activation_fn (nn.Module, optional) – Activation function, by default nn.GELU
coord_features (bool, optional) – Use coordinate grid as additional feature map, by default True
- build_fno(num_fno_modes: List[int]) None [source]#
construct FNO block. :param num_fno_modes: Number of Fourier modes kept in spectral convolutions :type num_fno_modes: List[int]
- forward(x: Tensor) Tensor [source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- grid_to_points(
- value: Tensor,
converting from grid based (image) to point based representation
- Parameters:
value (Meshgrid tensor)
- Returns:
Tensor, meshgrid shape
- Return type:
Tuple
- class physicsnemo.models.fno.fno.FNO4DEncoder(
- in_channels: int = 1,
- num_fno_layers: int = 4,
- fno_layer_size: int = 32,
- num_fno_modes: int | List[int] = 16,
- padding: int | List[int] = 8,
- padding_type: str = 'constant',
- activation_fn: Module = GELU(approximate='none'),
- coord_features: bool = True,
Bases:
Module
4D Spectral encoder for FNO
- Parameters:
in_channels (int, optional) – Number of input channels, by default 1
num_fno_layers (int, optional) – Number of spectral convolutional layers, by default 4
fno_layer_size (int, optional) – Latent features size in spectral convolutions, by default 32
num_fno_modes (Union[int, List[int]], optional) – Number of Fourier modes kept in spectral convolutions, by default 16
padding (Union[int, List[int]], optional) – Domain padding for spectral convolutions, by default 8
padding_type (str, optional) – Type of padding for spectral convolutions, by default “constant”
activation_fn (nn.Module, optional) – Activation function, by default nn.GELU
coord_features (bool, optional) – Use coordinate grid as additional feature map, by default True
- build_fno(num_fno_modes: List[int]) None [source]#
construct FNO block. :param num_fno_modes: Number of Fourier modes kept in spectral convolutions :type num_fno_modes: List[int]
- forward(x: Tensor) Tensor [source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- grid_to_points(
- value: Tensor,
converting from grid based (image) to point based representation
- Parameters:
value (Meshgrid tensor)
- Returns:
Tensor, meshgrid shape
- Return type:
Tuple
- class physicsnemo.models.fno.fno.MetaData(
- name: str = 'FourierNeuralOperator',
- jit: bool = True,
- cuda_graphs: bool = True,
- amp: bool = False,
- amp_cpu: bool = None,
- amp_gpu: bool = None,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = False,
- onnx_cpu: bool = False,
- onnx_runtime: bool = False,
- trt: bool = False,
- var_dim: int = 1,
- func_torch: bool = False,
- auto_grad: bool = False,
Bases:
ModelMetaData
- class physicsnemo.models.afno.afno.AFNO(*args, **kwargs)[source]#
Bases:
Module
Adaptive Fourier neural operator (AFNO) model.
Note
AFNO is a model that is designed for 2D images only.
- Parameters:
inp_shape (List[int]) – Input image dimensions [height, width]
in_channels (int) – Number of input channels
out_channels (int) – Number of output channels
patch_size (List[int], optional) – Size of image patches, by default [16, 16]
embed_dim (int, optional) – Embedded channel size, by default 256
depth (int, optional) – Number of AFNO layers, by default 4
mlp_ratio (float, optional) – Ratio of layer MLP latent variable size to input feature size, by default 4.0
drop_rate (float, optional) – Drop out rate in layer MLPs, by default 0.0
num_blocks (int, optional) – Number of blocks in the block-diag frequency weight matrices, by default 16
sparsity_threshold (float, optional) – Sparsity threshold (softshrink) of spectral features, by default 0.01
hard_thresholding_fraction (float, optional) – Threshold for limiting number of modes used [0,1], by default 1
Example
>>> model = physicsnemo.models.afno.AFNO( ... inp_shape=[32, 32], ... in_channels=2, ... out_channels=1, ... patch_size=(8, 8), ... embed_dim=16, ... depth=2, ... num_blocks=2, ... ) >>> input = torch.randn(32, 2, 32, 32) #(N, C, H, W) >>> output = model(input) >>> output.size() torch.Size([32, 1, 32, 32])
Note
Reference: Guibas, John, et al. “Adaptive fourier neural operators: Efficient token mixers for transformers.” arXiv preprint arXiv:2111.13587 (2021).
- forward(x: Tensor) Tensor [source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class physicsnemo.models.afno.afno.AFNO2DLayer(
- hidden_size: int,
- num_blocks: int = 8,
- sparsity_threshold: float = 0.01,
- hard_thresholding_fraction: float = 1,
- hidden_size_factor: int = 1,
Bases:
Module
AFNO spectral convolution layer
- Parameters:
hidden_size (int) – Feature dimensionality
num_blocks (int, optional) – Number of blocks used in the block diagonal weight matrix, by default 8
sparsity_threshold (float, optional) – Sparsity threshold (softshrink) of spectral features, by default 0.01
hard_thresholding_fraction (float, optional) – Threshold for limiting number of modes used [0,1], by default 1
hidden_size_factor (int, optional) – Factor to increase spectral features by after weight multiplication, by default 1
- forward(x: Tensor) Tensor [source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class physicsnemo.models.afno.afno.AFNOMlp(
- in_features: int,
- latent_features: int,
- out_features: int,
- activation_fn: Module = GELU(approximate='none'),
- drop: float = 0.0,
Bases:
Module
Fully-connected Multi-layer perception used inside AFNO
- Parameters:
in_features (int) – Input feature size
latent_features (int) – Latent feature size
out_features (int) – Output feature size
activation_fn (nn.Module, optional) – Activation function, by default nn.GELU
drop (float, optional) – Drop out rate, by default 0.0
- forward(x: Tensor) Tensor [source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class physicsnemo.models.afno.afno.Block(
- embed_dim: int,
- num_blocks: int = 8,
- mlp_ratio: float = 4.0,
- drop: float = 0.0,
- activation_fn: ~torch.nn.modules.module.Module = GELU(approximate='none'),
- norm_layer: ~torch.nn.modules.module.Module = <class 'torch.nn.modules.normalization.LayerNorm'>,
- double_skip: bool = True,
- sparsity_threshold: float = 0.01,
- hard_thresholding_fraction: float = 1.0,
Bases:
Module
AFNO block, spectral convolution and MLP
- Parameters:
embed_dim (int) – Embedded feature dimensionality
num_blocks (int, optional) – Number of blocks used in the block diagonal weight matrix, by default 8
mlp_ratio (float, optional) – Ratio of MLP latent variable size to input feature size, by default 4.0
drop (float, optional) – Drop out rate in MLP, by default 0.0
activation_fn (nn.Module, optional) – Activation function used in MLP, by default nn.GELU
norm_layer (nn.Module, optional) – Normalization function, by default nn.LayerNorm
double_skip (bool, optional) – Residual, by default True
sparsity_threshold (float, optional) – Sparsity threshold (softshrink) of spectral features, by default 0.01
hard_thresholding_fraction (float, optional) – Threshold for limiting number of modes used [0,1], by default 1
- forward(x: Tensor) Tensor [source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class physicsnemo.models.afno.afno.MetaData(
- name: str = 'AFNO',
- jit: bool = False,
- cuda_graphs: bool = True,
- amp: bool = True,
- amp_cpu: bool = None,
- amp_gpu: bool = None,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = True,
- onnx_cpu: bool = False,
- onnx_runtime: bool = True,
- trt: bool = False,
- var_dim: int = 1,
- func_torch: bool = False,
- auto_grad: bool = False,
Bases:
ModelMetaData
- class physicsnemo.models.afno.afno.PatchEmbed(
- inp_shape: List[int],
- in_channels: int,
- patch_size: List[int] = [16, 16],
- embed_dim: int = 256,
Bases:
Module
Patch embedding layer
Converts 2D patch into a 1D vector for input to AFNO
- Parameters:
inp_shape (List[int]) – Input image dimensions [height, width]
in_channels (int) – Number of input channels
patch_size (List[int], optional) – Size of image patches, by default [16, 16]
embed_dim (int, optional) – Embedded channel size, by default 256
- forward(x: Tensor) Tensor [source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class physicsnemo.models.afno.modafno.Block(
- embed_dim: int,
- mod_dim: int,
- num_blocks: int = 8,
- mlp_ratio: float = 4.0,
- drop: float = 0.0,
- activation_fn: ~torch.nn.modules.module.Module = GELU(approximate='none'),
- norm_layer: ~torch.nn.modules.module.Module = <class 'torch.nn.modules.normalization.LayerNorm'>,
- double_skip: bool = True,
- sparsity_threshold: float = 0.01,
- hard_thresholding_fraction: float = 1.0,
- modulate_filter: bool = True,
- modulate_mlp: bool = True,
- scale_shift_mode: ~typing.Literal['complex',
- 'real'] = 'real',
Bases:
Module
AFNO block, spectral convolution and MLP
- Parameters:
embed_dim (int) – Embedded feature dimensionality
mod_dim (int) – Modululation input dimensionality
num_blocks (int, optional) – Number of blocks used in the block diagonal weight matrix, by default 8
mlp_ratio (float, optional) – Ratio of MLP latent variable size to input feature size, by default 4.0
drop (float, optional) – Drop out rate in MLP, by default 0.0
activation_fn (nn.Module, optional) – Activation function used in MLP, by default nn.GELU
norm_layer (nn.Module, optional) – Normalization function, by default nn.LayerNorm
double_skip (bool, optional) – Residual, by default True
sparsity_threshold (float, optional) – Sparsity threshold (softshrink) of spectral features, by default 0.01
hard_thresholding_fraction (float, optional) – Threshold for limiting number of modes used [0,1], by default 1
modulate_filter (bool, optional) – Whether to compute the modulation for the FFT filter
modulate_mlp (bool, optional) – Whether to compute the modulation for the MLP
scale_shift_mode (["complex", "real"]) – If ‘complex’ (default), compute the scale-shift operation using complex operations. If ‘real’, use real operations.
- forward(
- x: Tensor,
- mod_embed: Tensor,
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class physicsnemo.models.afno.modafno.MetaData(
- name: str = 'ModAFNO',
- jit: bool = False,
- cuda_graphs: bool = True,
- amp: bool = True,
- amp_cpu: bool = None,
- amp_gpu: bool = None,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = True,
- onnx_cpu: bool = False,
- onnx_runtime: bool = True,
- trt: bool = False,
- var_dim: int = 1,
- func_torch: bool = False,
- auto_grad: bool = False,
Bases:
ModelMetaData
- class physicsnemo.models.afno.modafno.ModAFNO(*args, **kwargs)[source]#
Bases:
Module
Modulated Adaptive Fourier neural operator (ModAFNO) model.
- Parameters:
inp_shape (List[int]) – Input image dimensions [height, width]
in_channels (int, optional) – Number of input channels
out_channels (int, optional) – Number of output channels
embed_model (dict, optional) – Dictionary of arguments to pass to the ModEmbedNet embedding model
patch_size (List[int], optional) – Size of image patches, by default [16, 16]
embed_dim (int, optional) – Embedded channel size, by default 256
mod_dim (int) – Modululation input dimensionality
modulate_filter (bool, optional) – Whether to compute the modulation for the FFT filter, by default True
modulate_mlp (bool, optional) – Whether to compute the modulation for the MLP, by default True
scale_shift_mode (["complex", "real"]) – If ‘complex’ (default), compute the scale-shift operation using complex operations. If ‘real’, use real operations.
depth (int, optional) – Number of AFNO layers, by default 4
mlp_ratio (float, optional) – Ratio of layer MLP latent variable size to input feature size, by default 4.0
drop_rate (float, optional) – Drop out rate in layer MLPs, by default 0.0
num_blocks (int, optional) – Number of blocks in the block-diag frequency weight matrices, by default 16
sparsity_threshold (float, optional) – Sparsity threshold (softshrink) of spectral features, by default 0.01
hard_thresholding_fraction (float, optional) – Threshold for limiting number of modes used [0,1], by default 1
below. (The default settings correspond to the implementation in the paper cited)
Example
>>> import torch >>> from physicsnemo.models.afno import ModAFNO >>> model = ModAFNO( ... inp_shape=[32, 32], ... in_channels=2, ... out_channels=1, ... patch_size=(8, 8), ... embed_dim=16, ... depth=2, ... num_blocks=2, ... ) >>> input = torch.randn(32, 2, 32, 32) #(N, C, H, W) >>> time = torch.full((32, 1), 0.5) >>> output = model(input, time) >>> output.size() torch.Size([32, 1, 32, 32])
Note
Reference: Leinonen et al. “Modulated Adaptive Fourier Neural Operators for Temporal Interpolation of Weather Forecasts.” arXiv preprint arXiv:TODO (2024).
- class physicsnemo.models.afno.modafno.ModAFNO2DLayer(
- hidden_size: int,
- mod_features: int,
- num_blocks: int = 8,
- sparsity_threshold: float = 0.01,
- hard_thresholding_fraction: float = 1,
- hidden_size_factor: int = 1,
- scale_shift_kwargs: dict | None = None,
- scale_shift_mode: Literal['complex', 'real'] = 'complex',
Bases:
AFNO2DLayer
AFNO spectral convolution layer
- Parameters:
hidden_size (int) – Feature dimensionality
mod_features (int) – Number of modulation features
num_blocks (int, optional) – Number of blocks used in the block diagonal weight matrix, by default 8
sparsity_threshold (float, optional) – Sparsity threshold (softshrink) of spectral features, by default 0.01
hard_thresholding_fraction (float, optional) – Threshold for limiting number of modes used [0,1], by default 1
hidden_size_factor (int, optional) – Factor to increase spectral features by after weight multiplication, by default 1
scale_shift_kwargs (dict, optional) – Options to the MLP that computes the scale-shift parameters
scale_shift_mode (["complex", "real"]) – If ‘complex’ (default), compute the scale-shift operation using complex operations. If ‘real’, use real operations.
- forward(
- x: Tensor,
- mod_embed: Tensor,
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class physicsnemo.models.afno.modafno.ModAFNOMlp(
- in_features: int,
- latent_features: int,
- out_features: int,
- mod_features: int,
- activation_fn: Module = GELU(approximate='none'),
- drop: float = 0.0,
- scale_shift_kwargs: dict | None = None,
Bases:
AFNOMlp
Modulated MLP used inside ModAFNO
- Parameters:
in_features (int) – Input feature size
latent_features (int) – Latent feature size
out_features (int) – Output feature size
activation_fn (nn.Module, optional) – Activation function, by default nn.GELU
drop (float, optional) – Drop out rate, by default 0.0
scale_shift_kwargs (dict, optional) – Options to the MLP that computes the scale-shift parameters
- forward(
- x: Tensor,
- mod_embed: Tensor,
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class physicsnemo.models.afno.modafno.ScaleShiftMlp(
- in_features: int,
- out_features: int,
- hidden_features: int | None = None,
- hidden_layers: int = 0,
- activation_fn: ~typing.Type[~torch.nn.modules.module.Module] = <class 'torch.nn.modules.activation.GELU'>,
Bases:
Module
MLP used to compute the scale and shift parameters of the ModAFNO block
- Parameters:
in_features (int) – Input feature size
out_features (int) – Output feature size
hidden_features (int, optional) – Hidden feature size, defaults to 2 * out_features
hidden_layers (int, optional) – Number of hidden layers, defaults to 0
activation_fn (nn.Module, optional) – Activation function, by default nn.GELU
- forward(x: Tensor)[source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
Graph Neural Networks#
- class physicsnemo.models.meshgraphnet.meshgraphnet.MeshGraphNet(*args, **kwargs)[source]#
Bases:
Module
MeshGraphNet network architecture
- Parameters:
input_dim_nodes (int) – Number of node features
input_dim_edges (int) – Number of edge features
output_dim (int) – Number of outputs
processor_size (int, optional) – Number of message passing blocks, by default 15
mlp_activation_fn (Union[str, List[str]], optional) – Activation function to use, by default ‘relu’
num_layers_node_processor (int, optional) – Number of MLP layers for processing nodes in each message passing block, by default 2
num_layers_edge_processor (int, optional) – Number of MLP layers for processing edge features in each message passing block, by default 2
hidden_dim_processor (int, optional) – Hidden layer size for the message passing blocks, by default 128
hidden_dim_node_encoder (int, optional) – Hidden layer size for the node feature encoder, by default 128
num_layers_node_encoder (Union[int, None], optional) – Number of MLP layers for the node feature encoder, by default 2. If None is provided, the MLP will collapse to a Identity function, i.e. no node encoder
hidden_dim_edge_encoder (int, optional) – Hidden layer size for the edge feature encoder, by default 128
num_layers_edge_encoder (Union[int, None], optional) – Number of MLP layers for the edge feature encoder, by default 2. If None is provided, the MLP will collapse to a Identity function, i.e. no edge encoder
hidden_dim_node_decoder (int, optional) – Hidden layer size for the node feature decoder, by default 128
num_layers_node_decoder (Union[int, None], optional) – Number of MLP layers for the node feature decoder, by default 2. If None is provided, the MLP will collapse to a Identity function, i.e. no decoder
aggregation (str, optional) – Message aggregation type, by default “sum”
do_conat_trick (: bool, default=False) – Whether to replace concat+MLP with MLP+idx+sum
num_processor_checkpoint_segments (int, optional) – Number of processor segments for gradient checkpointing, by default 0 (checkpointing disabled)
checkpoint_offloading (bool, optional) – Whether to offload the checkpointing to the CPU, by default False
Example
>>> # `norm_type` in MeshGraphNet is deprecated, >>> # TE will be automatically used if possible unless told otherwise. >>> # (You don't have to set this varialbe, it's faster to use TE!) >>> # Example of how to disable: >>> import os >>> os.environ['PHYSICSNEMO_FORCE_TE'] = 'False' >>> >>> model = physicsnemo.models.meshgraphnet.MeshGraphNet( ... input_dim_nodes=4, ... input_dim_edges=3, ... output_dim=2, ... ) >>> graph = dgl.rand_graph(10, 5) >>> node_features = torch.randn(10, 4) >>> edge_features = torch.randn(5, 3) >>> output = model(node_features, edge_features, graph) >>> output.size() torch.Size([10, 2])
Note
Reference: Pfaff, Tobias, et al. “Learning mesh-based simulation with graph networks.” arXiv preprint arXiv:2010.03409 (2020).
- forward(
- node_features: Tensor,
- edge_features: Tensor,
- graph: physicsnemo.models.gnn_layers.utils.GraphType,
- **kwargs,
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class physicsnemo.models.meshgraphnet.meshgraphnet.MeshGraphNetProcessor(
- processor_size: int = 15,
- input_dim_node: int = 128,
- input_dim_edge: int = 128,
- num_layers_node: int = 2,
- num_layers_edge: int = 2,
- aggregation: str = 'sum',
- norm_type: str = 'LayerNorm',
- activation_fn: Module = ReLU(),
- do_concat_trick: bool = False,
- num_processor_checkpoint_segments: int = 0,
- checkpoint_offloading: bool = False,
Bases:
Module
MeshGraphNet processor block
- forward(
- node_features: Tensor,
- edge_features: Tensor,
- graph: physicsnemo.models.gnn_layers.utils.GraphType,
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- run_function(
- segment_start: int,
- segment_end: int,
Custom forward for gradient checkpointing
- Parameters:
segment_start (int) – Layer index as start of the segment
segment_end (int) – Layer index as end of the segment
- Returns:
Custom forward function
- Return type:
Callable
- class physicsnemo.models.meshgraphnet.meshgraphnet.MetaData(
- name: str = 'MeshGraphNet',
- jit: bool = False,
- cuda_graphs: bool = False,
- amp: bool = False,
- amp_cpu: bool = False,
- amp_gpu: bool = True,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = None,
- onnx_cpu: bool = None,
- onnx_runtime: bool = False,
- trt: bool = False,
- var_dim: int = -1,
- func_torch: bool = True,
- auto_grad: bool = True,
Bases:
ModelMetaData
- class physicsnemo.models.mesh_reduced.mesh_reduced.Mesh_Reduced(
- input_dim_nodes: int,
- input_dim_edges: int,
- output_decode_dim: int,
- output_encode_dim: int = 3,
- processor_size: int = 15,
- num_layers_node_processor: int = 2,
- num_layers_edge_processor: int = 2,
- hidden_dim_processor: int = 128,
- hidden_dim_node_encoder: int = 128,
- num_layers_node_encoder: int = 2,
- hidden_dim_edge_encoder: int = 128,
- num_layers_edge_encoder: int = 2,
- hidden_dim_node_decoder: int = 128,
- num_layers_node_decoder: int = 2,
- k: int = 3,
- aggregation: str = 'mean',
Bases:
Module
PbGMR-GMUS architecture.
A mesh-reduced architecture that combines encoding and decoding processors for physics prediction in reduced mesh space.
- Parameters:
input_dim_nodes (int) – Number of node features.
input_dim_edges (int) – Number of edge features.
output_decode_dim (int) – Number of decoding outputs (per node).
output_encode_dim (int, optional) – Number of encoding outputs (per pivotal position), by default 3.
processor_size (int, optional) – Number of message passing blocks, by default 15.
num_layers_node_processor (int, optional) – Number of MLP layers for processing nodes in each message passing block, by default 2.
num_layers_edge_processor (int, optional) – Number of MLP layers for processing edge features in each message passing block, by default 2.
hidden_dim_processor (int, optional) – Hidden layer size for the message passing blocks, by default 128.
hidden_dim_node_encoder (int, optional) – Hidden layer size for the node feature encoder, by default 128.
num_layers_node_encoder (int, optional) – Number of MLP layers for the node feature encoder, by default 2.
hidden_dim_edge_encoder (int, optional) – Hidden layer size for the edge feature encoder, by default 128.
num_layers_edge_encoder (int, optional) – Number of MLP layers for the edge feature encoder, by default 2.
hidden_dim_node_decoder (int, optional) – Hidden layer size for the node feature decoder, by default 128.
num_layers_node_decoder (int, optional) – Number of MLP layers for the node feature decoder, by default 2.
k (int, optional) – Number of nodes considered for per pivotal position, by default 3.
aggregation (str, optional) – Message aggregation type, by default “mean”.
Notes
Reference: Han, Xu, et al. “Predicting physics in mesh-reduced space with temporal attention.” arXiv preprint arXiv:2201.09113 (2022).
- decode(
- x,
- edge_features,
- graph,
- position_mesh,
- position_pivotal,
Decode pivotal features back to mesh space.
- Parameters:
x (torch.Tensor) – Input features in pivotal space.
edge_features (torch.Tensor) – Edge features.
graph (Union[DGLGraph, pyg.data.Data]) – Input graph.
position_mesh (torch.Tensor) – Mesh positions.
position_pivotal (torch.Tensor) – Pivotal positions.
- Returns:
Decoded features in mesh space.
- Return type:
torch.Tensor
- encode(
- x,
- edge_features,
- graph,
- position_mesh,
- position_pivotal,
Encode mesh features to pivotal space.
- Parameters:
x (torch.Tensor) – Input node features.
edge_features (torch.Tensor) – Edge features.
graph (Union[DGLGraph, pyg.data.Data]) – Input graph.
position_mesh (torch.Tensor) – Mesh positions.
position_pivotal (torch.Tensor) – Pivotal positions.
- Returns:
Encoded features in pivotal space.
- Return type:
torch.Tensor
- knn_interpolate(
- x: Tensor,
- pos_x: Tensor,
- pos_y: Tensor,
- batch_x: Tensor = None,
- batch_y: Tensor = None,
- k: int = 3,
- num_workers: int = 1,
Perform k-nearest neighbor interpolation.
- Parameters:
x (torch.Tensor) – Input features to interpolate.
pos_x (torch.Tensor) – Source positions.
pos_y (torch.Tensor) – Target positions.
batch_x (torch.Tensor, optional) – Batch indices for source positions, by default None.
batch_y (torch.Tensor, optional) – Batch indices for target positions, by default None.
k (int, optional) – Number of nearest neighbors to consider, by default 3.
num_workers (int, optional) – Number of workers for parallel processing, by default 1.
- Returns:
torch.Tensor – Interpolated features.
torch.Tensor – Source indices.
torch.Tensor – Target indices.
torch.Tensor – Interpolation weights.
- class physicsnemo.models.meshgraphnet.bsms_mgn.BiStrideMeshGraphNet(*args, **kwargs)[source]#
Bases:
MeshGraphNet
Bi-stride MeshGraphNet network architecture
- Parameters:
input_dim_nodes (int) – Number of node features
input_dim_edges (int) – Number of edge features
output_dim (int) – Number of outputs
processor_size (int, optional) – Number of message passing blocks, by default 15
mlp_activation_fn (Union[str, List[str]], optional) – Activation function to use, by default ‘relu’
num_layers_node_processor (int, optional) – Number of MLP layers for processing nodes in each message passing block, by default 2
num_layers_edge_processor (int, optional) – Number of MLP layers for processing edge features in each message passing block, by default 2
hidden_dim_processor (int, optional) – Hidden layer size for the message passing blocks, by default 128
hidden_dim_node_encoder (int, optional) – Hidden layer size for the node feature encoder, by default 128
num_layers_node_encoder (Union[int, None], optional) – Number of MLP layers for the node feature encoder, by default 2. If None is provided, the MLP will collapse to a Identity function, i.e. no node encoder
hidden_dim_edge_encoder (int, optional) – Hidden layer size for the edge feature encoder, by default 128
num_layers_edge_encoder (Union[int, None], optional) – Number of MLP layers for the edge feature encoder, by default 2. If None is provided, the MLP will collapse to a Identity function, i.e. no edge encoder
hidden_dim_node_decoder (int, optional) – Hidden layer size for the node feature decoder, by default 128
num_layers_node_decoder (Union[int, None], optional) – Number of MLP layers for the node feature decoder, by default 2. If None is provided, the MLP will collapse to a Identity function, i.e. no decoder
aggregation (str, optional) – Message aggregation type, by default “sum”
do_conat_trick (: bool, default=False) – Whether to replace concat+MLP with MLP+idx+sum
num_processor_checkpoint_segments (int, optional) – Number of processor segments for gradient checkpointing, by default 0 (checkpointing disabled). The number of segments should be a factor of 2 * processor_size, for example, if processor_size is 15, then num_processor_checkpoint_segments can be 10 since it’s a factor of 15 * 2 = 30. It is recommended to start with a smaller number of segments until the model fits into memory since each segment will affect model training speed.
- forward(
- node_features: Tensor,
- edge_features: Tensor,
- graph: dgl.DGLGraph,
- ms_edges: Iterable[Tensor] = (),
- ms_ids: Iterable[Tensor] = (),
- **kwargs,
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class physicsnemo.models.meshgraphnet.bsms_mgn.MetaData(
- name: str = 'BiStrideMeshGraphNet',
- jit: bool = False,
- cuda_graphs: bool = False,
- amp: bool = False,
- amp_cpu: bool = False,
- amp_gpu: bool = True,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = None,
- onnx_cpu: bool = None,
- onnx_runtime: bool = False,
- trt: bool = False,
- var_dim: int = -1,
- func_torch: bool = True,
- auto_grad: bool = True,
Bases:
ModelMetaData
Convolutional Networks#
- class physicsnemo.models.pix2pix.pix2pix.MetaData(
- name: str = 'Pix2Pix',
- jit: bool = True,
- cuda_graphs: bool = True,
- amp: bool = False,
- amp_cpu: bool = False,
- amp_gpu: bool = True,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = True,
- onnx_gpu: bool = None,
- onnx_cpu: bool = None,
- onnx_runtime: bool = False,
- trt: bool = False,
- var_dim: int = 1,
- func_torch: bool = True,
- auto_grad: bool = True,
Bases:
ModelMetaData
- class physicsnemo.models.pix2pix.pix2pix.Pix2Pix(*args, **kwargs)[source]#
Bases:
Module
Convolutional encoder-decoder based on pix2pix generator models.
Note
The pix2pix architecture supports options for 1D, 2D and 3D fields which can be constroled using the dimension parameter.
- Parameters:
in_channels (int) – Number of input channels
out_channels (Union[int, Any], optional) – Number of output channels
dimension (int) – Model dimensionality (supports 1, 2, 3).
conv_layer_size (int, optional) – Latent channel size after first convolution, by default 64
n_downsampling (int, optional) – Number of downsampling blocks, by default 3
n_upsampling (int, optional) – Number of upsampling blocks, by default 3
n_blocks (int, optional) – Number of residual blocks in middle of model, by default 3
activation_fn (Any, optional) – Activation function, by default “relu”
batch_norm (bool, optional) – Batch normalization, by default False
padding_type (str, optional) – Padding type (‘reflect’, ‘replicate’ or ‘zero’), by default “reflect”
Example
>>> #2D convolutional encoder decoder >>> model = physicsnemo.models.pix2pix.Pix2Pix( ... in_channels=1, ... out_channels=2, ... dimension=2, ... conv_layer_size=4) >>> input = torch.randn(4, 1, 32, 32) #(N, C, H, W) >>> output = model(input) >>> output.size() torch.Size([4, 2, 32, 32])
Note
Reference: Isola, Phillip, et al. “Image-To-Image translation with conditional adversarial networks” Conference on Computer Vision and Pattern Recognition, 2017. https://arxiv.org/abs/1611.07004
Reference: Wang, Ting-Chun, et al. “High-Resolution image synthesis and semantic manipulation with conditional GANs” Conference on Computer Vision and Pattern Recognition, 2018. https://arxiv.org/abs/1711.11585
Note
Based on the implementation: NVIDIA/pix2pixHD
- forward(input: Tensor) Tensor [source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class physicsnemo.models.pix2pix.pix2pix.ResnetBlock(
- dimension: int,
- channels: int,
- padding_type: str = 'reflect',
- activation: Module = ReLU(),
- use_batch_norm: bool = False,
- use_dropout: bool = False,
Bases:
Module
A simple ResNet block
- Parameters:
dimension (int) – Model dimensionality (supports 1, 2, 3).
channels (int) – Number of feature channels
padding_type (str, optional) – Padding type (‘reflect’, ‘replicate’ or ‘zero’), by default “reflect”
activation (nn.Module, optional) – Activation function, by default nn.ReLU()
use_batch_norm (bool, optional) – Batch normalization, by default False
- forward(x: Tensor) Tensor [source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class physicsnemo.models.srrn.super_res_net.ConvolutionalBlock3d(
- in_channels: int,
- out_channels: int,
- kernel_size: int,
- stride: int = 1,
- batch_norm: bool = False,
- activation_fn: Module = Identity(),
Bases:
Module
3D convolutional block
- Parameters:
in_channels (int) – Input channels
out_channels (int) – Output channels
kernel_size (int) – Kernel size
stride (int, optional) – Convolutional stride, by default 1
batch_norm (bool, optional) – Use batchnorm, by default False
- forward(input: Tensor) Tensor [source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class physicsnemo.models.srrn.super_res_net.MetaData(
- name: str = 'SuperResolution',
- jit: bool = True,
- cuda_graphs: bool = False,
- amp: bool = False,
- amp_cpu: bool = False,
- amp_gpu: bool = False,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = True,
- onnx_gpu: bool = None,
- onnx_cpu: bool = None,
- onnx_runtime: bool = False,
- trt: bool = False,
- var_dim: int = 1,
- func_torch: bool = True,
- auto_grad: bool = True,
Bases:
ModelMetaData
- class physicsnemo.models.srrn.super_res_net.PixelShuffle3d(scale: int)[source]#
Bases:
Module
3D pixel-shuffle operation
- Parameters:
scale (int) – Factor to downscale channel count by
Note
- forward(input: Tensor) Tensor [source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class physicsnemo.models.srrn.super_res_net.ResidualConvBlock3d(
- n_layers: int = 1,
- kernel_size: int = 3,
- conv_layer_size: int = 64,
- activation_fn: Module = Identity(),
Bases:
Module
3D ResNet block
- Parameters:
n_layers (int, optional) – Number of convolutional layers, by default 1
kernel_size (int, optional) – Kernel size, by default 3
conv_layer_size (int, optional) – Latent channel size, by default 64
activation_fn (nn.Module, optional) – Activation function, by default nn.Identity()
- forward(input: Tensor) Tensor [source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class physicsnemo.models.srrn.super_res_net.SRResNet(*args, **kwargs)[source]#
Bases:
Module
3D convolutional super-resolution network
- Parameters:
in_channels (int) – Number of input channels
out_channels (int) – Number of outout channels
large_kernel_size (int, optional) – convolutional kernel size for first and last convolution, by default 7
small_kernel_size (int, optional) – convolutional kernel size for internal convolutions, by default 3
conv_layer_size (int, optional) – Latent channel size, by default 32
n_resid_blocks (int, optional) – Number of residual blocks before , by default 8
scaling_factor (int, optional) – Scaling factor to increase the output feature size compared to the input (2, 4, or 8), by default 8
activation_fn (Any, optional) – Activation function, by default “prelu”
Example
>>> #3D convolutional encoder decoder >>> model = physicsnemo.models.srrn.SRResNet( ... in_channels=1, ... out_channels=2, ... conv_layer_size=4, ... scaling_factor=2) >>> input = torch.randn(4, 1, 8, 8, 8) #(N, C, D, H, W) >>> output = model(input) >>> output.size() torch.Size([4, 2, 16, 16, 16])
Note
Based on the implementation: sgrvinod/a-PyTorch-Tutorial-to-Super-Resolution
- forward(in_vars: Tensor) Tensor [source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class physicsnemo.models.srrn.super_res_net.SubPixel_ConvolutionalBlock3d(
- kernel_size: int = 3,
- conv_layer_size: int = 64,
- scaling_factor: int = 2,
Bases:
Module
Convolutional block with Pixel Shuffle operation
- Parameters:
kernel_size (int, optional) – Kernel size, by default 3
conv_layer_size (int, optional) – Latent channel size, by default 64
scaling_factor (int, optional) – Pixel shuffle scaling factor, by default 2
- forward(
- input: Tensor,
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
Recurrent Neural Networks#
- class physicsnemo.models.rnn.rnn_one2many.MetaData(
- name: str = 'One2ManyRNN',
- jit: bool = False,
- cuda_graphs: bool = False,
- amp: bool = True,
- amp_cpu: bool = None,
- amp_gpu: bool = None,
- torch_fx: bool = True,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = None,
- onnx_cpu: bool = None,
- onnx_runtime: bool = False,
- trt: bool = False,
- var_dim: int = -1,
- func_torch: bool = False,
- auto_grad: bool = False,
Bases:
ModelMetaData
- class physicsnemo.models.rnn.rnn_one2many.One2ManyRNN(*args, **kwargs)[source]#
Bases:
Module
A RNN model with encoder/decoder for 2d/3d problems that provides predictions based on single initial condition.
- Parameters:
input_channels (int) – Number of channels in the input
dimension (int, optional) – Spatial dimension of the input. Only 2d and 3d are supported, by default 2
nr_latent_channels (int, optional) – Channels for encoding/decoding, by default 512
nr_residual_blocks (int, optional) – Number of residual blocks, by default 2
activation_fn (str, optional) – Activation function to use, by default “relu”
nr_downsamples (int, optional) – Number of downsamples, by default 2
nr_tsteps (int, optional) – Time steps to predict, by default 32
Example
>>> model = physicsnemo.models.rnn.One2ManyRNN( ... input_channels=6, ... dimension=2, ... nr_latent_channels=32, ... activation_fn="relu", ... nr_downsamples=2, ... nr_tsteps=16, ... ) >>> input = invar = torch.randn(4, 6, 1, 16, 16) # [N, C, T, H, W] >>> output = model(input) >>> output.size() torch.Size([4, 6, 16, 16, 16])
- forward(x: Tensor) Tensor [source]#
Forward pass
- Parameters:
x (Tensor) – Expects a tensor of size [N, C, 1, H, W] for 2D or [N, C, 1, D, H, W] for 3D Where, N is the batch size, C is the number of channels, 1 is the number of input timesteps and D, H, W are spatial dimensions.
- Returns:
Size [N, C, T, H, W] for 2D or [N, C, T, D, H, W] for 3D. Where, T is the number of timesteps being predicted.
- Return type:
Tensor
- class physicsnemo.models.rnn.rnn_seq2seq.MetaData(
- name: str = 'Seq2SeqRNN',
- jit: bool = False,
- cuda_graphs: bool = False,
- amp: bool = True,
- amp_cpu: bool = None,
- amp_gpu: bool = None,
- torch_fx: bool = True,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = None,
- onnx_cpu: bool = None,
- onnx_runtime: bool = False,
- trt: bool = False,
- var_dim: int = -1,
- func_torch: bool = False,
- auto_grad: bool = False,
Bases:
ModelMetaData
- class physicsnemo.models.rnn.rnn_seq2seq.Seq2SeqRNN(*args, **kwargs)[source]#
Bases:
Module
A RNN model with encoder/decoder for 2d/3d problems. Given input 0 to t-1, predicts signal t to t + nr_tsteps
- Parameters:
input_channels (int) – Number of channels in the input
dimension (int, optional) – Spatial dimension of the input. Only 2d and 3d are supported, by default 2
nr_latent_channels (int, optional) – Channels for encoding/decoding, by default 512
nr_residual_blocks (int, optional) – Number of residual blocks, by default 2
activation_fn (str, optional) – Activation function to use, by default “relu”
nr_downsamples (int, optional) – Number of downsamples, by default 2
nr_tsteps (int, optional) – Time steps to predict, by default 32
Example
>>> model = physicsnemo.models.rnn.Seq2SeqRNN( ... input_channels=6, ... dimension=2, ... nr_latent_channels=32, ... activation_fn="relu", ... nr_downsamples=2, ... nr_tsteps=16, ... ) >>> input = invar = torch.randn(4, 6, 16, 16, 16) # [N, C, T, H, W] >>> output = model(input) >>> output.size() torch.Size([4, 6, 16, 16, 16])
- forward(x: Tensor) Tensor [source]#
Forward pass
- Parameters:
x (Tensor) – Expects a tensor of size [N, C, T, H, W] for 2D or [N, C, T, D, H, W] for 3D Where, N is the batch size, C is the number of channels, T is the number of input timesteps and D, H, W are spatial dimensions. Currently, this requires input time steps to be same as predicted time steps.
- Returns:
Size [N, C, T, H, W] for 2D or [N, C, T, D, H, W] for 3D. Where, T is the number of timesteps being predicted.
- Return type:
Tensor
Operator Models#
This code contains the DoMINO model architecture. The DoMINO class contains an architecture to model both surface and volume quantities together as well as separately (controlled using the config.yaml file)
- class physicsnemo.models.domino.model.AggregationModel(
- input_features: int,
- output_features: int,
- model_parameters=None,
- new_change: bool = True,
Bases:
Module
Neural network module to aggregate local geometry encoding with basis functions.
This module combines basis function representations with geometry encodings to predict the final output quantities. It serves as the final prediction layer that integrates all available information sources.
- forward(x: Tensor) Tensor [source]#
Process the combined input features to predict output quantities.
This method applies a series of fully connected layers to the input, which typically contains a combination of basis functions, geometry encodings, and potentially parameter encodings.
- Parameters:
x – Input tensor containing combined features
- Returns:
Tensor containing predicted output quantities
- class physicsnemo.models.domino.model.BQWarp(
- grid_resolution=None,
- radius: float = 0.25,
- neighbors_in_radius: int = 10,
Bases:
Module
Warp-based ball-query layer for finding neighboring points within a specified radius.
This layer uses an accelerated ball query implementation to efficiently find points within a specified radius of query points.
- forward(
- x: Tensor,
- p_grid: Tensor,
- reverse_mapping: bool = True,
Performs ball query operation to find neighboring points and their features.
This method uses the Warp-accelerated ball query implementation to find points within a specified radius. It can operate in two modes: - Forward mapping: Find points from x that are near p_grid points (reverse_mapping=False) - Reverse mapping: Find points from p_grid that are near x points (reverse_mapping=True)
- Parameters:
x – Tensor of shape (batch_size, num_points, 3+features) containing point coordinates and their features
p_grid – Tensor of shape (batch_size, grid_x, grid_y, grid_z, 3) containing grid point coordinates
reverse_mapping – Boolean flag to control the direction of the mapping: - True: Find p_grid points near x points - False: Find x points near p_grid points
- Returns:
mapping: Tensor containing indices of neighboring points
outputs: Tensor containing coordinates of the neighboring points
- Return type:
tuple containing
- class physicsnemo.models.domino.model.DoMINO(
- input_features: int,
- output_features_vol: int | None = None,
- output_features_surf: int | None = None,
- global_features: int = 2,
- model_parameters=None,
Bases:
Module
DoMINO model architecture for predicting both surface and volume quantities.
The DoMINO (Deep Operational Modal Identification and Nonlinear Optimization) model is designed to model both surface and volume physical quantities in aerodynamic simulations. It can operate in three modes: 1. Surface-only: Predicting only surface quantities 2. Volume-only: Predicting only volume quantities 3. Combined: Predicting both surface and volume quantities
The model uses a combination of: - Geometry representation modules - Neural network basis functions - Parameter encoding - Local and global geometry processing - Aggregation models for final prediction
- Parameters:
input_features (int) – Number of point input features
output_features_vol (int, optional) – Number of output features in volume
output_features_surf (int, optional) – Number of output features on surface
model_parameters – Model parameters controlled by config.yaml
Example
>>> from physicsnemo.models.domino.model import DoMINO >>> import torch, os >>> from hydra import compose, initialize >>> from omegaconf import OmegaConf >>> device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") >>> cfg = OmegaConf.register_new_resolver("eval", eval) >>> with initialize(version_base="1.3", config_path="examples/cfd/external_aerodynamics/domino/src/conf"): ... cfg = compose(config_name="config") >>> cfg.model.model_type = "combined" >>> model = DoMINO( ... input_features=3, ... output_features_vol=5, ... output_features_surf=4, ... model_parameters=cfg.model ... ).to(device)
Warp … >>> bsize = 1 >>> nx, ny, nz = cfg.model.interp_res >>> num_neigh = 7 >>> global_features = 2 >>> pos_normals_closest_vol = torch.randn(bsize, 100, 3).to(device) >>> pos_normals_com_vol = torch.randn(bsize, 100, 3).to(device) >>> pos_normals_com_surface = torch.randn(bsize, 100, 3).to(device) >>> geom_centers = torch.randn(bsize, 100, 3).to(device) >>> grid = torch.randn(bsize, nx, ny, nz, 3).to(device) >>> surf_grid = torch.randn(bsize, nx, ny, nz, 3).to(device) >>> sdf_grid = torch.randn(bsize, nx, ny, nz).to(device) >>> sdf_surf_grid = torch.randn(bsize, nx, ny, nz).to(device) >>> sdf_nodes = torch.randn(bsize, 100, 1).to(device) >>> surface_coordinates = torch.randn(bsize, 100, 3).to(device) >>> surface_neighbors = torch.randn(bsize, 100, num_neigh, 3).to(device) >>> surface_normals = torch.randn(bsize, 100, 3).to(device) >>> surface_neighbors_normals = torch.randn(bsize, 100, num_neigh, 3).to(device) >>> surface_sizes = torch.rand(bsize, 100).to(device) + 1e-6 # Note this needs to be > 0.0 >>> surface_neighbors_areas = torch.rand(bsize, 100, num_neigh).to(device) + 1e-6 >>> volume_coordinates = torch.randn(bsize, 100, 3).to(device) >>> vol_grid_max_min = torch.randn(bsize, 2, 3).to(device) >>> surf_grid_max_min = torch.randn(bsize, 2, 3).to(device) >>> global_params_values = torch.randn(bsize, global_features, 1).to(device) >>> global_params_reference = torch.randn(bsize, global_features, 1).to(device) >>> input_dict = { … “pos_volume_closest”: pos_normals_closest_vol, … “pos_volume_center_of_mass”: pos_normals_com_vol, … “pos_surface_center_of_mass”: pos_normals_com_surface, … “geometry_coordinates”: geom_centers, … “grid”: grid, … “surf_grid”: surf_grid, … “sdf_grid”: sdf_grid, … “sdf_surf_grid”: sdf_surf_grid, … “sdf_nodes”: sdf_nodes, … “surface_mesh_centers”: surface_coordinates, … “surface_mesh_neighbors”: surface_neighbors, … “surface_normals”: surface_normals, … “surface_neighbors_normals”: surface_neighbors_normals, … “surface_areas”: surface_sizes, … “surface_neighbors_areas”: surface_neighbors_areas, … “volume_mesh_centers”: volume_coordinates, … “volume_min_max”: vol_grid_max_min, … “surface_min_max”: surf_grid_max_min, … “global_params_reference”: global_params_values, … “global_params_values”: global_params_reference, … } >>> output = model(input_dict) >>> print(f”{output[0].shape}, {output[1].shape}”) torch.Size([1, 100, 5]), torch.Size([1, 100, 4])
- calculate_solution(
- volume_mesh_centers,
- encoding_g,
- encoding_node,
- global_params_values,
- global_params_reference,
- eval_mode,
- num_sample_points=20,
- noise_intensity=50,
- return_volume_neighbors=False,
Function to approximate solution sampling the neighborhood information
- calculate_solution_with_neighbors(
- surface_mesh_centers,
- encoding_g,
- encoding_node,
- surface_mesh_neighbors,
- surface_normals,
- surface_neighbors_normals,
- surface_areas,
- surface_neighbors_areas,
- global_params_values,
- global_params_reference,
- num_sample_points=7,
Function to approximate solution given the neighborhood information
- forward(data_dict, return_volume_neighbors=False)[source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- geo_encoding_local(
- encoding_g,
- volume_mesh_centers,
- p_grid,
- mode='volume',
Function to calculate local geometry encoding from global encoding
- position_encoder(
- encoding_node: Tensor,
- eval_mode: Literal['surface', 'volume'] = 'volume',
Compute positional encoding for input points.
- Parameters:
encoding_node – Tensor containing node position information
eval_mode – Mode of evaluation, either “volume” or “surface”
- Returns:
Tensor containing positional encoding features
- sample_sphere(center, r, num_points)[source]#
Uniformly sample points in a 3D sphere around the center.
This method generates random points within a sphere of radius r centered at each point in the input tensor. The sampling is uniform in volume, meaning points are more likely to be sampled in the outer regions of the sphere.
- Parameters:
center – Tensor of shape (batch_size, num_points, 3) containing center coordinates
r – Radius of the sphere for sampling
num_points – Number of points to sample per center
- Returns:
Tensor of shape (batch_size, num_points, num_samples, 3) containing the sampled points around each center
- sample_sphere_shell(center, r_inner, r_outer, num_points)[source]#
Uniformly sample points in a 3D spherical shell around a center.
This method generates random points within a spherical shell (annulus) between inner radius r_inner and outer radius r_outer centered at each point in the input tensor. The sampling is uniform in volume within the shell.
- Parameters:
center – Tensor of shape (batch_size, num_points, 3) containing center coordinates
r_inner – Inner radius of the spherical shell
r_outer – Outer radius of the spherical shell
num_points – Number of points to sample per center
- Returns:
Tensor of shape (batch_size, num_points, num_samples, 3) containing the sampled points within the spherical shell around each center
- class physicsnemo.models.domino.model.GeoConvOut(
- input_features: int,
- model_parameters,
- grid_resolution=None,
Bases:
Module
Geometry layer to project STL geometry data onto regular grids.
- forward(
- x: Tensor,
- grid: Tensor,
- radius: float = 0.025,
- neighbors_in_radius: int = 10,
Process and project geometric features onto a 3D grid.
- Parameters:
x – Input tensor containing coordinates of the neighboring points (batch_size, nx*ny*nz, 3, n_points)
grid – Input tensor represented as a grid of shape (batch_size, nx, ny, nz, 3)
- Returns:
Processed geometry features of shape (batch_size, base_neurons_in, nx, ny, nz)
- class physicsnemo.models.domino.model.GeoProcessor(
- input_filters: int,
- output_filters: int,
- model_parameters,
Bases:
Module
Geometry processing layer using CNNs
- forward(x: Tensor) Tensor [source]#
Process geometry information through the 3D CNN network.
The network follows an encoder-decoder architecture with skip connections: 1. Downsampling path (encoder) with three levels of max pooling 2. Processing loop in the bottleneck 3. Upsampling path (decoder) with skip connections from the encoder
- Parameters:
x – Input tensor containing grid-represented geometry of shape (batch_size, input_filters, nx, ny, nz)
- Returns:
Processed geometry features of shape (batch_size, 1, nx, ny, nz)
- class physicsnemo.models.domino.model.GeometryRep(
- input_features: int,
- radii: Sequence[float],
- neighbors_in_radius,
- hops=1,
- model_parameters=None,
Bases:
Module
Geometry representation module that processes STL geometry data.
This module constructs a multiscale representation of geometry by: 1. Computing multi-scale geometry encoding for local and global context 2. Processing signed distance field (SDF) data for surface information
The combined encoding enables the model to reason about both local and global geometric properties.
- forward(
- x: Tensor,
- p_grid: Tensor,
- sdf: Tensor,
Process geometry data to create a comprehensive representation.
This method combines short-range, long-range, and SDF-based geometry encodings to create a rich representation of the geometry.
- Parameters:
x – Input tensor containing geometric point data
p_grid – Grid points for sampling
sdf – Signed distance field tensor
- Returns:
Comprehensive geometry encoding that concatenates short-range, SDF-based, and long-range features
- class physicsnemo.models.domino.model.LocalPointConv(
- input_features,
- base_layer,
- output_features,
- model_parameters=None,
Bases:
Module
Layer for local geometry point kernel
- forward(x)[source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class physicsnemo.models.domino.model.NNBasisFunctions(input_features: int, model_parameters=None)[source]#
Bases:
Module
Basis function layer for point clouds
- class physicsnemo.models.domino.model.ParameterModel(input_features: int, model_parameters=None)[source]#
Bases:
Module
Neural network module to encode simulation parameters.
This module encodes physical global parameters into a learned latent representation that can be incorporated into the model’sprediction process.
- class physicsnemo.models.domino.model.PositionEncoder(input_features: int, model_parameters=None)[source]#
Bases:
Module
Positional encoding of point clouds
- physicsnemo.models.domino.model.calculate_pos_encoding(nx, d=8)[source]#
Function to caluculate positional encoding
- physicsnemo.models.domino.model.fourier_encode(coords, num_freqs)[source]#
Function to caluculate fourier features
- physicsnemo.models.domino.model.fourier_encode_vectorized(coords, freqs)[source]#
Vectorized Fourier feature encoding
- physicsnemo.models.domino.model.get_activation(
- activation: Literal['relu', 'gelu'],
Return a PyTorch activation function corresponding to the given name.
- physicsnemo.models.domino.model.scale_sdf(sdf: Tensor) Tensor [source]#
Scale a signed distance function (SDF) to emphasize surface regions.
This function applies a non-linear scaling to the SDF values that compresses the range while preserving the sign, effectively giving more weight to points near surfaces where abs(SDF) is small.
- Parameters:
sdf – Tensor containing signed distance function values
- Returns:
Tensor with scaled SDF values in range [-1, 1]
Diffusion Models#
PhysicsNeMo diffusion library provides three categories of models, that serve
different purposes. All models are based on the
Module
class.
- Model backbones:
Those are highly configurable architectures that can be used as a building block for more complex models.
- Specialized architectures:
Those are models that usually inherit from the model backbones, with some specific additional functionalities.
- Application-specific interfaces:
These Modules are not truly architectures, but rather wrappers around the model backbones or specialized architectures. Their intent is to provide a more user-friendly interface for specific applications.
In addition of these model architectures, PhysicsNeMo provides diffusion preconditioners, which are essentially wrappers around model architectures, that rescale the inputs and outputs of diffusion models to improve their performance.
Architecture Backbones#
Diffusion model backbones are highly configurable architectures that can be used
as a building block for more complex models. Backbones support
both conditional and unconditional modeling. Currently, there are two provided
backbones: the SongUNet, as implemented in the
SongUNet
class and the DhariwalUNet,
as implemented in the DhariwalUNet
class. These models were introduced in the papers Score-based generative modeling through stochastic
differential equations, Song et al. and
Diffusion models beat gans on image synthesis, Dhariwal et al..
The PhysicsNeMo implementation of these models follows closely that used in the paper
Elucidating the Design Space of Diffusion-Based Generative Models, Karras et al.. The original implementation of these
models can be found in the EDM repository.
Model backbones can be used as is, such as in in the StormCast example, but they can also be used as a base class for more complex models.
One of the most common diffusion backbones for image generation is the
SongUNet
class. Its latent state \(\mathbf{x}\) is a tensor of shape \((B, C, H, W)\),
where \(B\) is the batch size, \(C\) is the number of channels,
and \(H\) and \(W\) are the height and width of the feature map. The
model is conditional on the noise level, and can additionally be conditioned on
vector-valued class labels and/or images. The model is organized into levels,
whose number is determined by len(channel_mult)
, and each level operates at half the resolution of the
previous level (odd resolutions are rounded down). Each level is composed of a sequence of UNet blocks, that optionally contain
self-attention layers, as controlled by the attn_resolutions
parameter. The feature map resolution
is halved at the first block of each level and then remains constant within the level.
Here we start by creating a SongUNet
model with 3 levels, that applies self-attention
at levels 1 and 2. The model is unconditional, i.e. it is not conditioned on any
class labels or images (but is still conditional on the noise level, as it is
standard practice for diffusion models).
import torch
from physicsnemo.models.diffusion import SongUNet
B, C_x, res = 3, 6, 40 # Batch size, channels, and resolution of the latent state
model = SongUNet(
img_resolution=res,
in_channels=C_x,
out_channels=C_x, # No conditioning on image: number of output channels is the same as the input channels
label_dim=0, # No conditioning on vector-valued class labels
augment_dim=0,
model_channels=64,
channel_mult=[1, 2, 3], # 3-levels UNet with 64, 128, and 192 channels at each level, respectively
num_blocks=4, # 4 UNet blocks at each level
attn_resolutions=[20, 10], # Attention is applied at level 1 (resolution 20x20) and level 2 (resolution 10x10)
)
x = torch.randn(B, C_x, res, res) # Latent state
noise_labels = torch.randn(B) # Noise level for each sample
# The feature map resolution is 40 at level 0, 20 at level 1, and 10 at level 2
out = model(x, noise_labels, None)
print(out.shape) # Shape: (B, C_x, res, res), same as the latent state
# The same model can be used on images of different resolution
# Note: the attention is still applied at levels 1 and 2
x_32 = torch.randn(B, C_x, 32, 32) # Lower resolution latent state
out_32 = model(x_32, noise_labels, None) # None means no conditioning on class labels
print(out_32.shape) # Shape: (B, C_x, 32, 32), same as the latent state
The unconditional SongUNet
can be extended to be conditional on class labels and/or
images. Conditioning on images is performed by channel-wise concatenation of the image
to the latent state \(\mathbf{x}\) before passing it to the model. The model does not perform
conditioning on images internally, and this operation is left to the user. For
conditioning on class labels (or any vector-valued quantity whose dimension is label_dim
),
the model internally generates embeddings for the class labels
and adds them to intermediate activations within the UNet blocks. Here we
extend the previous example to be conditional on a 16-dimensional vector-valued
class label and a 3-channel image.
import torch
from physicsnemo.models.diffusion import SongUNet
B, C_x, res = 3, 10, 40
C_cond = 3
model = SongUNet(
img_resolution=res,
in_channels=C_x + C_cond, # Conditioning on an image with C_cond channels
out_channels=C_x, # Output channels: only those of the latent state
label_dim=16, # Conditioning on 16-dimensional vector-valued class labels
augment_dim=0,
model_channels=64,
channel_mult=[1, 2, 2],
num_blocks=4,
attn_resolutions=[20, 10],
)
x = torch.randn(B, C_x, res, res) # Latent state
cond = torch.randn(B, C_cond, res, res) # Conditioning image
x_cond = torch.cat([x, cond], dim=1) # Channel-wise concatenation of the conditioning image before passing to the model
noise_labels = torch.randn(B)
class_labels = torch.randn(B, 16) # Conditioning on vector-valued class labels
out = model(x_cond, noise_labels, class_labels)
print(out.shape) # Shape: (B, C_x, res, res), same as the latent state
Specialized Architectures#
Note that even though backbones can be used as is, some of the examples in
PhysicsNeMo examples use specialized architectures. These specialized architectures
typically inherit from the backbones and implement additional functionalities for specific
applications. For example the CorrDiff example
uses the specialized architectures SongUNetPosEmbd
and SongUNetPosLtEmbd
to implement
the diffusion model.
Positional embeddings#
Multi-diffusion (also called patch-based diffusion) is a technique to scale
diffusion models to large domains. The idea is to split the full domain into
patches, and run a diffusion model on each patch in parallel. The generated
patches are then fused back to form the final image. This technique is
particularly useful for domains that are too large to fit into the memory of
a single GPU. The CorrDiff example
uses patch-based diffusion for weather downscaling on large domains. A key
ingredient in the implementation of patch-based diffusion is the use of a
global spatial grid, that is used to inform each patch with their respective
position in the full domain. The SongUNetPosEmbd
class implements this functionality by providing multiple methods to encode
global spatial coordinates of the pixels into a global positional embedding grid.
In addition of multi-diffusion, spatial positional embeddings have also been
observed to improve the quality of the generated images, even for diffusion models
that operate on the full domain.
The following example shows how to use the specialized architecture
SongUNetPosEmbd
to implement a
multi-diffusion model. First, we create a SongUNetPosEmbd
model similar to
the one in the conditional SongUnet example
with a global positional embedding grid of shape (C_pos_emb, res, res)
. We
show that the model can be used with the entire latent state (full domain).
import torch
from physicsnemo.models.diffusion import SongUNetPosEmbd
B, C_x, res = 3, 10, 40
C_cond = 3
C_PE = 8 # Number of channels in the positional embedding grid
# Create a SongUNet with a global positional embedding grid of shape (C_PE, res, res)
model = SongUNetPosEmbd(
img_resolution=res, # Define the resolution of the global positional embedding grid
in_channels=C_x + C_cond + C_PE, # in_channels must include the number of channels in the positional embedding grid
out_channels=C_x,
label_dim=16,
augment_dim=0,
model_channels=64,
channel_mult=[1, 2, 2],
num_blocks=4,
attn_resolutions=[20, 10],
gridtype="learnable", # Use a learnable grid of positional embeddings
N_grid_channels=C_PE # Number of channels in the positional embedding grid
)
# Can pass the entire latent state to the model
x_global = torch.randn(B, C_x, res, res) # Entire latent state
cond = torch.randn(B, C_cond, res, res) # Conditioning image
x_cond = torch.cat([x_global, cond], dim=1) # Latent state with conditioning image
noise_labels = torch.randn(B)
class_labels = torch.randn(B, 16)
# The model internally concatenates the global positional embedding grid to the
# input x_cond before the first UNet block.
# Note: global_index=None means use the entire positional embedding grid
out = model(x_cond, noise_labels, class_labels, global_index=None)
print(out.shape) # Shape: (B, C_x, res, res), same as the latent state
Now we show that the model can be used on local patches of the latent state
(multi-diffusion approach). We manually extract 3 patches from the latent
state. Patches are treated as individual samples, so they are concatenated along
the batch dimension. We also create a global grid of indices grid
that
contains the indices of the pixels in the full domain, and we exctract the same
3 patches from the global grid and pass them to the global_index
parameter. The model internally uses global_index
to extract the corresponding
patches from the positional embedding grid and concatenate them to the input
x_cond_patches
before the first UNet block. Note that conditional
multi-diffusion still requires each patch to be conditioned on the entire
conditioning image cond
, which is why we interpolate the conditioning image
to the patch resolution and concatenate it to each individual patch.
In practice it is not necessary to manually extract the patches from the latent
state and the global grid, as PhysicsNeMo provides utilities to help with the
patching operations, in patching
. For an example of how
to use these utilities, see the CorrDiff example.
# Can pass local patches to the model
# Create batch of 3 patches from `x_global` with resolution 16x16
pres = 16 # Patch resolution
p1 = x_global[0:1, :, :pres, :pres] # Patch 1
p2 = x_global[3:4, :, pres:2*pres, pres:2*pres] # Patch 2
p3 = x_global[1:2, :, -pres:, pres:2*pres] # Patch 3
patches = torch.cat([p1, p2, p3], dim=0) # Batch of 3 patches
# Note: the conditioning image needs interpolation (or other operations) to
# match the patch resolution
cond1 = torch.nn.functional.interpolate(cond[0:1], size=(pres, pres), mode="bilinear")
cond2 = torch.nn.functional.interpolate(cond[3:4], size=(pres, pres), mode="bilinear")
cond3 = torch.nn.functional.interpolate(cond[1:2], size=(pres, pres), mode="bilinear")
cond_patches = torch.cat([cond1, cond2, cond3], dim=0)
# Concatenate the patches and the conditioning image
x_cond_patches = torch.cat([patches, cond_patches], dim=1)
# Create corresponding global indices for the patches
Ny, Nx = torch.arange(res).int(), torch.arange(res).int()
grid = torch.stack(torch.meshgrid(Ny, Nx, indexing="ij"), dim=0)
idx_patch1 = grid[:, :pres, :pres] # Global indices for patch 1
idx_patch2 = grid[:, pres:2*pres, pres:2*pres] # Global indices for patch 2
idx_patch3 = grid[:, -pres:, pres:2*pres] # Global indices for patch 3
global_index = torch.stack([idx_patch1, idx_patch2, idx_patch3], dim=0)
# The model internally extracts the corresponding patches from the global
# positional embedding grid and concatenates them to the input x_cond_patches
# before the first UNet block.
out = model(x_cond_patches, noise_labels, class_labels, global_index=global_index)
print(out.shape) # Shape: (3, C_x, pres, pres), same as the patches extracted from the latent state
Lead-time aware models#
In many diffusion applications, the latent state is time-dependent, and the diffusion process should account for the time-dependence of the latent state. For instance, a forecast model could provide latent states \(\mathbf{x}(T)\) (current time), \(\mathbf{x}(T + \Delta t)\) (one time step forward), …, up to \(\mathbf{x}(T + K \Delta t)\) (K time steps forward). Such prediction horizons are called lead-times (a term adopted from the weather and climate forecasting community) and we want to apply diffusion to each of these latent states while accounting for their associated lead-time information.
PhysicsNeMo provides a specialized architecture
SongUNetPosLtEmbd
that implements
lead-time aware models. This is an extension of the
SongUNetPosEmbd
class, and
additionally supports lead-time information. In its forward pass, the model
uses the lead_time_label
parameter to internally retrieve the associated
lead-time embeddings; it then conditions the diffusion process on those with a
channel-wise concatenation to the latent-state before the first UNet block.
Here we show an example extending the previous ones with lead-time information.
We assume that we have a batch of 3 latent states at times \(T + 2 \Delta t\)
(2 time intervals forward), \(T + 0 \Delta t\) (current time),
and \(T + \Delta t\) (1 time interval forward). The associated lead-time labels are
[2, 0, 1]
. In addition, the SongUNetPosLtEmbd
model has the ability to
predict probabilities for some channels of the latent state, specified by the
prob_channels
parameter. Here we assume that channels 1 and 3 are
probability (i.e. classification) outputs, while other channels are regression
outputs.
import torch
from physicsnemo.models.diffusion import SongUNetPosLtEmbd
B, C_x, res = 3, 10, 40
C_cond = 3
C_PE = 8
lead_time_steps = 3 # Maximum supported lead-time is 2 * dt
C_LT = 6 # 6 channels for each lead-time embeddings
# Create a SongUNet with a lead-time embedding grid of shape
# (lead_time_steps, C_lt_emb, res, res)
model = SongUNetPosLtEmbd(
img_resolution=res,
in_channels=C_x + C_cond + C_PE + C_LT, # in_channels must include the number of channels in lead-time grid
out_channels=C_x,
label_dim=16,
augment_dim=0,
model_channels=64,
channel_mult=[1, 2, 2],
num_blocks=4,
attn_resolutions=[10, 5],
gridtype="learnable",
N_grid_channels=C_PE,
lead_time_channels=C_LT,
lead_time_steps=lead_time_steps, # Maximum supported lead-time horizon
prob_channels=[1, 3], # Channels 1 and 3 fromn the latent state are probability outputs
)
x = torch.randn(B, C_x, res, res) # Latent state at times T+2*dt, T+0*dt, and T + 1*dt
cond = torch.randn(B, C_cond, res, res)
x_cond = torch.cat([x, cond], dim=1)
noise_labels = torch.randn(B)
class_labels = torch.randn(B, 16)
lead_time_label = torch.tensor([2, 0, 1]) # Lead-time labels for each sample
# The model internally extracts the lead-time embeddings corresponding to the
# lead-time labels 2, 0, 1 and concatenates them to the input x_cond before the first
# UNet block. In training mode, the model outputs logits for channels 1 and 3.
out = model(x_cond, noise_labels, class_labels, lead_time_label=lead_time_label)
print(out.shape) # Shape: (B, C_x, res, res), same as the latent state
# If eval mode the model outputs probabilities for channels 1 and 3
model.eval()
out = model(x_cond, noise_labels, class_labels, lead_time_label=lead_time_label)
Note
The SongUNetPosLtEmbd
is not an autoregressive model that performs a rollout
to produce future predictions. From the point of view of the SongUNetPosLtEmbd
,
the lead-time information is frozen. The lead-time dependent latent state \(\mathbf{x}\)
might however be produced by such an autoregressive/rollout model.
Note
The SongUNetPosLtEmbd
model cannot be scaled to very long lead-time
horizons (controlled by the lead_time_steps
parameter). This is because
the lead-time embeddings are represented by a grid of learnable parameters of
shape (lead_time_steps, C_LT, res, res)
. For very long lead-time, the
size of this grid of embeddings becomes prohibitively large.
Note
In a given input batch x
, the associated lead-times might be not necessarily
consecutive or in order. The do not even need to originate from the same forecast
trajectory. For example, the lead-time labels might be [0, 1, 2]
instead of [2, 0, 1]
,
or even [2, 2, 1]
.
Application-specific Interfaces#
Application-specific interfaces are not true architectures, but rather wrappers
around the model backbones or specialized architectures that provide a more
user-friendly interface for specific applications. Note that not all these
classes are true diffusion models, but can also be used in conjunction with
diffusion models. For instance, the CorrDiff example in
CorrDiff example uses the UNet
class to implement a regression model.
- class physicsnemo.models.diffusion.song_unet.SongUNet(*args, **kwargs)[source]#
Bases:
Module
This architecture is a diffusion backbone for 2D image generation. It is a reimplementation of the DDPM++ and NCSN++ architectures, which are U-Net variants with optional self-attention, embeddings, and encoder-decoder components.
This model supports conditional and unconditional setups, as well as several options for various internal architectural choices such as encoder and decoder type, embedding type, etc., making it flexible and adaptable to different tasks and configurations.
This architecture supports conditioning on the noise level (called noise labels), as well as on additional vector-valued labels (called class labels) and (optional) vector-valued augmentation labels. The conditioning mechanism relies on addition of the conditioning embeddings in the U-Net blocks of the encoder. To condition on images, the simplest mechanism is to concatenate the image to the input before passing it to the SongUNet.
The model first applies a mapping operation to generate embeddings for all the conditioning inputs (the noise level, the class labels, and the optional augmentation labels).
Then, at each level in the U-Net encoder, a sequence of blocks is applied:
A first block downsamples the feature map resolution by a factor of 2 (odd resolutions are floored). This block does not change the number of channels.
A sequence of
num_blocks
U-Net blocks are applied, each with a different number of channels. These blocks do not change the feature map resolution, but they multiply the number of channels by a factor specified inchannel_mult
. If required, the U-Net blocks also apply self-attention at the specified resolutions.At the end of the level, the feature map is cached to be used in a skip connection in the decoder.
The decoder is a mirror of the encoder, with the same number of levels and the same number of blocks per level. It multiplies the feature map resolution by a factor of 2 at each level.
- Parameters:
img_resolution (Union[List[int, int], int]) –
The resolution of the input/output image. Can be a single int \(H\) for square images or a list \([H, W]\) for rectangular images.
Note: This parameter is only used as a convenience to build the network. In practice, the model can still be used with images of different resolutions. The only exception to this rule is when
additive_pos_embed
is True, in which case the resolution of the latent state \(\mathbf{x}\) must matchimg_resolution
.in_channels (int) – Number of channels \(C_{in}\) in the input image. May include channels from both the latent state and additional channels when conditioning on images. For an unconditional model, this should be equal to
out_channels
.out_channels (int) – Number of channels \(C_{out}\) in the output image. Should be equal to the number of channels \(C_{\mathbf{x}}\) in the latent state.
label_dim (int, optional, default=0) – Dimension of the vector-valued
class_labels
conditioning; 0 indicates no conditioning on class labels.augment_dim (int, optional, default=0) – Dimension of the vector-valued augment_labels conditioning; 0 means no conditioning on augmentation labels.
model_channels (int, optional, default=128) – Base multiplier for the number of channels accross the entire network.
channel_mult (List[int], optional, default=[1, 2, 2, 2]) – Multipliers for the number of channels at every level in the encoder and decoder. The length of
channel_mult
determines the number of levels in the U-Net. At leveli
, the number of channel in the feature map ischannel_mult[i] * model_channels
.channel_mult_emb (int, optional, default=4) – Multiplier for the number of channels in the embedding vector. The embedding vector has
model_channels * channel_mult_emb
channels.num_blocks (int, optional, default=4) – Number of U-Net blocks at each level.
attn_resolutions (List[int], optional, default=[16]) – Resolutions of the levels at which self-attention layers are applied. Note that the feature map resolution must match exactly the value provided in attn_resolutions for the self-attention layers to be applied.
dropout (float, optional, default=0.10) – Dropout probability applied to intermediate activations within the U-Net blocks.
label_dropout (float, optional, default=0.0) – Dropout probability applied to the class_labels. Typically used for classifier-free guidance.
embedding_type (Literal["fourier", "positional", "zero"], optional, default="positional") – Diffusion timestep embedding type: ‘positional’ for DDPM++, ‘fourier’ for NCSN++, ‘zero’ for none.
channel_mult_noise (int, optional, default=1) – Multiplier for the number of channels in the noise level embedding. The noise level embedding vector has
model_channels * channel_mult_noise
channels.encoder_type (Literal["standard", "skip", "residual"], optional, default="standard") – Encoder architecture: ‘standard’ for DDPM++, ‘residual’ for NCSN++, ‘skip’ for skip connections.
decoder_type (Literal["standard", "skip"], optional, default="standard") – Decoder architecture: ‘standard’ or ‘skip’ for skip connections.
resample_filter (List[int], optional, default=[1, 1]) – Resampling filter coefficients applied in the U-Net blocks convolutions: [1,1] for DDPM++, [1,3,3,1] for NCSN++.
checkpoint_level (int, optional, default=0) – Number of levels that should use gradient checkpointing. Only levels at which the feature map resolution is large enough will be checkpointed (0 disables checkpointing, higher values means more layers are checkpointed). Higher values trade memory for computation.
additive_pos_embed (bool, optional, default=False) –
If
True
, adds a learnable positional embedding after the first convolution layer. Used in StormCast model.Note: Those positional embeddings encode spatial position information of the image pixels, unlike the
embedding_type
parameter which encodes temporal information about the diffusion process. In that sense it is a simpler version of the positional embedding used inSongUNetPosEmbd
.use_apex_gn (bool, optional, default=False) – A flag indicating whether we want to use Apex GroupNorm for NHWC layout. Apex needs to be installed for this to work. Need to set this as False on cpu.
act (str, optional, default=None) – The activation function to use when fusing activation with GroupNorm. Required when
use_apex_gn
isTrue
.profile_mode (bool, optional, default=False) – A flag indicating whether to enable all nvtx annotations during profiling.
amp_mode (bool, optional, default=False) – A flag indicating whether mixed-precision (AMP) training is enabled.
Forward#
- xtorch.Tensor
The input image of shape \((B, C_{in}, H_{in}, W_{in})\). In general
x
is the channel-wise concatenation of the latent state \(\mathbf{x}\) and additional images used for conditioning. For an unconditional model,x
is simply the latent state \(\mathbf{x}\).Note: \(H_{in}\) and \(W_{in}\) do not need to match \(H\) and \(W\) defined in
img_resolution
, except whenadditive_pos_embed
isTrue
. In that case, the resolution ofx
must matchimg_resolution
.- noise_labelstorch.Tensor
The noise labels of shape \((B,)\). Used for conditioning on the diffusion noise level.
- class_labelstorch.Tensor
The class labels of shape \((B, \text{label_dim})\). Used for conditioning on any vector-valued quantity. Can pass
None
whenlabel_dim
is 0.- augment_labelstorch.Tensor, optional, default=None
The augmentation labels of shape \((B, \text{augment_dim})\). Used for conditioning on any additional vector-valued quantity. Can pass
None
whenaugment_dim
is 0.
Outputs#
- torch.Tensor
The denoised latent state of shape \((B, C_{out}, H_{in}, W_{in})\).
Important
The terms noise levels (or noise labels) are used to refer to the diffusion time-step, as these are conceptually equivalent.
The terms labels and classes originate from the original paper and EDM repository, where this architecture was used for class-conditional image generation. While these terms suggest class-based conditioning, the architecture can actually be conditioned on any vector-valued conditioning.
The term positional embedding used in the embedding_type parameter also comes from the original paper and EDM repository. Here, positional refers to the diffusion time-step, similar to how position is used in transformer architectures. Despite the name, these embeddings encode temporal information about the diffusion process rather than spatial position information.
Limitations on input image resolution: for a model that has \(N\) levels, the latent state \(\mathbf{x}\) must have resolution that is a multiple of \(2^N\) in each dimension. This is due to a limitation in the decoder that does not support shape mismatch in the residual connections from the encoder to the decoder. For images that do not match this requirement, it is recommended to interpolate your data on a grid of the required resolution beforehand.
Example
>>> model = SongUNet(img_resolution=16, in_channels=2, out_channels=2) >>> noise_labels = torch.randn([1]) >>> class_labels = torch.randint(0, 1, (1, 1)) >>> input_image = torch.ones([1, 2, 16, 16]) >>> output_image = model(input_image, noise_labels, class_labels) >>> output_image.shape torch.Size([1, 2, 16, 16])
- property amp_mode#
Should be set to
True
to enable automatic mixed precision.
- property profile_mode#
Should be set to
True
to enable profiling.
- class physicsnemo.models.diffusion.dhariwal_unet.DhariwalUNet(*args, **kwargs)[source]#
Bases:
Module
This architecture is a diffusion backbone for 2D image generation. It reimplements the ADM architecture, a U-Net variant, with optional self-attention.
It is highly similar to the U-Net backbone defined in
SongUNet
, and only differs in a few aspects:The embedding conditioning mechanism relies on adaptive scaling of the group normalization layers within the U-Net blocks.
The parameters initialization follows Kaiming uniform initialization.
- Parameters:
img_resolution (int) –
The resolution \(H = W\) of the input/output image. Assumes square images.
Note: This parameter is only used as a convenience to build the network. In practice, the model can still be used with images of different resolutions.
in_channels (int) – Number of channels \(C_{in}\) in the input image. May include channels from both the latent state \(\mathbf{x}\) and additional channels when conditioning on images. For an unconditional model, this should be equal to
out_channels
.out_channels (int) – Number of channels \(C_{out}\) in the output image. Should be equal to the number of channels \(C_{\mathbf{x}}\) in the latent state.
label_dim (int, optional, default=0) – Dimension of the vector-valued
class_labels
conditioning; 0 indicates no conditioning on class labels.augment_dim (int, optional, default=0) – Dimension of the vector-valued
augment_labels
conditioning; 0 means no conditioning on augmentation labels.model_channels (int, optional, default=128) – Base multiplier for the number of channels accross the entire network.
channel_mult (List[int], optional, default=[1,2,2,2]) – Multipliers for the number of channels at every level in the encoder and decoder. The length of
channel_mult
determines the number of levels in the U-Net. At leveli
, the number of channel in the feature map ischannel_mult[i] * model_channels
.channel_mult_emb (int, optional, default=4) – Multiplier for the number of channels in the embedding vector. The embedding vector has
model_channels * channel_mult_emb
channels.num_blocks (int, optional, default=3) – Number of U-Net blocks at each level.
attn_resolutions (List[int], optional, default=[16]) – Resolutions of the levels at which self-attention layers are applied. Note that the feature map resolution must match exactly the value provided in
attn_resolutions
for the self-attention layers to be applied.dropout (float, optional, default=0.10) – Dropout probability applied to intermediate activations within the U-Net blocks.
label_dropout (float, optional, default=0.0) – Dropout probability applied to the
class_labels
. Typically used for classifier-free guidance.
Forward#
- xtorch.Tensor
The input tensor of shape \((B, C_{in}, H_{in}, W_{in})\). In general
x
is the channel-wise concatenation of the latent state \(\mathbf{x}\) and additional images used for conditioning. For an unconditional model,x
is simply the latent state \(\mathbf{x}\).- noise_labelstorch.Tensor
The noise labels of shape \((B,)\). Used for conditioning on the noise level.
- class_labelstorch.Tensor
The class labels of shape \((B, \text{label_dim})\). Used for conditioning on any vector-valued quantity. Can pass
None
whenlabel_dim
is 0.- augment_labelstorch.Tensor, optional, default=None
The augmentation labels of shape \((B, \text{augment_dim})\). Used for conditioning on any additional vector-valued quantity. Can pass
None
whenaugment_dim
is 0.
Outputs#
- torch.Tensor:
The denoised latent state of shape \((B, C_{out}, H_{in}, W_{in})\).
Examples
>>> model = DhariwalUNet(img_resolution=16, in_channels=2, out_channels=2) >>> noise_labels = torch.randn([1]) >>> class_labels = torch.randint(0, 1, (1, 1)) # noqa: N806 >>> input_image = torch.ones([1, 2, 16, 16]) # noqa: N806 >>> output_image = model(input_image, noise_labels, class_labels) # noqa: N806
- property amp_mode#
Should be set to
True
to enable automatic mixed precision.
- property profile_mode#
Should be set to
True
to enable profiling.
- class physicsnemo.models.diffusion.song_unet.SongUNetPosEmbd(*args, **kwargs)[source]#
Bases:
SongUNet
This specialized architecture extends
SongUNet
with positional embeddings that encode global spatial coordinates of the pixels.This model supports the same type of conditioning as the base SongUNet, and can be in addition conditioned on the positional embeddings. Conditioning on the positional embeddings is performed with a channel-wise concatenation to the input image before the first layer of the U-Net. Multiple types of positional embeddings are supported. Positional embeddings are represented by a 2D grid of shape \((C_{PE}, H, W)\), where \(H\) and \(W\) correspond to the
img_resolution
parameter.The following types of positional embeddings are supported:
learnable: uses a 2D grid of learnable parameters.
linear: uses a 2D rectilinear grid over the domain \([-1, 1] \times [-1, 1]\).
sinusoidal: uses sinusoidal functions of the spatial coordinates, with possibly multiple frequency bands.
test: uses a 2D grid of integer indices, only used for testing.
When the input image spatial resolution is smaller than the global positional embeddings, it is necessary to select a subset (or patch) of the embedding grid that correspond to the spatial locations of the input image pixels. The model provides two methods for selecting the subset of positional embeddings:
Using a selector function. See
positional_embedding_selector()
for details.Using global indices. See
positional_embedding_indexing()
for details.
If none of these are provided, the entire grid of positional embeddings is used and channel-wise concatenated to the input image.
Most parameters are the same as in the parent class
SongUNet
. Only the ones that differ are listed below.- Parameters:
img_resolution (Union[List[int, int], int]) – The resolution of the input/output image. Can be a single int for square images or a list \([H, W]\) for rectangular images. Used to set the resolution of the positional embedding grid. It must correspond to the spatial resolution of the global domain/image.
in_channels (int) –
Number of channels \(C_{in} + C_{PE}\), where \(C_{in}\) is the number of channels in the image passed to the U-Net and \(C_{PE}\) is the number of channels in the positional embedding grid.
Important: in comparison to the base
SongUNet
, this parameter should also include the number of channels in the positional embedding grid \(C_{PE}\).gridtype (Literal["sinusoidal", "learnable", "linear", "test"], optional, default="sinusoidal") – Type of positional embedding to use. Controls how spatial pixels locations are encoded.
N_grid_channels (int, optional, default=4) – Number of channels \(C_{PE}\) in the positional embedding grid. For ‘sinusoidal’ must be 4 or multiple of 4. For ‘linear’ and ‘test’ must be 2. For ‘learnable’ can be any value.
lead_time_mode (bool, optional, default=False) – Provided for convenience. It is recommended to use the architecture
SongUNetPosLtEmbd
for a lead-time aware model.lead_time_channels (int, optional, default=None) – Provided for convenience. Refer to
SongUNetPosLtEmbd
.lead_time_steps (int, optional, default=9) – Provided for convenience. Refer to
SongUNetPosLtEmbd
.prob_channels (List[int], optional, default=[]) – Provided for convenience. Refer to
SongUNetPosLtEmbd
.
Forward#
- xtorch.Tensor
The input image of shape \((B, C_{in}, H_{in}, W_{in})\), where \(H_{in}\) and \(W_{in}\) are the spatial dimensions of the input image (does not need to be the full image). In general
x
is the channel-wise concatenation of the latent state \(\mathbf{x}\) and additional images used for conditioning. For an unconditional model,x
is simply the latent state \(\mathbf{x}\).Note: \(H_{in}\) and \(W_{in}\) do not need to match the
img_resolution
parameter, except whenadditive_pos_embed
isTrue
. In all other cases, the resolution ofx
must be smaller thanimg_resolution
.- noise_labelstorch.Tensor
The noise labels of shape \((B,)\). Used for conditioning on the diffusion noise level.
- class_labelstorch.Tensor
The class labels of shape \((B, \text{label_dim})\). Used for conditioning on any vector-valued quantity. Can pass
None
whenlabel_dim
is 0.- global_indextorch.Tensor, optional, default=None
The global indices of the positional embeddings to use. If neither
global_index
norembedding_selector
are provided, the entire positional embedding grid of shape \((C_{PE}, H, W)\) is used. In this casex
must have the same spatial resolution as the positional embedding grid. Seepositional_embedding_indexing()
for details.- embedding_selectorCallable, optional, default=None
A function that selects the positional embeddings to use. See
positional_embedding_selector()
for details.- augment_labelstorch.Tensor, optional, default=None
The augmentation labels of shape \((B, \text{augment_dim})\). Used for conditioning on any additional vector-valued quantity. Can pass
None
whenaugment_dim
is 0.
Outputs#
- torch.Tensor
The output tensor of shape \((B, C_{out}, H_{in}, W_{in})\).
Important
Unlike positional embeddings defined by
embedding_type
in the parent classSongUNet
that encode the diffusion time-step (or noise level), the positional embeddings in this specialized architecture encode global spatial coordinates of the pixels.Examples
>>> import torch >>> from physicsnemo.models.diffusion.song_unet import SongUNetPosEmbd >>> from physicsnemo.utils.patching import GridPatching2D >>> >>> # Model initialization - in_channels must include both original input channels (2) >>> # and the positional embedding channels (N_grid_channels=4 by default) >>> model = SongUNetPosEmbd(img_resolution=16, in_channels=2+4, out_channels=2) >>> noise_labels = torch.randn([1]) >>> class_labels = torch.randint(0, 1, (1, 1)) >>> # The input has only the original 2 channels - positional embeddings are >>> # added automatically inside the forward method >>> input_image = torch.ones([1, 2, 16, 16]) >>> output_image = model(input_image, noise_labels, class_labels) >>> output_image.shape torch.Size([1, 2, 16, 16]) >>> >>> # Using a global index to select all positional embeddings >>> patching = GridPatching2D(img_shape=(16, 16), patch_shape=(16, 16)) >>> global_index = patching.global_index(batch_size=1) >>> output_image = model( ... input_image, noise_labels, class_labels, ... global_index=global_index ... ) >>> output_image.shape torch.Size([1, 2, 16, 16]) >>> >>> # Using a custom embedding selector to select all positional embeddings >>> def patch_embedding_selector(emb): ... return patching.apply(emb[None].expand(1, -1, -1, -1)) >>> output_image = model( ... input_image, noise_labels, class_labels, ... embedding_selector=patch_embedding_selector ... ) >>> output_image.shape torch.Size([1, 2, 16, 16])
- property amp_mode#
Should be set to
True
to enable automatic mixed precision.
- positional_embedding_indexing(
- x: Tensor,
- global_index: Tensor | None = None,
- lead_time_label=None,
Select positional embeddings using global indices.
This method uses global indices to select specific subset of the positional embedding grid (called patches). If no indices are provided, the entire positional embedding grid is returned.
- Parameters:
x (torch.Tensor) – Input tensor of shape \((P \times B, C, H_{in}, W_{in})\). Only used to determine batch size \(B\) and device.
global_index (Optional[torch.Tensor], default=None) – Tensor of shape \((P, 2, H_{in}, W_{in})\) that correspond to the patches to extract from the positional embedding grid. \(P\) is the number of distinct patches in the input tensor
x
. The channel dimension should contain \(j\), \(i\) indices that should represent the indices of the pixels to extract from the embedding grid.
- Returns:
Selected positional embeddings with shape \((P \times B, C_{PE}, H_{in}, W_{in})\) (same spatial resolution as
global_index
) ifglobal_index
is provided. Ifglobal_index
is None, the entire positional embedding grid is duplicated \(B\) times and returned with shape \((B, C_{PE}, H, W)\).- Return type:
torch.Tensor
Example
>>> # Create global indices using patching utility: >>> from physicsnemo.utils.patching import GridPatching2D >>> patching = GridPatching2D(img_shape=(16, 16), patch_shape=(8, 8)) >>> global_index = patching.global_index(batch_size=3) >>> print(global_index.shape) torch.Size([4, 2, 8, 8])
Notes
This method is typically used in patch-based diffusion (or multi-diffusion), where a large input image is split into multiple patches. The batch dimension of the input tensor contains the patches. Patches are processed independently by the model, and the
global_index
parameter is used to select the grid of positional embeddings corresponding to each patch.See this method from
physicsnemo.utils.patching.BasePatching2D
for generating theglobal_index
parameter:global_index()
.
- positional_embedding_selector(
- x: Tensor,
- embedding_selector: Callable[[Tensor], Tensor],
- lead_time_label=None,
Select positional embeddings using a selector function.
Similar to
positional_embedding_indexing()
, but instead uses a selector function to select the embeddings.- Parameters:
x (torch.Tensor) – Input tensor of shape \((P \times B, C, H_{in}, W_{in})\). Only used to determine batch size \(B\), dtype and device.
embedding_selector (Callable) – Function that takes as input the entire embedding grid of shape \((C_{PE}, H, W)\) and returns selected embeddings with shape \((P \times B, C_{PE}, H_{in}, W_{in})\). Each selected embedding should correspond to the portion of the embedding grid that corresponds to the batch element in
x
. Typically this should be based onphysicsnemo.utils.patching.BasePatching2D.apply()
method to maintain consistency with patch extraction.lead_time_label (Optional[torch.Tensor], default=None) – Tensor of shape \((P,)\) that corresponds to the lead-time label for each patch. Only used if
lead_time_mode
is True.
- Returns:
A tensor of shape \((P \times B, C_{PE} [+ C_{LT}], H_{in}, W_{in})\). \(C_{PE}\) is the number of embedding channels in the positional embedding grid, and \(C_{LT}\) is the number of embedding channels in the lead-time embedding grid. If
lead_time_label
is provided, the lead-time embedding channels are included.- Return type:
torch.Tensor
Notes
This method is typically used in patch-based diffusion (or multi-diffusion), where a large input image is split into multiple patches. The batch dimension of the input tensor contains the patches. Patches are processed independently by the model, and the
embedding_selector
function is used to select the grid of positional embeddings corresponding to each patch.See this method from
physicsnemo.utils.patching.BasePatching2D
for generating theembedding_selector
parameter:apply()
Example
>>> # Define a selector function with a patching utility: >>> from physicsnemo.utils.patching import GridPatching2D >>> patching = GridPatching2D(img_shape=(16, 16), patch_shape=(8, 8)) >>> batch_size = 4 >>> def embedding_selector(emb): ... return patching.apply(emb[None].expand(batch_size, -1, -1, -1)) >>>
- property profile_mode#
Should be set to
True
to enable profiling.
- class physicsnemo.models.diffusion.song_unet.SongUNetPosLtEmbd(*args, **kwargs)[source]#
Bases:
SongUNetPosEmbd
This specialized architecture extends
SongUNetPosEmbd
with two additional capabilities:The model can be conditioned on lead-time labels. These labels encode physical time information, such as a forecasting horizon.
Similarly to the parent
SongUNetPosEmbd
, this model predicts regression targets, but it can also produce classification predictions. More precisely, some of the ouput channels are probability outputs, that are passed through a softmax activation function. This is useful for multi-task applications, where the objective is a combination of both regression and classification losses.
The mechanism to condition on lead-time labels is implemented by:
First generating a grid of learnable lead-time embeddings of shape \((\text{lead_time_steps}, C_{LT}, H, W)\). The spatial resolution of the lead-time embeddings is the same as the input/output image.
Then, given an input
x
, select the lead-time embeddings that corresponds to the lead-times associated with the samples in the inputx
.Finally, concatenate channels-wise the selected lead-time embeddings and positional embeddings to the input
x
and pass them to the U-Net network.
Most parameters are similar to the parent
SongUNetPosEmbd
, at the exception of the ones listed below.- Parameters:
in_channels (int) –
Number of channels \(C_{in} + C_{PE} + C_{LT}\) in the image passed to the U-Net.
Important: in comparison to the base
SongUNet
, this parameter should also include the number of channels in the positional embedding grid \(C_{PE}\) and the number of channels in the lead-time embedding grid \(C_{LT}\).lead_time_channels (int, optional, default=None) – Number of channels \(C_{LT}\) in the lead time embedding. These are learned embeddings that encode physical time information.
lead_time_steps (int, optional, default=9) – Number of discrete lead time steps to support. Each step gets its own learned embedding vector of shape \((C_{LT}, H, W)\).
prob_channels (List[int], optional, default=[]) – Indices of channels that are probability outputs (or classification predictions), In training mode, the model outputs logits for these probability channels, and in eval mode, the model applies a softmax to outputs the probabilities.
Forward
-------
x (torch.Tensor) – The input image of shape \((B, C_{in}, H_{in}, W_{in})\), where \(H_{in}\) and \(W_{in}\) are the spatial dimensions of the input image (does not need to be the full image).
noise_labels (torch.Tensor) – The noise labels of shape \((B,)\). Used for conditioning on the diffusion noise level.
class_labels (torch.Tensor) – The class labels of shape \((B, \text{label_dim})\). Used for conditioning on any vector-valued quantity. Can pass
None
whenlabel_dim
is 0.global_index (torch.Tensor, optional, default=None) – The global indices of the positional embeddings to use. See
positional_embedding_indexing()
for details. If neitherglobal_index
norembedding_selector
are provided, the entire positional embedding grid is used.embedding_selector (Callable, optional, default=None) – A function that selects the positional embeddings to use. See
positional_embedding_selector()
for details.augment_labels (torch.Tensor, optional, default=None) – The augmentation labels of shape \((B, \text{augment_dim})\). Used for conditioning on any additional vector-valued quantity.
lead_time_label (torch.Tensor, optional, default=None) – The lead-time labels of shape \((B,)\). Used for selecting lead-time embeddings. It should contain the indices of the lead-time embeddings that correspond to the lead-time of each sample in the batch.
Outputs
-------
torch.Tensor – The output tensor of shape \((B, C_{out}, H_{in}, W_{in})\).
Notes
The lead-time embeddings differ from the diffusion time embeddings used in
SongUNet
class, as they do not encode diffusion time-step but physical forecast time.
Example
>>> import torch >>> from physicsnemo.models.diffusion.song_unet import SongUNetPosLtEmbd >>> from physicsnemo.utils.patching import GridPatching2D >>> >>> # Model initialization - in_channels must include original input channels (2), >>> # positional embedding channels (N_grid_channels=4 by default) and >>> # lead time embedding channels (4) >>> model = SongUNetPosLtEmbd( ... img_resolution=16, in_channels=2+4+4, out_channels=2, ... lead_time_channels=4, lead_time_steps=9 ... ) >>> noise_labels = torch.randn([1]) >>> class_labels = torch.randint(0, 1, (1, 1)) >>> # The input has only the original 2 channels - positional embeddings and >>> # lead time embeddings are added automatically inside the forward method >>> input_image = torch.ones([1, 2, 16, 16]) >>> lead_time_label = torch.tensor([3]) >>> output_image = model( ... input_image, noise_labels, class_labels, ... lead_time_label=lead_time_label ... ) >>> output_image.shape torch.Size([1, 2, 16, 16]) >>> >>> # Using global_index to select all the positional and lead time embeddings >>> patching = GridPatching2D(img_shape=(16, 16), patch_shape=(16, 16)) >>> global_index = patching.global_index(batch_size=1) >>> output_image = model( ... input_image, noise_labels, class_labels, ... lead_time_label=lead_time_label, ... global_index=global_index ... ) >>> output_image.shape torch.Size([1, 2, 16, 16])
- property amp_mode#
Should be set to
True
to enable automatic mixed precision.
- positional_embedding_indexing(
- x: Tensor,
- global_index: Tensor | None = None,
- lead_time_label=None,
Select positional embeddings using global indices.
This method uses global indices to select specific subset of the positional embedding grid (called patches). If no indices are provided, the entire positional embedding grid is returned.
- Parameters:
x (torch.Tensor) – Input tensor of shape \((P \times B, C, H_{in}, W_{in})\). Only used to determine batch size \(B\) and device.
global_index (Optional[torch.Tensor], default=None) – Tensor of shape \((P, 2, H_{in}, W_{in})\) that correspond to the patches to extract from the positional embedding grid. \(P\) is the number of distinct patches in the input tensor
x
. The channel dimension should contain \(j\), \(i\) indices that should represent the indices of the pixels to extract from the embedding grid.
- Returns:
Selected positional embeddings with shape \((P \times B, C_{PE}, H_{in}, W_{in})\) (same spatial resolution as
global_index
) ifglobal_index
is provided. Ifglobal_index
is None, the entire positional embedding grid is duplicated \(B\) times and returned with shape \((B, C_{PE}, H, W)\).- Return type:
torch.Tensor
Example
>>> # Create global indices using patching utility: >>> from physicsnemo.utils.patching import GridPatching2D >>> patching = GridPatching2D(img_shape=(16, 16), patch_shape=(8, 8)) >>> global_index = patching.global_index(batch_size=3) >>> print(global_index.shape) torch.Size([4, 2, 8, 8])
Notes
This method is typically used in patch-based diffusion (or multi-diffusion), where a large input image is split into multiple patches. The batch dimension of the input tensor contains the patches. Patches are processed independently by the model, and the
global_index
parameter is used to select the grid of positional embeddings corresponding to each patch.See this method from
physicsnemo.utils.patching.BasePatching2D
for generating theglobal_index
parameter:global_index()
.
- positional_embedding_selector(
- x: Tensor,
- embedding_selector: Callable[[Tensor], Tensor],
- lead_time_label=None,
Select positional embeddings using a selector function.
Similar to
positional_embedding_indexing()
, but instead uses a selector function to select the embeddings.- Parameters:
x (torch.Tensor) – Input tensor of shape \((P \times B, C, H_{in}, W_{in})\). Only used to determine batch size \(B\), dtype and device.
embedding_selector (Callable) – Function that takes as input the entire embedding grid of shape \((C_{PE}, H, W)\) and returns selected embeddings with shape \((P \times B, C_{PE}, H_{in}, W_{in})\). Each selected embedding should correspond to the portion of the embedding grid that corresponds to the batch element in
x
. Typically this should be based onphysicsnemo.utils.patching.BasePatching2D.apply()
method to maintain consistency with patch extraction.lead_time_label (Optional[torch.Tensor], default=None) – Tensor of shape \((P,)\) that corresponds to the lead-time label for each patch. Only used if
lead_time_mode
is True.
- Returns:
A tensor of shape \((P \times B, C_{PE} [+ C_{LT}], H_{in}, W_{in})\). \(C_{PE}\) is the number of embedding channels in the positional embedding grid, and \(C_{LT}\) is the number of embedding channels in the lead-time embedding grid. If
lead_time_label
is provided, the lead-time embedding channels are included.- Return type:
torch.Tensor
Notes
This method is typically used in patch-based diffusion (or multi-diffusion), where a large input image is split into multiple patches. The batch dimension of the input tensor contains the patches. Patches are processed independently by the model, and the
embedding_selector
function is used to select the grid of positional embeddings corresponding to each patch.See this method from
physicsnemo.utils.patching.BasePatching2D
for generating theembedding_selector
parameter:apply()
Example
>>> # Define a selector function with a patching utility: >>> from physicsnemo.utils.patching import GridPatching2D >>> patching = GridPatching2D(img_shape=(16, 16), patch_shape=(8, 8)) >>> batch_size = 4 >>> def embedding_selector(emb): ... return patching.apply(emb[None].expand(batch_size, -1, -1, -1)) >>>
- property profile_mode#
Should be set to
True
to enable profiling.
- class physicsnemo.models.diffusion.unet.UNet(*args, **kwargs)[source]#
Bases:
Module
This interface provides a U-Net wrapper for CorrDiff deterministic regression model (and other deterministic downsampling models). It supports the following architectures:
It shares the same architeture as a conditional diffusion model. It does so by concatenating a conditioning image to a zero-filled latent state, and by setting the noise level and the class labels to zero.
- Parameters:
img_resolution (Union[int, Tuple[int, int]]) – The resolution of the input/output image. If a single int is provided, then the image is assumed to be square.
img_in_channels (int) – Number of channels in the input image.
img_out_channels (int) – Number of channels in the output image.
use_fp16 (bool, optional, default=False) – Execute the underlying model at FP16 precision.
model_type (Literal['SongUNet', 'SongUNetPosEmbd', 'SongUNetPosLtEmbd',)
'DhariwalUNet'] – Class name of the underlying architecture. Must be one of the following: ‘SongUNet’, ‘SongUNetPosEmbd’, ‘SongUNetPosLtEmbd’, ‘DhariwalUNet’.
default='SongUNetPosEmbd' – Class name of the underlying architecture. Must be one of the following: ‘SongUNet’, ‘SongUNetPosEmbd’, ‘SongUNetPosLtEmbd’, ‘DhariwalUNet’.
**model_kwargs (dict) – Keyword arguments passed to the underlying architecture __init__ method.
call (Please refer to the documentation of these classes for details on how to)
directly. (and use these models)
Forward
-------
x (torch.Tensor) – The input tensor, typically zero-filled, of shape \((B, C_{in}, H_{in}, W_{in})\).
img_lr (torch.Tensor) – Conditioning image of shape \((B, C_{lr}, H_{in}, W_{in})\).
**model_kwargs – Additional keyword arguments to pass to the underlying architecture forward method.
Outputs
-------
torch.Tensor – Output tensor of shape \((B, C_{out}, H_{in}, W_{in})\) (same spatial dimensions as the input).
- property amp_mode#
Set to
True
when using automatic mixed precision.
- property profile_mode#
Set to
True
to enable profiling of the wrapped model.
- property use_fp16#
Whether the model uses float16 precision.
- Returns:
True if the model is in float16 mode, False otherwise.
- Return type:
bool
- Type:
bool
Diffusion Preconditioners#
Preconditioning is an essential technique to improve the performance of diffusion models. It consists in scaling the latent state and the noise level that are passed to a network. Some preconditioning also requires to re-scale the output of the network. PhysicsNeMo provides a set of preconditioning classes that are wrappers around backbones or specialized architectures.
Preconditioning schemes used in the paper”Elucidating the Design Space of Diffusion-Based Generative Models”.
- class physicsnemo.models.diffusion.preconditioning.EDMPrecond(*args, **kwargs)[source]#
Bases:
Module
Improved preconditioning proposed in the paper “Elucidating the Design Space of Diffusion-Based Generative Models” (EDM)
- Parameters:
img_resolution (int) – Image resolution.
img_channels (int) – Number of color channels (for both input and output). If your model requires a different number of input or output chanels, override this by passing either of the optional img_in_channels or img_out_channels args
label_dim (int) – Number of class labels, 0 = unconditional, by default 0.
use_fp16 (bool) – Execute the underlying model at FP16 precision?, by default False.
sigma_min (float) – Minimum supported noise level, by default 0.0.
sigma_max (float) – Maximum supported noise level, by default inf.
sigma_data (float) – Expected standard deviation of the training data, by default 0.5.
model_type (str) – Class name of the underlying model, by default “DhariwalUNet”.
img_in_channels (int) – Optional setting for when number of input channels =/= number of output channels. If set, will override img_channels for the input This is useful in the case of additional (conditional) channels
img_out_channels (int) – Optional setting for when number of input channels =/= number of output channels. If set, will override img_channels for the output
**model_kwargs (dict) – Keyword arguments for the underlying model.
Note
Reference: Karras, T., Aittala, M., Aila, T. and Laine, S., 2022. Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems, 35, pp.26565-26577.
- forward(
- x,
- sigma,
- condition=None,
- class_labels=None,
- force_fp32=False,
- **model_kwargs,
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class physicsnemo.models.diffusion.preconditioning.EDMPrecondMetaData(
- name: str = 'EDMPrecond',
- jit: bool = False,
- cuda_graphs: bool = False,
- amp: bool = False,
- amp_cpu: bool = False,
- amp_gpu: bool = True,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = None,
- onnx_cpu: bool = None,
- onnx_runtime: bool = False,
- trt: bool = False,
- var_dim: int = -1,
- func_torch: bool = False,
- auto_grad: bool = False,
Bases:
ModelMetaData
EDMPrecond meta data
- class physicsnemo.models.diffusion.preconditioning.EDMPrecondSR(*args, **kwargs)[source]#
Bases:
EDMPrecondSuperResolution
NOTE: This is a deprecated version of the EDMPrecondSuperResolution model. This was used to maintain backwards compatibility and allow loading old models. Please use the EDMPrecondSuperResolution model instead.
Improved preconditioning proposed in the paper “Elucidating the Design Space of Diffusion-Based Generative Models” (EDM) for super-resolution tasks
- Parameters:
img_resolution (int) – Image resolution.
img_channels (int) – Number of color channels (deprecated, not used).
img_in_channels (int) – Number of input color channels.
img_out_channels (int) – Number of output color channels.
use_fp16 (bool) – Execute the underlying model at FP16 precision?, by default False.
sigma_min (float) – Minimum supported noise level, by default 0.0.
sigma_max (float) – Maximum supported noise level, by default inf.
sigma_data (float) – Expected standard deviation of the training data, by default 0.5.
model_type (str) – Class name of the underlying model, by default “SongUNetPosEmbd”.
scale_cond_input (bool) – Whether to scale the conditional input (deprecated), by default True.
**model_kwargs (dict) – Keyword arguments for the underlying model.
Note
References: - Karras, T., Aittala, M., Aila, T. and Laine, S., 2022. Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems, 35, pp.26565-26577. - Mardani, M., Brenowitz, N., Cohen, Y., Pathak, J., Chen, C.Y., Liu, C.C.,Vahdat, A., Kashinath, K., Kautz, J. and Pritchard, M., 2023. Generative Residual Diffusion Modeling for Km-scale Atmospheric Downscaling. arXiv preprint arXiv:2309.15214.
- forward(
- x,
- img_lr,
- sigma,
- force_fp32=False,
- **model_kwargs,
Forward pass of the EDMPrecondSR model wrapper.
- Parameters:
x (torch.Tensor) – Noisy high-resolution image of shape (B, C_hr, H, W).
img_lr (torch.Tensor) – Low-resolution conditioning image of shape (B, C_lr, H, W).
sigma (torch.Tensor) – Noise level of shape (B) or (B, 1) or (B, 1, 1, 1).
force_fp32 (bool, optional) – Whether to force FP32 precision regardless of the use_fp16 attribute, by default False.
**model_kwargs (dict) – Additional keyword arguments to pass to the underlying model.
- Returns:
Denoised high-resolution image of shape (B, C_hr, H, W).
- Return type:
torch.Tensor
- class physicsnemo.models.diffusion.preconditioning.EDMPrecondSRMetaData(
- name: str = 'EDMPrecondSR',
- jit: bool = False,
- cuda_graphs: bool = False,
- amp: bool = False,
- amp_cpu: bool = False,
- amp_gpu: bool = True,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = None,
- onnx_cpu: bool = None,
- onnx_runtime: bool = False,
- trt: bool = False,
- var_dim: int = -1,
- func_torch: bool = False,
- auto_grad: bool = False,
Bases:
ModelMetaData
EDMPrecondSR meta data
- class physicsnemo.models.diffusion.preconditioning.EDMPrecondSuperResolution(*args, **kwargs)[source]#
Bases:
Module
Improved preconditioning proposed in the paper “Elucidating the Design Space of Diffusion-Based Generative Models” (EDM).
This is a variant of EDMPrecond that is specifically designed for super-resolution tasks. It wraps a neural network that predicts the denoised high-resolution image given a noisy high-resolution image, and additional conditioning that includes a low-resolution image, and a noise level.
- Parameters:
img_resolution (Union[int, Tuple[int, int]]) – Spatial resolution \((H, W)\) of the image. If a single int is provided, the image is assumed to be square.
img_in_channels (int) – Number of input channels in the low-resolution input image.
img_out_channels (int) – Number of output channels in the high-resolution output image.
use_fp16 (bool, optional) – Whether to use half-precision floating point (FP16) for model execution, by default False.
model_type (str, optional) – Class name of the underlying model. Must be one of the following: ‘SongUNet’, ‘SongUNetPosEmbd’, ‘SongUNetPosLtEmbd’, ‘DhariwalUNet’. Defaults to ‘SongUNetPosEmbd’.
sigma_data (float, optional) – Expected standard deviation of the training data, by default 0.5.
sigma_min (float, optional) – Minimum supported noise level, by default 0.0.
sigma_max (float, optional) – Maximum supported noise level, by default inf.
**model_kwargs (dict) – Keyword arguments passed to the underlying model __init__ method.
See also
For
SongUNet
Basic U-Net for diffusion models
SongUNetPosEmbd
U-Net with positional embeddings
SongUNetPosLtEmbd
U-Net with positional and lead-time embeddings
Please
,and
Note
References: - Karras, T., Aittala, M., Aila, T. and Laine, S., 2022. Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems, 35, pp.26565-26577. - Mardani, M., Brenowitz, N., Cohen, Y., Pathak, J., Chen, C.Y., Liu, C.C.,Vahdat, A., Kashinath, K., Kautz, J. and Pritchard, M., 2023. Generative Residual Diffusion Modeling for Km-scale Atmospheric Downscaling. arXiv preprint arXiv:2309.15214.
- property amp_mode#
Set to
True
when using automatic mixed precision.
- forward(
- x: Tensor,
- img_lr: Tensor,
- sigma: Tensor,
- force_fp32: bool = False,
- **model_kwargs: dict,
Forward pass of the EDMPrecondSuperResolution model wrapper.
This method applies the EDM preconditioning to compute the denoised image from a noisy high-resolution image and low-resolution conditioning image.
- Parameters:
x (torch.Tensor) – Noisy high-resolution image of shape (B, C_hr, H, W). The number of channels C_hr should be equal to img_out_channels.
img_lr (torch.Tensor) – Low-resolution conditioning image of shape (B, C_lr, H, W). The number of channels C_lr should be equal to img_in_channels.
sigma (torch.Tensor) – Noise level of shape (B) or (B, 1) or (B, 1, 1, 1).
force_fp32 (bool, optional) – Whether to force FP32 precision regardless of the use_fp16 attribute, by default False.
**model_kwargs (dict) – Additional keyword arguments to pass to the underlying model self.model forward method.
- Returns:
Denoised high-resolution image of shape (B, C_hr, H, W).
- Return type:
torch.Tensor
- Raises:
ValueError – If the model output dtype doesn’t match the expected dtype.
- property profile_mode#
Set to
True
to enable profiling of the wrapped model.
- static round_sigma(
- sigma: float | List | Tensor,
Convert a given sigma value(s) to a tensor representation.
- Parameters:
sigma (Union[float, List, torch.Tensor]) – Sigma value(s) to convert.
- Returns:
Tensor representation of sigma values.
- Return type:
torch.Tensor
See also
- property use_fp16#
Whether the model uses float16 precision.
- Returns:
True if the model is in float16 mode, False otherwise.
- Return type:
bool
- Type:
bool
- class physicsnemo.models.diffusion.preconditioning.EDMPrecondSuperResolutionMetaData(
- name: str = 'EDMPrecondSuperResolution',
- jit: bool = False,
- cuda_graphs: bool = False,
- amp: bool = False,
- amp_cpu: bool = False,
- amp_gpu: bool = True,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = None,
- onnx_cpu: bool = None,
- onnx_runtime: bool = False,
- trt: bool = False,
- var_dim: int = -1,
- func_torch: bool = False,
- auto_grad: bool = False,
Bases:
ModelMetaData
EDMPrecondSuperResolution meta data
- class physicsnemo.models.diffusion.preconditioning.VEPrecond(*args, **kwargs)[source]#
Bases:
Module
Preconditioning corresponding to the variance exploding (VE) formulation.
- Parameters:
img_resolution (int) – Image resolution.
img_channels (int) – Number of color channels.
label_dim (int) – Number of class labels, 0 = unconditional, by default 0.
use_fp16 (bool) – Execute the underlying model at FP16 precision?, by default False.
sigma_min (float) – Minimum supported noise level, by default 0.02.
sigma_max (float) – Maximum supported noise level, by default 100.0.
model_type (str) – Class name of the underlying model, by default “SongUNet”.
**model_kwargs (dict) – Keyword arguments for the underlying model.
Note
Reference: Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S. and Poole, B., 2020. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456.
- forward(
- x,
- sigma,
- class_labels=None,
- force_fp32=False,
- **model_kwargs,
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class physicsnemo.models.diffusion.preconditioning.VEPrecondMetaData(
- name: str = 'VEPrecond',
- jit: bool = False,
- cuda_graphs: bool = False,
- amp: bool = False,
- amp_cpu: bool = False,
- amp_gpu: bool = True,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = None,
- onnx_cpu: bool = None,
- onnx_runtime: bool = False,
- trt: bool = False,
- var_dim: int = -1,
- func_torch: bool = False,
- auto_grad: bool = False,
Bases:
ModelMetaData
VEPrecond meta data
- class physicsnemo.models.diffusion.preconditioning.VEPrecond_dfsr(
- img_resolution: int,
- img_channels: int,
- label_dim: int = 0,
- use_fp16: bool = False,
- sigma_min: float = 0.02,
- sigma_max: float = 100.0,
- dataset_mean: float = 5.85e-05,
- dataset_scale: float = 4.79,
- model_type: str = 'SongUNet',
- **model_kwargs: dict,
Bases:
Module
Preconditioning for dfsr model, modified from class VEPrecond, where the input argument ‘sigma’ in forward propagation function is used to receive the timestep of the backward diffusion process.
- Parameters:
img_resolution (int) – Image resolution.
img_channels (int) – Number of color channels.
label_dim (int) – Number of class labels, 0 = unconditional, by default 0.
use_fp16 (bool) – Execute the underlying model at FP16 precision?, by default False.
sigma_min (float) – Minimum supported noise level, by default 0.02.
sigma_max (float) – Maximum supported noise level, by default 100.0.
model_type (str) – Class name of the underlying model, by default “SongUNet”.
**model_kwargs (dict) – Keyword arguments for the underlying model.
Note
Reference: Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models. Advances in neural information processing systems. 2020;33:6840-51.
- forward(
- x,
- sigma,
- class_labels=None,
- force_fp32=False,
- **model_kwargs,
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class physicsnemo.models.diffusion.preconditioning.VEPrecond_dfsr_cond(
- img_resolution: int,
- img_channels: int,
- label_dim: int = 0,
- use_fp16: bool = False,
- sigma_min: float = 0.02,
- sigma_max: float = 100.0,
- dataset_mean: float = 5.85e-05,
- dataset_scale: float = 4.79,
- model_type: str = 'SongUNet',
- **model_kwargs: dict,
Bases:
Module
Preconditioning for dfsr model with physics-informed conditioning input, modified from class VEPrecond, where the input argument ‘sigma’ in forward propagation function is used to receive the timestep of the backward diffusion process. The gradient of PDE residual with respect to the vorticity in the governing Navier-Stokes equation is computed as the physics-informed conditioning variable and is combined with the backward diffusion timestep before being sent to the underlying model for noise prediction.
- Parameters:
img_resolution (int) – Image resolution.
img_channels (int) – Number of color channels.
label_dim (int) – Number of class labels, 0 = unconditional, by default 0.
use_fp16 (bool) – Execute the underlying model at FP16 precision?, by default False.
sigma_min (float) – Minimum supported noise level, by default 0.02.
sigma_max (float) – Maximum supported noise level, by default 100.0.
model_type (str) – Class name of the underlying model, by default “SongUNet”.
**model_kwargs (dict) – Keyword arguments for the underlying model.
Note
Reference: [1] Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S. and Poole, B., 2020. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456. [2] Shu D, Li Z, Farimani AB. A physics-informed diffusion model for high-fidelity flow field reconstruction. Journal of Computational Physics. 2023 Apr 1;478:111972.
- forward(
- x,
- sigma,
- class_labels=None,
- force_fp32=False,
- **model_kwargs,
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- voriticity_residual(w, re=1000.0, dt=0.03125)[source]#
Compute the gradient of PDE residual with respect to a given vorticity w using the spectrum method.
- Parameters:
w (torch.Tensor) – The fluid flow data sample (vorticity).
re (float) – The value of Reynolds number used in the governing Navier-Stokes equation.
dt (float) – Time step used to compute the time-derivative of vorticity included in the governing Navier-Stokes equation.
- Returns:
The computed vorticity gradient.
- Return type:
torch.Tensor
- class physicsnemo.models.diffusion.preconditioning.VPPrecond(*args, **kwargs)[source]#
Bases:
Module
Preconditioning corresponding to the variance preserving (VP) formulation.
- Parameters:
img_resolution (int) – Image resolution.
img_channels (int) – Number of color channels.
label_dim (int) – Number of class labels, 0 = unconditional, by default 0.
use_fp16 (bool) – Execute the underlying model at FP16 precision?, by default False.
beta_d (float) – Extent of the noise level schedule, by default 19.9.
beta_min (float) – Initial slope of the noise level schedule, by default 0.1.
M (int) – Original number of timesteps in the DDPM formulation, by default 1000.
epsilon_t (float) – Minimum t-value used during training, by default 1e-5.
model_type (str) – Class name of the underlying model, by default “SongUNet”.
**model_kwargs (dict) – Keyword arguments for the underlying model.
Note
Reference: Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S. and Poole, B., 2020. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456.
- forward(
- x,
- sigma,
- class_labels=None,
- force_fp32=False,
- **model_kwargs,
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- round_sigma(sigma: float | List | Tensor)[source]#
Convert a given sigma value(s) to a tensor representation.
- Parameters:
sigma (Union[float list, torch.Tensor]) – The sigma value(s) to convert.
- Returns:
The tensor representation of the provided sigma value(s).
- Return type:
torch.Tensor
- sigma(t: float | Tensor)[source]#
Compute the sigma(t) value for a given t based on the VP formulation.
The function calculates the noise level schedule for the diffusion process based on the given parameters beta_d and beta_min.
- Parameters:
t (Union[float, torch.Tensor]) – The timestep or set of timesteps for which to compute sigma(t).
- Returns:
The computed sigma(t) value(s).
- Return type:
torch.Tensor
- sigma_inv(sigma: float | Tensor)[source]#
Compute the inverse of the sigma function for a given sigma.
This function effectively calculates t from a given sigma(t) based on the parameters beta_d and beta_min.
- Parameters:
sigma (Union[float, torch.Tensor]) – The sigma(t) value or set of sigma(t) values for which to compute the inverse.
- Returns:
The computed t value(s) corresponding to the provided sigma(t).
- Return type:
torch.Tensor
- class physicsnemo.models.diffusion.preconditioning.VPPrecondMetaData(
- name: str = 'VPPrecond',
- jit: bool = False,
- cuda_graphs: bool = False,
- amp: bool = False,
- amp_cpu: bool = False,
- amp_gpu: bool = True,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = None,
- onnx_cpu: bool = None,
- onnx_runtime: bool = False,
- trt: bool = False,
- var_dim: int = -1,
- func_torch: bool = False,
- auto_grad: bool = False,
Bases:
ModelMetaData
VPPrecond meta data
- class physicsnemo.models.diffusion.preconditioning.iDDPMPrecond(*args, **kwargs)[source]#
Bases:
Module
Preconditioning corresponding to the improved DDPM (iDDPM) formulation.
- Parameters:
img_resolution (int) – Image resolution.
img_channels (int) – Number of color channels.
label_dim (int) – Number of class labels, 0 = unconditional, by default 0.
use_fp16 (bool) – Execute the underlying model at FP16 precision?, by default False.
C_1 (float) – Timestep adjustment at low noise levels., by default 0.001.
C_2 (float) – Timestep adjustment at high noise levels., by default 0.008.
M (int) – Original number of timesteps in the DDPM formulation, by default 1000.
model_type (str) – Class name of the underlying model, by default “DhariwalUNet”.
**model_kwargs (dict) – Keyword arguments for the underlying model.
Note
Reference: Nichol, A.Q. and Dhariwal, P., 2021, July. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning (pp. 8162-8171). PMLR.
- alpha_bar(j)[source]#
Compute the alpha_bar(j) value for a given j based on the iDDPM formulation.
- Parameters:
j (Union[int, torch.Tensor]) – The timestep or set of timesteps for which to compute alpha_bar(j).
- Returns:
The computed alpha_bar(j) value(s).
- Return type:
torch.Tensor
- forward(
- x,
- sigma,
- class_labels=None,
- force_fp32=False,
- **model_kwargs,
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- round_sigma(sigma, return_index=False)[source]#
Round the provided sigma value(s) to the nearest value(s) in a pre-defined set u.
- Parameters:
sigma (Union[float, list, torch.Tensor]) – The sigma value(s) to round.
return_index (bool, optional) – Whether to return the index/indices of the rounded value(s) in u instead of the rounded value(s) themselves, by default False.
- Returns:
The rounded sigma value(s) or their index/indices in u, depending on the value of return_index.
- Return type:
torch.Tensor
- class physicsnemo.models.diffusion.preconditioning.iDDPMPrecondMetaData(
- name: str = 'iDDPMPrecond',
- jit: bool = False,
- cuda_graphs: bool = False,
- amp: bool = False,
- amp_cpu: bool = False,
- amp_gpu: bool = True,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = None,
- onnx_cpu: bool = None,
- onnx_runtime: bool = False,
- trt: bool = False,
- var_dim: int = -1,
- func_torch: bool = False,
- auto_grad: bool = False,
Bases:
ModelMetaData
iDDPMPrecond meta data
Weather / Climate Models#
- class physicsnemo.models.dlwp.dlwp.DLWP(*args, **kwargs)[source]#
Bases:
Module
A Convolutional model for Deep Learning Weather Prediction that works on Cubed-sphere grids.
This model expects the input to be of shape [N, C, 6, Res, Res]
- Parameters:
nr_input_channels (int) – Number of channels in the input
nr_output_channels (int) – Number of channels in the output
nr_initial_channels (int) – Number of channels in the initial convolution. This governs the overall channels in the model.
activation_fn (str) – Activation function for the convolutions
depth (int) – Depth for the U-Net
clamp_activation (Tuple of ints, floats or None) – The min and max value used for torch.clamp()
Example
>>> model = physicsnemo.models.dlwp.DLWP( ... nr_input_channels=2, ... nr_output_channels=4, ... ) >>> input = torch.randn(4, 2, 6, 64, 64) # [N, C, F, Res, Res] >>> output = model(input) >>> output.size() torch.Size([4, 4, 6, 64, 64])
Note
- Reference: Weyn, Jonathan A., et al. “Sub‐seasonal forecasting with a large ensemble
of deep‐learning weather prediction models.” Journal of Advances in Modeling Earth Systems 13.7 (2021): e2021MS002502.
- forward(cubed_sphere_input)[source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class physicsnemo.models.dlwp.dlwp.MetaData(
- name: str = 'DLWP',
- jit: bool = False,
- cuda_graphs: bool = True,
- amp: bool = False,
- amp_cpu: bool = True,
- amp_gpu: bool = True,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = None,
- onnx_cpu: bool = None,
- onnx_runtime: bool = False,
- trt: bool = False,
- var_dim: int = 1,
- func_torch: bool = False,
- auto_grad: bool = False,
Bases:
ModelMetaData
- class physicsnemo.models.dlwp_healpix.HEALPixRecUNet.HEALPixRecUNet(*args, **kwargs)[source]#
Bases:
Module
Deep Learning Weather Prediction (DLWP) recurrent UNet model on the HEALPix mesh.
- forward(
- inputs: Sequence,
- output_only_last=False,
Forward pass of the HEALPixUnet
- Parameters:
inputs (Sequence) – Inputs to the model, of the form [prognostics|TISR|constants] [B, F, T, C, H, W] is the format for prognostics and TISR [F, C, H, W] is the format for constants
output_only_last (bool, optional) – If only the last dimension of the outputs should be returned
- Returns:
th.Tensor
- Return type:
Predicted outputs
- property integration_steps#
Number of integration steps
- class physicsnemo.models.dlwp_healpix.HEALPixRecUNet.MetaData(
- name: str = 'DLWP_HEALPixRec',
- jit: bool = False,
- cuda_graphs: bool = True,
- amp: bool = False,
- amp_cpu: bool = True,
- amp_gpu: bool = True,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = None,
- onnx_cpu: bool = None,
- onnx_runtime: bool = False,
- trt: bool = False,
- var_dim: int = 1,
- func_torch: bool = False,
- auto_grad: bool = False,
Bases:
ModelMetaData
Metadata for the DLWP HEALPix Model
- class physicsnemo.models.graphcast.graph_cast_net.GraphCastNet(*args, **kwargs)[source]#
Bases:
Module
GraphCast network architecture
- Parameters:
multimesh_level (int, optional) – Level of the latent mesh, by default 6
multimesh (bool, optional) – If the latent mesh is a multimesh, by default True If True, the latent mesh includes the nodes corresponding to the specified mesh_level`and incorporates the edges from all mesh levels ranging from level 0 up to and including `mesh_level.
input_res (Tuple[int, int]) – Input resolution of the latitude-longitude grid
input_dim_grid_nodes (int, optional) – Input dimensionality of the grid node features, by default 474
input_dim_mesh_nodes (int, optional) – Input dimensionality of the mesh node features, by default 3
input_dim_edges (int, optional) – Input dimensionality of the edge features, by default 4
output_dim_grid_nodes (int, optional) – Final output dimensionality of the edge features, by default 227
processor_type (str, optional) – The type of processor used in this model. Available options are ‘MessagePassing’, and ‘GraphTransformer’, which correspond to the processors in GraphCast and GenCast, respectively. By default ‘MessagePassing’.
khop_neighbors (int, optional) – Number of khop neighbors used in the GraphTransformer. This option is ignored if ‘MessagePassing’ processor is used. By default 0.
processor_layers (int, optional) – Number of processor layers, by default 16
hidden_layers (int, optional) – Number of hiddel layers, by default 1
hidden_dim (int, optional) – Number of neurons in each hidden layer, by default 512
aggregation (str, optional) – Message passing aggregation method (“sum”, “mean”), by default “sum”
activation_fn (str, optional) – Type of activation function, by default “silu”
norm_type (str, optional) – Normalization type [“TELayerNorm”, “LayerNorm”]. Use “TELayerNorm” for optimal performance. By default “LayerNorm”.
use_cugraphops_encoder (bool, default=False) – Flag to select cugraphops kernels in encoder
use_cugraphops_processor (bool, default=False) – Flag to select cugraphops kernels in the processor
use_cugraphops_decoder (bool, default=False) – Flag to select cugraphops kernels in the decoder
do_concat_trick (bool, default=False) – Whether to replace concat+MLP with MLP+idx+sum
recompute_activation (bool, optional) – Flag for recomputing activation in backward to save memory, by default False. Currently, only SiLU is supported.
partition_size (int, default=1) – Number of process groups across which graphs are distributed. If equal to 1, the model is run in a normal Single-GPU configuration.
partition_group_name (str, default=None) – Name of process group across which graphs are distributed. If partition_size is set to 1, the model is run in a normal Single-GPU configuration and the specification of a process group is not necessary. If partitition_size > 1, passing no process group name leads to a parallelism across the default process group. Otherwise, the group size of a process group is expected to match partition_size.
use_lat_lon_partitioning (bool, default=False) – flag to specify whether all graphs (grid-to-mesh, mesh, mesh-to-grid) are partitioned based on lat-lon-coordinates of nodes or based on IDs.
expect_partitioned_input (bool, default=False) – Flag indicating whether the model expects the input to be already partitioned. This can be helpful e.g. in multi-step rollouts to avoid aggregating the output just to distribute it in the next step again.
global_features_on_rank_0 (bool, default=False) – Flag indicating whether the model expects the input to be present in its “global” form only on group_rank 0. During the input preparation phase, the model will take care of scattering the input accordingly onto all ranks of the process group across which the graph is partitioned. Note that only either this flag or expect_partitioned_input can be set at a time.
produce_aggregated_output (bool, default=True) – Flag indicating whether the model produces the aggregated output on each rank of the procress group across which the graph is distributed or whether the output is kept distributed. This can be helpful e.g. in multi-step rollouts to avoid aggregating the output just to distribute it in the next step again.
produce_aggregated_output_on_all_ranks (bool, default=True) – Flag indicating - if produce_aggregated_output is True - whether the model produces the aggregated output on each rank of the process group across which the group is distributed or only on group_rank 0. This can be helpful for computing the loss using global targets only on a single rank which can avoid either having to distribute the computation of a loss function.
Note
Based on these papers:
- “GraphCast: Learning skillful medium-range global weather forecasting”
- “Forecasting Global Weather with Graph Neural Networks”
- “Learning Mesh-Based Simulation with Graph Networks”
- “MultiScale MeshGraphNets”
- “GenCast: Diffusion-based ensemble forecasting for medium-range weather”
- custom_forward(
- grid_nfeat: Tensor,
GraphCast forward method with support for gradient checkpointing.
- Parameters:
grid_nfeat (Tensor) – Node features of the latitude-longitude graph.
- Returns:
grid_nfeat_finale – Predicted node features of the latitude-longitude graph.
- Return type:
Tensor
- decoder_forward(
- mesh_efeat_processed: Tensor,
- mesh_nfeat_processed: Tensor,
- grid_nfeat_encoded: Tensor,
Forward method for the last layer of the processor, the decoder, and the final MLP.
- Parameters:
mesh_efeat_processed (Tensor) – Multimesh edge features processed by the processor.
mesh_nfeat_processed (Tensor) – Multi-mesh node features processed by the processor.
grid_nfeat_encoded (Tensor) – The encoded node features for the latitude-longitude grid.
- Returns:
grid_nfeat_finale – The final node features for the latitude-longitude grid.
- Return type:
Tensor
- encoder_forward(
- grid_nfeat: Tensor,
Forward method for the embedder, encoder, and the first of the processor.
- Parameters:
grid_nfeat (Tensor) – Node features for the latitude-longitude grid.
- Returns:
mesh_efeat_processed (Tensor) – Processed edge features for the multimesh.
mesh_nfeat_processed (Tensor) – Processed node features for the multimesh.
grid_nfeat_encoded (Tensor) – Encoded node features for the latitude-longitude grid.
- forward(grid_nfeat: Tensor) Tensor [source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- prepare_input(
- invar: Tensor,
- expect_partitioned_input: bool,
- global_features_on_rank_0: bool,
Prepares the input to the model in the required shape.
- Parameters:
invar (Tensor) – Input in the shape [N, C, H, W].
expect_partitioned_input (bool) – flag indicating whether input is partioned according to graph partitioning scheme
global_features_on_rank_0 (bool) – Flag indicating whether input is in its “global” form only on group_rank 0 which requires a scatter operation beforehand. Note that only either this flag or expect_partitioned_input can be set at a time.
- Returns:
Reshaped input.
- Return type:
Tensor
- prepare_output(
- outvar: Tensor,
- produce_aggregated_output: bool,
- produce_aggregated_output_on_all_ranks: bool = True,
Prepares the output of the model in the shape [N, C, H, W].
- Parameters:
outvar (Tensor) – Output of the final MLP of the model.
produce_aggregated_output (bool) – flag indicating whether output is gathered onto each rank or kept distributed
produce_aggregated_output_on_all_ranks (bool) – flag indicating whether output is gatherered on each rank or only gathered at group_rank 0, True by default and only valid if produce_aggregated_output is set.
- Returns:
The reshaped output of the model.
- Return type:
Tensor
- set_checkpoint_decoder(checkpoint_flag: bool)[source]#
Sets checkpoint function for the last layer of the processor, the decoder, and the final MLP.
This function returns the appropriate checkpoint function based on the provided checkpoint_flag flag. If checkpoint_flag is True, the function returns the checkpoint function from PyTorch’s torch.utils.checkpoint. Otherwise, it returns an identity function that simply passes the inputs through the given layer.
- Parameters:
checkpoint_flag (bool) – Whether to use checkpointing for gradient computation. Checkpointing can reduce memory usage during backpropagation at the cost of increased computation time.
- Returns:
The selected checkpoint function to use for gradient computation.
- Return type:
Callable
- set_checkpoint_encoder(checkpoint_flag: bool)[source]#
Sets checkpoint function for the embedder, encoder, and the first of the processor.
This function returns the appropriate checkpoint function based on the provided checkpoint_flag flag. If checkpoint_flag is True, the function returns the checkpoint function from PyTorch’s torch.utils.checkpoint. Otherwise, it returns an identity function that simply passes the inputs through the given layer.
- Parameters:
checkpoint_flag (bool) – Whether to use checkpointing for gradient computation. Checkpointing can reduce memory usage during backpropagation at the cost of increased computation time.
- Returns:
The selected checkpoint function to use for gradient computation.
- Return type:
Callable
- set_checkpoint_model(checkpoint_flag: bool)[source]#
Sets checkpoint function for the entire model.
This function returns the appropriate checkpoint function based on the provided checkpoint_flag flag. If checkpoint_flag is True, the function returns the checkpoint function from PyTorch’s torch.utils.checkpoint. In this case, all the other gradient checkpoitings will be disabled. Otherwise, it returns an identity function that simply passes the inputs through the given layer.
- Parameters:
checkpoint_flag (bool) – Whether to use checkpointing for gradient computation. Checkpointing can reduce memory usage during backpropagation at the cost of increased computation time.
- Returns:
The selected checkpoint function to use for gradient computation.
- Return type:
Callable
- set_checkpoint_processor(checkpoint_segments: int)[source]#
Sets checkpoint function for the processor excluding the first and last layers.
This function returns the appropriate checkpoint function based on the provided checkpoint_segments flag. If checkpoint_segments is positive, the function returns the checkpoint function from PyTorch’s torch.utils.checkpoint, with number of checkpointing segments equal to checkpoint_segments. Otherwise, it returns an identity function that simply passes the inputs through the given layer.
- Parameters:
checkpoint_segments (int) – Number of checkpointing segments for gradient computation. Checkpointing can reduce memory usage during backpropagation at the cost of increased computation time.
- Returns:
The selected checkpoint function to use for gradient computation.
- Return type:
Callable
- to(
- *args: Any,
- **kwargs: Any,
Moves the object to the specified device, dtype, or format. This method moves the object and its underlying graph and graph features to the specified device, dtype, or format, and returns the updated object.
- Parameters:
*args (Any) – Positional arguments to be passed to the torch._C._nn._parse_to function.
**kwargs (Any) – Keyword arguments to be passed to the torch._C._nn._parse_to function.
- Returns:
The updated object after moving to the specified device, dtype, or format.
- Return type:
- class physicsnemo.models.graphcast.graph_cast_net.MetaData(
- name: str = 'GraphCastNet',
- jit: bool = False,
- cuda_graphs: bool = False,
- amp: bool = False,
- amp_cpu: bool = False,
- amp_gpu: bool = True,
- torch_fx: bool = False,
- bf16: bool = True,
- onnx: bool = False,
- onnx_gpu: bool = None,
- onnx_cpu: bool = None,
- onnx_runtime: bool = False,
- trt: bool = False,
- var_dim: int = -1,
- func_torch: bool = False,
- auto_grad: bool = False,
Bases:
ModelMetaData
- physicsnemo.models.graphcast.graph_cast_net.get_lat_lon_partition_separators(partition_size: int)[source]#
Utility Function to get separation intervals for lat-lon grid for partition_sizes of interest.
- Parameters:
partition_size (int) – size of graph partition
- class physicsnemo.models.fengwu.fengwu.Fengwu(*args, **kwargs)[source]#
Bases:
Module
FengWu PyTorch impl of: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead - https://arxiv.org/pdf/2304.02948.pdf
- Parameters:
img_size – Image size(Lat, Lon). Default: (721,1440)
pressure_level – Number of pressure_level. Default: 37
embed_dim (int) – Patch embedding dimension. Default: 192
patch_size (tuple[int]) – Patch token size. Default: (4,4)
num_heads (tuple[int]) – Number of attention heads in different layers.
window_size (tuple[int]) – Window size.
- forward(x)[source]#
- Parameters:
surface (torch.Tensor) – 2D n_lat=721, n_lon=1440, chans=4.
z (torch.Tensor) – 2D n_lat=721, n_lon=1440, chans=37.
r (torch.Tensor) – 2D n_lat=721, n_lon=1440, chans=37.
u (torch.Tensor) – 2D n_lat=721, n_lon=1440, chans=37.
v (torch.Tensor) – 2D n_lat=721, n_lon=1440, chans=37.
t (torch.Tensor) – 2D n_lat=721, n_lon=1440, chans=37.
- prepare_input(surface, z, r, u, v, t)[source]#
Prepares the input to the model in the required shape. :param surface: 2D n_lat=721, n_lon=1440, chans=4. :type surface: torch.Tensor :param z: 2D n_lat=721, n_lon=1440, chans=37. :type z: torch.Tensor :param r: 2D n_lat=721, n_lon=1440, chans=37. :type r: torch.Tensor :param u: 2D n_lat=721, n_lon=1440, chans=37. :type u: torch.Tensor :param v: 2D n_lat=721, n_lon=1440, chans=37. :type v: torch.Tensor :param t: 2D n_lat=721, n_lon=1440, chans=37. :type t: torch.Tensor
- class physicsnemo.models.fengwu.fengwu.MetaData(
- name: str = 'Fengwu',
- jit: bool = False,
- cuda_graphs: bool = True,
- amp: bool = True,
- amp_cpu: bool = None,
- amp_gpu: bool = None,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = True,
- onnx_cpu: bool = False,
- onnx_runtime: bool = True,
- trt: bool = False,
- var_dim: int = 1,
- func_torch: bool = False,
- auto_grad: bool = False,
Bases:
ModelMetaData
- class physicsnemo.models.pangu.pangu.MetaData(
- name: str = 'Pangu',
- jit: bool = False,
- cuda_graphs: bool = True,
- amp: bool = True,
- amp_cpu: bool = None,
- amp_gpu: bool = None,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = True,
- onnx_cpu: bool = False,
- onnx_runtime: bool = True,
- trt: bool = False,
- var_dim: int = 1,
- func_torch: bool = False,
- auto_grad: bool = False,
Bases:
ModelMetaData
- class physicsnemo.models.pangu.pangu.Pangu(*args, **kwargs)[source]#
Bases:
Module
Pangu A PyTorch impl of: Pangu-Weather: A 3D High-Resolution Model for Fast and Accurate Global Weather Forecast - https://arxiv.org/abs/2211.02556
- Parameters:
img_size (tuple[int]) – Image size [Lat, Lon].
patch_size (tuple[int]) – Patch token size [Lat, Lon].
embed_dim (int) – Patch embedding dimension. Default: 192
num_heads (tuple[int]) – Number of attention heads in different layers.
window_size (tuple[int]) – Window size.
- prepare_input(surface, surface_mask, upper_air)[source]#
Prepares the input to the model in the required shape. :param surface: 2D n_lat=721, n_lon=1440, chans=4. :type surface: torch.Tensor :param surface_mask: 2D n_lat=721, n_lon=1440, chans=3. :type surface_mask: torch.Tensor :param upper_air: 3D n_pl=13, n_lat=721, n_lon=1440, chans=5. :type upper_air: torch.Tensor
- class physicsnemo.models.swinvrnn.swinvrnn.MetaData(
- name: str = 'SwinRNN',
- jit: bool = False,
- cuda_graphs: bool = True,
- amp: bool = True,
- amp_cpu: bool = None,
- amp_gpu: bool = None,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = True,
- onnx_cpu: bool = False,
- onnx_runtime: bool = True,
- trt: bool = False,
- var_dim: int = 1,
- func_torch: bool = False,
- auto_grad: bool = False,
Bases:
ModelMetaData
- class physicsnemo.models.swinvrnn.swinvrnn.SwinRNN(*args, **kwargs)[source]#
Bases:
Module
Implementation of SwinRNN https://arxiv.org/abs/2205.13158 :param img_size: Image size [T, Lat, Lon]. :type img_size: Sequence[int], optional :param patch_size: Patch token size [T, Lat, Lon]. :type patch_size: Sequence[int], optional :param in_chans: number of input channels. :type in_chans: int, optional :param out_chans: number of output channels. :type out_chans: int, optional :param embed_dim: number of embed channels. :type embed_dim: int, optional :param num_groups: number of groups to separate the channels into. :type num_groups: Sequence[int] | int, optional :param num_heads: Number of attention heads. :type num_heads: int, optional :param window_size: Local window size. :type window_size: int | tuple[int], optional
- forward(x: Tensor)[source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.