PhysicsNeMo Models#

Basics#

PhysicsNeMo contains its own Model class for constructing neural networks. This model class is built on top of PyTorch’s nn.Module and can be used interchangeably within the PyTorch ecosystem. Using PhysicsNeMo models allows you to leverage various features of PhysicsNeMo aimed at improving performance and ease of use. These features include, but are not limited to, model zoo, automatic mixed-precision, CUDA Graphs, and easy checkpointing. We discuss each of these features in the following sections.

Model Zoo#

PhysicsNeMo contains several optimized, customizable and easy-to-use models. These include some very general models like Fourier Neural Operators (FNOs), ResNet, and Graph Neural Networks (GNNs) as well as domain-specific models like Deep Learning Weather Prediction (DLWP) and Spherical Fourier Neural Operators (SFNO).

For a list of currently available models, please refer the models on GitHub.

Below are some simple examples of how to use these models.

>>> import torch
>>> from physicsnemo.models.mlp.fully_connected import FullyConnected
>>> model = FullyConnected(in_features=32, out_features=64)
>>> input = torch.randn(128, 32)
>>> output = model(input)
>>> output.shape
torch.Size([128, 64])
>>> import torch
>>> from physicsnemo.models.fno.fno import FNO
>>> model = FNO(
        in_channels=4,
        out_channels=3,
        decoder_layers=2,
        decoder_layer_size=32,
        dimension=2,
        latent_channels=32,
        num_fno_layers=2,
        padding=0,
    )
>>> input = torch.randn(32, 4, 32, 32) #(N, C, H, W)
>>> output = model(input)
>>> output.size()
torch.Size([32, 3, 32, 32])

How to write your own PhysicsNeMo model#

There are a few different ways to construct a PhysicsNeMo model. If you are a seasoned PyTorch user, the easiest way would be to write your model using the optimized layers and utilities from PhysicsNeMo or Pytorch. Let’s take a look at a simple example of a UNet model first showing a simple PyTorch implementation and then a PhysicsNeMo implementation that supports CUDA Graphs and Automatic Mixed-Precision.

import torch.nn as nn

class UNet(nn.Module):
    def __init__(self, in_channels=1, out_channels=1):
        super(UNet, self).__init__()

        self.enc1 = self.conv_block(in_channels, 64)
        self.enc2 = self.conv_block(64, 128)

        self.dec1 = self.upconv_block(128, 64)
        self.final = nn.Conv2d(64, out_channels, kernel_size=1)

    def conv_block(self, in_channels, out_channels):
        return nn.Sequential(
            nn.Conv2d(in_channels, out_channels, 3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2)
        )

    def upconv_block(self, in_channels, out_channels):
        return nn.Sequential(
            nn.ConvTranspose2d(in_channels, out_channels, 2, stride=2),
            nn.Conv2d(out_channels, out_channels, 3, padding=1),
            nn.ReLU(inplace=True)
        )

    def forward(self, x):
        x1 = self.enc1(x)
        x2 = self.enc2(x1)
        x = self.dec1(x2)
        return self.final(x)

Now we show this model rewritten in PhysicsNeMo. First, let us subclass the model from physicsnemo.Module instead of torch.nn.Module. The physicsnemo.Module class acts like a direct replacement for the torch.nn.Module and provides additional functionality for saving and loading checkpoints, etc. Refer to the API docs of physicsnemo.Module for further details. Additionally, we will add metadata to the model to capture the optimizations that this model supports. In this case we will enable CUDA Graphs and Automatic Mixed-Precision.

from dataclasses import dataclass
import physicsnemo
import torch.nn as nn

@dataclass
class UNetMetaData(physicsnemo.ModelMetaData):
    name: str = "UNet"
    # Optimization
    jit: bool = True
    cuda_graphs: bool = True
    amp_cpu: bool = True
    amp_gpu: bool = True

class UNet(physicsnemo.Module):
    def __init__(self, in_channels=1, out_channels=1):
        super(UNet, self).__init__(meta=UNetMetaData())

        self.enc1 = self.conv_block(in_channels, 64)
        self.enc2 = self.conv_block(64, 128)

        self.dec1 = self.upconv_block(128, 64)
        self.final = nn.Conv2d(64, out_channels, kernel_size=1)

    def conv_block(self, in_channels, out_channels):
        return nn.Sequential(
            nn.Conv2d(in_channels, out_channels, 3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2)
        )

    def upconv_block(self, in_channels, out_channels):
        return nn.Sequential(
            nn.ConvTranspose2d(in_channels, out_channels, 2, stride=2),
            nn.Conv2d(out_channels, out_channels, 3, padding=1),
            nn.ReLU(inplace=True)
        )

    def forward(self, x):
        x1 = self.enc1(x)
        x2 = self.enc2(x1)
        x = self.dec1(x2)
        return self.final(x)

Now that we have our PhysicsNeMo model, we can make use of these optimizations using the physicsnemo.utils.StaticCaptureTraining decorator. This decorator will capture the training step function and optimize it for the specified optimizations.

import torch
from physicsnemo.utils import StaticCaptureTraining

model = UNet().to("cuda")
input = torch.randn(8, 1, 128, 128).to("cuda")
output = torch.zeros(8, 1, 64, 64).to("cuda")

optim = torch.optim.Adam(model.parameters(), lr=0.001)

# Create training step function with optimization wrapper
# StaticCaptureTraining calls `backward` on the loss and
# `optimizer.step()` so you don't have to do that
# explicitly.
@StaticCaptureTraining(
    model=model,
    optim=optim,
    cuda_graph_warmup=11,
)
def training_step(invar, outvar):
    predvar = model(invar)
    loss = torch.sum(torch.pow(predvar - outvar, 2))
    return loss

# Sample training loop
for i in range(20):
    # In place copy of input and output to support cuda graphs
    input.copy_(torch.randn(8, 1, 128, 128).to("cuda"))
    output.copy_(torch.zeros(8, 1, 64, 64).to("cuda"))

    # Run training step
    loss = training_step(input, output)

For the simple model above, you can observe ~1.1x speed-up due to CUDA Graphs and AMP. The speed-up observed changes from model to model and is typically greater for more complex models.

Note

The ModelMetaData and physicsnemo.Module do not make the model support CUDA Graphs, AMP, etc. optimizations automatically. The user is responsible to write the model code that enables each of these optimizations. Models in the PhysicsNeMo Model Zoo are written to support many of these optimizations and checked against PhysicsNeMo’s CI to ensure that they work correctly.

Note

The StaticCaptureTraining decorator is still under development and may be refactored in the future.

Converting PyTorch Models to PhysicsNeMo Models#

In the above example we show constructing a PhysicsNeMo model from scratch. However, you can also convert existing PyTorch models to PhysicsNeMo models in order to leverage PhysicsNeMo features. To do this, you can use the Module.from_torch method as shown below.

from dataclasses import dataclass
import physicsnemo
import torch.nn as nn

class TorchModel(nn.Module):
    def __init__(self):
        super(TorchModel, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = self.conv1(x)
        return self.conv2(x)

@dataclass
class ConvMetaData(ModelMetaData):
    name: str = "UNet"
    # Optimization
    jit: bool = True
    cuda_graphs: bool = True
    amp_cpu: bool = True
    amp_gpu: bool = True

PhysicsNeMoModel = physicsnemo.Module.from_torch(TorchModel, meta=ConvMetaData())

Saving and Loading PhysicsNeMo Models#

As mentioned above, PhysicsNeMo models are interoperable with PyTorch models. This means that you can save and load PhysicsNeMo models using the standard PyTorch APIs however, we provide a few additional utilities to make this process easier. A key challenge in saving and loading models is keeping track of the model metadata such as layer sizes, etc. PhysicsNeMo models can be saved with this metadata to a custom .mdlus file. These files allow for easy loading and instantiation of the model. We show two examples of this below. The first example shows saving and loading a model from an already instantiated model.

 >>> from physicsnemo.models.mlp.fully_connected import FullyConnected
 >>> model = FullyConnected(in_features=32, out_features=64)
 >>> model.save("model.mdlus") # Save model to .mdlus file
 >>> model.load("model.mdlus") # Load model weights from .mdlus file from already instantiated model
 >>> model
 FullyConnected(
  (layers): ModuleList(
    (0): FCLayer(
      (activation_fn): SiLU()
      (linear): Linear(in_features=32, out_features=512, bias=True)
    )
    (1-5): 5 x FCLayer(
      (activation_fn): SiLU()
      (linear): Linear(in_features=512, out_features=512, bias=True)
    )
  )
  (final_layer): FCLayer(
    (activation_fn): Identity()
    (linear): Linear(in_features=512, out_features=64, bias=True)
  )
)

The second example shows loading a model from a .mdlus file without having to instantiate the model first. We note that in this case we don’t know the class or parameters to pass to the constructor of the model. However, we can still load the model from the .mdlus file.

 >>> from physicsnemo import Module
 >>> fc_model = Module.from_checkpoint("model.mdlus") # Instantiate model from .mdlus file.
 >>> fc_model
 FullyConnected(
  (layers): ModuleList(
    (0): FCLayer(
      (activation_fn): SiLU()
      (linear): Linear(in_features=32, out_features=512, bias=True)
    )
    (1-5): 5 x FCLayer(
      (activation_fn): SiLU()
      (linear): Linear(in_features=512, out_features=512, bias=True)
    )
  )
  (final_layer): FCLayer(
    (activation_fn): Identity()
    (linear): Linear(in_features=512, out_features=64, bias=True)
  )
)

Note

In order to make use of this functionality, the model must have .json serializable inputs to the __init__ function. It is highly recommended that all PhysicsNeMo models be developed with this requirement in mind.

Note

Using Module.from_checkpoint will not work if the model has any buffers or parameters that are registered outside of the model’s __init__ function due to the above requirement. In that case, one should use Module.load, or ensure that all model parameters and buffers are registered inside __init__.

PhysicsNeMo Model Registry and Entry Points#

PhysicsNeMo contains a model registry that allows for easy access and ingestion of models. Below is a simple example of how to use the model registry to obtain a model class.

>>> from physicsnemo.registry import ModelRegistry
>>> model_registry = ModelRegistry()
>>> model_registry.list_models()
['AFNO', 'DLWP', 'FNO', 'FullyConnected', 'GraphCastNet', 'MeshGraphNet', 'One2ManyRNN', 'Pix2Pix', 'SFNO', 'SRResNet']
>>> FullyConnected = model_registry.factory("FullyConnected")
>>> model = FullyConnected(in_features=32, out_features=64)

The model registry also allows exposing models via entry points. This allows for integration of models into the PhysicsNeMo ecosystem. For example, suppose you have a package MyPackage that contains a model MyModel. You can expose this model to the PhysicsNeMo registry by adding an entry point to your toml file. For example, suppose your package structure is as follows:

# setup.py

from setuptools import setup, find_packages

setup()
# pyproject.toml

[build-system]
requires = ["setuptools", "wheel"]
build-backend = "setuptools.build_meta"

[project]
name = "MyPackage"
description = "My Neural Network Zoo."
version = "0.1.0"

[project.entry-points."physicsnemo.models"]
MyPhysicsNeMoModel = "mypackage.models.MyPhysicsNeMoModel:MyPhysicsNeMoModel"
# mypackage/models.py

import torch.nn as nn
from physicsnemo.models import Module

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = self.conv1(x)
        return self.conv2(x)

MyPhysicsNeMoModel = Module.from_pytorch(MyModel)

Once this package is installed, you can access the model via the PhysicsNeMo model registry.

>>> from physicsnemo.registry import ModelRegistry
>>> model_registry = ModelRegistry()
>>> model_registry.list_models()
['MyPhysicsNeMoModel', 'AFNO', 'DLWP', 'FNO', 'FullyConnected', 'GraphCastNet', 'MeshGraphNet', 'One2ManyRNN', 'Pix2Pix', 'SFNO', 'SRResNet']
>>> MyPhysicsNeMoModel = model_registry.factory("MyPhysicsNeMoModel")

For more information on entry points and potential use cases, see this blog post.

Fully Connected Network#

class physicsnemo.models.mlp.fully_connected.FullyConnected(*args, **kwargs)[source]#

Bases: Module

A densely-connected MLP architecture

Parameters:
  • in_features (int, optional) – Size of input features, by default 512

  • layer_size (int, optional) – Size of every hidden layer, by default 512

  • out_features (int, optional) – Size of output features, by default 512

  • num_layers (int, optional) – Number of hidden layers, by default 6

  • activation_fn (Union[str, List[str]], optional) – Activation function to use, by default ‘silu’

  • skip_connections (bool, optional) – Add skip connections every 2 hidden layers, by default False

  • adaptive_activations (bool, optional) – Use an adaptive activation function, by default False

  • weight_norm (bool, optional) – Use weight norm on fully connected layers, by default False

  • weight_fact (bool, optional) – Use weight factorization on fully connected layers, by default False

Example

>>> model = physicsnemo.models.mlp.FullyConnected(in_features=32, out_features=64)
>>> input = torch.randn(128, 32)
>>> output = model(input)
>>> output.size()
torch.Size([128, 64])
forward(x: Tensor) Tensor[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class physicsnemo.models.mlp.fully_connected.MetaData(
name: str = 'FullyConnected',
jit: bool = True,
cuda_graphs: bool = True,
amp: bool = True,
amp_cpu: bool = None,
amp_gpu: bool = None,
torch_fx: bool = True,
bf16: bool = False,
onnx: bool = True,
onnx_gpu: bool = None,
onnx_cpu: bool = None,
onnx_runtime: bool = True,
trt: bool = False,
var_dim: int = -1,
func_torch: bool = True,
auto_grad: bool = True,
)[source]#

Bases: ModelMetaData

Fourier Neural Operators#

class physicsnemo.models.fno.fno.FNO(*args, **kwargs)[source]#

Bases: Module

Fourier neural operator (FNO) model.

Note

The FNO architecture supports options for 1D, 2D, 3D and 4D fields which can be controlled using the dimension parameter.

Parameters:
  • in_channels (int) – Number of input channels

  • out_channels (int) – Number of output channels

  • decoder_layers (int, optional) – Number of decoder layers, by default 1

  • decoder_layer_size (int, optional) – Number of neurons in decoder layers, by default 32

  • decoder_activation_fn (str, optional) – Activation function for decoder, by default “silu”

  • dimension (int) – Model dimensionality (supports 1, 2, 3).

  • latent_channels (int, optional) – Latent features size in spectral convolutions, by default 32

  • num_fno_layers (int, optional) – Number of spectral convolutional layers, by default 4

  • num_fno_modes (Union[int, List[int]], optional) – Number of Fourier modes kept in spectral convolutions, by default 16

  • padding (int, optional) – Domain padding for spectral convolutions, by default 8

  • padding_type (str, optional) – Type of padding for spectral convolutions, by default “constant”

  • activation_fn (str, optional) – Activation function, by default “gelu”

  • coord_features (bool, optional) – Use coordinate grid as additional feature map, by default True

Example

>>> # define the 2d FNO model
>>> model = physicsnemo.models.fno.FNO(
...     in_channels=4,
...     out_channels=3,
...     decoder_layers=2,
...     decoder_layer_size=32,
...     dimension=2,
...     latent_channels=32,
...     num_fno_layers=2,
...     padding=0,
... )
>>> input = torch.randn(32, 4, 32, 32) #(N, C, H, W)
>>> output = model(input)
>>> output.size()
torch.Size([32, 3, 32, 32])

Note

Reference: Li, Zongyi, et al. “Fourier neural operator for parametric partial differential equations.” arXiv preprint arXiv:2010.08895 (2020).

forward(x: Tensor) Tensor[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class physicsnemo.models.fno.fno.FNO1DEncoder(
in_channels: int = 1,
num_fno_layers: int = 4,
fno_layer_size: int = 32,
num_fno_modes: int | List[int] = 16,
padding: int | List[int] = 8,
padding_type: str = 'constant',
activation_fn: Module = GELU(approximate='none'),
coord_features: bool = True,
)[source]#

Bases: Module

1D Spectral encoder for FNO

Parameters:
  • in_channels (int, optional) – Number of input channels, by default 1

  • num_fno_layers (int, optional) – Number of spectral convolutional layers, by default 4

  • fno_layer_size (int, optional) – Latent features size in spectral convolutions, by default 32

  • num_fno_modes (Union[int, List[int]], optional) – Number of Fourier modes kept in spectral convolutions, by default 16

  • padding (Union[int, List[int]], optional) – Domain padding for spectral convolutions, by default 8

  • padding_type (str, optional) – Type of padding for spectral convolutions, by default “constant”

  • activation_fn (nn.Module, optional) – Activation function, by default nn.GELU

  • coord_features (bool, optional) – Use coordinate grid as additional feature map, by default True

build_fno(num_fno_modes: List[int]) None[source]#

construct FNO block. :param num_fno_modes: Number of Fourier modes kept in spectral convolutions :type num_fno_modes: List[int]

build_lift_network() None[source]#

construct network for lifting variables to latent space.

forward(x: Tensor) Tensor[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

grid_to_points(
value: Tensor,
) Tuple[Tensor, List[int]][source]#

converting from grid based (image) to point based representation

Parameters:

value (Meshgrid tensor)

Returns:

Tensor, meshgrid shape

Return type:

Tuple

meshgrid(
shape: List[int],
device: device,
) Tensor[source]#

Creates 1D meshgrid feature

Parameters:
  • shape (List[int]) – Tensor shape

  • device (torch.device) – Device model is on

Returns:

Meshgrid tensor

Return type:

Tensor

points_to_grid(
value: Tensor,
shape: List[int],
) Tensor[source]#

converting from point based to grid based (image) representation

Parameters:
  • value (Tensor) – Tensor

  • shape (List[int]) – meshgrid shape

Returns:

Meshgrid tensor

Return type:

Tensor

class physicsnemo.models.fno.fno.FNO2DEncoder(
in_channels: int = 1,
num_fno_layers: int = 4,
fno_layer_size: int = 32,
num_fno_modes: int | List[int] = 16,
padding: int | List[int] = 8,
padding_type: str = 'constant',
activation_fn: Module = GELU(approximate='none'),
coord_features: bool = True,
)[source]#

Bases: Module

2D Spectral encoder for FNO

Parameters:
  • in_channels (int, optional) – Number of input channels, by default 1

  • num_fno_layers (int, optional) – Number of spectral convolutional layers, by default 4

  • fno_layer_size (int, optional) – Latent features size in spectral convolutions, by default 32

  • num_fno_modes (Union[int, List[int]], optional) – Number of Fourier modes kept in spectral convolutions, by default 16

  • padding (Union[int, List[int]], optional) – Domain padding for spectral convolutions, by default 8

  • padding_type (str, optional) – Type of padding for spectral convolutions, by default “constant”

  • activation_fn (nn.Module, optional) – Activation function, by default nn.GELU

  • coord_features (bool, optional) – Use coordinate grid as additional feature map, by default True

build_fno(num_fno_modes: List[int]) None[source]#

construct FNO block. :param num_fno_modes: Number of Fourier modes kept in spectral convolutions :type num_fno_modes: List[int]

build_lift_network() None[source]#

construct network for lifting variables to latent space.

forward(x: Tensor) Tensor[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

grid_to_points(
value: Tensor,
) Tuple[Tensor, List[int]][source]#

converting from grid based (image) to point based representation

Parameters:

value (Meshgrid tensor)

Returns:

Tensor, meshgrid shape

Return type:

Tuple

meshgrid(
shape: List[int],
device: device,
) Tensor[source]#

Creates 2D meshgrid feature

Parameters:
  • shape (List[int]) – Tensor shape

  • device (torch.device) – Device model is on

Returns:

Meshgrid tensor

Return type:

Tensor

points_to_grid(
value: Tensor,
shape: List[int],
) Tensor[source]#

converting from point based to grid based (image) representation

Parameters:
  • value (Tensor) – Tensor

  • shape (List[int]) – meshgrid shape

Returns:

Meshgrid tensor

Return type:

Tensor

class physicsnemo.models.fno.fno.FNO3DEncoder(
in_channels: int = 1,
num_fno_layers: int = 4,
fno_layer_size: int = 32,
num_fno_modes: int | List[int] = 16,
padding: int | List[int] = 8,
padding_type: str = 'constant',
activation_fn: Module = GELU(approximate='none'),
coord_features: bool = True,
)[source]#

Bases: Module

3D Spectral encoder for FNO

Parameters:
  • in_channels (int, optional) – Number of input channels, by default 1

  • num_fno_layers (int, optional) – Number of spectral convolutional layers, by default 4

  • fno_layer_size (int, optional) – Latent features size in spectral convolutions, by default 32

  • num_fno_modes (Union[int, List[int]], optional) – Number of Fourier modes kept in spectral convolutions, by default 16

  • padding (Union[int, List[int]], optional) – Domain padding for spectral convolutions, by default 8

  • padding_type (str, optional) – Type of padding for spectral convolutions, by default “constant”

  • activation_fn (nn.Module, optional) – Activation function, by default nn.GELU

  • coord_features (bool, optional) – Use coordinate grid as additional feature map, by default True

build_fno(num_fno_modes: List[int]) None[source]#

construct FNO block. :param num_fno_modes: Number of Fourier modes kept in spectral convolutions :type num_fno_modes: List[int]

build_lift_network() None[source]#

construct network for lifting variables to latent space.

forward(x: Tensor) Tensor[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

grid_to_points(
value: Tensor,
) Tuple[Tensor, List[int]][source]#

converting from grid based (image) to point based representation

Parameters:

value (Meshgrid tensor)

Returns:

Tensor, meshgrid shape

Return type:

Tuple

meshgrid(
shape: List[int],
device: device,
) Tensor[source]#

Creates 3D meshgrid feature

Parameters:
  • shape (List[int]) – Tensor shape

  • device (torch.device) – Device model is on

Returns:

Meshgrid tensor

Return type:

Tensor

points_to_grid(
value: Tensor,
shape: List[int],
) Tensor[source]#

converting from point based to grid based (image) representation

Parameters:
  • value (Tensor) – Tensor

  • shape (List[int]) – meshgrid shape

Returns:

Meshgrid tensor

Return type:

Tensor

class physicsnemo.models.fno.fno.FNO4DEncoder(
in_channels: int = 1,
num_fno_layers: int = 4,
fno_layer_size: int = 32,
num_fno_modes: int | List[int] = 16,
padding: int | List[int] = 8,
padding_type: str = 'constant',
activation_fn: Module = GELU(approximate='none'),
coord_features: bool = True,
)[source]#

Bases: Module

4D Spectral encoder for FNO

Parameters:
  • in_channels (int, optional) – Number of input channels, by default 1

  • num_fno_layers (int, optional) – Number of spectral convolutional layers, by default 4

  • fno_layer_size (int, optional) – Latent features size in spectral convolutions, by default 32

  • num_fno_modes (Union[int, List[int]], optional) – Number of Fourier modes kept in spectral convolutions, by default 16

  • padding (Union[int, List[int]], optional) – Domain padding for spectral convolutions, by default 8

  • padding_type (str, optional) – Type of padding for spectral convolutions, by default “constant”

  • activation_fn (nn.Module, optional) – Activation function, by default nn.GELU

  • coord_features (bool, optional) – Use coordinate grid as additional feature map, by default True

build_fno(num_fno_modes: List[int]) None[source]#

construct FNO block. :param num_fno_modes: Number of Fourier modes kept in spectral convolutions :type num_fno_modes: List[int]

build_lift_network() None[source]#

construct network for lifting variables to latent space.

forward(x: Tensor) Tensor[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

grid_to_points(
value: Tensor,
) Tuple[Tensor, List[int]][source]#

converting from grid based (image) to point based representation

Parameters:

value (Meshgrid tensor)

Returns:

Tensor, meshgrid shape

Return type:

Tuple

meshgrid(
shape: List[int],
device: device,
) Tensor[source]#

Creates 4D meshgrid feature

Parameters:
  • shape (List[int]) – Tensor shape

  • device (torch.device) – Device model is on

Returns:

Meshgrid tensor

Return type:

Tensor

points_to_grid(
value: Tensor,
shape: List[int],
) Tensor[source]#

converting from point based to grid based (image) representation

Parameters:
  • value (Tensor) – Tensor

  • shape (List[int]) – meshgrid shape

Returns:

Meshgrid tensor

Return type:

Tensor

class physicsnemo.models.fno.fno.MetaData(
name: str = 'FourierNeuralOperator',
jit: bool = True,
cuda_graphs: bool = True,
amp: bool = False,
amp_cpu: bool = None,
amp_gpu: bool = None,
torch_fx: bool = False,
bf16: bool = False,
onnx: bool = False,
onnx_gpu: bool = False,
onnx_cpu: bool = False,
onnx_runtime: bool = False,
trt: bool = False,
var_dim: int = 1,
func_torch: bool = False,
auto_grad: bool = False,
)[source]#

Bases: ModelMetaData

class physicsnemo.models.afno.afno.AFNO(*args, **kwargs)[source]#

Bases: Module

Adaptive Fourier neural operator (AFNO) model.

Note

AFNO is a model that is designed for 2D images only.

Parameters:
  • inp_shape (List[int]) – Input image dimensions [height, width]

  • in_channels (int) – Number of input channels

  • out_channels (int) – Number of output channels

  • patch_size (List[int], optional) – Size of image patches, by default [16, 16]

  • embed_dim (int, optional) – Embedded channel size, by default 256

  • depth (int, optional) – Number of AFNO layers, by default 4

  • mlp_ratio (float, optional) – Ratio of layer MLP latent variable size to input feature size, by default 4.0

  • drop_rate (float, optional) – Drop out rate in layer MLPs, by default 0.0

  • num_blocks (int, optional) – Number of blocks in the block-diag frequency weight matrices, by default 16

  • sparsity_threshold (float, optional) – Sparsity threshold (softshrink) of spectral features, by default 0.01

  • hard_thresholding_fraction (float, optional) – Threshold for limiting number of modes used [0,1], by default 1

Example

>>> model = physicsnemo.models.afno.AFNO(
...     inp_shape=[32, 32],
...     in_channels=2,
...     out_channels=1,
...     patch_size=(8, 8),
...     embed_dim=16,
...     depth=2,
...     num_blocks=2,
... )
>>> input = torch.randn(32, 2, 32, 32) #(N, C, H, W)
>>> output = model(input)
>>> output.size()
torch.Size([32, 1, 32, 32])

Note

Reference: Guibas, John, et al. “Adaptive fourier neural operators: Efficient token mixers for transformers.” arXiv preprint arXiv:2111.13587 (2021).

forward(x: Tensor) Tensor[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

forward_features(x: Tensor) Tensor[source]#

Forward pass of core AFNO

class physicsnemo.models.afno.afno.AFNO2DLayer(
hidden_size: int,
num_blocks: int = 8,
sparsity_threshold: float = 0.01,
hard_thresholding_fraction: float = 1,
hidden_size_factor: int = 1,
)[source]#

Bases: Module

AFNO spectral convolution layer

Parameters:
  • hidden_size (int) – Feature dimensionality

  • num_blocks (int, optional) – Number of blocks used in the block diagonal weight matrix, by default 8

  • sparsity_threshold (float, optional) – Sparsity threshold (softshrink) of spectral features, by default 0.01

  • hard_thresholding_fraction (float, optional) – Threshold for limiting number of modes used [0,1], by default 1

  • hidden_size_factor (int, optional) – Factor to increase spectral features by after weight multiplication, by default 1

forward(x: Tensor) Tensor[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class physicsnemo.models.afno.afno.AFNOMlp(
in_features: int,
latent_features: int,
out_features: int,
activation_fn: Module = GELU(approximate='none'),
drop: float = 0.0,
)[source]#

Bases: Module

Fully-connected Multi-layer perception used inside AFNO

Parameters:
  • in_features (int) – Input feature size

  • latent_features (int) – Latent feature size

  • out_features (int) – Output feature size

  • activation_fn (nn.Module, optional) – Activation function, by default nn.GELU

  • drop (float, optional) – Drop out rate, by default 0.0

forward(x: Tensor) Tensor[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class physicsnemo.models.afno.afno.Block(
embed_dim: int,
num_blocks: int = 8,
mlp_ratio: float = 4.0,
drop: float = 0.0,
activation_fn: ~torch.nn.modules.module.Module = GELU(approximate='none'),
norm_layer: ~torch.nn.modules.module.Module = <class 'torch.nn.modules.normalization.LayerNorm'>,
double_skip: bool = True,
sparsity_threshold: float = 0.01,
hard_thresholding_fraction: float = 1.0,
)[source]#

Bases: Module

AFNO block, spectral convolution and MLP

Parameters:
  • embed_dim (int) – Embedded feature dimensionality

  • num_blocks (int, optional) – Number of blocks used in the block diagonal weight matrix, by default 8

  • mlp_ratio (float, optional) – Ratio of MLP latent variable size to input feature size, by default 4.0

  • drop (float, optional) – Drop out rate in MLP, by default 0.0

  • activation_fn (nn.Module, optional) – Activation function used in MLP, by default nn.GELU

  • norm_layer (nn.Module, optional) – Normalization function, by default nn.LayerNorm

  • double_skip (bool, optional) – Residual, by default True

  • sparsity_threshold (float, optional) – Sparsity threshold (softshrink) of spectral features, by default 0.01

  • hard_thresholding_fraction (float, optional) – Threshold for limiting number of modes used [0,1], by default 1

forward(x: Tensor) Tensor[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class physicsnemo.models.afno.afno.MetaData(
name: str = 'AFNO',
jit: bool = False,
cuda_graphs: bool = True,
amp: bool = True,
amp_cpu: bool = None,
amp_gpu: bool = None,
torch_fx: bool = False,
bf16: bool = False,
onnx: bool = False,
onnx_gpu: bool = True,
onnx_cpu: bool = False,
onnx_runtime: bool = True,
trt: bool = False,
var_dim: int = 1,
func_torch: bool = False,
auto_grad: bool = False,
)[source]#

Bases: ModelMetaData

class physicsnemo.models.afno.afno.PatchEmbed(
inp_shape: List[int],
in_channels: int,
patch_size: List[int] = [16, 16],
embed_dim: int = 256,
)[source]#

Bases: Module

Patch embedding layer

Converts 2D patch into a 1D vector for input to AFNO

Parameters:
  • inp_shape (List[int]) – Input image dimensions [height, width]

  • in_channels (int) – Number of input channels

  • patch_size (List[int], optional) – Size of image patches, by default [16, 16]

  • embed_dim (int, optional) – Embedded channel size, by default 256

forward(x: Tensor) Tensor[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class physicsnemo.models.afno.modafno.Block(
embed_dim: int,
mod_dim: int,
num_blocks: int = 8,
mlp_ratio: float = 4.0,
drop: float = 0.0,
activation_fn: ~torch.nn.modules.module.Module = GELU(approximate='none'),
norm_layer: ~torch.nn.modules.module.Module = <class 'torch.nn.modules.normalization.LayerNorm'>,
double_skip: bool = True,
sparsity_threshold: float = 0.01,
hard_thresholding_fraction: float = 1.0,
modulate_filter: bool = True,
modulate_mlp: bool = True,
scale_shift_mode: ~typing.Literal['complex',
'real'] = 'real',
)[source]#

Bases: Module

AFNO block, spectral convolution and MLP

Parameters:
  • embed_dim (int) – Embedded feature dimensionality

  • mod_dim (int) – Modululation input dimensionality

  • num_blocks (int, optional) – Number of blocks used in the block diagonal weight matrix, by default 8

  • mlp_ratio (float, optional) – Ratio of MLP latent variable size to input feature size, by default 4.0

  • drop (float, optional) – Drop out rate in MLP, by default 0.0

  • activation_fn (nn.Module, optional) – Activation function used in MLP, by default nn.GELU

  • norm_layer (nn.Module, optional) – Normalization function, by default nn.LayerNorm

  • double_skip (bool, optional) – Residual, by default True

  • sparsity_threshold (float, optional) – Sparsity threshold (softshrink) of spectral features, by default 0.01

  • hard_thresholding_fraction (float, optional) – Threshold for limiting number of modes used [0,1], by default 1

  • modulate_filter (bool, optional) – Whether to compute the modulation for the FFT filter

  • modulate_mlp (bool, optional) – Whether to compute the modulation for the MLP

  • scale_shift_mode (["complex", "real"]) – If ‘complex’ (default), compute the scale-shift operation using complex operations. If ‘real’, use real operations.

forward(
x: Tensor,
mod_embed: Tensor,
) Tensor[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class physicsnemo.models.afno.modafno.MetaData(
name: str = 'ModAFNO',
jit: bool = False,
cuda_graphs: bool = True,
amp: bool = True,
amp_cpu: bool = None,
amp_gpu: bool = None,
torch_fx: bool = False,
bf16: bool = False,
onnx: bool = False,
onnx_gpu: bool = True,
onnx_cpu: bool = False,
onnx_runtime: bool = True,
trt: bool = False,
var_dim: int = 1,
func_torch: bool = False,
auto_grad: bool = False,
)[source]#

Bases: ModelMetaData

class physicsnemo.models.afno.modafno.ModAFNO(*args, **kwargs)[source]#

Bases: Module

Modulated Adaptive Fourier neural operator (ModAFNO) model.

Parameters:
  • inp_shape (List[int]) – Input image dimensions [height, width]

  • in_channels (int, optional) – Number of input channels

  • out_channels (int, optional) – Number of output channels

  • embed_model (dict, optional) – Dictionary of arguments to pass to the ModEmbedNet embedding model

  • patch_size (List[int], optional) – Size of image patches, by default [16, 16]

  • embed_dim (int, optional) – Embedded channel size, by default 256

  • mod_dim (int) – Modululation input dimensionality

  • modulate_filter (bool, optional) – Whether to compute the modulation for the FFT filter, by default True

  • modulate_mlp (bool, optional) – Whether to compute the modulation for the MLP, by default True

  • scale_shift_mode (["complex", "real"]) – If ‘complex’ (default), compute the scale-shift operation using complex operations. If ‘real’, use real operations.

  • depth (int, optional) – Number of AFNO layers, by default 4

  • mlp_ratio (float, optional) – Ratio of layer MLP latent variable size to input feature size, by default 4.0

  • drop_rate (float, optional) – Drop out rate in layer MLPs, by default 0.0

  • num_blocks (int, optional) – Number of blocks in the block-diag frequency weight matrices, by default 16

  • sparsity_threshold (float, optional) – Sparsity threshold (softshrink) of spectral features, by default 0.01

  • hard_thresholding_fraction (float, optional) – Threshold for limiting number of modes used [0,1], by default 1

  • below. (The default settings correspond to the implementation in the paper cited)

Example

>>> import torch
>>> from physicsnemo.models.afno import ModAFNO
>>> model = ModAFNO(
...     inp_shape=[32, 32],
...     in_channels=2,
...     out_channels=1,
...     patch_size=(8, 8),
...     embed_dim=16,
...     depth=2,
...     num_blocks=2,
... )
>>> input = torch.randn(32, 2, 32, 32) #(N, C, H, W)
>>> time = torch.full((32, 1), 0.5)
>>> output = model(input, time)
>>> output.size()
torch.Size([32, 1, 32, 32])

Note

Reference: Leinonen et al. “Modulated Adaptive Fourier Neural Operators for Temporal Interpolation of Weather Forecasts.” arXiv preprint arXiv:TODO (2024).

forward(x: Tensor, mod: Tensor) Tensor[source]#

The full ModAFNO model logic.

forward_features(
x: Tensor,
mod: Tensor,
) Tensor[source]#

Forward pass of core ModAFNO

class physicsnemo.models.afno.modafno.ModAFNO2DLayer(
hidden_size: int,
mod_features: int,
num_blocks: int = 8,
sparsity_threshold: float = 0.01,
hard_thresholding_fraction: float = 1,
hidden_size_factor: int = 1,
scale_shift_kwargs: dict | None = None,
scale_shift_mode: Literal['complex', 'real'] = 'complex',
)[source]#

Bases: AFNO2DLayer

AFNO spectral convolution layer

Parameters:
  • hidden_size (int) – Feature dimensionality

  • mod_features (int) – Number of modulation features

  • num_blocks (int, optional) – Number of blocks used in the block diagonal weight matrix, by default 8

  • sparsity_threshold (float, optional) – Sparsity threshold (softshrink) of spectral features, by default 0.01

  • hard_thresholding_fraction (float, optional) – Threshold for limiting number of modes used [0,1], by default 1

  • hidden_size_factor (int, optional) – Factor to increase spectral features by after weight multiplication, by default 1

  • scale_shift_kwargs (dict, optional) – Options to the MLP that computes the scale-shift parameters

  • scale_shift_mode (["complex", "real"]) – If ‘complex’ (default), compute the scale-shift operation using complex operations. If ‘real’, use real operations.

forward(
x: Tensor,
mod_embed: Tensor,
) Tensor[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class physicsnemo.models.afno.modafno.ModAFNOMlp(
in_features: int,
latent_features: int,
out_features: int,
mod_features: int,
activation_fn: Module = GELU(approximate='none'),
drop: float = 0.0,
scale_shift_kwargs: dict | None = None,
)[source]#

Bases: AFNOMlp

Modulated MLP used inside ModAFNO

Parameters:
  • in_features (int) – Input feature size

  • latent_features (int) – Latent feature size

  • out_features (int) – Output feature size

  • activation_fn (nn.Module, optional) – Activation function, by default nn.GELU

  • drop (float, optional) – Drop out rate, by default 0.0

  • scale_shift_kwargs (dict, optional) – Options to the MLP that computes the scale-shift parameters

forward(
x: Tensor,
mod_embed: Tensor,
) Tensor[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class physicsnemo.models.afno.modafno.ScaleShiftMlp(
in_features: int,
out_features: int,
hidden_features: int | None = None,
hidden_layers: int = 0,
activation_fn: ~typing.Type[~torch.nn.modules.module.Module] = <class 'torch.nn.modules.activation.GELU'>,
)[source]#

Bases: Module

MLP used to compute the scale and shift parameters of the ModAFNO block

Parameters:
  • in_features (int) – Input feature size

  • out_features (int) – Output feature size

  • hidden_features (int, optional) – Hidden feature size, defaults to 2 * out_features

  • hidden_layers (int, optional) – Number of hidden layers, defaults to 0

  • activation_fn (nn.Module, optional) – Activation function, by default nn.GELU

forward(x: Tensor)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Graph Neural Networks#

class physicsnemo.models.meshgraphnet.meshgraphnet.MeshGraphNet(*args, **kwargs)[source]#

Bases: Module

MeshGraphNet network architecture

Parameters:
  • input_dim_nodes (int) – Number of node features

  • input_dim_edges (int) – Number of edge features

  • output_dim (int) – Number of outputs

  • processor_size (int, optional) – Number of message passing blocks, by default 15

  • mlp_activation_fn (Union[str, List[str]], optional) – Activation function to use, by default ‘relu’

  • num_layers_node_processor (int, optional) – Number of MLP layers for processing nodes in each message passing block, by default 2

  • num_layers_edge_processor (int, optional) – Number of MLP layers for processing edge features in each message passing block, by default 2

  • hidden_dim_processor (int, optional) – Hidden layer size for the message passing blocks, by default 128

  • hidden_dim_node_encoder (int, optional) – Hidden layer size for the node feature encoder, by default 128

  • num_layers_node_encoder (Union[int, None], optional) – Number of MLP layers for the node feature encoder, by default 2. If None is provided, the MLP will collapse to a Identity function, i.e. no node encoder

  • hidden_dim_edge_encoder (int, optional) – Hidden layer size for the edge feature encoder, by default 128

  • num_layers_edge_encoder (Union[int, None], optional) – Number of MLP layers for the edge feature encoder, by default 2. If None is provided, the MLP will collapse to a Identity function, i.e. no edge encoder

  • hidden_dim_node_decoder (int, optional) – Hidden layer size for the node feature decoder, by default 128

  • num_layers_node_decoder (Union[int, None], optional) – Number of MLP layers for the node feature decoder, by default 2. If None is provided, the MLP will collapse to a Identity function, i.e. no decoder

  • aggregation (str, optional) – Message aggregation type, by default “sum”

  • do_conat_trick (: bool, default=False) – Whether to replace concat+MLP with MLP+idx+sum

  • num_processor_checkpoint_segments (int, optional) – Number of processor segments for gradient checkpointing, by default 0 (checkpointing disabled)

  • checkpoint_offloading (bool, optional) – Whether to offload the checkpointing to the CPU, by default False

Example

>>> # `norm_type` in MeshGraphNet is deprecated,
>>> # TE will be automatically used if possible unless told otherwise.
>>> # (You don't have to set this varialbe, it's faster to use TE!)
>>> # Example of how to disable:
>>> import os
>>> os.environ['PHYSICSNEMO_FORCE_TE'] = 'False'
>>>
>>> model = physicsnemo.models.meshgraphnet.MeshGraphNet(
...         input_dim_nodes=4,
...         input_dim_edges=3,
...         output_dim=2,
...     )
>>> graph = dgl.rand_graph(10, 5)
>>> node_features = torch.randn(10, 4)
>>> edge_features = torch.randn(5, 3)
>>> output = model(node_features, edge_features, graph)
>>> output.size()
torch.Size([10, 2])

Note

Reference: Pfaff, Tobias, et al. “Learning mesh-based simulation with graph networks.” arXiv preprint arXiv:2010.03409 (2020).

forward(
node_features: Tensor,
edge_features: Tensor,
graph: physicsnemo.models.gnn_layers.utils.GraphType,
**kwargs,
) Tensor[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class physicsnemo.models.meshgraphnet.meshgraphnet.MeshGraphNetProcessor(
processor_size: int = 15,
input_dim_node: int = 128,
input_dim_edge: int = 128,
num_layers_node: int = 2,
num_layers_edge: int = 2,
aggregation: str = 'sum',
norm_type: str = 'LayerNorm',
activation_fn: Module = ReLU(),
do_concat_trick: bool = False,
num_processor_checkpoint_segments: int = 0,
checkpoint_offloading: bool = False,
)[source]#

Bases: Module

MeshGraphNet processor block

forward(
node_features: Tensor,
edge_features: Tensor,
graph: physicsnemo.models.gnn_layers.utils.GraphType,
) Tensor[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

run_function(
segment_start: int,
segment_end: int,
) Callable[[Tensor, Tensor, physicsnemo.models.gnn_layers.utils.GraphType], Tuple[Tensor, Tensor]][source]#

Custom forward for gradient checkpointing

Parameters:
  • segment_start (int) – Layer index as start of the segment

  • segment_end (int) – Layer index as end of the segment

Returns:

Custom forward function

Return type:

Callable

set_checkpoint_offload_ctx(enabled: bool)[source]#

Set the context for CPU offloading of checkpoints

Parameters:

checkpoint_offloading (bool) – whether to offload the checkpointing to the CPU

set_checkpoint_segments(
checkpoint_segments: int,
)[source]#

Set the number of checkpoint segments

Parameters:

checkpoint_segments (int) – number of checkpoint segments

Raises:

ValueError – if the number of processor layers is not a multiple of the number of checkpoint segments

class physicsnemo.models.meshgraphnet.meshgraphnet.MetaData(
name: str = 'MeshGraphNet',
jit: bool = False,
cuda_graphs: bool = False,
amp: bool = False,
amp_cpu: bool = False,
amp_gpu: bool = True,
torch_fx: bool = False,
bf16: bool = False,
onnx: bool = False,
onnx_gpu: bool = None,
onnx_cpu: bool = None,
onnx_runtime: bool = False,
trt: bool = False,
var_dim: int = -1,
func_torch: bool = True,
auto_grad: bool = True,
)[source]#

Bases: ModelMetaData

class physicsnemo.models.mesh_reduced.mesh_reduced.Mesh_Reduced(
input_dim_nodes: int,
input_dim_edges: int,
output_decode_dim: int,
output_encode_dim: int = 3,
processor_size: int = 15,
num_layers_node_processor: int = 2,
num_layers_edge_processor: int = 2,
hidden_dim_processor: int = 128,
hidden_dim_node_encoder: int = 128,
num_layers_node_encoder: int = 2,
hidden_dim_edge_encoder: int = 128,
num_layers_edge_encoder: int = 2,
hidden_dim_node_decoder: int = 128,
num_layers_node_decoder: int = 2,
k: int = 3,
aggregation: str = 'mean',
)[source]#

Bases: Module

PbGMR-GMUS architecture.

A mesh-reduced architecture that combines encoding and decoding processors for physics prediction in reduced mesh space.

Parameters:
  • input_dim_nodes (int) – Number of node features.

  • input_dim_edges (int) – Number of edge features.

  • output_decode_dim (int) – Number of decoding outputs (per node).

  • output_encode_dim (int, optional) – Number of encoding outputs (per pivotal position), by default 3.

  • processor_size (int, optional) – Number of message passing blocks, by default 15.

  • num_layers_node_processor (int, optional) – Number of MLP layers for processing nodes in each message passing block, by default 2.

  • num_layers_edge_processor (int, optional) – Number of MLP layers for processing edge features in each message passing block, by default 2.

  • hidden_dim_processor (int, optional) – Hidden layer size for the message passing blocks, by default 128.

  • hidden_dim_node_encoder (int, optional) – Hidden layer size for the node feature encoder, by default 128.

  • num_layers_node_encoder (int, optional) – Number of MLP layers for the node feature encoder, by default 2.

  • hidden_dim_edge_encoder (int, optional) – Hidden layer size for the edge feature encoder, by default 128.

  • num_layers_edge_encoder (int, optional) – Number of MLP layers for the edge feature encoder, by default 2.

  • hidden_dim_node_decoder (int, optional) – Hidden layer size for the node feature decoder, by default 128.

  • num_layers_node_decoder (int, optional) – Number of MLP layers for the node feature decoder, by default 2.

  • k (int, optional) – Number of nodes considered for per pivotal position, by default 3.

  • aggregation (str, optional) – Message aggregation type, by default “mean”.

Notes

Reference: Han, Xu, et al. “Predicting physics in mesh-reduced space with temporal attention.” arXiv preprint arXiv:2201.09113 (2022).

decode(
x,
edge_features,
graph,
position_mesh,
position_pivotal,
)[source]#

Decode pivotal features back to mesh space.

Parameters:
  • x (torch.Tensor) – Input features in pivotal space.

  • edge_features (torch.Tensor) – Edge features.

  • graph (Union[DGLGraph, pyg.data.Data]) – Input graph.

  • position_mesh (torch.Tensor) – Mesh positions.

  • position_pivotal (torch.Tensor) – Pivotal positions.

Returns:

Decoded features in mesh space.

Return type:

torch.Tensor

encode(
x,
edge_features,
graph,
position_mesh,
position_pivotal,
)[source]#

Encode mesh features to pivotal space.

Parameters:
  • x (torch.Tensor) – Input node features.

  • edge_features (torch.Tensor) – Edge features.

  • graph (Union[DGLGraph, pyg.data.Data]) – Input graph.

  • position_mesh (torch.Tensor) – Mesh positions.

  • position_pivotal (torch.Tensor) – Pivotal positions.

Returns:

Encoded features in pivotal space.

Return type:

torch.Tensor

knn_interpolate(
x: Tensor,
pos_x: Tensor,
pos_y: Tensor,
batch_x: Tensor = None,
batch_y: Tensor = None,
k: int = 3,
num_workers: int = 1,
)[source]#

Perform k-nearest neighbor interpolation.

Parameters:
  • x (torch.Tensor) – Input features to interpolate.

  • pos_x (torch.Tensor) – Source positions.

  • pos_y (torch.Tensor) – Target positions.

  • batch_x (torch.Tensor, optional) – Batch indices for source positions, by default None.

  • batch_y (torch.Tensor, optional) – Batch indices for target positions, by default None.

  • k (int, optional) – Number of nearest neighbors to consider, by default 3.

  • num_workers (int, optional) – Number of workers for parallel processing, by default 1.

Returns:

  • torch.Tensor – Interpolated features.

  • torch.Tensor – Source indices.

  • torch.Tensor – Target indices.

  • torch.Tensor – Interpolation weights.

class physicsnemo.models.meshgraphnet.bsms_mgn.BiStrideMeshGraphNet(*args, **kwargs)[source]#

Bases: MeshGraphNet

Bi-stride MeshGraphNet network architecture

Parameters:
  • input_dim_nodes (int) – Number of node features

  • input_dim_edges (int) – Number of edge features

  • output_dim (int) – Number of outputs

  • processor_size (int, optional) – Number of message passing blocks, by default 15

  • mlp_activation_fn (Union[str, List[str]], optional) – Activation function to use, by default ‘relu’

  • num_layers_node_processor (int, optional) – Number of MLP layers for processing nodes in each message passing block, by default 2

  • num_layers_edge_processor (int, optional) – Number of MLP layers for processing edge features in each message passing block, by default 2

  • hidden_dim_processor (int, optional) – Hidden layer size for the message passing blocks, by default 128

  • hidden_dim_node_encoder (int, optional) – Hidden layer size for the node feature encoder, by default 128

  • num_layers_node_encoder (Union[int, None], optional) – Number of MLP layers for the node feature encoder, by default 2. If None is provided, the MLP will collapse to a Identity function, i.e. no node encoder

  • hidden_dim_edge_encoder (int, optional) – Hidden layer size for the edge feature encoder, by default 128

  • num_layers_edge_encoder (Union[int, None], optional) – Number of MLP layers for the edge feature encoder, by default 2. If None is provided, the MLP will collapse to a Identity function, i.e. no edge encoder

  • hidden_dim_node_decoder (int, optional) – Hidden layer size for the node feature decoder, by default 128

  • num_layers_node_decoder (Union[int, None], optional) – Number of MLP layers for the node feature decoder, by default 2. If None is provided, the MLP will collapse to a Identity function, i.e. no decoder

  • aggregation (str, optional) – Message aggregation type, by default “sum”

  • do_conat_trick (: bool, default=False) – Whether to replace concat+MLP with MLP+idx+sum

  • num_processor_checkpoint_segments (int, optional) – Number of processor segments for gradient checkpointing, by default 0 (checkpointing disabled). The number of segments should be a factor of 2 * processor_size, for example, if processor_size is 15, then num_processor_checkpoint_segments can be 10 since it’s a factor of 15 * 2 = 30. It is recommended to start with a smaller number of segments until the model fits into memory since each segment will affect model training speed.

forward(
node_features: Tensor,
edge_features: Tensor,
graph: dgl.DGLGraph,
ms_edges: Iterable[Tensor] = (),
ms_ids: Iterable[Tensor] = (),
**kwargs,
) Tensor[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class physicsnemo.models.meshgraphnet.bsms_mgn.MetaData(
name: str = 'BiStrideMeshGraphNet',
jit: bool = False,
cuda_graphs: bool = False,
amp: bool = False,
amp_cpu: bool = False,
amp_gpu: bool = True,
torch_fx: bool = False,
bf16: bool = False,
onnx: bool = False,
onnx_gpu: bool = None,
onnx_cpu: bool = None,
onnx_runtime: bool = False,
trt: bool = False,
var_dim: int = -1,
func_torch: bool = True,
auto_grad: bool = True,
)[source]#

Bases: ModelMetaData

Convolutional Networks#

class physicsnemo.models.pix2pix.pix2pix.MetaData(
name: str = 'Pix2Pix',
jit: bool = True,
cuda_graphs: bool = True,
amp: bool = False,
amp_cpu: bool = False,
amp_gpu: bool = True,
torch_fx: bool = False,
bf16: bool = False,
onnx: bool = True,
onnx_gpu: bool = None,
onnx_cpu: bool = None,
onnx_runtime: bool = False,
trt: bool = False,
var_dim: int = 1,
func_torch: bool = True,
auto_grad: bool = True,
)[source]#

Bases: ModelMetaData

class physicsnemo.models.pix2pix.pix2pix.Pix2Pix(*args, **kwargs)[source]#

Bases: Module

Convolutional encoder-decoder based on pix2pix generator models.

Note

The pix2pix architecture supports options for 1D, 2D and 3D fields which can be constroled using the dimension parameter.

Parameters:
  • in_channels (int) – Number of input channels

  • out_channels (Union[int, Any], optional) – Number of output channels

  • dimension (int) – Model dimensionality (supports 1, 2, 3).

  • conv_layer_size (int, optional) – Latent channel size after first convolution, by default 64

  • n_downsampling (int, optional) – Number of downsampling blocks, by default 3

  • n_upsampling (int, optional) – Number of upsampling blocks, by default 3

  • n_blocks (int, optional) – Number of residual blocks in middle of model, by default 3

  • activation_fn (Any, optional) – Activation function, by default “relu”

  • batch_norm (bool, optional) – Batch normalization, by default False

  • padding_type (str, optional) – Padding type (‘reflect’, ‘replicate’ or ‘zero’), by default “reflect”

Example

>>> #2D convolutional encoder decoder
>>> model = physicsnemo.models.pix2pix.Pix2Pix(
... in_channels=1,
... out_channels=2,
... dimension=2,
... conv_layer_size=4)
>>> input = torch.randn(4, 1, 32, 32) #(N, C, H, W)
>>> output = model(input)
>>> output.size()
torch.Size([4, 2, 32, 32])

Note

Reference: Isola, Phillip, et al. “Image-To-Image translation with conditional adversarial networks” Conference on Computer Vision and Pattern Recognition, 2017. https://arxiv.org/abs/1611.07004

Reference: Wang, Ting-Chun, et al. “High-Resolution image synthesis and semantic manipulation with conditional GANs” Conference on Computer Vision and Pattern Recognition, 2018. https://arxiv.org/abs/1711.11585

Note

Based on the implementation: NVIDIA/pix2pixHD

forward(input: Tensor) Tensor[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class physicsnemo.models.pix2pix.pix2pix.ResnetBlock(
dimension: int,
channels: int,
padding_type: str = 'reflect',
activation: Module = ReLU(),
use_batch_norm: bool = False,
use_dropout: bool = False,
)[source]#

Bases: Module

A simple ResNet block

Parameters:
  • dimension (int) – Model dimensionality (supports 1, 2, 3).

  • channels (int) – Number of feature channels

  • padding_type (str, optional) – Padding type (‘reflect’, ‘replicate’ or ‘zero’), by default “reflect”

  • activation (nn.Module, optional) – Activation function, by default nn.ReLU()

  • use_batch_norm (bool, optional) – Batch normalization, by default False

forward(x: Tensor) Tensor[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class physicsnemo.models.srrn.super_res_net.ConvolutionalBlock3d(
in_channels: int,
out_channels: int,
kernel_size: int,
stride: int = 1,
batch_norm: bool = False,
activation_fn: Module = Identity(),
)[source]#

Bases: Module

3D convolutional block

Parameters:
  • in_channels (int) – Input channels

  • out_channels (int) – Output channels

  • kernel_size (int) – Kernel size

  • stride (int, optional) – Convolutional stride, by default 1

  • batch_norm (bool, optional) – Use batchnorm, by default False

forward(input: Tensor) Tensor[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class physicsnemo.models.srrn.super_res_net.MetaData(
name: str = 'SuperResolution',
jit: bool = True,
cuda_graphs: bool = False,
amp: bool = False,
amp_cpu: bool = False,
amp_gpu: bool = False,
torch_fx: bool = False,
bf16: bool = False,
onnx: bool = True,
onnx_gpu: bool = None,
onnx_cpu: bool = None,
onnx_runtime: bool = False,
trt: bool = False,
var_dim: int = 1,
func_torch: bool = True,
auto_grad: bool = True,
)[source]#

Bases: ModelMetaData

class physicsnemo.models.srrn.super_res_net.PixelShuffle3d(scale: int)[source]#

Bases: Module

3D pixel-shuffle operation

Parameters:

scale (int) – Factor to downscale channel count by

forward(input: Tensor) Tensor[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class physicsnemo.models.srrn.super_res_net.ResidualConvBlock3d(
n_layers: int = 1,
kernel_size: int = 3,
conv_layer_size: int = 64,
activation_fn: Module = Identity(),
)[source]#

Bases: Module

3D ResNet block

Parameters:
  • n_layers (int, optional) – Number of convolutional layers, by default 1

  • kernel_size (int, optional) – Kernel size, by default 3

  • conv_layer_size (int, optional) – Latent channel size, by default 64

  • activation_fn (nn.Module, optional) – Activation function, by default nn.Identity()

forward(input: Tensor) Tensor[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class physicsnemo.models.srrn.super_res_net.SRResNet(*args, **kwargs)[source]#

Bases: Module

3D convolutional super-resolution network

Parameters:
  • in_channels (int) – Number of input channels

  • out_channels (int) – Number of outout channels

  • large_kernel_size (int, optional) – convolutional kernel size for first and last convolution, by default 7

  • small_kernel_size (int, optional) – convolutional kernel size for internal convolutions, by default 3

  • conv_layer_size (int, optional) – Latent channel size, by default 32

  • n_resid_blocks (int, optional) – Number of residual blocks before , by default 8

  • scaling_factor (int, optional) – Scaling factor to increase the output feature size compared to the input (2, 4, or 8), by default 8

  • activation_fn (Any, optional) – Activation function, by default “prelu”

Example

>>> #3D convolutional encoder decoder
>>> model = physicsnemo.models.srrn.SRResNet(
... in_channels=1,
... out_channels=2,
... conv_layer_size=4,
... scaling_factor=2)
>>> input = torch.randn(4, 1, 8, 8, 8) #(N, C, D, H, W)
>>> output = model(input)
>>> output.size()
torch.Size([4, 2, 16, 16, 16])

Note

Based on the implementation: sgrvinod/a-PyTorch-Tutorial-to-Super-Resolution

forward(in_vars: Tensor) Tensor[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class physicsnemo.models.srrn.super_res_net.SubPixel_ConvolutionalBlock3d(
kernel_size: int = 3,
conv_layer_size: int = 64,
scaling_factor: int = 2,
)[source]#

Bases: Module

Convolutional block with Pixel Shuffle operation

Parameters:
  • kernel_size (int, optional) – Kernel size, by default 3

  • conv_layer_size (int, optional) – Latent channel size, by default 64

  • scaling_factor (int, optional) – Pixel shuffle scaling factor, by default 2

forward(
input: Tensor,
) Tensor[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Recurrent Neural Networks#

class physicsnemo.models.rnn.rnn_one2many.MetaData(
name: str = 'One2ManyRNN',
jit: bool = False,
cuda_graphs: bool = False,
amp: bool = True,
amp_cpu: bool = None,
amp_gpu: bool = None,
torch_fx: bool = True,
bf16: bool = False,
onnx: bool = False,
onnx_gpu: bool = None,
onnx_cpu: bool = None,
onnx_runtime: bool = False,
trt: bool = False,
var_dim: int = -1,
func_torch: bool = False,
auto_grad: bool = False,
)[source]#

Bases: ModelMetaData

class physicsnemo.models.rnn.rnn_one2many.One2ManyRNN(*args, **kwargs)[source]#

Bases: Module

A RNN model with encoder/decoder for 2d/3d problems that provides predictions based on single initial condition.

Parameters:
  • input_channels (int) – Number of channels in the input

  • dimension (int, optional) – Spatial dimension of the input. Only 2d and 3d are supported, by default 2

  • nr_latent_channels (int, optional) – Channels for encoding/decoding, by default 512

  • nr_residual_blocks (int, optional) – Number of residual blocks, by default 2

  • activation_fn (str, optional) – Activation function to use, by default “relu”

  • nr_downsamples (int, optional) – Number of downsamples, by default 2

  • nr_tsteps (int, optional) – Time steps to predict, by default 32

Example

>>> model = physicsnemo.models.rnn.One2ManyRNN(
... input_channels=6,
... dimension=2,
... nr_latent_channels=32,
... activation_fn="relu",
... nr_downsamples=2,
... nr_tsteps=16,
... )
>>> input = invar = torch.randn(4, 6, 1, 16, 16) # [N, C, T, H, W]
>>> output = model(input)
>>> output.size()
torch.Size([4, 6, 16, 16, 16])
forward(x: Tensor) Tensor[source]#

Forward pass

Parameters:

x (Tensor) – Expects a tensor of size [N, C, 1, H, W] for 2D or [N, C, 1, D, H, W] for 3D Where, N is the batch size, C is the number of channels, 1 is the number of input timesteps and D, H, W are spatial dimensions.

Returns:

Size [N, C, T, H, W] for 2D or [N, C, T, D, H, W] for 3D. Where, T is the number of timesteps being predicted.

Return type:

Tensor

class physicsnemo.models.rnn.rnn_seq2seq.MetaData(
name: str = 'Seq2SeqRNN',
jit: bool = False,
cuda_graphs: bool = False,
amp: bool = True,
amp_cpu: bool = None,
amp_gpu: bool = None,
torch_fx: bool = True,
bf16: bool = False,
onnx: bool = False,
onnx_gpu: bool = None,
onnx_cpu: bool = None,
onnx_runtime: bool = False,
trt: bool = False,
var_dim: int = -1,
func_torch: bool = False,
auto_grad: bool = False,
)[source]#

Bases: ModelMetaData

class physicsnemo.models.rnn.rnn_seq2seq.Seq2SeqRNN(*args, **kwargs)[source]#

Bases: Module

A RNN model with encoder/decoder for 2d/3d problems. Given input 0 to t-1, predicts signal t to t + nr_tsteps

Parameters:
  • input_channels (int) – Number of channels in the input

  • dimension (int, optional) – Spatial dimension of the input. Only 2d and 3d are supported, by default 2

  • nr_latent_channels (int, optional) – Channels for encoding/decoding, by default 512

  • nr_residual_blocks (int, optional) – Number of residual blocks, by default 2

  • activation_fn (str, optional) – Activation function to use, by default “relu”

  • nr_downsamples (int, optional) – Number of downsamples, by default 2

  • nr_tsteps (int, optional) – Time steps to predict, by default 32

Example

>>> model = physicsnemo.models.rnn.Seq2SeqRNN(
... input_channels=6,
... dimension=2,
... nr_latent_channels=32,
... activation_fn="relu",
... nr_downsamples=2,
... nr_tsteps=16,
... )
>>> input = invar = torch.randn(4, 6, 16, 16, 16) # [N, C, T, H, W]
>>> output = model(input)
>>> output.size()
torch.Size([4, 6, 16, 16, 16])
forward(x: Tensor) Tensor[source]#

Forward pass

Parameters:

x (Tensor) – Expects a tensor of size [N, C, T, H, W] for 2D or [N, C, T, D, H, W] for 3D Where, N is the batch size, C is the number of channels, T is the number of input timesteps and D, H, W are spatial dimensions. Currently, this requires input time steps to be same as predicted time steps.

Returns:

Size [N, C, T, H, W] for 2D or [N, C, T, D, H, W] for 3D. Where, T is the number of timesteps being predicted.

Return type:

Tensor

Operator Models#

This code contains the DoMINO model architecture. The DoMINO class contains an architecture to model both surface and volume quantities together as well as separately (controlled using the config.yaml file)

class physicsnemo.models.domino.model.AggregationModel(
input_features: int,
output_features: int,
model_parameters=None,
new_change: bool = True,
)[source]#

Bases: Module

Neural network module to aggregate local geometry encoding with basis functions.

This module combines basis function representations with geometry encodings to predict the final output quantities. It serves as the final prediction layer that integrates all available information sources.

forward(x: Tensor) Tensor[source]#

Process the combined input features to predict output quantities.

This method applies a series of fully connected layers to the input, which typically contains a combination of basis functions, geometry encodings, and potentially parameter encodings.

Parameters:

x – Input tensor containing combined features

Returns:

Tensor containing predicted output quantities

class physicsnemo.models.domino.model.BQWarp(
grid_resolution=None,
radius: float = 0.25,
neighbors_in_radius: int = 10,
)[source]#

Bases: Module

Warp-based ball-query layer for finding neighboring points within a specified radius.

This layer uses an accelerated ball query implementation to efficiently find points within a specified radius of query points.

forward(
x: Tensor,
p_grid: Tensor,
reverse_mapping: bool = True,
) tuple[Tensor, Tensor][source]#

Performs ball query operation to find neighboring points and their features.

This method uses the Warp-accelerated ball query implementation to find points within a specified radius. It can operate in two modes: - Forward mapping: Find points from x that are near p_grid points (reverse_mapping=False) - Reverse mapping: Find points from p_grid that are near x points (reverse_mapping=True)

Parameters:
  • x – Tensor of shape (batch_size, num_points, 3+features) containing point coordinates and their features

  • p_grid – Tensor of shape (batch_size, grid_x, grid_y, grid_z, 3) containing grid point coordinates

  • reverse_mapping – Boolean flag to control the direction of the mapping: - True: Find p_grid points near x points - False: Find x points near p_grid points

Returns:

  • mapping: Tensor containing indices of neighboring points

  • outputs: Tensor containing coordinates of the neighboring points

Return type:

tuple containing

class physicsnemo.models.domino.model.DoMINO(
input_features: int,
output_features_vol: int | None = None,
output_features_surf: int | None = None,
global_features: int = 2,
model_parameters=None,
)[source]#

Bases: Module

DoMINO model architecture for predicting both surface and volume quantities.

The DoMINO (Deep Operational Modal Identification and Nonlinear Optimization) model is designed to model both surface and volume physical quantities in aerodynamic simulations. It can operate in three modes: 1. Surface-only: Predicting only surface quantities 2. Volume-only: Predicting only volume quantities 3. Combined: Predicting both surface and volume quantities

The model uses a combination of: - Geometry representation modules - Neural network basis functions - Parameter encoding - Local and global geometry processing - Aggregation models for final prediction

Parameters:
  • input_features (int) – Number of point input features

  • output_features_vol (int, optional) – Number of output features in volume

  • output_features_surf (int, optional) – Number of output features on surface

  • model_parameters – Model parameters controlled by config.yaml

Example

>>> from physicsnemo.models.domino.model import DoMINO
>>> import torch, os
>>> from hydra import compose, initialize
>>> from omegaconf import OmegaConf
>>> device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
>>> cfg = OmegaConf.register_new_resolver("eval", eval)
>>> with initialize(version_base="1.3", config_path="examples/cfd/external_aerodynamics/domino/src/conf"):
...    cfg = compose(config_name="config")
>>> cfg.model.model_type = "combined"
>>> model = DoMINO(
...         input_features=3,
...         output_features_vol=5,
...         output_features_surf=4,
...         model_parameters=cfg.model
...     ).to(device)

Warp … >>> bsize = 1 >>> nx, ny, nz = cfg.model.interp_res >>> num_neigh = 7 >>> global_features = 2 >>> pos_normals_closest_vol = torch.randn(bsize, 100, 3).to(device) >>> pos_normals_com_vol = torch.randn(bsize, 100, 3).to(device) >>> pos_normals_com_surface = torch.randn(bsize, 100, 3).to(device) >>> geom_centers = torch.randn(bsize, 100, 3).to(device) >>> grid = torch.randn(bsize, nx, ny, nz, 3).to(device) >>> surf_grid = torch.randn(bsize, nx, ny, nz, 3).to(device) >>> sdf_grid = torch.randn(bsize, nx, ny, nz).to(device) >>> sdf_surf_grid = torch.randn(bsize, nx, ny, nz).to(device) >>> sdf_nodes = torch.randn(bsize, 100, 1).to(device) >>> surface_coordinates = torch.randn(bsize, 100, 3).to(device) >>> surface_neighbors = torch.randn(bsize, 100, num_neigh, 3).to(device) >>> surface_normals = torch.randn(bsize, 100, 3).to(device) >>> surface_neighbors_normals = torch.randn(bsize, 100, num_neigh, 3).to(device) >>> surface_sizes = torch.rand(bsize, 100).to(device) + 1e-6 # Note this needs to be > 0.0 >>> surface_neighbors_areas = torch.rand(bsize, 100, num_neigh).to(device) + 1e-6 >>> volume_coordinates = torch.randn(bsize, 100, 3).to(device) >>> vol_grid_max_min = torch.randn(bsize, 2, 3).to(device) >>> surf_grid_max_min = torch.randn(bsize, 2, 3).to(device) >>> global_params_values = torch.randn(bsize, global_features, 1).to(device) >>> global_params_reference = torch.randn(bsize, global_features, 1).to(device) >>> input_dict = { … “pos_volume_closest”: pos_normals_closest_vol, … “pos_volume_center_of_mass”: pos_normals_com_vol, … “pos_surface_center_of_mass”: pos_normals_com_surface, … “geometry_coordinates”: geom_centers, … “grid”: grid, … “surf_grid”: surf_grid, … “sdf_grid”: sdf_grid, … “sdf_surf_grid”: sdf_surf_grid, … “sdf_nodes”: sdf_nodes, … “surface_mesh_centers”: surface_coordinates, … “surface_mesh_neighbors”: surface_neighbors, … “surface_normals”: surface_normals, … “surface_neighbors_normals”: surface_neighbors_normals, … “surface_areas”: surface_sizes, … “surface_neighbors_areas”: surface_neighbors_areas, … “volume_mesh_centers”: volume_coordinates, … “volume_min_max”: vol_grid_max_min, … “surface_min_max”: surf_grid_max_min, … “global_params_reference”: global_params_values, … “global_params_values”: global_params_reference, … } >>> output = model(input_dict) >>> print(f”{output[0].shape}, {output[1].shape}”) torch.Size([1, 100, 5]), torch.Size([1, 100, 4])

calculate_solution(
volume_mesh_centers,
encoding_g,
encoding_node,
global_params_values,
global_params_reference,
eval_mode,
num_sample_points=20,
noise_intensity=50,
return_volume_neighbors=False,
)[source]#

Function to approximate solution sampling the neighborhood information

calculate_solution_with_neighbors(
surface_mesh_centers,
encoding_g,
encoding_node,
surface_mesh_neighbors,
surface_normals,
surface_neighbors_normals,
surface_areas,
surface_neighbors_areas,
global_params_values,
global_params_reference,
num_sample_points=7,
)[source]#

Function to approximate solution given the neighborhood information

forward(data_dict, return_volume_neighbors=False)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

geo_encoding_local(
encoding_g,
volume_mesh_centers,
p_grid,
mode='volume',
)[source]#

Function to calculate local geometry encoding from global encoding

position_encoder(
encoding_node: Tensor,
eval_mode: Literal['surface', 'volume'] = 'volume',
) Tensor[source]#

Compute positional encoding for input points.

Parameters:
  • encoding_node – Tensor containing node position information

  • eval_mode – Mode of evaluation, either “volume” or “surface”

Returns:

Tensor containing positional encoding features

sample_sphere(center, r, num_points)[source]#

Uniformly sample points in a 3D sphere around the center.

This method generates random points within a sphere of radius r centered at each point in the input tensor. The sampling is uniform in volume, meaning points are more likely to be sampled in the outer regions of the sphere.

Parameters:
  • center – Tensor of shape (batch_size, num_points, 3) containing center coordinates

  • r – Radius of the sphere for sampling

  • num_points – Number of points to sample per center

Returns:

Tensor of shape (batch_size, num_points, num_samples, 3) containing the sampled points around each center

sample_sphere_shell(center, r_inner, r_outer, num_points)[source]#

Uniformly sample points in a 3D spherical shell around a center.

This method generates random points within a spherical shell (annulus) between inner radius r_inner and outer radius r_outer centered at each point in the input tensor. The sampling is uniform in volume within the shell.

Parameters:
  • center – Tensor of shape (batch_size, num_points, 3) containing center coordinates

  • r_inner – Inner radius of the spherical shell

  • r_outer – Outer radius of the spherical shell

  • num_points – Number of points to sample per center

Returns:

Tensor of shape (batch_size, num_points, num_samples, 3) containing the sampled points within the spherical shell around each center

class physicsnemo.models.domino.model.GeoConvOut(
input_features: int,
model_parameters,
grid_resolution=None,
)[source]#

Bases: Module

Geometry layer to project STL geometry data onto regular grids.

forward(
x: Tensor,
grid: Tensor,
radius: float = 0.025,
neighbors_in_radius: int = 10,
) Tensor[source]#

Process and project geometric features onto a 3D grid.

Parameters:
  • x – Input tensor containing coordinates of the neighboring points (batch_size, nx*ny*nz, 3, n_points)

  • grid – Input tensor represented as a grid of shape (batch_size, nx, ny, nz, 3)

Returns:

Processed geometry features of shape (batch_size, base_neurons_in, nx, ny, nz)

class physicsnemo.models.domino.model.GeoProcessor(
input_filters: int,
output_filters: int,
model_parameters,
)[source]#

Bases: Module

Geometry processing layer using CNNs

forward(x: Tensor) Tensor[source]#

Process geometry information through the 3D CNN network.

The network follows an encoder-decoder architecture with skip connections: 1. Downsampling path (encoder) with three levels of max pooling 2. Processing loop in the bottleneck 3. Upsampling path (decoder) with skip connections from the encoder

Parameters:

x – Input tensor containing grid-represented geometry of shape (batch_size, input_filters, nx, ny, nz)

Returns:

Processed geometry features of shape (batch_size, 1, nx, ny, nz)

class physicsnemo.models.domino.model.GeometryRep(
input_features: int,
radii: Sequence[float],
neighbors_in_radius,
hops=1,
model_parameters=None,
)[source]#

Bases: Module

Geometry representation module that processes STL geometry data.

This module constructs a multiscale representation of geometry by: 1. Computing multi-scale geometry encoding for local and global context 2. Processing signed distance field (SDF) data for surface information

The combined encoding enables the model to reason about both local and global geometric properties.

forward(
x: Tensor,
p_grid: Tensor,
sdf: Tensor,
) Tensor[source]#

Process geometry data to create a comprehensive representation.

This method combines short-range, long-range, and SDF-based geometry encodings to create a rich representation of the geometry.

Parameters:
  • x – Input tensor containing geometric point data

  • p_grid – Grid points for sampling

  • sdf – Signed distance field tensor

Returns:

Comprehensive geometry encoding that concatenates short-range, SDF-based, and long-range features

class physicsnemo.models.domino.model.LocalPointConv(
input_features,
base_layer,
output_features,
model_parameters=None,
)[source]#

Bases: Module

Layer for local geometry point kernel

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class physicsnemo.models.domino.model.NNBasisFunctions(input_features: int, model_parameters=None)[source]#

Bases: Module

Basis function layer for point clouds

forward(x: Tensor) Tensor[source]#

Transform point features into a basis function representation.

Parameters:

x – Input tensor containing point features

Returns:

Tensor containing basis function coefficients

class physicsnemo.models.domino.model.ParameterModel(input_features: int, model_parameters=None)[source]#

Bases: Module

Neural network module to encode simulation parameters.

This module encodes physical global parameters into a learned latent representation that can be incorporated into the model’sprediction process.

forward(x: Tensor) Tensor[source]#

Encode physical parameters into a latent representation.

Parameters:

x – Input tensor containing physical parameters (e.g., inlet velocity, air density)

Returns:

Tensor containing encoded parameter representation

class physicsnemo.models.domino.model.PositionEncoder(input_features: int, model_parameters=None)[source]#

Bases: Module

Positional encoding of point clouds

forward(x: Tensor) Tensor[source]#

Transform point features into a basis function representation.

Parameters:

x – Input tensor containing point features

Returns:

Tensor containing position encoder

physicsnemo.models.domino.model.calculate_pos_encoding(nx, d=8)[source]#

Function to caluculate positional encoding

physicsnemo.models.domino.model.fourier_encode(coords, num_freqs)[source]#

Function to caluculate fourier features

physicsnemo.models.domino.model.fourier_encode_vectorized(coords, freqs)[source]#

Vectorized Fourier feature encoding

physicsnemo.models.domino.model.get_activation(
activation: Literal['relu', 'gelu'],
) Callable[source]#

Return a PyTorch activation function corresponding to the given name.

physicsnemo.models.domino.model.scale_sdf(sdf: Tensor) Tensor[source]#

Scale a signed distance function (SDF) to emphasize surface regions.

This function applies a non-linear scaling to the SDF values that compresses the range while preserving the sign, effectively giving more weight to points near surfaces where abs(SDF) is small.

Parameters:

sdf – Tensor containing signed distance function values

Returns:

Tensor with scaled SDF values in range [-1, 1]

Diffusion Models#

PhysicsNeMo diffusion library provides three categories of models, that serve different purposes. All models are based on the Module class.

  • Model backbones:

    Those are highly configurable architectures that can be used as a building block for more complex models.

  • Specialized architectures:

    Those are models that usually inherit from the model backbones, with some specific additional functionalities.

  • Application-specific interfaces:

    These Modules are not truly architectures, but rather wrappers around the model backbones or specialized architectures. Their intent is to provide a more user-friendly interface for specific applications.

In addition of these model architectures, PhysicsNeMo provides diffusion preconditioners, which are essentially wrappers around model architectures, that rescale the inputs and outputs of diffusion models to improve their performance.

Architecture Backbones#

Diffusion model backbones are highly configurable architectures that can be used as a building block for more complex models. Backbones support both conditional and unconditional modeling. Currently, there are two provided backbones: the SongUNet, as implemented in the SongUNet class and the DhariwalUNet, as implemented in the DhariwalUNet class. These models were introduced in the papers Score-based generative modeling through stochastic differential equations, Song et al. and Diffusion models beat gans on image synthesis, Dhariwal et al.. The PhysicsNeMo implementation of these models follows closely that used in the paper Elucidating the Design Space of Diffusion-Based Generative Models, Karras et al.. The original implementation of these models can be found in the EDM repository.

Model backbones can be used as is, such as in in the StormCast example, but they can also be used as a base class for more complex models.

One of the most common diffusion backbones for image generation is the SongUNet class. Its latent state \(\mathbf{x}\) is a tensor of shape \((B, C, H, W)\), where \(B\) is the batch size, \(C\) is the number of channels, and \(H\) and \(W\) are the height and width of the feature map. The model is conditional on the noise level, and can additionally be conditioned on vector-valued class labels and/or images. The model is organized into levels, whose number is determined by len(channel_mult), and each level operates at half the resolution of the previous level (odd resolutions are rounded down). Each level is composed of a sequence of UNet blocks, that optionally contain self-attention layers, as controlled by the attn_resolutions parameter. The feature map resolution is halved at the first block of each level and then remains constant within the level.

Here we start by creating a SongUNet model with 3 levels, that applies self-attention at levels 1 and 2. The model is unconditional, i.e. it is not conditioned on any class labels or images (but is still conditional on the noise level, as it is standard practice for diffusion models).

import torch
from physicsnemo.models.diffusion import SongUNet

B, C_x, res = 3, 6, 40   # Batch size, channels, and resolution of the latent state

model = SongUNet(
    img_resolution=res,
    in_channels=C_x,
    out_channels=C_x,  # No conditioning on image: number of output channels is the same as the input channels
    label_dim=0,  # No conditioning on vector-valued class labels
    augment_dim=0,
    model_channels=64,
    channel_mult=[1, 2, 3],  # 3-levels UNet with 64, 128, and 192 channels at each level, respectively
    num_blocks=4,  # 4 UNet blocks at each level
    attn_resolutions=[20, 10],  # Attention is applied at level 1 (resolution 20x20) and level 2 (resolution 10x10)
)

x = torch.randn(B, C_x, res, res)  # Latent state
noise_labels = torch.randn(B)  # Noise level for each sample

# The feature map resolution is 40 at level 0, 20 at level 1, and 10 at level 2
out = model(x, noise_labels, None)
print(out.shape)  # Shape: (B, C_x, res, res), same as the latent state

# The same model can be used on images of different resolution
# Note: the attention is still applied at levels 1 and 2
x_32 = torch.randn(B, C_x, 32, 32)  # Lower resolution latent state
out_32 = model(x_32, noise_labels, None)  # None means no conditioning on class labels
print(out_32.shape)  # Shape: (B, C_x, 32, 32), same as the latent state

The unconditional SongUNet can be extended to be conditional on class labels and/or images. Conditioning on images is performed by channel-wise concatenation of the image to the latent state \(\mathbf{x}\) before passing it to the model. The model does not perform conditioning on images internally, and this operation is left to the user. For conditioning on class labels (or any vector-valued quantity whose dimension is label_dim), the model internally generates embeddings for the class labels and adds them to intermediate activations within the UNet blocks. Here we extend the previous example to be conditional on a 16-dimensional vector-valued class label and a 3-channel image.

import torch
from physicsnemo.models.diffusion import SongUNet

B, C_x, res = 3, 10, 40
C_cond = 3

model = SongUNet(
    img_resolution=res,
    in_channels=C_x + C_cond,  # Conditioning on an image with C_cond channels
    out_channels=C_x,  # Output channels: only those of the latent state
    label_dim=16,  # Conditioning on 16-dimensional vector-valued class labels
    augment_dim=0,
    model_channels=64,
    channel_mult=[1, 2, 2],
    num_blocks=4,
    attn_resolutions=[20, 10],
)

x = torch.randn(B, C_x, res, res)  # Latent state
cond = torch.randn(B, C_cond, res, res)  # Conditioning image
x_cond = torch.cat([x, cond], dim=1)  # Channel-wise concatenation of the conditioning image before passing to the model
noise_labels = torch.randn(B)
class_labels = torch.randn(B, 16)  # Conditioning on vector-valued class labels

out = model(x_cond, noise_labels, class_labels)
print(out.shape)  # Shape: (B, C_x, res, res), same as the latent state

Specialized Architectures#

Note that even though backbones can be used as is, some of the examples in PhysicsNeMo examples use specialized architectures. These specialized architectures typically inherit from the backbones and implement additional functionalities for specific applications. For example the CorrDiff example uses the specialized architectures SongUNetPosEmbd and SongUNetPosLtEmbd to implement the diffusion model.

Positional embeddings#

Multi-diffusion (also called patch-based diffusion) is a technique to scale diffusion models to large domains. The idea is to split the full domain into patches, and run a diffusion model on each patch in parallel. The generated patches are then fused back to form the final image. This technique is particularly useful for domains that are too large to fit into the memory of a single GPU. The CorrDiff example uses patch-based diffusion for weather downscaling on large domains. A key ingredient in the implementation of patch-based diffusion is the use of a global spatial grid, that is used to inform each patch with their respective position in the full domain. The SongUNetPosEmbd class implements this functionality by providing multiple methods to encode global spatial coordinates of the pixels into a global positional embedding grid. In addition of multi-diffusion, spatial positional embeddings have also been observed to improve the quality of the generated images, even for diffusion models that operate on the full domain.

The following example shows how to use the specialized architecture SongUNetPosEmbd to implement a multi-diffusion model. First, we create a SongUNetPosEmbd model similar to the one in the conditional SongUnet example with a global positional embedding grid of shape (C_pos_emb, res, res). We show that the model can be used with the entire latent state (full domain).

import torch
from physicsnemo.models.diffusion import SongUNetPosEmbd

B, C_x, res = 3, 10, 40
C_cond = 3
C_PE = 8  # Number of channels in the positional embedding grid

# Create a SongUNet with a global positional embedding grid of shape (C_PE, res, res)
model = SongUNetPosEmbd(
    img_resolution=res,  # Define the resolution of the global positional embedding grid
    in_channels=C_x + C_cond + C_PE,  # in_channels must include the number of channels in the positional embedding grid
    out_channels=C_x,
    label_dim=16,
    augment_dim=0,
    model_channels=64,
    channel_mult=[1, 2, 2],
    num_blocks=4,
    attn_resolutions=[20, 10],
    gridtype="learnable",  # Use a learnable grid of positional embeddings
    N_grid_channels=C_PE  # Number of channels in the positional embedding grid
)

# Can pass the entire latent state to the model
x_global = torch.randn(B, C_x, res, res)  # Entire latent state
cond = torch.randn(B, C_cond, res, res)  # Conditioning image
x_cond = torch.cat([x_global, cond], dim=1)  # Latent state with conditioning image
noise_labels = torch.randn(B)
class_labels = torch.randn(B, 16)

# The model internally concatenates the global positional embedding grid to the
# input x_cond before the first UNet block.
# Note: global_index=None means use the entire positional embedding grid
out = model(x_cond, noise_labels, class_labels, global_index=None)
print(out.shape)  # Shape: (B, C_x, res, res), same as the latent state

Now we show that the model can be used on local patches of the latent state (multi-diffusion approach). We manually extract 3 patches from the latent state. Patches are treated as individual samples, so they are concatenated along the batch dimension. We also create a global grid of indices grid that contains the indices of the pixels in the full domain, and we exctract the same 3 patches from the global grid and pass them to the global_index parameter. The model internally uses global_index to extract the corresponding patches from the positional embedding grid and concatenate them to the input x_cond_patches before the first UNet block. Note that conditional multi-diffusion still requires each patch to be conditioned on the entire conditioning image cond, which is why we interpolate the conditioning image to the patch resolution and concatenate it to each individual patch. In practice it is not necessary to manually extract the patches from the latent state and the global grid, as PhysicsNeMo provides utilities to help with the patching operations, in patching. For an example of how to use these utilities, see the CorrDiff example.

# Can pass local patches to the model
# Create batch of 3 patches from `x_global` with resolution 16x16
pres = 16  # Patch resolution
p1 = x_global[0:1, :, :pres, :pres]  # Patch 1
p2 = x_global[3:4, :, pres:2*pres, pres:2*pres]  # Patch 2
p3 = x_global[1:2, :, -pres:, pres:2*pres]  # Patch 3
patches = torch.cat([p1, p2, p3], dim=0)  # Batch of 3 patches

# Note: the conditioning image needs interpolation (or other operations) to
# match the patch resolution
cond1 = torch.nn.functional.interpolate(cond[0:1], size=(pres, pres), mode="bilinear")
cond2 = torch.nn.functional.interpolate(cond[3:4], size=(pres, pres), mode="bilinear")
cond3 = torch.nn.functional.interpolate(cond[1:2], size=(pres, pres), mode="bilinear")
cond_patches = torch.cat([cond1, cond2, cond3], dim=0)

# Concatenate the patches and the conditioning image
x_cond_patches = torch.cat([patches, cond_patches], dim=1)

# Create corresponding global indices for the patches
Ny, Nx = torch.arange(res).int(), torch.arange(res).int()
grid = torch.stack(torch.meshgrid(Ny, Nx, indexing="ij"), dim=0)
idx_patch1 = grid[:, :pres, :pres]  # Global indices for patch 1
idx_patch2 = grid[:, pres:2*pres, pres:2*pres]  # Global indices for patch 2
idx_patch3 = grid[:, -pres:, pres:2*pres]  # Global indices for patch 3
global_index = torch.stack([idx_patch1, idx_patch2, idx_patch3], dim=0)

# The model internally extracts the corresponding patches from the global
# positional embedding grid and concatenates them to the input x_cond_patches
# before the first UNet block.
out = model(x_cond_patches, noise_labels, class_labels, global_index=global_index)
print(out.shape)  # Shape: (3, C_x, pres, pres), same as the patches extracted from the latent state
Lead-time aware models#

In many diffusion applications, the latent state is time-dependent, and the diffusion process should account for the time-dependence of the latent state. For instance, a forecast model could provide latent states \(\mathbf{x}(T)\) (current time), \(\mathbf{x}(T + \Delta t)\) (one time step forward), …, up to \(\mathbf{x}(T + K \Delta t)\) (K time steps forward). Such prediction horizons are called lead-times (a term adopted from the weather and climate forecasting community) and we want to apply diffusion to each of these latent states while accounting for their associated lead-time information.

PhysicsNeMo provides a specialized architecture SongUNetPosLtEmbd that implements lead-time aware models. This is an extension of the SongUNetPosEmbd class, and additionally supports lead-time information. In its forward pass, the model uses the lead_time_label parameter to internally retrieve the associated lead-time embeddings; it then conditions the diffusion process on those with a channel-wise concatenation to the latent-state before the first UNet block.

Here we show an example extending the previous ones with lead-time information. We assume that we have a batch of 3 latent states at times \(T + 2 \Delta t\) (2 time intervals forward), \(T + 0 \Delta t\) (current time), and \(T + \Delta t\) (1 time interval forward). The associated lead-time labels are [2, 0, 1]. In addition, the SongUNetPosLtEmbd model has the ability to predict probabilities for some channels of the latent state, specified by the prob_channels parameter. Here we assume that channels 1 and 3 are probability (i.e. classification) outputs, while other channels are regression outputs.

import torch
from physicsnemo.models.diffusion import SongUNetPosLtEmbd

B, C_x, res = 3, 10, 40
C_cond = 3
C_PE = 8
lead_time_steps = 3  # Maximum supported lead-time is 2 * dt
C_LT = 6  # 6 channels for each lead-time embeddings

# Create a SongUNet with a lead-time embedding grid of shape
# (lead_time_steps, C_lt_emb, res, res)
model = SongUNetPosLtEmbd(
    img_resolution=res,
    in_channels=C_x + C_cond + C_PE + C_LT,  # in_channels must include the number of channels in lead-time grid
    out_channels=C_x,
    label_dim=16,
    augment_dim=0,
    model_channels=64,
    channel_mult=[1, 2, 2],
    num_blocks=4,
    attn_resolutions=[10, 5],
    gridtype="learnable",
    N_grid_channels=C_PE,
    lead_time_channels=C_LT,
    lead_time_steps=lead_time_steps,  # Maximum supported lead-time horizon
    prob_channels=[1, 3],  # Channels 1 and 3 fromn the latent state are probability outputs
)

x = torch.randn(B, C_x, res, res)  # Latent state at times T+2*dt, T+0*dt, and T + 1*dt
cond = torch.randn(B, C_cond, res, res)
x_cond = torch.cat([x, cond], dim=1)
noise_labels = torch.randn(B)
class_labels = torch.randn(B, 16)
lead_time_label = torch.tensor([2, 0, 1])  # Lead-time labels for each sample

# The model internally extracts the lead-time embeddings corresponding to the
# lead-time labels 2, 0, 1 and concatenates them to the input x_cond before the first
# UNet block. In training mode, the model outputs logits for channels 1 and 3.
out = model(x_cond, noise_labels, class_labels, lead_time_label=lead_time_label)
print(out.shape)  # Shape: (B, C_x, res, res), same as the latent state

# If eval mode the model outputs probabilities for channels 1 and 3
model.eval()
out = model(x_cond, noise_labels, class_labels, lead_time_label=lead_time_label)

Note

The SongUNetPosLtEmbd is not an autoregressive model that performs a rollout to produce future predictions. From the point of view of the SongUNetPosLtEmbd, the lead-time information is frozen. The lead-time dependent latent state \(\mathbf{x}\) might however be produced by such an autoregressive/rollout model.

Note

The SongUNetPosLtEmbd model cannot be scaled to very long lead-time horizons (controlled by the lead_time_steps parameter). This is because the lead-time embeddings are represented by a grid of learnable parameters of shape (lead_time_steps, C_LT, res, res). For very long lead-time, the size of this grid of embeddings becomes prohibitively large.

Note

In a given input batch x, the associated lead-times might be not necessarily consecutive or in order. The do not even need to originate from the same forecast trajectory. For example, the lead-time labels might be [0, 1, 2] instead of [2, 0, 1], or even [2, 2, 1].

Application-specific Interfaces#

Application-specific interfaces are not true architectures, but rather wrappers around the model backbones or specialized architectures that provide a more user-friendly interface for specific applications. Note that not all these classes are true diffusion models, but can also be used in conjunction with diffusion models. For instance, the CorrDiff example in CorrDiff example uses the UNet class to implement a regression model.

class physicsnemo.models.diffusion.song_unet.SongUNet(*args, **kwargs)[source]#

Bases: Module

This architecture is a diffusion backbone for 2D image generation. It is a reimplementation of the DDPM++ and NCSN++ architectures, which are U-Net variants with optional self-attention, embeddings, and encoder-decoder components.

This model supports conditional and unconditional setups, as well as several options for various internal architectural choices such as encoder and decoder type, embedding type, etc., making it flexible and adaptable to different tasks and configurations.

This architecture supports conditioning on the noise level (called noise labels), as well as on additional vector-valued labels (called class labels) and (optional) vector-valued augmentation labels. The conditioning mechanism relies on addition of the conditioning embeddings in the U-Net blocks of the encoder. To condition on images, the simplest mechanism is to concatenate the image to the input before passing it to the SongUNet.

The model first applies a mapping operation to generate embeddings for all the conditioning inputs (the noise level, the class labels, and the optional augmentation labels).

Then, at each level in the U-Net encoder, a sequence of blocks is applied:

  • A first block downsamples the feature map resolution by a factor of 2 (odd resolutions are floored). This block does not change the number of channels.

  • A sequence of num_blocks U-Net blocks are applied, each with a different number of channels. These blocks do not change the feature map resolution, but they multiply the number of channels by a factor specified in channel_mult. If required, the U-Net blocks also apply self-attention at the specified resolutions.

  • At the end of the level, the feature map is cached to be used in a skip connection in the decoder.

The decoder is a mirror of the encoder, with the same number of levels and the same number of blocks per level. It multiplies the feature map resolution by a factor of 2 at each level.

Parameters:
  • img_resolution (Union[List[int, int], int]) –

    The resolution of the input/output image. Can be a single int \(H\) for square images or a list \([H, W]\) for rectangular images.

    Note: This parameter is only used as a convenience to build the network. In practice, the model can still be used with images of different resolutions. The only exception to this rule is when additive_pos_embed is True, in which case the resolution of the latent state \(\mathbf{x}\) must match img_resolution.

  • in_channels (int) – Number of channels \(C_{in}\) in the input image. May include channels from both the latent state and additional channels when conditioning on images. For an unconditional model, this should be equal to out_channels.

  • out_channels (int) – Number of channels \(C_{out}\) in the output image. Should be equal to the number of channels \(C_{\mathbf{x}}\) in the latent state.

  • label_dim (int, optional, default=0) – Dimension of the vector-valued class_labels conditioning; 0 indicates no conditioning on class labels.

  • augment_dim (int, optional, default=0) – Dimension of the vector-valued augment_labels conditioning; 0 means no conditioning on augmentation labels.

  • model_channels (int, optional, default=128) – Base multiplier for the number of channels accross the entire network.

  • channel_mult (List[int], optional, default=[1, 2, 2, 2]) – Multipliers for the number of channels at every level in the encoder and decoder. The length of channel_mult determines the number of levels in the U-Net. At level i, the number of channel in the feature map is channel_mult[i] * model_channels.

  • channel_mult_emb (int, optional, default=4) – Multiplier for the number of channels in the embedding vector. The embedding vector has model_channels * channel_mult_emb channels.

  • num_blocks (int, optional, default=4) – Number of U-Net blocks at each level.

  • attn_resolutions (List[int], optional, default=[16]) – Resolutions of the levels at which self-attention layers are applied. Note that the feature map resolution must match exactly the value provided in attn_resolutions for the self-attention layers to be applied.

  • dropout (float, optional, default=0.10) – Dropout probability applied to intermediate activations within the U-Net blocks.

  • label_dropout (float, optional, default=0.0) – Dropout probability applied to the class_labels. Typically used for classifier-free guidance.

  • embedding_type (Literal["fourier", "positional", "zero"], optional, default="positional") – Diffusion timestep embedding type: ‘positional’ for DDPM++, ‘fourier’ for NCSN++, ‘zero’ for none.

  • channel_mult_noise (int, optional, default=1) – Multiplier for the number of channels in the noise level embedding. The noise level embedding vector has model_channels * channel_mult_noise channels.

  • encoder_type (Literal["standard", "skip", "residual"], optional, default="standard") – Encoder architecture: ‘standard’ for DDPM++, ‘residual’ for NCSN++, ‘skip’ for skip connections.

  • decoder_type (Literal["standard", "skip"], optional, default="standard") – Decoder architecture: ‘standard’ or ‘skip’ for skip connections.

  • resample_filter (List[int], optional, default=[1, 1]) – Resampling filter coefficients applied in the U-Net blocks convolutions: [1,1] for DDPM++, [1,3,3,1] for NCSN++.

  • checkpoint_level (int, optional, default=0) – Number of levels that should use gradient checkpointing. Only levels at which the feature map resolution is large enough will be checkpointed (0 disables checkpointing, higher values means more layers are checkpointed). Higher values trade memory for computation.

  • additive_pos_embed (bool, optional, default=False) –

    If True, adds a learnable positional embedding after the first convolution layer. Used in StormCast model.

    Note: Those positional embeddings encode spatial position information of the image pixels, unlike the embedding_type parameter which encodes temporal information about the diffusion process. In that sense it is a simpler version of the positional embedding used in SongUNetPosEmbd.

  • use_apex_gn (bool, optional, default=False) – A flag indicating whether we want to use Apex GroupNorm for NHWC layout. Apex needs to be installed for this to work. Need to set this as False on cpu.

  • act (str, optional, default=None) – The activation function to use when fusing activation with GroupNorm. Required when use_apex_gn is True.

  • profile_mode (bool, optional, default=False) – A flag indicating whether to enable all nvtx annotations during profiling.

  • amp_mode (bool, optional, default=False) – A flag indicating whether mixed-precision (AMP) training is enabled.

Forward#
xtorch.Tensor

The input image of shape \((B, C_{in}, H_{in}, W_{in})\). In general x is the channel-wise concatenation of the latent state \(\mathbf{x}\) and additional images used for conditioning. For an unconditional model, x is simply the latent state \(\mathbf{x}\).

Note: \(H_{in}\) and \(W_{in}\) do not need to match \(H\) and \(W\) defined in img_resolution, except when additive_pos_embed is True. In that case, the resolution of x must match img_resolution.

noise_labelstorch.Tensor

The noise labels of shape \((B,)\). Used for conditioning on the diffusion noise level.

class_labelstorch.Tensor

The class labels of shape \((B, \text{label_dim})\). Used for conditioning on any vector-valued quantity. Can pass None when label_dim is 0.

augment_labelstorch.Tensor, optional, default=None

The augmentation labels of shape \((B, \text{augment_dim})\). Used for conditioning on any additional vector-valued quantity. Can pass None when augment_dim is 0.

Outputs#
torch.Tensor

The denoised latent state of shape \((B, C_{out}, H_{in}, W_{in})\).

Important

  • The terms noise levels (or noise labels) are used to refer to the diffusion time-step, as these are conceptually equivalent.

  • The terms labels and classes originate from the original paper and EDM repository, where this architecture was used for class-conditional image generation. While these terms suggest class-based conditioning, the architecture can actually be conditioned on any vector-valued conditioning.

  • The term positional embedding used in the embedding_type parameter also comes from the original paper and EDM repository. Here, positional refers to the diffusion time-step, similar to how position is used in transformer architectures. Despite the name, these embeddings encode temporal information about the diffusion process rather than spatial position information.

  • Limitations on input image resolution: for a model that has \(N\) levels, the latent state \(\mathbf{x}\) must have resolution that is a multiple of \(2^N\) in each dimension. This is due to a limitation in the decoder that does not support shape mismatch in the residual connections from the encoder to the decoder. For images that do not match this requirement, it is recommended to interpolate your data on a grid of the required resolution beforehand.

Example

>>> model = SongUNet(img_resolution=16, in_channels=2, out_channels=2)
>>> noise_labels = torch.randn([1])
>>> class_labels = torch.randint(0, 1, (1, 1))
>>> input_image = torch.ones([1, 2, 16, 16])
>>> output_image = model(input_image, noise_labels, class_labels)
>>> output_image.shape
torch.Size([1, 2, 16, 16])
property amp_mode#

Should be set to True to enable automatic mixed precision.

property profile_mode#

Should be set to True to enable profiling.

class physicsnemo.models.diffusion.dhariwal_unet.DhariwalUNet(*args, **kwargs)[source]#

Bases: Module

This architecture is a diffusion backbone for 2D image generation. It reimplements the ADM architecture, a U-Net variant, with optional self-attention.

It is highly similar to the U-Net backbone defined in SongUNet, and only differs in a few aspects:

  • The embedding conditioning mechanism relies on adaptive scaling of the group normalization layers within the U-Net blocks.

  • The parameters initialization follows Kaiming uniform initialization.

Parameters:
  • img_resolution (int) –

    The resolution \(H = W\) of the input/output image. Assumes square images.

    Note: This parameter is only used as a convenience to build the network. In practice, the model can still be used with images of different resolutions.

  • in_channels (int) – Number of channels \(C_{in}\) in the input image. May include channels from both the latent state \(\mathbf{x}\) and additional channels when conditioning on images. For an unconditional model, this should be equal to out_channels.

  • out_channels (int) – Number of channels \(C_{out}\) in the output image. Should be equal to the number of channels \(C_{\mathbf{x}}\) in the latent state.

  • label_dim (int, optional, default=0) – Dimension of the vector-valued class_labels conditioning; 0 indicates no conditioning on class labels.

  • augment_dim (int, optional, default=0) – Dimension of the vector-valued augment_labels conditioning; 0 means no conditioning on augmentation labels.

  • model_channels (int, optional, default=128) – Base multiplier for the number of channels accross the entire network.

  • channel_mult (List[int], optional, default=[1,2,2,2]) – Multipliers for the number of channels at every level in the encoder and decoder. The length of channel_mult determines the number of levels in the U-Net. At level i, the number of channel in the feature map is channel_mult[i] * model_channels.

  • channel_mult_emb (int, optional, default=4) – Multiplier for the number of channels in the embedding vector. The embedding vector has model_channels * channel_mult_emb channels.

  • num_blocks (int, optional, default=3) – Number of U-Net blocks at each level.

  • attn_resolutions (List[int], optional, default=[16]) – Resolutions of the levels at which self-attention layers are applied. Note that the feature map resolution must match exactly the value provided in attn_resolutions for the self-attention layers to be applied.

  • dropout (float, optional, default=0.10) – Dropout probability applied to intermediate activations within the U-Net blocks.

  • label_dropout (float, optional, default=0.0) – Dropout probability applied to the class_labels. Typically used for classifier-free guidance.

Forward#
xtorch.Tensor

The input tensor of shape \((B, C_{in}, H_{in}, W_{in})\). In general x is the channel-wise concatenation of the latent state \(\mathbf{x}\) and additional images used for conditioning. For an unconditional model, x is simply the latent state \(\mathbf{x}\).

noise_labelstorch.Tensor

The noise labels of shape \((B,)\). Used for conditioning on the noise level.

class_labelstorch.Tensor

The class labels of shape \((B, \text{label_dim})\). Used for conditioning on any vector-valued quantity. Can pass None when label_dim is 0.

augment_labelstorch.Tensor, optional, default=None

The augmentation labels of shape \((B, \text{augment_dim})\). Used for conditioning on any additional vector-valued quantity. Can pass None when augment_dim is 0.

Outputs#
torch.Tensor:

The denoised latent state of shape \((B, C_{out}, H_{in}, W_{in})\).

Examples

>>> model = DhariwalUNet(img_resolution=16, in_channels=2, out_channels=2)
>>> noise_labels = torch.randn([1])
>>> class_labels = torch.randint(0, 1, (1, 1))  # noqa: N806
>>> input_image = torch.ones([1, 2, 16, 16])  # noqa: N806
>>> output_image = model(input_image, noise_labels, class_labels)  # noqa: N806
property amp_mode#

Should be set to True to enable automatic mixed precision.

property profile_mode#

Should be set to True to enable profiling.

class physicsnemo.models.diffusion.song_unet.SongUNetPosEmbd(*args, **kwargs)[source]#

Bases: SongUNet

This specialized architecture extends SongUNet with positional embeddings that encode global spatial coordinates of the pixels.

This model supports the same type of conditioning as the base SongUNet, and can be in addition conditioned on the positional embeddings. Conditioning on the positional embeddings is performed with a channel-wise concatenation to the input image before the first layer of the U-Net. Multiple types of positional embeddings are supported. Positional embeddings are represented by a 2D grid of shape \((C_{PE}, H, W)\), where \(H\) and \(W\) correspond to the img_resolution parameter.

The following types of positional embeddings are supported:

  • learnable: uses a 2D grid of learnable parameters.

  • linear: uses a 2D rectilinear grid over the domain \([-1, 1] \times [-1, 1]\).

  • sinusoidal: uses sinusoidal functions of the spatial coordinates, with possibly multiple frequency bands.

  • test: uses a 2D grid of integer indices, only used for testing.

When the input image spatial resolution is smaller than the global positional embeddings, it is necessary to select a subset (or patch) of the embedding grid that correspond to the spatial locations of the input image pixels. The model provides two methods for selecting the subset of positional embeddings:

  1. Using a selector function. See positional_embedding_selector() for details.

  2. Using global indices. See positional_embedding_indexing() for details.

If none of these are provided, the entire grid of positional embeddings is used and channel-wise concatenated to the input image.

Most parameters are the same as in the parent class SongUNet. Only the ones that differ are listed below.

Parameters:
  • img_resolution (Union[List[int, int], int]) – The resolution of the input/output image. Can be a single int for square images or a list \([H, W]\) for rectangular images. Used to set the resolution of the positional embedding grid. It must correspond to the spatial resolution of the global domain/image.

  • in_channels (int) –

    Number of channels \(C_{in} + C_{PE}\), where \(C_{in}\) is the number of channels in the image passed to the U-Net and \(C_{PE}\) is the number of channels in the positional embedding grid.

    Important: in comparison to the base SongUNet, this parameter should also include the number of channels in the positional embedding grid \(C_{PE}\).

  • gridtype (Literal["sinusoidal", "learnable", "linear", "test"], optional, default="sinusoidal") – Type of positional embedding to use. Controls how spatial pixels locations are encoded.

  • N_grid_channels (int, optional, default=4) – Number of channels \(C_{PE}\) in the positional embedding grid. For ‘sinusoidal’ must be 4 or multiple of 4. For ‘linear’ and ‘test’ must be 2. For ‘learnable’ can be any value.

  • lead_time_mode (bool, optional, default=False) – Provided for convenience. It is recommended to use the architecture SongUNetPosLtEmbd for a lead-time aware model.

  • lead_time_channels (int, optional, default=None) – Provided for convenience. Refer to SongUNetPosLtEmbd.

  • lead_time_steps (int, optional, default=9) – Provided for convenience. Refer to SongUNetPosLtEmbd.

  • prob_channels (List[int], optional, default=[]) – Provided for convenience. Refer to SongUNetPosLtEmbd.

Forward#
xtorch.Tensor

The input image of shape \((B, C_{in}, H_{in}, W_{in})\), where \(H_{in}\) and \(W_{in}\) are the spatial dimensions of the input image (does not need to be the full image). In general x is the channel-wise concatenation of the latent state \(\mathbf{x}\) and additional images used for conditioning. For an unconditional model, x is simply the latent state \(\mathbf{x}\).

Note: \(H_{in}\) and \(W_{in}\) do not need to match the img_resolution parameter, except when additive_pos_embed is True. In all other cases, the resolution of x must be smaller than img_resolution.

noise_labelstorch.Tensor

The noise labels of shape \((B,)\). Used for conditioning on the diffusion noise level.

class_labelstorch.Tensor

The class labels of shape \((B, \text{label_dim})\). Used for conditioning on any vector-valued quantity. Can pass None when label_dim is 0.

global_indextorch.Tensor, optional, default=None

The global indices of the positional embeddings to use. If neither global_index nor embedding_selector are provided, the entire positional embedding grid of shape \((C_{PE}, H, W)\) is used. In this case x must have the same spatial resolution as the positional embedding grid. See positional_embedding_indexing() for details.

embedding_selectorCallable, optional, default=None

A function that selects the positional embeddings to use. See positional_embedding_selector() for details.

augment_labelstorch.Tensor, optional, default=None

The augmentation labels of shape \((B, \text{augment_dim})\). Used for conditioning on any additional vector-valued quantity. Can pass None when augment_dim is 0.

Outputs#
torch.Tensor

The output tensor of shape \((B, C_{out}, H_{in}, W_{in})\).

Important

Unlike positional embeddings defined by embedding_type in the parent class SongUNet that encode the diffusion time-step (or noise level), the positional embeddings in this specialized architecture encode global spatial coordinates of the pixels.

Examples

>>> import torch
>>> from physicsnemo.models.diffusion.song_unet import SongUNetPosEmbd
>>> from physicsnemo.utils.patching import GridPatching2D
>>>
>>> # Model initialization - in_channels must include both original input channels (2)
>>> # and the positional embedding channels (N_grid_channels=4 by default)
>>> model = SongUNetPosEmbd(img_resolution=16, in_channels=2+4, out_channels=2)
>>> noise_labels = torch.randn([1])
>>> class_labels = torch.randint(0, 1, (1, 1))
>>> # The input has only the original 2 channels - positional embeddings are
>>> # added automatically inside the forward method
>>> input_image = torch.ones([1, 2, 16, 16])
>>> output_image = model(input_image, noise_labels, class_labels)
>>> output_image.shape
torch.Size([1, 2, 16, 16])
>>>
>>> # Using a global index to select all positional embeddings
>>> patching = GridPatching2D(img_shape=(16, 16), patch_shape=(16, 16))
>>> global_index = patching.global_index(batch_size=1)
>>> output_image = model(
...     input_image, noise_labels, class_labels,
...     global_index=global_index
... )
>>> output_image.shape
torch.Size([1, 2, 16, 16])
>>>
>>> # Using a custom embedding selector to select all positional embeddings
>>> def patch_embedding_selector(emb):
...     return patching.apply(emb[None].expand(1, -1, -1, -1))
>>> output_image = model(
...     input_image, noise_labels, class_labels,
...     embedding_selector=patch_embedding_selector
... )
>>> output_image.shape
torch.Size([1, 2, 16, 16])
property amp_mode#

Should be set to True to enable automatic mixed precision.

positional_embedding_indexing(
x: Tensor,
global_index: Tensor | None = None,
lead_time_label=None,
) Tensor[source]#

Select positional embeddings using global indices.

This method uses global indices to select specific subset of the positional embedding grid (called patches). If no indices are provided, the entire positional embedding grid is returned.

Parameters:
  • x (torch.Tensor) – Input tensor of shape \((P \times B, C, H_{in}, W_{in})\). Only used to determine batch size \(B\) and device.

  • global_index (Optional[torch.Tensor], default=None) – Tensor of shape \((P, 2, H_{in}, W_{in})\) that correspond to the patches to extract from the positional embedding grid. \(P\) is the number of distinct patches in the input tensor x. The channel dimension should contain \(j\), \(i\) indices that should represent the indices of the pixels to extract from the embedding grid.

Returns:

Selected positional embeddings with shape \((P \times B, C_{PE}, H_{in}, W_{in})\) (same spatial resolution as global_index) if global_index is provided. If global_index is None, the entire positional embedding grid is duplicated \(B\) times and returned with shape \((B, C_{PE}, H, W)\).

Return type:

torch.Tensor

Example

>>> # Create global indices using patching utility:
>>> from physicsnemo.utils.patching import GridPatching2D
>>> patching = GridPatching2D(img_shape=(16, 16), patch_shape=(8, 8))
>>> global_index = patching.global_index(batch_size=3)
>>> print(global_index.shape)
torch.Size([4, 2, 8, 8])

Notes

  • This method is typically used in patch-based diffusion (or multi-diffusion), where a large input image is split into multiple patches. The batch dimension of the input tensor contains the patches. Patches are processed independently by the model, and the global_index parameter is used to select the grid of positional embeddings corresponding to each patch.

  • See this method from physicsnemo.utils.patching.BasePatching2D for generating the global_index parameter: global_index().

positional_embedding_selector(
x: Tensor,
embedding_selector: Callable[[Tensor], Tensor],
lead_time_label=None,
) Tensor[source]#

Select positional embeddings using a selector function.

Similar to positional_embedding_indexing(), but instead uses a selector function to select the embeddings.

Parameters:
  • x (torch.Tensor) – Input tensor of shape \((P \times B, C, H_{in}, W_{in})\). Only used to determine batch size \(B\), dtype and device.

  • embedding_selector (Callable) – Function that takes as input the entire embedding grid of shape \((C_{PE}, H, W)\) and returns selected embeddings with shape \((P \times B, C_{PE}, H_{in}, W_{in})\). Each selected embedding should correspond to the portion of the embedding grid that corresponds to the batch element in x. Typically this should be based on physicsnemo.utils.patching.BasePatching2D.apply() method to maintain consistency with patch extraction.

  • lead_time_label (Optional[torch.Tensor], default=None) – Tensor of shape \((P,)\) that corresponds to the lead-time label for each patch. Only used if lead_time_mode is True.

Returns:

A tensor of shape \((P \times B, C_{PE} [+ C_{LT}], H_{in}, W_{in})\). \(C_{PE}\) is the number of embedding channels in the positional embedding grid, and \(C_{LT}\) is the number of embedding channels in the lead-time embedding grid. If lead_time_label is provided, the lead-time embedding channels are included.

Return type:

torch.Tensor

Notes

  • This method is typically used in patch-based diffusion (or multi-diffusion), where a large input image is split into multiple patches. The batch dimension of the input tensor contains the patches. Patches are processed independently by the model, and the embedding_selector function is used to select the grid of positional embeddings corresponding to each patch.

  • See this method from physicsnemo.utils.patching.BasePatching2D for generating the embedding_selector parameter: apply()

Example

>>> # Define a selector function with a patching utility:
>>> from physicsnemo.utils.patching import GridPatching2D
>>> patching = GridPatching2D(img_shape=(16, 16), patch_shape=(8, 8))
>>> batch_size = 4
>>> def embedding_selector(emb):
...     return patching.apply(emb[None].expand(batch_size, -1, -1, -1))
>>>
property profile_mode#

Should be set to True to enable profiling.

class physicsnemo.models.diffusion.song_unet.SongUNetPosLtEmbd(*args, **kwargs)[source]#

Bases: SongUNetPosEmbd

This specialized architecture extends SongUNetPosEmbd with two additional capabilities:

  1. The model can be conditioned on lead-time labels. These labels encode physical time information, such as a forecasting horizon.

  2. Similarly to the parent SongUNetPosEmbd, this model predicts regression targets, but it can also produce classification predictions. More precisely, some of the ouput channels are probability outputs, that are passed through a softmax activation function. This is useful for multi-task applications, where the objective is a combination of both regression and classification losses.

The mechanism to condition on lead-time labels is implemented by:

  • First generating a grid of learnable lead-time embeddings of shape \((\text{lead_time_steps}, C_{LT}, H, W)\). The spatial resolution of the lead-time embeddings is the same as the input/output image.

  • Then, given an input x, select the lead-time embeddings that corresponds to the lead-times associated with the samples in the input x.

  • Finally, concatenate channels-wise the selected lead-time embeddings and positional embeddings to the input x and pass them to the U-Net network.

Most parameters are similar to the parent SongUNetPosEmbd, at the exception of the ones listed below.

Parameters:
  • in_channels (int) –

    Number of channels \(C_{in} + C_{PE} + C_{LT}\) in the image passed to the U-Net.

    Important: in comparison to the base SongUNet, this parameter should also include the number of channels in the positional embedding grid \(C_{PE}\) and the number of channels in the lead-time embedding grid \(C_{LT}\).

  • lead_time_channels (int, optional, default=None) – Number of channels \(C_{LT}\) in the lead time embedding. These are learned embeddings that encode physical time information.

  • lead_time_steps (int, optional, default=9) – Number of discrete lead time steps to support. Each step gets its own learned embedding vector of shape \((C_{LT}, H, W)\).

  • prob_channels (List[int], optional, default=[]) – Indices of channels that are probability outputs (or classification predictions), In training mode, the model outputs logits for these probability channels, and in eval mode, the model applies a softmax to outputs the probabilities.

  • Forward

  • -------

  • x (torch.Tensor) – The input image of shape \((B, C_{in}, H_{in}, W_{in})\), where \(H_{in}\) and \(W_{in}\) are the spatial dimensions of the input image (does not need to be the full image).

  • noise_labels (torch.Tensor) – The noise labels of shape \((B,)\). Used for conditioning on the diffusion noise level.

  • class_labels (torch.Tensor) – The class labels of shape \((B, \text{label_dim})\). Used for conditioning on any vector-valued quantity. Can pass None when label_dim is 0.

  • global_index (torch.Tensor, optional, default=None) – The global indices of the positional embeddings to use. See positional_embedding_indexing() for details. If neither global_index nor embedding_selector are provided, the entire positional embedding grid is used.

  • embedding_selector (Callable, optional, default=None) – A function that selects the positional embeddings to use. See positional_embedding_selector() for details.

  • augment_labels (torch.Tensor, optional, default=None) – The augmentation labels of shape \((B, \text{augment_dim})\). Used for conditioning on any additional vector-valued quantity.

  • lead_time_label (torch.Tensor, optional, default=None) – The lead-time labels of shape \((B,)\). Used for selecting lead-time embeddings. It should contain the indices of the lead-time embeddings that correspond to the lead-time of each sample in the batch.

  • Outputs

  • -------

  • torch.Tensor – The output tensor of shape \((B, C_{out}, H_{in}, W_{in})\).

Notes

  • The lead-time embeddings differ from the diffusion time embeddings used in SongUNet class, as they do not encode diffusion time-step but physical forecast time.

Example

>>> import torch
>>> from physicsnemo.models.diffusion.song_unet import SongUNetPosLtEmbd
>>> from physicsnemo.utils.patching import GridPatching2D
>>>
>>> # Model initialization - in_channels must include original input channels (2),
>>> # positional embedding channels (N_grid_channels=4 by default) and
>>> # lead time embedding channels (4)
>>> model = SongUNetPosLtEmbd(
...     img_resolution=16, in_channels=2+4+4, out_channels=2,
...     lead_time_channels=4, lead_time_steps=9
... )
>>> noise_labels = torch.randn([1])
>>> class_labels = torch.randint(0, 1, (1, 1))
>>> # The input has only the original 2 channels - positional embeddings and
>>> # lead time embeddings are added automatically inside the forward method
>>> input_image = torch.ones([1, 2, 16, 16])
>>> lead_time_label = torch.tensor([3])
>>> output_image = model(
...     input_image, noise_labels, class_labels,
...     lead_time_label=lead_time_label
... )
>>> output_image.shape
torch.Size([1, 2, 16, 16])
>>>
>>> # Using global_index to select all the positional and lead time embeddings
>>> patching = GridPatching2D(img_shape=(16, 16), patch_shape=(16, 16))
>>> global_index = patching.global_index(batch_size=1)
>>> output_image = model(
...     input_image, noise_labels, class_labels,
...     lead_time_label=lead_time_label,
...     global_index=global_index
... )
>>> output_image.shape
torch.Size([1, 2, 16, 16])
property amp_mode#

Should be set to True to enable automatic mixed precision.

positional_embedding_indexing(
x: Tensor,
global_index: Tensor | None = None,
lead_time_label=None,
) Tensor#

Select positional embeddings using global indices.

This method uses global indices to select specific subset of the positional embedding grid (called patches). If no indices are provided, the entire positional embedding grid is returned.

Parameters:
  • x (torch.Tensor) – Input tensor of shape \((P \times B, C, H_{in}, W_{in})\). Only used to determine batch size \(B\) and device.

  • global_index (Optional[torch.Tensor], default=None) – Tensor of shape \((P, 2, H_{in}, W_{in})\) that correspond to the patches to extract from the positional embedding grid. \(P\) is the number of distinct patches in the input tensor x. The channel dimension should contain \(j\), \(i\) indices that should represent the indices of the pixels to extract from the embedding grid.

Returns:

Selected positional embeddings with shape \((P \times B, C_{PE}, H_{in}, W_{in})\) (same spatial resolution as global_index) if global_index is provided. If global_index is None, the entire positional embedding grid is duplicated \(B\) times and returned with shape \((B, C_{PE}, H, W)\).

Return type:

torch.Tensor

Example

>>> # Create global indices using patching utility:
>>> from physicsnemo.utils.patching import GridPatching2D
>>> patching = GridPatching2D(img_shape=(16, 16), patch_shape=(8, 8))
>>> global_index = patching.global_index(batch_size=3)
>>> print(global_index.shape)
torch.Size([4, 2, 8, 8])

Notes

  • This method is typically used in patch-based diffusion (or multi-diffusion), where a large input image is split into multiple patches. The batch dimension of the input tensor contains the patches. Patches are processed independently by the model, and the global_index parameter is used to select the grid of positional embeddings corresponding to each patch.

  • See this method from physicsnemo.utils.patching.BasePatching2D for generating the global_index parameter: global_index().

positional_embedding_selector(
x: Tensor,
embedding_selector: Callable[[Tensor], Tensor],
lead_time_label=None,
) Tensor#

Select positional embeddings using a selector function.

Similar to positional_embedding_indexing(), but instead uses a selector function to select the embeddings.

Parameters:
  • x (torch.Tensor) – Input tensor of shape \((P \times B, C, H_{in}, W_{in})\). Only used to determine batch size \(B\), dtype and device.

  • embedding_selector (Callable) – Function that takes as input the entire embedding grid of shape \((C_{PE}, H, W)\) and returns selected embeddings with shape \((P \times B, C_{PE}, H_{in}, W_{in})\). Each selected embedding should correspond to the portion of the embedding grid that corresponds to the batch element in x. Typically this should be based on physicsnemo.utils.patching.BasePatching2D.apply() method to maintain consistency with patch extraction.

  • lead_time_label (Optional[torch.Tensor], default=None) – Tensor of shape \((P,)\) that corresponds to the lead-time label for each patch. Only used if lead_time_mode is True.

Returns:

A tensor of shape \((P \times B, C_{PE} [+ C_{LT}], H_{in}, W_{in})\). \(C_{PE}\) is the number of embedding channels in the positional embedding grid, and \(C_{LT}\) is the number of embedding channels in the lead-time embedding grid. If lead_time_label is provided, the lead-time embedding channels are included.

Return type:

torch.Tensor

Notes

  • This method is typically used in patch-based diffusion (or multi-diffusion), where a large input image is split into multiple patches. The batch dimension of the input tensor contains the patches. Patches are processed independently by the model, and the embedding_selector function is used to select the grid of positional embeddings corresponding to each patch.

  • See this method from physicsnemo.utils.patching.BasePatching2D for generating the embedding_selector parameter: apply()

Example

>>> # Define a selector function with a patching utility:
>>> from physicsnemo.utils.patching import GridPatching2D
>>> patching = GridPatching2D(img_shape=(16, 16), patch_shape=(8, 8))
>>> batch_size = 4
>>> def embedding_selector(emb):
...     return patching.apply(emb[None].expand(batch_size, -1, -1, -1))
>>>
property profile_mode#

Should be set to True to enable profiling.

class physicsnemo.models.diffusion.unet.UNet(*args, **kwargs)[source]#

Bases: Module

This interface provides a U-Net wrapper for CorrDiff deterministic regression model (and other deterministic downsampling models). It supports the following architectures:

It shares the same architeture as a conditional diffusion model. It does so by concatenating a conditioning image to a zero-filled latent state, and by setting the noise level and the class labels to zero.

Parameters:
  • img_resolution (Union[int, Tuple[int, int]]) – The resolution of the input/output image. If a single int is provided, then the image is assumed to be square.

  • img_in_channels (int) – Number of channels in the input image.

  • img_out_channels (int) – Number of channels in the output image.

  • use_fp16 (bool, optional, default=False) – Execute the underlying model at FP16 precision.

  • model_type (Literal['SongUNet', 'SongUNetPosEmbd', 'SongUNetPosLtEmbd',)

  • 'DhariwalUNet'] – Class name of the underlying architecture. Must be one of the following: ‘SongUNet’, ‘SongUNetPosEmbd’, ‘SongUNetPosLtEmbd’, ‘DhariwalUNet’.

  • default='SongUNetPosEmbd' – Class name of the underlying architecture. Must be one of the following: ‘SongUNet’, ‘SongUNetPosEmbd’, ‘SongUNetPosLtEmbd’, ‘DhariwalUNet’.

  • **model_kwargs (dict) – Keyword arguments passed to the underlying architecture __init__ method.

  • call (Please refer to the documentation of these classes for details on how to)

  • directly. (and use these models)

  • Forward

  • -------

  • x (torch.Tensor) – The input tensor, typically zero-filled, of shape \((B, C_{in}, H_{in}, W_{in})\).

  • img_lr (torch.Tensor) – Conditioning image of shape \((B, C_{lr}, H_{in}, W_{in})\).

  • **model_kwargs – Additional keyword arguments to pass to the underlying architecture forward method.

  • Outputs

  • -------

  • torch.Tensor – Output tensor of shape \((B, C_{out}, H_{in}, W_{in})\) (same spatial dimensions as the input).

property amp_mode#

Set to True when using automatic mixed precision.

property profile_mode#

Set to True to enable profiling of the wrapped model.

property use_fp16#

Whether the model uses float16 precision.

Returns:

True if the model is in float16 mode, False otherwise.

Return type:

bool

Type:

bool

Diffusion Preconditioners#

Preconditioning is an essential technique to improve the performance of diffusion models. It consists in scaling the latent state and the noise level that are passed to a network. Some preconditioning also requires to re-scale the output of the network. PhysicsNeMo provides a set of preconditioning classes that are wrappers around backbones or specialized architectures.

Preconditioning schemes used in the paper”Elucidating the Design Space of Diffusion-Based Generative Models”.

class physicsnemo.models.diffusion.preconditioning.EDMPrecond(*args, **kwargs)[source]#

Bases: Module

Improved preconditioning proposed in the paper “Elucidating the Design Space of Diffusion-Based Generative Models” (EDM)

Parameters:
  • img_resolution (int) – Image resolution.

  • img_channels (int) – Number of color channels (for both input and output). If your model requires a different number of input or output chanels, override this by passing either of the optional img_in_channels or img_out_channels args

  • label_dim (int) – Number of class labels, 0 = unconditional, by default 0.

  • use_fp16 (bool) – Execute the underlying model at FP16 precision?, by default False.

  • sigma_min (float) – Minimum supported noise level, by default 0.0.

  • sigma_max (float) – Maximum supported noise level, by default inf.

  • sigma_data (float) – Expected standard deviation of the training data, by default 0.5.

  • model_type (str) – Class name of the underlying model, by default “DhariwalUNet”.

  • img_in_channels (int) – Optional setting for when number of input channels =/= number of output channels. If set, will override img_channels for the input This is useful in the case of additional (conditional) channels

  • img_out_channels (int) – Optional setting for when number of input channels =/= number of output channels. If set, will override img_channels for the output

  • **model_kwargs (dict) – Keyword arguments for the underlying model.

Note

Reference: Karras, T., Aittala, M., Aila, T. and Laine, S., 2022. Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems, 35, pp.26565-26577.

forward(
x,
sigma,
condition=None,
class_labels=None,
force_fp32=False,
**model_kwargs,
)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

static round_sigma(sigma: float | List | Tensor)[source]#

Convert a given sigma value(s) to a tensor representation.

Parameters:

sigma (Union[float list, torch.Tensor]) – The sigma value(s) to convert.

Returns:

The tensor representation of the provided sigma value(s).

Return type:

torch.Tensor

class physicsnemo.models.diffusion.preconditioning.EDMPrecondMetaData(
name: str = 'EDMPrecond',
jit: bool = False,
cuda_graphs: bool = False,
amp: bool = False,
amp_cpu: bool = False,
amp_gpu: bool = True,
torch_fx: bool = False,
bf16: bool = False,
onnx: bool = False,
onnx_gpu: bool = None,
onnx_cpu: bool = None,
onnx_runtime: bool = False,
trt: bool = False,
var_dim: int = -1,
func_torch: bool = False,
auto_grad: bool = False,
)[source]#

Bases: ModelMetaData

EDMPrecond meta data

class physicsnemo.models.diffusion.preconditioning.EDMPrecondSR(*args, **kwargs)[source]#

Bases: EDMPrecondSuperResolution

NOTE: This is a deprecated version of the EDMPrecondSuperResolution model. This was used to maintain backwards compatibility and allow loading old models. Please use the EDMPrecondSuperResolution model instead.

Improved preconditioning proposed in the paper “Elucidating the Design Space of Diffusion-Based Generative Models” (EDM) for super-resolution tasks

Parameters:
  • img_resolution (int) – Image resolution.

  • img_channels (int) – Number of color channels (deprecated, not used).

  • img_in_channels (int) – Number of input color channels.

  • img_out_channels (int) – Number of output color channels.

  • use_fp16 (bool) – Execute the underlying model at FP16 precision?, by default False.

  • sigma_min (float) – Minimum supported noise level, by default 0.0.

  • sigma_max (float) – Maximum supported noise level, by default inf.

  • sigma_data (float) – Expected standard deviation of the training data, by default 0.5.

  • model_type (str) – Class name of the underlying model, by default “SongUNetPosEmbd”.

  • scale_cond_input (bool) – Whether to scale the conditional input (deprecated), by default True.

  • **model_kwargs (dict) – Keyword arguments for the underlying model.

Note

References: - Karras, T., Aittala, M., Aila, T. and Laine, S., 2022. Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems, 35, pp.26565-26577. - Mardani, M., Brenowitz, N., Cohen, Y., Pathak, J., Chen, C.Y., Liu, C.C.,Vahdat, A., Kashinath, K., Kautz, J. and Pritchard, M., 2023. Generative Residual Diffusion Modeling for Km-scale Atmospheric Downscaling. arXiv preprint arXiv:2309.15214.

forward(
x,
img_lr,
sigma,
force_fp32=False,
**model_kwargs,
)[source]#

Forward pass of the EDMPrecondSR model wrapper.

Parameters:
  • x (torch.Tensor) – Noisy high-resolution image of shape (B, C_hr, H, W).

  • img_lr (torch.Tensor) – Low-resolution conditioning image of shape (B, C_lr, H, W).

  • sigma (torch.Tensor) – Noise level of shape (B) or (B, 1) or (B, 1, 1, 1).

  • force_fp32 (bool, optional) – Whether to force FP32 precision regardless of the use_fp16 attribute, by default False.

  • **model_kwargs (dict) – Additional keyword arguments to pass to the underlying model.

Returns:

Denoised high-resolution image of shape (B, C_hr, H, W).

Return type:

torch.Tensor

class physicsnemo.models.diffusion.preconditioning.EDMPrecondSRMetaData(
name: str = 'EDMPrecondSR',
jit: bool = False,
cuda_graphs: bool = False,
amp: bool = False,
amp_cpu: bool = False,
amp_gpu: bool = True,
torch_fx: bool = False,
bf16: bool = False,
onnx: bool = False,
onnx_gpu: bool = None,
onnx_cpu: bool = None,
onnx_runtime: bool = False,
trt: bool = False,
var_dim: int = -1,
func_torch: bool = False,
auto_grad: bool = False,
)[source]#

Bases: ModelMetaData

EDMPrecondSR meta data

class physicsnemo.models.diffusion.preconditioning.EDMPrecondSuperResolution(*args, **kwargs)[source]#

Bases: Module

Improved preconditioning proposed in the paper “Elucidating the Design Space of Diffusion-Based Generative Models” (EDM).

This is a variant of EDMPrecond that is specifically designed for super-resolution tasks. It wraps a neural network that predicts the denoised high-resolution image given a noisy high-resolution image, and additional conditioning that includes a low-resolution image, and a noise level.

Parameters:
  • img_resolution (Union[int, Tuple[int, int]]) – Spatial resolution \((H, W)\) of the image. If a single int is provided, the image is assumed to be square.

  • img_in_channels (int) – Number of input channels in the low-resolution input image.

  • img_out_channels (int) – Number of output channels in the high-resolution output image.

  • use_fp16 (bool, optional) – Whether to use half-precision floating point (FP16) for model execution, by default False.

  • model_type (str, optional) – Class name of the underlying model. Must be one of the following: ‘SongUNet’, ‘SongUNetPosEmbd’, ‘SongUNetPosLtEmbd’, ‘DhariwalUNet’. Defaults to ‘SongUNetPosEmbd’.

  • sigma_data (float, optional) – Expected standard deviation of the training data, by default 0.5.

  • sigma_min (float, optional) – Minimum supported noise level, by default 0.0.

  • sigma_max (float, optional) – Maximum supported noise level, by default inf.

  • **model_kwargs (dict) – Keyword arguments passed to the underlying model __init__ method.

See also

For

SongUNet

Basic U-Net for diffusion models

SongUNetPosEmbd

U-Net with positional embeddings

SongUNetPosLtEmbd

U-Net with positional and lead-time embeddings

Please, and

Note

References: - Karras, T., Aittala, M., Aila, T. and Laine, S., 2022. Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems, 35, pp.26565-26577. - Mardani, M., Brenowitz, N., Cohen, Y., Pathak, J., Chen, C.Y., Liu, C.C.,Vahdat, A., Kashinath, K., Kautz, J. and Pritchard, M., 2023. Generative Residual Diffusion Modeling for Km-scale Atmospheric Downscaling. arXiv preprint arXiv:2309.15214.

property amp_mode#

Set to True when using automatic mixed precision.

forward(
x: Tensor,
img_lr: Tensor,
sigma: Tensor,
force_fp32: bool = False,
**model_kwargs: dict,
) Tensor[source]#

Forward pass of the EDMPrecondSuperResolution model wrapper.

This method applies the EDM preconditioning to compute the denoised image from a noisy high-resolution image and low-resolution conditioning image.

Parameters:
  • x (torch.Tensor) – Noisy high-resolution image of shape (B, C_hr, H, W). The number of channels C_hr should be equal to img_out_channels.

  • img_lr (torch.Tensor) – Low-resolution conditioning image of shape (B, C_lr, H, W). The number of channels C_lr should be equal to img_in_channels.

  • sigma (torch.Tensor) – Noise level of shape (B) or (B, 1) or (B, 1, 1, 1).

  • force_fp32 (bool, optional) – Whether to force FP32 precision regardless of the use_fp16 attribute, by default False.

  • **model_kwargs (dict) – Additional keyword arguments to pass to the underlying model self.model forward method.

Returns:

Denoised high-resolution image of shape (B, C_hr, H, W).

Return type:

torch.Tensor

Raises:

ValueError – If the model output dtype doesn’t match the expected dtype.

property profile_mode#

Set to True to enable profiling of the wrapped model.

static round_sigma(
sigma: float | List | Tensor,
) Tensor[source]#

Convert a given sigma value(s) to a tensor representation.

Parameters:

sigma (Union[float, List, torch.Tensor]) – Sigma value(s) to convert.

Returns:

Tensor representation of sigma values.

Return type:

torch.Tensor

property use_fp16#

Whether the model uses float16 precision.

Returns:

True if the model is in float16 mode, False otherwise.

Return type:

bool

Type:

bool

class physicsnemo.models.diffusion.preconditioning.EDMPrecondSuperResolutionMetaData(
name: str = 'EDMPrecondSuperResolution',
jit: bool = False,
cuda_graphs: bool = False,
amp: bool = False,
amp_cpu: bool = False,
amp_gpu: bool = True,
torch_fx: bool = False,
bf16: bool = False,
onnx: bool = False,
onnx_gpu: bool = None,
onnx_cpu: bool = None,
onnx_runtime: bool = False,
trt: bool = False,
var_dim: int = -1,
func_torch: bool = False,
auto_grad: bool = False,
)[source]#

Bases: ModelMetaData

EDMPrecondSuperResolution meta data

class physicsnemo.models.diffusion.preconditioning.VEPrecond(*args, **kwargs)[source]#

Bases: Module

Preconditioning corresponding to the variance exploding (VE) formulation.

Parameters:
  • img_resolution (int) – Image resolution.

  • img_channels (int) – Number of color channels.

  • label_dim (int) – Number of class labels, 0 = unconditional, by default 0.

  • use_fp16 (bool) – Execute the underlying model at FP16 precision?, by default False.

  • sigma_min (float) – Minimum supported noise level, by default 0.02.

  • sigma_max (float) – Maximum supported noise level, by default 100.0.

  • model_type (str) – Class name of the underlying model, by default “SongUNet”.

  • **model_kwargs (dict) – Keyword arguments for the underlying model.

Note

Reference: Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S. and Poole, B., 2020. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456.

forward(
x,
sigma,
class_labels=None,
force_fp32=False,
**model_kwargs,
)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

round_sigma(sigma: float | List | Tensor)[source]#

Convert a given sigma value(s) to a tensor representation.

Parameters:

sigma (Union[float list, torch.Tensor]) – The sigma value(s) to convert.

Returns:

The tensor representation of the provided sigma value(s).

Return type:

torch.Tensor

class physicsnemo.models.diffusion.preconditioning.VEPrecondMetaData(
name: str = 'VEPrecond',
jit: bool = False,
cuda_graphs: bool = False,
amp: bool = False,
amp_cpu: bool = False,
amp_gpu: bool = True,
torch_fx: bool = False,
bf16: bool = False,
onnx: bool = False,
onnx_gpu: bool = None,
onnx_cpu: bool = None,
onnx_runtime: bool = False,
trt: bool = False,
var_dim: int = -1,
func_torch: bool = False,
auto_grad: bool = False,
)[source]#

Bases: ModelMetaData

VEPrecond meta data

class physicsnemo.models.diffusion.preconditioning.VEPrecond_dfsr(
img_resolution: int,
img_channels: int,
label_dim: int = 0,
use_fp16: bool = False,
sigma_min: float = 0.02,
sigma_max: float = 100.0,
dataset_mean: float = 5.85e-05,
dataset_scale: float = 4.79,
model_type: str = 'SongUNet',
**model_kwargs: dict,
)[source]#

Bases: Module

Preconditioning for dfsr model, modified from class VEPrecond, where the input argument ‘sigma’ in forward propagation function is used to receive the timestep of the backward diffusion process.

Parameters:
  • img_resolution (int) – Image resolution.

  • img_channels (int) – Number of color channels.

  • label_dim (int) – Number of class labels, 0 = unconditional, by default 0.

  • use_fp16 (bool) – Execute the underlying model at FP16 precision?, by default False.

  • sigma_min (float) – Minimum supported noise level, by default 0.02.

  • sigma_max (float) – Maximum supported noise level, by default 100.0.

  • model_type (str) – Class name of the underlying model, by default “SongUNet”.

  • **model_kwargs (dict) – Keyword arguments for the underlying model.

Note

Reference: Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models. Advances in neural information processing systems. 2020;33:6840-51.

forward(
x,
sigma,
class_labels=None,
force_fp32=False,
**model_kwargs,
)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class physicsnemo.models.diffusion.preconditioning.VEPrecond_dfsr_cond(
img_resolution: int,
img_channels: int,
label_dim: int = 0,
use_fp16: bool = False,
sigma_min: float = 0.02,
sigma_max: float = 100.0,
dataset_mean: float = 5.85e-05,
dataset_scale: float = 4.79,
model_type: str = 'SongUNet',
**model_kwargs: dict,
)[source]#

Bases: Module

Preconditioning for dfsr model with physics-informed conditioning input, modified from class VEPrecond, where the input argument ‘sigma’ in forward propagation function is used to receive the timestep of the backward diffusion process. The gradient of PDE residual with respect to the vorticity in the governing Navier-Stokes equation is computed as the physics-informed conditioning variable and is combined with the backward diffusion timestep before being sent to the underlying model for noise prediction.

Parameters:
  • img_resolution (int) – Image resolution.

  • img_channels (int) – Number of color channels.

  • label_dim (int) – Number of class labels, 0 = unconditional, by default 0.

  • use_fp16 (bool) – Execute the underlying model at FP16 precision?, by default False.

  • sigma_min (float) – Minimum supported noise level, by default 0.02.

  • sigma_max (float) – Maximum supported noise level, by default 100.0.

  • model_type (str) – Class name of the underlying model, by default “SongUNet”.

  • **model_kwargs (dict) – Keyword arguments for the underlying model.

Note

Reference: [1] Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S. and Poole, B., 2020. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456. [2] Shu D, Li Z, Farimani AB. A physics-informed diffusion model for high-fidelity flow field reconstruction. Journal of Computational Physics. 2023 Apr 1;478:111972.

forward(
x,
sigma,
class_labels=None,
force_fp32=False,
**model_kwargs,
)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

voriticity_residual(w, re=1000.0, dt=0.03125)[source]#

Compute the gradient of PDE residual with respect to a given vorticity w using the spectrum method.

Parameters:
  • w (torch.Tensor) – The fluid flow data sample (vorticity).

  • re (float) – The value of Reynolds number used in the governing Navier-Stokes equation.

  • dt (float) – Time step used to compute the time-derivative of vorticity included in the governing Navier-Stokes equation.

Returns:

The computed vorticity gradient.

Return type:

torch.Tensor

class physicsnemo.models.diffusion.preconditioning.VPPrecond(*args, **kwargs)[source]#

Bases: Module

Preconditioning corresponding to the variance preserving (VP) formulation.

Parameters:
  • img_resolution (int) – Image resolution.

  • img_channels (int) – Number of color channels.

  • label_dim (int) – Number of class labels, 0 = unconditional, by default 0.

  • use_fp16 (bool) – Execute the underlying model at FP16 precision?, by default False.

  • beta_d (float) – Extent of the noise level schedule, by default 19.9.

  • beta_min (float) – Initial slope of the noise level schedule, by default 0.1.

  • M (int) – Original number of timesteps in the DDPM formulation, by default 1000.

  • epsilon_t (float) – Minimum t-value used during training, by default 1e-5.

  • model_type (str) – Class name of the underlying model, by default “SongUNet”.

  • **model_kwargs (dict) – Keyword arguments for the underlying model.

Note

Reference: Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S. and Poole, B., 2020. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456.

forward(
x,
sigma,
class_labels=None,
force_fp32=False,
**model_kwargs,
)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

round_sigma(sigma: float | List | Tensor)[source]#

Convert a given sigma value(s) to a tensor representation.

Parameters:

sigma (Union[float list, torch.Tensor]) – The sigma value(s) to convert.

Returns:

The tensor representation of the provided sigma value(s).

Return type:

torch.Tensor

sigma(t: float | Tensor)[source]#

Compute the sigma(t) value for a given t based on the VP formulation.

The function calculates the noise level schedule for the diffusion process based on the given parameters beta_d and beta_min.

Parameters:

t (Union[float, torch.Tensor]) – The timestep or set of timesteps for which to compute sigma(t).

Returns:

The computed sigma(t) value(s).

Return type:

torch.Tensor

sigma_inv(sigma: float | Tensor)[source]#

Compute the inverse of the sigma function for a given sigma.

This function effectively calculates t from a given sigma(t) based on the parameters beta_d and beta_min.

Parameters:

sigma (Union[float, torch.Tensor]) – The sigma(t) value or set of sigma(t) values for which to compute the inverse.

Returns:

The computed t value(s) corresponding to the provided sigma(t).

Return type:

torch.Tensor

class physicsnemo.models.diffusion.preconditioning.VPPrecondMetaData(
name: str = 'VPPrecond',
jit: bool = False,
cuda_graphs: bool = False,
amp: bool = False,
amp_cpu: bool = False,
amp_gpu: bool = True,
torch_fx: bool = False,
bf16: bool = False,
onnx: bool = False,
onnx_gpu: bool = None,
onnx_cpu: bool = None,
onnx_runtime: bool = False,
trt: bool = False,
var_dim: int = -1,
func_torch: bool = False,
auto_grad: bool = False,
)[source]#

Bases: ModelMetaData

VPPrecond meta data

class physicsnemo.models.diffusion.preconditioning.iDDPMPrecond(*args, **kwargs)[source]#

Bases: Module

Preconditioning corresponding to the improved DDPM (iDDPM) formulation.

Parameters:
  • img_resolution (int) – Image resolution.

  • img_channels (int) – Number of color channels.

  • label_dim (int) – Number of class labels, 0 = unconditional, by default 0.

  • use_fp16 (bool) – Execute the underlying model at FP16 precision?, by default False.

  • C_1 (float) – Timestep adjustment at low noise levels., by default 0.001.

  • C_2 (float) – Timestep adjustment at high noise levels., by default 0.008.

  • M (int) – Original number of timesteps in the DDPM formulation, by default 1000.

  • model_type (str) – Class name of the underlying model, by default “DhariwalUNet”.

  • **model_kwargs (dict) – Keyword arguments for the underlying model.

Note

Reference: Nichol, A.Q. and Dhariwal, P., 2021, July. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning (pp. 8162-8171). PMLR.

alpha_bar(j)[source]#

Compute the alpha_bar(j) value for a given j based on the iDDPM formulation.

Parameters:

j (Union[int, torch.Tensor]) – The timestep or set of timesteps for which to compute alpha_bar(j).

Returns:

The computed alpha_bar(j) value(s).

Return type:

torch.Tensor

forward(
x,
sigma,
class_labels=None,
force_fp32=False,
**model_kwargs,
)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

round_sigma(sigma, return_index=False)[source]#

Round the provided sigma value(s) to the nearest value(s) in a pre-defined set u.

Parameters:
  • sigma (Union[float, list, torch.Tensor]) – The sigma value(s) to round.

  • return_index (bool, optional) – Whether to return the index/indices of the rounded value(s) in u instead of the rounded value(s) themselves, by default False.

Returns:

The rounded sigma value(s) or their index/indices in u, depending on the value of return_index.

Return type:

torch.Tensor

class physicsnemo.models.diffusion.preconditioning.iDDPMPrecondMetaData(
name: str = 'iDDPMPrecond',
jit: bool = False,
cuda_graphs: bool = False,
amp: bool = False,
amp_cpu: bool = False,
amp_gpu: bool = True,
torch_fx: bool = False,
bf16: bool = False,
onnx: bool = False,
onnx_gpu: bool = None,
onnx_cpu: bool = None,
onnx_runtime: bool = False,
trt: bool = False,
var_dim: int = -1,
func_torch: bool = False,
auto_grad: bool = False,
)[source]#

Bases: ModelMetaData

iDDPMPrecond meta data

Weather / Climate Models#

class physicsnemo.models.dlwp.dlwp.DLWP(*args, **kwargs)[source]#

Bases: Module

A Convolutional model for Deep Learning Weather Prediction that works on Cubed-sphere grids.

This model expects the input to be of shape [N, C, 6, Res, Res]

Parameters:
  • nr_input_channels (int) – Number of channels in the input

  • nr_output_channels (int) – Number of channels in the output

  • nr_initial_channels (int) – Number of channels in the initial convolution. This governs the overall channels in the model.

  • activation_fn (str) – Activation function for the convolutions

  • depth (int) – Depth for the U-Net

  • clamp_activation (Tuple of ints, floats or None) – The min and max value used for torch.clamp()

Example

>>> model = physicsnemo.models.dlwp.DLWP(
... nr_input_channels=2,
... nr_output_channels=4,
... )
>>> input = torch.randn(4, 2, 6, 64, 64) # [N, C, F, Res, Res]
>>> output = model(input)
>>> output.size()
torch.Size([4, 4, 6, 64, 64])

Note

Reference: Weyn, Jonathan A., et al. “Sub‐seasonal forecasting with a large ensemble

of deep‐learning weather prediction models.” Journal of Advances in Modeling Earth Systems 13.7 (2021): e2021MS002502.

forward(cubed_sphere_input)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class physicsnemo.models.dlwp.dlwp.MetaData(
name: str = 'DLWP',
jit: bool = False,
cuda_graphs: bool = True,
amp: bool = False,
amp_cpu: bool = True,
amp_gpu: bool = True,
torch_fx: bool = False,
bf16: bool = False,
onnx: bool = False,
onnx_gpu: bool = None,
onnx_cpu: bool = None,
onnx_runtime: bool = False,
trt: bool = False,
var_dim: int = 1,
func_torch: bool = False,
auto_grad: bool = False,
)[source]#

Bases: ModelMetaData

class physicsnemo.models.dlwp_healpix.HEALPixRecUNet.HEALPixRecUNet(*args, **kwargs)[source]#

Bases: Module

Deep Learning Weather Prediction (DLWP) recurrent UNet model on the HEALPix mesh.

forward(
inputs: Sequence,
output_only_last=False,
) Tensor[source]#

Forward pass of the HEALPixUnet

Parameters:
  • inputs (Sequence) – Inputs to the model, of the form [prognostics|TISR|constants] [B, F, T, C, H, W] is the format for prognostics and TISR [F, C, H, W] is the format for constants

  • output_only_last (bool, optional) – If only the last dimension of the outputs should be returned

Returns:

th.Tensor

Return type:

Predicted outputs

property integration_steps#

Number of integration steps

reset()[source]#

Resets the state of the network

class physicsnemo.models.dlwp_healpix.HEALPixRecUNet.MetaData(
name: str = 'DLWP_HEALPixRec',
jit: bool = False,
cuda_graphs: bool = True,
amp: bool = False,
amp_cpu: bool = True,
amp_gpu: bool = True,
torch_fx: bool = False,
bf16: bool = False,
onnx: bool = False,
onnx_gpu: bool = None,
onnx_cpu: bool = None,
onnx_runtime: bool = False,
trt: bool = False,
var_dim: int = 1,
func_torch: bool = False,
auto_grad: bool = False,
)[source]#

Bases: ModelMetaData

Metadata for the DLWP HEALPix Model

class physicsnemo.models.graphcast.graph_cast_net.GraphCastNet(*args, **kwargs)[source]#

Bases: Module

GraphCast network architecture

Parameters:
  • multimesh_level (int, optional) – Level of the latent mesh, by default 6

  • multimesh (bool, optional) – If the latent mesh is a multimesh, by default True If True, the latent mesh includes the nodes corresponding to the specified mesh_level`and incorporates the edges from all mesh levels ranging from level 0 up to and including `mesh_level.

  • input_res (Tuple[int, int]) – Input resolution of the latitude-longitude grid

  • input_dim_grid_nodes (int, optional) – Input dimensionality of the grid node features, by default 474

  • input_dim_mesh_nodes (int, optional) – Input dimensionality of the mesh node features, by default 3

  • input_dim_edges (int, optional) – Input dimensionality of the edge features, by default 4

  • output_dim_grid_nodes (int, optional) – Final output dimensionality of the edge features, by default 227

  • processor_type (str, optional) – The type of processor used in this model. Available options are ‘MessagePassing’, and ‘GraphTransformer’, which correspond to the processors in GraphCast and GenCast, respectively. By default ‘MessagePassing’.

  • khop_neighbors (int, optional) – Number of khop neighbors used in the GraphTransformer. This option is ignored if ‘MessagePassing’ processor is used. By default 0.

  • processor_layers (int, optional) – Number of processor layers, by default 16

  • hidden_layers (int, optional) – Number of hiddel layers, by default 1

  • hidden_dim (int, optional) – Number of neurons in each hidden layer, by default 512

  • aggregation (str, optional) – Message passing aggregation method (“sum”, “mean”), by default “sum”

  • activation_fn (str, optional) – Type of activation function, by default “silu”

  • norm_type (str, optional) – Normalization type [“TELayerNorm”, “LayerNorm”]. Use “TELayerNorm” for optimal performance. By default “LayerNorm”.

  • use_cugraphops_encoder (bool, default=False) – Flag to select cugraphops kernels in encoder

  • use_cugraphops_processor (bool, default=False) – Flag to select cugraphops kernels in the processor

  • use_cugraphops_decoder (bool, default=False) – Flag to select cugraphops kernels in the decoder

  • do_concat_trick (bool, default=False) – Whether to replace concat+MLP with MLP+idx+sum

  • recompute_activation (bool, optional) – Flag for recomputing activation in backward to save memory, by default False. Currently, only SiLU is supported.

  • partition_size (int, default=1) – Number of process groups across which graphs are distributed. If equal to 1, the model is run in a normal Single-GPU configuration.

  • partition_group_name (str, default=None) – Name of process group across which graphs are distributed. If partition_size is set to 1, the model is run in a normal Single-GPU configuration and the specification of a process group is not necessary. If partitition_size > 1, passing no process group name leads to a parallelism across the default process group. Otherwise, the group size of a process group is expected to match partition_size.

  • use_lat_lon_partitioning (bool, default=False) – flag to specify whether all graphs (grid-to-mesh, mesh, mesh-to-grid) are partitioned based on lat-lon-coordinates of nodes or based on IDs.

  • expect_partitioned_input (bool, default=False) – Flag indicating whether the model expects the input to be already partitioned. This can be helpful e.g. in multi-step rollouts to avoid aggregating the output just to distribute it in the next step again.

  • global_features_on_rank_0 (bool, default=False) – Flag indicating whether the model expects the input to be present in its “global” form only on group_rank 0. During the input preparation phase, the model will take care of scattering the input accordingly onto all ranks of the process group across which the graph is partitioned. Note that only either this flag or expect_partitioned_input can be set at a time.

  • produce_aggregated_output (bool, default=True) – Flag indicating whether the model produces the aggregated output on each rank of the procress group across which the graph is distributed or whether the output is kept distributed. This can be helpful e.g. in multi-step rollouts to avoid aggregating the output just to distribute it in the next step again.

  • produce_aggregated_output_on_all_ranks (bool, default=True) – Flag indicating - if produce_aggregated_output is True - whether the model produces the aggregated output on each rank of the process group across which the group is distributed or only on group_rank 0. This can be helpful for computing the loss using global targets only on a single rank which can avoid either having to distribute the computation of a loss function.

Note

Based on these papers:

custom_forward(
grid_nfeat: Tensor,
) Tensor[source]#

GraphCast forward method with support for gradient checkpointing.

Parameters:

grid_nfeat (Tensor) – Node features of the latitude-longitude graph.

Returns:

grid_nfeat_finale – Predicted node features of the latitude-longitude graph.

Return type:

Tensor

decoder_forward(
mesh_efeat_processed: Tensor,
mesh_nfeat_processed: Tensor,
grid_nfeat_encoded: Tensor,
) Tensor[source]#

Forward method for the last layer of the processor, the decoder, and the final MLP.

Parameters:
  • mesh_efeat_processed (Tensor) – Multimesh edge features processed by the processor.

  • mesh_nfeat_processed (Tensor) – Multi-mesh node features processed by the processor.

  • grid_nfeat_encoded (Tensor) – The encoded node features for the latitude-longitude grid.

Returns:

grid_nfeat_finale – The final node features for the latitude-longitude grid.

Return type:

Tensor

encoder_forward(
grid_nfeat: Tensor,
) Tensor[source]#

Forward method for the embedder, encoder, and the first of the processor.

Parameters:

grid_nfeat (Tensor) – Node features for the latitude-longitude grid.

Returns:

  • mesh_efeat_processed (Tensor) – Processed edge features for the multimesh.

  • mesh_nfeat_processed (Tensor) – Processed node features for the multimesh.

  • grid_nfeat_encoded (Tensor) – Encoded node features for the latitude-longitude grid.

forward(grid_nfeat: Tensor) Tensor[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

prepare_input(
invar: Tensor,
expect_partitioned_input: bool,
global_features_on_rank_0: bool,
) Tensor[source]#

Prepares the input to the model in the required shape.

Parameters:
  • invar (Tensor) – Input in the shape [N, C, H, W].

  • expect_partitioned_input (bool) – flag indicating whether input is partioned according to graph partitioning scheme

  • global_features_on_rank_0 (bool) – Flag indicating whether input is in its “global” form only on group_rank 0 which requires a scatter operation beforehand. Note that only either this flag or expect_partitioned_input can be set at a time.

Returns:

Reshaped input.

Return type:

Tensor

prepare_output(
outvar: Tensor,
produce_aggregated_output: bool,
produce_aggregated_output_on_all_ranks: bool = True,
) Tensor[source]#

Prepares the output of the model in the shape [N, C, H, W].

Parameters:
  • outvar (Tensor) – Output of the final MLP of the model.

  • produce_aggregated_output (bool) – flag indicating whether output is gathered onto each rank or kept distributed

  • produce_aggregated_output_on_all_ranks (bool) – flag indicating whether output is gatherered on each rank or only gathered at group_rank 0, True by default and only valid if produce_aggregated_output is set.

Returns:

The reshaped output of the model.

Return type:

Tensor

set_checkpoint_decoder(checkpoint_flag: bool)[source]#

Sets checkpoint function for the last layer of the processor, the decoder, and the final MLP.

This function returns the appropriate checkpoint function based on the provided checkpoint_flag flag. If checkpoint_flag is True, the function returns the checkpoint function from PyTorch’s torch.utils.checkpoint. Otherwise, it returns an identity function that simply passes the inputs through the given layer.

Parameters:

checkpoint_flag (bool) – Whether to use checkpointing for gradient computation. Checkpointing can reduce memory usage during backpropagation at the cost of increased computation time.

Returns:

The selected checkpoint function to use for gradient computation.

Return type:

Callable

set_checkpoint_encoder(checkpoint_flag: bool)[source]#

Sets checkpoint function for the embedder, encoder, and the first of the processor.

This function returns the appropriate checkpoint function based on the provided checkpoint_flag flag. If checkpoint_flag is True, the function returns the checkpoint function from PyTorch’s torch.utils.checkpoint. Otherwise, it returns an identity function that simply passes the inputs through the given layer.

Parameters:

checkpoint_flag (bool) – Whether to use checkpointing for gradient computation. Checkpointing can reduce memory usage during backpropagation at the cost of increased computation time.

Returns:

The selected checkpoint function to use for gradient computation.

Return type:

Callable

set_checkpoint_model(checkpoint_flag: bool)[source]#

Sets checkpoint function for the entire model.

This function returns the appropriate checkpoint function based on the provided checkpoint_flag flag. If checkpoint_flag is True, the function returns the checkpoint function from PyTorch’s torch.utils.checkpoint. In this case, all the other gradient checkpoitings will be disabled. Otherwise, it returns an identity function that simply passes the inputs through the given layer.

Parameters:

checkpoint_flag (bool) – Whether to use checkpointing for gradient computation. Checkpointing can reduce memory usage during backpropagation at the cost of increased computation time.

Returns:

The selected checkpoint function to use for gradient computation.

Return type:

Callable

set_checkpoint_processor(checkpoint_segments: int)[source]#

Sets checkpoint function for the processor excluding the first and last layers.

This function returns the appropriate checkpoint function based on the provided checkpoint_segments flag. If checkpoint_segments is positive, the function returns the checkpoint function from PyTorch’s torch.utils.checkpoint, with number of checkpointing segments equal to checkpoint_segments. Otherwise, it returns an identity function that simply passes the inputs through the given layer.

Parameters:

checkpoint_segments (int) – Number of checkpointing segments for gradient computation. Checkpointing can reduce memory usage during backpropagation at the cost of increased computation time.

Returns:

The selected checkpoint function to use for gradient computation.

Return type:

Callable

to(
*args: Any,
**kwargs: Any,
) Self[source]#

Moves the object to the specified device, dtype, or format. This method moves the object and its underlying graph and graph features to the specified device, dtype, or format, and returns the updated object.

Parameters:
  • *args (Any) – Positional arguments to be passed to the torch._C._nn._parse_to function.

  • **kwargs (Any) – Keyword arguments to be passed to the torch._C._nn._parse_to function.

Returns:

The updated object after moving to the specified device, dtype, or format.

Return type:

GraphCastNet

class physicsnemo.models.graphcast.graph_cast_net.MetaData(
name: str = 'GraphCastNet',
jit: bool = False,
cuda_graphs: bool = False,
amp: bool = False,
amp_cpu: bool = False,
amp_gpu: bool = True,
torch_fx: bool = False,
bf16: bool = True,
onnx: bool = False,
onnx_gpu: bool = None,
onnx_cpu: bool = None,
onnx_runtime: bool = False,
trt: bool = False,
var_dim: int = -1,
func_torch: bool = False,
auto_grad: bool = False,
)[source]#

Bases: ModelMetaData

physicsnemo.models.graphcast.graph_cast_net.get_lat_lon_partition_separators(partition_size: int)[source]#

Utility Function to get separation intervals for lat-lon grid for partition_sizes of interest.

Parameters:

partition_size (int) – size of graph partition

class physicsnemo.models.fengwu.fengwu.Fengwu(*args, **kwargs)[source]#

Bases: Module

FengWu PyTorch impl of: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead - https://arxiv.org/pdf/2304.02948.pdf

Parameters:
  • img_size – Image size(Lat, Lon). Default: (721,1440)

  • pressure_level – Number of pressure_level. Default: 37

  • embed_dim (int) – Patch embedding dimension. Default: 192

  • patch_size (tuple[int]) – Patch token size. Default: (4,4)

  • num_heads (tuple[int]) – Number of attention heads in different layers.

  • window_size (tuple[int]) – Window size.

forward(x)[source]#
Parameters:
  • surface (torch.Tensor) – 2D n_lat=721, n_lon=1440, chans=4.

  • z (torch.Tensor) – 2D n_lat=721, n_lon=1440, chans=37.

  • r (torch.Tensor) – 2D n_lat=721, n_lon=1440, chans=37.

  • u (torch.Tensor) – 2D n_lat=721, n_lon=1440, chans=37.

  • v (torch.Tensor) – 2D n_lat=721, n_lon=1440, chans=37.

  • t (torch.Tensor) – 2D n_lat=721, n_lon=1440, chans=37.

prepare_input(surface, z, r, u, v, t)[source]#

Prepares the input to the model in the required shape. :param surface: 2D n_lat=721, n_lon=1440, chans=4. :type surface: torch.Tensor :param z: 2D n_lat=721, n_lon=1440, chans=37. :type z: torch.Tensor :param r: 2D n_lat=721, n_lon=1440, chans=37. :type r: torch.Tensor :param u: 2D n_lat=721, n_lon=1440, chans=37. :type u: torch.Tensor :param v: 2D n_lat=721, n_lon=1440, chans=37. :type v: torch.Tensor :param t: 2D n_lat=721, n_lon=1440, chans=37. :type t: torch.Tensor

class physicsnemo.models.fengwu.fengwu.MetaData(
name: str = 'Fengwu',
jit: bool = False,
cuda_graphs: bool = True,
amp: bool = True,
amp_cpu: bool = None,
amp_gpu: bool = None,
torch_fx: bool = False,
bf16: bool = False,
onnx: bool = False,
onnx_gpu: bool = True,
onnx_cpu: bool = False,
onnx_runtime: bool = True,
trt: bool = False,
var_dim: int = 1,
func_torch: bool = False,
auto_grad: bool = False,
)[source]#

Bases: ModelMetaData

class physicsnemo.models.pangu.pangu.MetaData(
name: str = 'Pangu',
jit: bool = False,
cuda_graphs: bool = True,
amp: bool = True,
amp_cpu: bool = None,
amp_gpu: bool = None,
torch_fx: bool = False,
bf16: bool = False,
onnx: bool = False,
onnx_gpu: bool = True,
onnx_cpu: bool = False,
onnx_runtime: bool = True,
trt: bool = False,
var_dim: int = 1,
func_torch: bool = False,
auto_grad: bool = False,
)[source]#

Bases: ModelMetaData

class physicsnemo.models.pangu.pangu.Pangu(*args, **kwargs)[source]#

Bases: Module

Pangu A PyTorch impl of: Pangu-Weather: A 3D High-Resolution Model for Fast and Accurate Global Weather Forecast - https://arxiv.org/abs/2211.02556

Parameters:
  • img_size (tuple[int]) – Image size [Lat, Lon].

  • patch_size (tuple[int]) – Patch token size [Lat, Lon].

  • embed_dim (int) – Patch embedding dimension. Default: 192

  • num_heads (tuple[int]) – Number of attention heads in different layers.

  • window_size (tuple[int]) – Window size.

forward(x)[source]#
Parameters:

x (torch.Tensor) – [batch, 4+3+5*13, lat, lon]

prepare_input(surface, surface_mask, upper_air)[source]#

Prepares the input to the model in the required shape. :param surface: 2D n_lat=721, n_lon=1440, chans=4. :type surface: torch.Tensor :param surface_mask: 2D n_lat=721, n_lon=1440, chans=3. :type surface_mask: torch.Tensor :param upper_air: 3D n_pl=13, n_lat=721, n_lon=1440, chans=5. :type upper_air: torch.Tensor

class physicsnemo.models.swinvrnn.swinvrnn.MetaData(
name: str = 'SwinRNN',
jit: bool = False,
cuda_graphs: bool = True,
amp: bool = True,
amp_cpu: bool = None,
amp_gpu: bool = None,
torch_fx: bool = False,
bf16: bool = False,
onnx: bool = False,
onnx_gpu: bool = True,
onnx_cpu: bool = False,
onnx_runtime: bool = True,
trt: bool = False,
var_dim: int = 1,
func_torch: bool = False,
auto_grad: bool = False,
)[source]#

Bases: ModelMetaData

class physicsnemo.models.swinvrnn.swinvrnn.SwinRNN(*args, **kwargs)[source]#

Bases: Module

Implementation of SwinRNN https://arxiv.org/abs/2205.13158 :param img_size: Image size [T, Lat, Lon]. :type img_size: Sequence[int], optional :param patch_size: Patch token size [T, Lat, Lon]. :type patch_size: Sequence[int], optional :param in_chans: number of input channels. :type in_chans: int, optional :param out_chans: number of output channels. :type out_chans: int, optional :param embed_dim: number of embed channels. :type embed_dim: int, optional :param num_groups: number of groups to separate the channels into. :type num_groups: Sequence[int] | int, optional :param num_heads: Number of attention heads. :type num_heads: int, optional :param window_size: Local window size. :type window_size: int | tuple[int], optional

forward(x: Tensor)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.