PhysicsNeMo Models#
Basics#
PhysicsNeMo contains its own Model class for constructing neural networks. This model class
is built on top of PyTorch’s nn.Module and can be used interchangeably within the
PyTorch ecosystem. Using PhysicsNeMo models allows you to leverage various features of
PhysicsNeMo aimed at improving performance and ease of use. These features include, but are
not limited to, model zoo, automatic mixed-precision, CUDA Graphs, and easy checkpointing.
We discuss each of these features in the following sections.
Model Zoo#
PhysicsNeMo contains several optimized, customizable and easy-to-use models. These include some very general models like Fourier Neural Operators (FNOs), ResNet, and Graph Neural Networks (GNNs) as well as domain-specific models like Deep Learning Weather Prediction (DLWP) and Spherical Fourier Neural Operators (SFNO).
For a list of currently available models, please refer the models on GitHub.
Below are some simple examples of how to use these models.
>>> import torch
>>> from physicsnemo.models.mlp.fully_connected import FullyConnected
>>> model = FullyConnected(in_features=32, out_features=64)
>>> input = torch.randn(128, 32)
>>> output = model(input)
>>> output.shape
torch.Size([128, 64])
>>> import torch
>>> from physicsnemo.models.fno.fno import FNO
>>> model = FNO(
        in_channels=4,
        out_channels=3,
        decoder_layers=2,
        decoder_layer_size=32,
        dimension=2,
        latent_channels=32,
        num_fno_layers=2,
        padding=0,
    )
>>> input = torch.randn(32, 4, 32, 32) #(N, C, H, W)
>>> output = model(input)
>>> output.size()
torch.Size([32, 3, 32, 32])
How to write your own PhysicsNeMo model#
There are a few different ways to construct a PhysicsNeMo model. If you are a seasoned PyTorch user, the easiest way would be to write your model using the optimized layers and utilities from PhysicsNeMo or Pytorch. Let’s take a look at a simple example of a UNet model first showing a simple PyTorch implementation and then a PhysicsNeMo implementation that supports CUDA Graphs and Automatic Mixed-Precision.
import torch.nn as nn
class UNet(nn.Module):
    def __init__(self, in_channels=1, out_channels=1):
        super(UNet, self).__init__()
        self.enc1 = self.conv_block(in_channels, 64)
        self.enc2 = self.conv_block(64, 128)
        self.dec1 = self.upconv_block(128, 64)
        self.final = nn.Conv2d(64, out_channels, kernel_size=1)
    def conv_block(self, in_channels, out_channels):
        return nn.Sequential(
            nn.Conv2d(in_channels, out_channels, 3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2)
        )
    def upconv_block(self, in_channels, out_channels):
        return nn.Sequential(
            nn.ConvTranspose2d(in_channels, out_channels, 2, stride=2),
            nn.Conv2d(out_channels, out_channels, 3, padding=1),
            nn.ReLU(inplace=True)
        )
    def forward(self, x):
        x1 = self.enc1(x)
        x2 = self.enc2(x1)
        x = self.dec1(x2)
        return self.final(x)
Now we show this model rewritten in PhysicsNeMo. First, let us subclass the model from
physicsnemo.Module instead of torch.nn.Module. The
physicsnemo.Module class acts like a direct replacement for the
torch.nn.Module and provides additional functionality for saving and loading
checkpoints, etc. Refer to the API docs of physicsnemo.Module for further
details. Additionally, we will add metadata to the model to capture the optimizations
that this model supports. In this case we will enable CUDA Graphs and Automatic Mixed-Precision.
from dataclasses import dataclass
import physicsnemo
import torch.nn as nn
@dataclass
class UNetMetaData(physicsnemo.ModelMetaData):
    name: str = "UNet"
    # Optimization
    jit: bool = True
    cuda_graphs: bool = True
    amp_cpu: bool = True
    amp_gpu: bool = True
class UNet(physicsnemo.Module):
    def __init__(self, in_channels=1, out_channels=1):
        super(UNet, self).__init__(meta=UNetMetaData())
        self.enc1 = self.conv_block(in_channels, 64)
        self.enc2 = self.conv_block(64, 128)
        self.dec1 = self.upconv_block(128, 64)
        self.final = nn.Conv2d(64, out_channels, kernel_size=1)
    def conv_block(self, in_channels, out_channels):
        return nn.Sequential(
            nn.Conv2d(in_channels, out_channels, 3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2)
        )
    def upconv_block(self, in_channels, out_channels):
        return nn.Sequential(
            nn.ConvTranspose2d(in_channels, out_channels, 2, stride=2),
            nn.Conv2d(out_channels, out_channels, 3, padding=1),
            nn.ReLU(inplace=True)
        )
    def forward(self, x):
        x1 = self.enc1(x)
        x2 = self.enc2(x1)
        x = self.dec1(x2)
        return self.final(x)
Now that we have our PhysicsNeMo model, we can make use of these optimizations using the
physicsnemo.utils.StaticCaptureTraining decorator. This decorator will capture the
training step function and optimize it for the specified optimizations.
import torch
from physicsnemo.utils import StaticCaptureTraining
model = UNet().to("cuda")
input = torch.randn(8, 1, 128, 128).to("cuda")
output = torch.zeros(8, 1, 64, 64).to("cuda")
optim = torch.optim.Adam(model.parameters(), lr=0.001)
# Create training step function with optimization wrapper
# StaticCaptureTraining calls `backward` on the loss and
# `optimizer.step()` so you don't have to do that
# explicitly.
@StaticCaptureTraining(
    model=model,
    optim=optim,
    cuda_graph_warmup=11,
)
def training_step(invar, outvar):
    predvar = model(invar)
    loss = torch.sum(torch.pow(predvar - outvar, 2))
    return loss
# Sample training loop
for i in range(20):
    # In place copy of input and output to support cuda graphs
    input.copy_(torch.randn(8, 1, 128, 128).to("cuda"))
    output.copy_(torch.zeros(8, 1, 64, 64).to("cuda"))
    # Run training step
    loss = training_step(input, output)
For the simple model above, you can observe ~1.1x speed-up due to CUDA Graphs and AMP. The speed-up observed changes from model to model and is typically greater for more complex models.
Note
The ModelMetaData and physicsnemo.Module do not make the model
support CUDA Graphs, AMP, etc. optimizations automatically. The user is responsible
to write the model code that enables each of these optimizations.
Models in the PhysicsNeMo Model Zoo are written to support many of these optimizations
and checked against PhysicsNeMo’s CI to ensure that they work correctly.
Note
The StaticCaptureTraining decorator is still under development and may be
refactored in the future.
Converting PyTorch Models to PhysicsNeMo Models#
In the above example we show constructing a PhysicsNeMo model from scratch. However, you
can also convert existing PyTorch models to PhysicsNeMo models in order to leverage
PhysicsNeMo features. To do this, you can use the Module.from_torch method as shown
below.
from dataclasses import dataclass
import physicsnemo
import torch.nn as nn
class TorchModel(nn.Module):
    def __init__(self):
        super(TorchModel, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)
    def forward(self, x):
        x = self.conv1(x)
        return self.conv2(x)
@dataclass
class ConvMetaData(ModelMetaData):
    name: str = "UNet"
    # Optimization
    jit: bool = True
    cuda_graphs: bool = True
    amp_cpu: bool = True
    amp_gpu: bool = True
PhysicsNeMoModel = physicsnemo.Module.from_torch(TorchModel, meta=ConvMetaData())
Saving and Loading PhysicsNeMo Models#
As mentioned above, PhysicsNeMo models are interoperable with PyTorch models. This means that
you can save and load PhysicsNeMo models using the standard PyTorch APIs however, we provide
a few additional utilities to make this process easier. A key challenge in saving and
loading models is keeping track of the model metadata such as layer sizes, etc. PhysicsNeMo
models can be saved with this metadata to a custom .mdlus file. These files allow
for easy loading and instantiation of the model. We show two examples of this below.
The first example shows saving and loading a model from an already instantiated model.
 >>> from physicsnemo.models.mlp.fully_connected import FullyConnected
 >>> model = FullyConnected(in_features=32, out_features=64)
 >>> model.save("model.mdlus") # Save model to .mdlus file
 >>> model.load("model.mdlus") # Load model weights from .mdlus file from already instantiated model
 >>> model
 FullyConnected(
  (layers): ModuleList(
    (0): FCLayer(
      (activation_fn): SiLU()
      (linear): Linear(in_features=32, out_features=512, bias=True)
    )
    (1-5): 5 x FCLayer(
      (activation_fn): SiLU()
      (linear): Linear(in_features=512, out_features=512, bias=True)
    )
  )
  (final_layer): FCLayer(
    (activation_fn): Identity()
    (linear): Linear(in_features=512, out_features=64, bias=True)
  )
)
The second example shows loading a model from a .mdlus file without having to
instantiate the model first. We note that in this case we don’t know the class or
parameters to pass to the constructor of the model. However, we can still load the
model from the .mdlus file.
 >>> from physicsnemo import Module
 >>> fc_model = Module.from_checkpoint("model.mdlus") # Instantiate model from .mdlus file.
 >>> fc_model
 FullyConnected(
  (layers): ModuleList(
    (0): FCLayer(
      (activation_fn): SiLU()
      (linear): Linear(in_features=32, out_features=512, bias=True)
    )
    (1-5): 5 x FCLayer(
      (activation_fn): SiLU()
      (linear): Linear(in_features=512, out_features=512, bias=True)
    )
  )
  (final_layer): FCLayer(
    (activation_fn): Identity()
    (linear): Linear(in_features=512, out_features=64, bias=True)
  )
)
Note
In order to make use of this functionality, the model must have .json serializable
inputs to the __init__ function. It is highly recommended that all PhysicsNeMo
models be developed with this requirement in mind.
Note
Using Module.from_checkpoint will not work if the model has any buffers or
parameters that are registered outside of the model’s __init__ function due to
the above requirement. In that case, one should use Module.load, or ensure
that all model parameters and buffers are registered inside __init__.
PhysicsNeMo Model Registry and Entry Points#
PhysicsNeMo contains a model registry that allows for easy access and ingestion of models. Below is a simple example of how to use the model registry to obtain a model class.
>>> from physicsnemo.registry import ModelRegistry
>>> model_registry = ModelRegistry()
>>> model_registry.list_models()
['AFNO', 'DLWP', 'FNO', 'FullyConnected', 'GraphCastNet', 'MeshGraphNet', 'One2ManyRNN', 'Pix2Pix', 'SFNO', 'SRResNet']
>>> FullyConnected = model_registry.factory("FullyConnected")
>>> model = FullyConnected(in_features=32, out_features=64)
The model registry also allows exposing models via entry points. This allows for
integration of models into the PhysicsNeMo ecosystem. For example, suppose you have a
package MyPackage that contains a model MyModel. You can expose this model
to the PhysicsNeMo registry by adding an entry point to your toml file. For
example, suppose your package structure is as follows:
# setup.py
from setuptools import setup, find_packages
setup()
# pyproject.toml
[build-system]
requires = ["setuptools", "wheel"]
build-backend = "setuptools.build_meta"
[project]
name = "MyPackage"
description = "My Neural Network Zoo."
version = "0.1.0"
[project.entry-points."physicsnemo.models"]
MyPhysicsNeMoModel = "mypackage.models.MyPhysicsNeMoModel:MyPhysicsNeMoModel"
# mypackage/models.py
import torch.nn as nn
from physicsnemo.models import Module
class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)
    def forward(self, x):
        x = self.conv1(x)
        return self.conv2(x)
MyPhysicsNeMoModel = Module.from_pytorch(MyModel)
Once this package is installed, you can access the model via the PhysicsNeMo model registry.
>>> from physicsnemo.registry import ModelRegistry
>>> model_registry = ModelRegistry()
>>> model_registry.list_models()
['MyPhysicsNeMoModel', 'AFNO', 'DLWP', 'FNO', 'FullyConnected', 'GraphCastNet', 'MeshGraphNet', 'One2ManyRNN', 'Pix2Pix', 'SFNO', 'SRResNet']
>>> MyPhysicsNeMoModel = model_registry.factory("MyPhysicsNeMoModel")
For more information on entry points and potential use cases, see this blog post.
Fully Connected Network#
- class physicsnemo.models.mlp.fully_connected.FullyConnected(*args, **kwargs)[source]#
- Bases: - Module- A densely-connected MLP architecture - Parameters:
- in_features (int, optional) – Size of input features, by default 512 
- layer_size (int, optional) – Size of every hidden layer, by default 512 
- out_features (int, optional) – Size of output features, by default 512 
- num_layers (int, optional) – Number of hidden layers, by default 6 
- activation_fn (Union[str, List[str]], optional) – Activation function to use, by default ‘silu’ 
- skip_connections (bool, optional) – Add skip connections every 2 hidden layers, by default False 
- adaptive_activations (bool, optional) – Use an adaptive activation function, by default False 
- weight_norm (bool, optional) – Use weight norm on fully connected layers, by default False 
- weight_fact (bool, optional) – Use weight factorization on fully connected layers, by default False 
 
 - Example - >>> model = physicsnemo.models.mlp.FullyConnected(in_features=32, out_features=64) >>> input = torch.randn(128, 32) >>> output = model(input) >>> output.size() torch.Size([128, 64]) - forward(x: Tensor) Tensor[source]#
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 
- class physicsnemo.models.mlp.fully_connected.MetaData(
- name: str = 'FullyConnected',
- jit: bool = True,
- cuda_graphs: bool = True,
- amp: bool = True,
- amp_cpu: bool = None,
- amp_gpu: bool = None,
- torch_fx: bool = True,
- bf16: bool = False,
- onnx: bool = True,
- onnx_gpu: bool = None,
- onnx_cpu: bool = None,
- onnx_runtime: bool = True,
- trt: bool = False,
- var_dim: int = -1,
- func_torch: bool = True,
- auto_grad: bool = True,
- Bases: - ModelMetaData
Fourier Neural Operators#
- class physicsnemo.models.fno.fno.FNO(*args, **kwargs)[source]#
- Bases: - Module- Fourier neural operator (FNO) model. - Note - The FNO architecture supports options for 1D, 2D, 3D and 4D fields which can be controlled using the dimension parameter. - Parameters:
- in_channels (int) – Number of input channels 
- out_channels (int) – Number of output channels 
- decoder_layers (int, optional) – Number of decoder layers, by default 1 
- decoder_layer_size (int, optional) – Number of neurons in decoder layers, by default 32 
- decoder_activation_fn (str, optional) – Activation function for decoder, by default “silu” 
- dimension (int) – Model dimensionality (supports 1, 2, 3). 
- latent_channels (int, optional) – Latent features size in spectral convolutions, by default 32 
- num_fno_layers (int, optional) – Number of spectral convolutional layers, by default 4 
- num_fno_modes (Union[int, List[int]], optional) – Number of Fourier modes kept in spectral convolutions, by default 16 
- padding (int, optional) – Domain padding for spectral convolutions, by default 8 
- padding_type (str, optional) – Type of padding for spectral convolutions, by default “constant” 
- activation_fn (str, optional) – Activation function, by default “gelu” 
- coord_features (bool, optional) – Use coordinate grid as additional feature map, by default True 
 
 - Example - >>> # define the 2d FNO model >>> model = physicsnemo.models.fno.FNO( ... in_channels=4, ... out_channels=3, ... decoder_layers=2, ... decoder_layer_size=32, ... dimension=2, ... latent_channels=32, ... num_fno_layers=2, ... padding=0, ... ) >>> input = torch.randn(32, 4, 32, 32) #(N, C, H, W) >>> output = model(input) >>> output.size() torch.Size([32, 3, 32, 32]) - Note - Reference: Li, Zongyi, et al. “Fourier neural operator for parametric partial differential equations.” arXiv preprint arXiv:2010.08895 (2020). - forward(x: Tensor) Tensor[source]#
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 
- class physicsnemo.models.fno.fno.FNO1DEncoder(
- in_channels: int = 1,
- num_fno_layers: int = 4,
- fno_layer_size: int = 32,
- num_fno_modes: int | List[int] = 16,
- padding: int | List[int] = 8,
- padding_type: str = 'constant',
- activation_fn: Module = GELU(approximate='none'),
- coord_features: bool = True,
- Bases: - Module- 1D Spectral encoder for FNO - Parameters:
- in_channels (int, optional) – Number of input channels, by default 1 
- num_fno_layers (int, optional) – Number of spectral convolutional layers, by default 4 
- fno_layer_size (int, optional) – Latent features size in spectral convolutions, by default 32 
- num_fno_modes (Union[int, List[int]], optional) – Number of Fourier modes kept in spectral convolutions, by default 16 
- padding (Union[int, List[int]], optional) – Domain padding for spectral convolutions, by default 8 
- padding_type (str, optional) – Type of padding for spectral convolutions, by default “constant” 
- activation_fn (nn.Module, optional) – Activation function, by default nn.GELU 
- coord_features (bool, optional) – Use coordinate grid as additional feature map, by default True 
 
 - build_fno(num_fno_modes: List[int]) None[source]#
- construct FNO block. :param num_fno_modes: Number of Fourier modes kept in spectral convolutions :type num_fno_modes: List[int] 
 - forward(x: Tensor) Tensor[source]#
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 - grid_to_points(
- value: Tensor,
- converting from grid based (image) to point based representation - Parameters:
- value (Meshgrid tensor) 
- Returns:
- Tensor, meshgrid shape 
- Return type:
- Tuple 
 
 
- class physicsnemo.models.fno.fno.FNO2DEncoder(
- in_channels: int = 1,
- num_fno_layers: int = 4,
- fno_layer_size: int = 32,
- num_fno_modes: int | List[int] = 16,
- padding: int | List[int] = 8,
- padding_type: str = 'constant',
- activation_fn: Module = GELU(approximate='none'),
- coord_features: bool = True,
- Bases: - Module- 2D Spectral encoder for FNO - Parameters:
- in_channels (int, optional) – Number of input channels, by default 1 
- num_fno_layers (int, optional) – Number of spectral convolutional layers, by default 4 
- fno_layer_size (int, optional) – Latent features size in spectral convolutions, by default 32 
- num_fno_modes (Union[int, List[int]], optional) – Number of Fourier modes kept in spectral convolutions, by default 16 
- padding (Union[int, List[int]], optional) – Domain padding for spectral convolutions, by default 8 
- padding_type (str, optional) – Type of padding for spectral convolutions, by default “constant” 
- activation_fn (nn.Module, optional) – Activation function, by default nn.GELU 
- coord_features (bool, optional) – Use coordinate grid as additional feature map, by default True 
 
 - build_fno(num_fno_modes: List[int]) None[source]#
- construct FNO block. :param num_fno_modes: Number of Fourier modes kept in spectral convolutions :type num_fno_modes: List[int] 
 - forward(x: Tensor) Tensor[source]#
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 - grid_to_points(
- value: Tensor,
- converting from grid based (image) to point based representation - Parameters:
- value (Meshgrid tensor) 
- Returns:
- Tensor, meshgrid shape 
- Return type:
- Tuple 
 
 
- class physicsnemo.models.fno.fno.FNO3DEncoder(
- in_channels: int = 1,
- num_fno_layers: int = 4,
- fno_layer_size: int = 32,
- num_fno_modes: int | List[int] = 16,
- padding: int | List[int] = 8,
- padding_type: str = 'constant',
- activation_fn: Module = GELU(approximate='none'),
- coord_features: bool = True,
- Bases: - Module- 3D Spectral encoder for FNO - Parameters:
- in_channels (int, optional) – Number of input channels, by default 1 
- num_fno_layers (int, optional) – Number of spectral convolutional layers, by default 4 
- fno_layer_size (int, optional) – Latent features size in spectral convolutions, by default 32 
- num_fno_modes (Union[int, List[int]], optional) – Number of Fourier modes kept in spectral convolutions, by default 16 
- padding (Union[int, List[int]], optional) – Domain padding for spectral convolutions, by default 8 
- padding_type (str, optional) – Type of padding for spectral convolutions, by default “constant” 
- activation_fn (nn.Module, optional) – Activation function, by default nn.GELU 
- coord_features (bool, optional) – Use coordinate grid as additional feature map, by default True 
 
 - build_fno(num_fno_modes: List[int]) None[source]#
- construct FNO block. :param num_fno_modes: Number of Fourier modes kept in spectral convolutions :type num_fno_modes: List[int] 
 - forward(x: Tensor) Tensor[source]#
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 - grid_to_points(
- value: Tensor,
- converting from grid based (image) to point based representation - Parameters:
- value (Meshgrid tensor) 
- Returns:
- Tensor, meshgrid shape 
- Return type:
- Tuple 
 
 
- class physicsnemo.models.fno.fno.FNO4DEncoder(
- in_channels: int = 1,
- num_fno_layers: int = 4,
- fno_layer_size: int = 32,
- num_fno_modes: int | List[int] = 16,
- padding: int | List[int] = 8,
- padding_type: str = 'constant',
- activation_fn: Module = GELU(approximate='none'),
- coord_features: bool = True,
- Bases: - Module- 4D Spectral encoder for FNO - Parameters:
- in_channels (int, optional) – Number of input channels, by default 1 
- num_fno_layers (int, optional) – Number of spectral convolutional layers, by default 4 
- fno_layer_size (int, optional) – Latent features size in spectral convolutions, by default 32 
- num_fno_modes (Union[int, List[int]], optional) – Number of Fourier modes kept in spectral convolutions, by default 16 
- padding (Union[int, List[int]], optional) – Domain padding for spectral convolutions, by default 8 
- padding_type (str, optional) – Type of padding for spectral convolutions, by default “constant” 
- activation_fn (nn.Module, optional) – Activation function, by default nn.GELU 
- coord_features (bool, optional) – Use coordinate grid as additional feature map, by default True 
 
 - build_fno(num_fno_modes: List[int]) None[source]#
- construct FNO block. :param num_fno_modes: Number of Fourier modes kept in spectral convolutions :type num_fno_modes: List[int] 
 - forward(x: Tensor) Tensor[source]#
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 - grid_to_points(
- value: Tensor,
- converting from grid based (image) to point based representation - Parameters:
- value (Meshgrid tensor) 
- Returns:
- Tensor, meshgrid shape 
- Return type:
- Tuple 
 
 
- class physicsnemo.models.fno.fno.MetaData(
- name: str = 'FourierNeuralOperator',
- jit: bool = True,
- cuda_graphs: bool = True,
- amp: bool = False,
- amp_cpu: bool = None,
- amp_gpu: bool = None,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = False,
- onnx_cpu: bool = False,
- onnx_runtime: bool = False,
- trt: bool = False,
- var_dim: int = 1,
- func_torch: bool = False,
- auto_grad: bool = False,
- Bases: - ModelMetaData
- class physicsnemo.models.afno.afno.AFNO(*args, **kwargs)[source]#
- Bases: - Module- Adaptive Fourier neural operator (AFNO) model. - Note - AFNO is a model that is designed for 2D images only. - Parameters:
- inp_shape (List[int]) – Input image dimensions [height, width] 
- in_channels (int) – Number of input channels 
- out_channels (int) – Number of output channels 
- patch_size (List[int], optional) – Size of image patches, by default [16, 16] 
- embed_dim (int, optional) – Embedded channel size, by default 256 
- depth (int, optional) – Number of AFNO layers, by default 4 
- mlp_ratio (float, optional) – Ratio of layer MLP latent variable size to input feature size, by default 4.0 
- drop_rate (float, optional) – Drop out rate in layer MLPs, by default 0.0 
- num_blocks (int, optional) – Number of blocks in the block-diag frequency weight matrices, by default 16 
- sparsity_threshold (float, optional) – Sparsity threshold (softshrink) of spectral features, by default 0.01 
- hard_thresholding_fraction (float, optional) – Threshold for limiting number of modes used [0,1], by default 1 
 
 - Example - >>> model = physicsnemo.models.afno.AFNO( ... inp_shape=[32, 32], ... in_channels=2, ... out_channels=1, ... patch_size=(8, 8), ... embed_dim=16, ... depth=2, ... num_blocks=2, ... ) >>> input = torch.randn(32, 2, 32, 32) #(N, C, H, W) >>> output = model(input) >>> output.size() torch.Size([32, 1, 32, 32]) - Note - Reference: Guibas, John, et al. “Adaptive fourier neural operators: Efficient token mixers for transformers.” arXiv preprint arXiv:2111.13587 (2021). - forward(x: Tensor) Tensor[source]#
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 
- class physicsnemo.models.afno.afno.AFNO2DLayer(
- hidden_size: int,
- num_blocks: int = 8,
- sparsity_threshold: float = 0.01,
- hard_thresholding_fraction: float = 1,
- hidden_size_factor: int = 1,
- Bases: - Module- AFNO spectral convolution layer - Parameters:
- hidden_size (int) – Feature dimensionality 
- num_blocks (int, optional) – Number of blocks used in the block diagonal weight matrix, by default 8 
- sparsity_threshold (float, optional) – Sparsity threshold (softshrink) of spectral features, by default 0.01 
- hard_thresholding_fraction (float, optional) – Threshold for limiting number of modes used [0,1], by default 1 
- hidden_size_factor (int, optional) – Factor to increase spectral features by after weight multiplication, by default 1 
 
 - forward(x: Tensor) Tensor[source]#
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 
- class physicsnemo.models.afno.afno.AFNOMlp(
- in_features: int,
- latent_features: int,
- out_features: int,
- activation_fn: Module = GELU(approximate='none'),
- drop: float = 0.0,
- Bases: - Module- Fully-connected Multi-layer perception used inside AFNO - Parameters:
- in_features (int) – Input feature size 
- latent_features (int) – Latent feature size 
- out_features (int) – Output feature size 
- activation_fn (nn.Module, optional) – Activation function, by default nn.GELU 
- drop (float, optional) – Drop out rate, by default 0.0 
 
 - forward(x: Tensor) Tensor[source]#
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 
- class physicsnemo.models.afno.afno.Block(
- embed_dim: int,
- num_blocks: int = 8,
- mlp_ratio: float = 4.0,
- drop: float = 0.0,
- activation_fn: ~torch.nn.modules.module.Module = GELU(approximate='none'),
- norm_layer: ~torch.nn.modules.module.Module = <class 'torch.nn.modules.normalization.LayerNorm'>,
- double_skip: bool = True,
- sparsity_threshold: float = 0.01,
- hard_thresholding_fraction: float = 1.0,
- Bases: - Module- AFNO block, spectral convolution and MLP - Parameters:
- embed_dim (int) – Embedded feature dimensionality 
- num_blocks (int, optional) – Number of blocks used in the block diagonal weight matrix, by default 8 
- mlp_ratio (float, optional) – Ratio of MLP latent variable size to input feature size, by default 4.0 
- drop (float, optional) – Drop out rate in MLP, by default 0.0 
- activation_fn (nn.Module, optional) – Activation function used in MLP, by default nn.GELU 
- norm_layer (nn.Module, optional) – Normalization function, by default nn.LayerNorm 
- double_skip (bool, optional) – Residual, by default True 
- sparsity_threshold (float, optional) – Sparsity threshold (softshrink) of spectral features, by default 0.01 
- hard_thresholding_fraction (float, optional) – Threshold for limiting number of modes used [0,1], by default 1 
 
 - forward(x: Tensor) Tensor[source]#
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 
- class physicsnemo.models.afno.afno.MetaData(
- name: str = 'AFNO',
- jit: bool = False,
- cuda_graphs: bool = True,
- amp: bool = True,
- amp_cpu: bool = None,
- amp_gpu: bool = None,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = True,
- onnx_cpu: bool = False,
- onnx_runtime: bool = True,
- trt: bool = False,
- var_dim: int = 1,
- func_torch: bool = False,
- auto_grad: bool = False,
- Bases: - ModelMetaData
- class physicsnemo.models.afno.afno.PatchEmbed(
- inp_shape: List[int],
- in_channels: int,
- patch_size: List[int] = [16, 16],
- embed_dim: int = 256,
- Bases: - Module- Patch embedding layer - Converts 2D patch into a 1D vector for input to AFNO - Parameters:
- inp_shape (List[int]) – Input image dimensions [height, width] 
- in_channels (int) – Number of input channels 
- patch_size (List[int], optional) – Size of image patches, by default [16, 16] 
- embed_dim (int, optional) – Embedded channel size, by default 256 
 
 - forward(x: Tensor) Tensor[source]#
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 
- class physicsnemo.models.afno.modafno.Block(
- embed_dim: int,
- mod_dim: int,
- num_blocks: int = 8,
- mlp_ratio: float = 4.0,
- drop: float = 0.0,
- activation_fn: ~torch.nn.modules.module.Module = GELU(approximate='none'),
- norm_layer: ~torch.nn.modules.module.Module = <class 'torch.nn.modules.normalization.LayerNorm'>,
- double_skip: bool = True,
- sparsity_threshold: float = 0.01,
- hard_thresholding_fraction: float = 1.0,
- modulate_filter: bool = True,
- modulate_mlp: bool = True,
- scale_shift_mode: ~typing.Literal['complex',
- 'real'] = 'real',
- Bases: - Module- AFNO block, spectral convolution and MLP - Parameters:
- embed_dim (int) – Embedded feature dimensionality 
- mod_dim (int) – Modululation input dimensionality 
- num_blocks (int, optional) – Number of blocks used in the block diagonal weight matrix, by default 8 
- mlp_ratio (float, optional) – Ratio of MLP latent variable size to input feature size, by default 4.0 
- drop (float, optional) – Drop out rate in MLP, by default 0.0 
- activation_fn (nn.Module, optional) – Activation function used in MLP, by default nn.GELU 
- norm_layer (nn.Module, optional) – Normalization function, by default nn.LayerNorm 
- double_skip (bool, optional) – Residual, by default True 
- sparsity_threshold (float, optional) – Sparsity threshold (softshrink) of spectral features, by default 0.01 
- hard_thresholding_fraction (float, optional) – Threshold for limiting number of modes used [0,1], by default 1 
- modulate_filter (bool, optional) – Whether to compute the modulation for the FFT filter 
- modulate_mlp (bool, optional) – Whether to compute the modulation for the MLP 
- scale_shift_mode (["complex", "real"]) – If ‘complex’ (default), compute the scale-shift operation using complex operations. If ‘real’, use real operations. 
 
 - forward(
- x: Tensor,
- mod_embed: Tensor,
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 
- class physicsnemo.models.afno.modafno.MetaData(
- name: str = 'ModAFNO',
- jit: bool = False,
- cuda_graphs: bool = True,
- amp: bool = True,
- amp_cpu: bool = None,
- amp_gpu: bool = None,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = True,
- onnx_cpu: bool = False,
- onnx_runtime: bool = True,
- trt: bool = False,
- var_dim: int = 1,
- func_torch: bool = False,
- auto_grad: bool = False,
- Bases: - ModelMetaData
- class physicsnemo.models.afno.modafno.ModAFNO(*args, **kwargs)[source]#
- Bases: - Module- Modulated Adaptive Fourier neural operator (ModAFNO) model. - Parameters:
- inp_shape (List[int]) – Input image dimensions [height, width] 
- in_channels (int, optional) – Number of input channels 
- out_channels (int, optional) – Number of output channels 
- embed_model (dict, optional) – Dictionary of arguments to pass to the ModEmbedNet embedding model 
- patch_size (List[int], optional) – Size of image patches, by default [16, 16] 
- embed_dim (int, optional) – Embedded channel size, by default 256 
- mod_dim (int) – Modululation input dimensionality 
- modulate_filter (bool, optional) – Whether to compute the modulation for the FFT filter, by default True 
- modulate_mlp (bool, optional) – Whether to compute the modulation for the MLP, by default True 
- scale_shift_mode (["complex", "real"]) – If ‘complex’ (default), compute the scale-shift operation using complex operations. If ‘real’, use real operations. 
- depth (int, optional) – Number of AFNO layers, by default 4 
- mlp_ratio (float, optional) – Ratio of layer MLP latent variable size to input feature size, by default 4.0 
- drop_rate (float, optional) – Drop out rate in layer MLPs, by default 0.0 
- num_blocks (int, optional) – Number of blocks in the block-diag frequency weight matrices, by default 16 
- sparsity_threshold (float, optional) – Sparsity threshold (softshrink) of spectral features, by default 0.01 
- hard_thresholding_fraction (float, optional) – Threshold for limiting number of modes used [0,1], by default 1 
- below. (The default settings correspond to the implementation in the paper cited) 
 
 - Example - >>> import torch >>> from physicsnemo.models.afno import ModAFNO >>> model = ModAFNO( ... inp_shape=[32, 32], ... in_channels=2, ... out_channels=1, ... patch_size=(8, 8), ... embed_dim=16, ... depth=2, ... num_blocks=2, ... ) >>> input = torch.randn(32, 2, 32, 32) #(N, C, H, W) >>> time = torch.full((32, 1), 0.5) >>> output = model(input, time) >>> output.size() torch.Size([32, 1, 32, 32]) - Note - Reference: Leinonen et al. “Modulated Adaptive Fourier Neural Operators for Temporal Interpolation of Weather Forecasts.” arXiv preprint arXiv:TODO (2024). 
- class physicsnemo.models.afno.modafno.ModAFNO2DLayer(
- hidden_size: int,
- mod_features: int,
- num_blocks: int = 8,
- sparsity_threshold: float = 0.01,
- hard_thresholding_fraction: float = 1,
- hidden_size_factor: int = 1,
- scale_shift_kwargs: dict | None = None,
- scale_shift_mode: Literal['complex', 'real'] = 'complex',
- Bases: - AFNO2DLayer- AFNO spectral convolution layer - Parameters:
- hidden_size (int) – Feature dimensionality 
- mod_features (int) – Number of modulation features 
- num_blocks (int, optional) – Number of blocks used in the block diagonal weight matrix, by default 8 
- sparsity_threshold (float, optional) – Sparsity threshold (softshrink) of spectral features, by default 0.01 
- hard_thresholding_fraction (float, optional) – Threshold for limiting number of modes used [0,1], by default 1 
- hidden_size_factor (int, optional) – Factor to increase spectral features by after weight multiplication, by default 1 
- scale_shift_kwargs (dict, optional) – Options to the MLP that computes the scale-shift parameters 
- scale_shift_mode (["complex", "real"]) – If ‘complex’ (default), compute the scale-shift operation using complex operations. If ‘real’, use real operations. 
 
 - forward(
- x: Tensor,
- mod_embed: Tensor,
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 
- class physicsnemo.models.afno.modafno.ModAFNOMlp(
- in_features: int,
- latent_features: int,
- out_features: int,
- mod_features: int,
- activation_fn: Module = GELU(approximate='none'),
- drop: float = 0.0,
- scale_shift_kwargs: dict | None = None,
- Bases: - AFNOMlp- Modulated MLP used inside ModAFNO - Parameters:
- in_features (int) – Input feature size 
- latent_features (int) – Latent feature size 
- out_features (int) – Output feature size 
- activation_fn (nn.Module, optional) – Activation function, by default nn.GELU 
- drop (float, optional) – Drop out rate, by default 0.0 
- scale_shift_kwargs (dict, optional) – Options to the MLP that computes the scale-shift parameters 
 
 - forward(
- x: Tensor,
- mod_embed: Tensor,
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 
- class physicsnemo.models.afno.modafno.ScaleShiftMlp(
- in_features: int,
- out_features: int,
- hidden_features: int | None = None,
- hidden_layers: int = 0,
- activation_fn: ~typing.Type[~torch.nn.modules.module.Module] = <class 'torch.nn.modules.activation.GELU'>,
- Bases: - Module- MLP used to compute the scale and shift parameters of the ModAFNO block - Parameters:
- in_features (int) – Input feature size 
- out_features (int) – Output feature size 
- hidden_features (int, optional) – Hidden feature size, defaults to 2 * out_features 
- hidden_layers (int, optional) – Number of hidden layers, defaults to 0 
- activation_fn (nn.Module, optional) – Activation function, by default nn.GELU 
 
 - forward(x: Tensor)[source]#
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 
Graph Neural Networks#
- class physicsnemo.models.meshgraphnet.meshgraphnet.MeshGraphNet(*args, **kwargs)[source]#
- Bases: - Module- MeshGraphNet network architecture - Parameters:
- input_dim_nodes (int) – Number of node features 
- input_dim_edges (int) – Number of edge features 
- output_dim (int) – Number of outputs 
- processor_size (int, optional) – Number of message passing blocks, by default 15 
- mlp_activation_fn (Union[str, List[str]], optional) – Activation function to use, by default ‘relu’ 
- num_layers_node_processor (int, optional) – Number of MLP layers for processing nodes in each message passing block, by default 2 
- num_layers_edge_processor (int, optional) – Number of MLP layers for processing edge features in each message passing block, by default 2 
- hidden_dim_processor (int, optional) – Hidden layer size for the message passing blocks, by default 128 
- hidden_dim_node_encoder (int, optional) – Hidden layer size for the node feature encoder, by default 128 
- num_layers_node_encoder (Union[int, None], optional) – Number of MLP layers for the node feature encoder, by default 2. If None is provided, the MLP will collapse to a Identity function, i.e. no node encoder 
- hidden_dim_edge_encoder (int, optional) – Hidden layer size for the edge feature encoder, by default 128 
- num_layers_edge_encoder (Union[int, None], optional) – Number of MLP layers for the edge feature encoder, by default 2. If None is provided, the MLP will collapse to a Identity function, i.e. no edge encoder 
- hidden_dim_node_decoder (int, optional) – Hidden layer size for the node feature decoder, by default 128 
- num_layers_node_decoder (Union[int, None], optional) – Number of MLP layers for the node feature decoder, by default 2. If None is provided, the MLP will collapse to a Identity function, i.e. no decoder 
- aggregation (str, optional) – Message aggregation type, by default “sum” 
- do_conat_trick (: bool, default=False) – Whether to replace concat+MLP with MLP+idx+sum 
- num_processor_checkpoint_segments (int, optional) – Number of processor segments for gradient checkpointing, by default 0 (checkpointing disabled) 
- checkpoint_offloading (bool, optional) – Whether to offload the checkpointing to the CPU, by default False 
 
 - Example - >>> # `norm_type` in MeshGraphNet is deprecated, >>> # TE will be automatically used if possible unless told otherwise. >>> # (You don't have to set this varialbe, it's faster to use TE!) >>> # Example of how to disable: >>> import os >>> os.environ['PHYSICSNEMO_FORCE_TE'] = 'False' >>> >>> model = physicsnemo.models.meshgraphnet.MeshGraphNet( ... input_dim_nodes=4, ... input_dim_edges=3, ... output_dim=2, ... ) >>> graph = dgl.rand_graph(10, 5) >>> node_features = torch.randn(10, 4) >>> edge_features = torch.randn(5, 3) >>> output = model(node_features, edge_features, graph) >>> output.size() torch.Size([10, 2]) - Note - Reference: Pfaff, Tobias, et al. “Learning mesh-based simulation with graph networks.” arXiv preprint arXiv:2010.03409 (2020). - forward(
- node_features: Tensor,
- edge_features: Tensor,
- graph: physicsnemo.models.gnn_layers.utils.GraphType,
- **kwargs,
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 
- class physicsnemo.models.meshgraphnet.meshgraphnet.MeshGraphNetProcessor(
- processor_size: int = 15,
- input_dim_node: int = 128,
- input_dim_edge: int = 128,
- num_layers_node: int = 2,
- num_layers_edge: int = 2,
- aggregation: str = 'sum',
- norm_type: str = 'LayerNorm',
- activation_fn: Module = ReLU(),
- do_concat_trick: bool = False,
- num_processor_checkpoint_segments: int = 0,
- checkpoint_offloading: bool = False,
- Bases: - Module- MeshGraphNet processor block - forward(
- node_features: Tensor,
- edge_features: Tensor,
- graph: physicsnemo.models.gnn_layers.utils.GraphType,
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 - run_function(
- segment_start: int,
- segment_end: int,
- Custom forward for gradient checkpointing - Parameters:
- segment_start (int) – Layer index as start of the segment 
- segment_end (int) – Layer index as end of the segment 
 
- Returns:
- Custom forward function 
- Return type:
- Callable 
 
 
- class physicsnemo.models.meshgraphnet.meshgraphnet.MetaData(
- name: str = 'MeshGraphNet',
- jit: bool = False,
- cuda_graphs: bool = False,
- amp: bool = False,
- amp_cpu: bool = False,
- amp_gpu: bool = True,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = None,
- onnx_cpu: bool = None,
- onnx_runtime: bool = False,
- trt: bool = False,
- var_dim: int = -1,
- func_torch: bool = True,
- auto_grad: bool = True,
- Bases: - ModelMetaData
- class physicsnemo.models.mesh_reduced.mesh_reduced.Mesh_Reduced(
- input_dim_nodes: int,
- input_dim_edges: int,
- output_decode_dim: int,
- output_encode_dim: int = 3,
- processor_size: int = 15,
- num_layers_node_processor: int = 2,
- num_layers_edge_processor: int = 2,
- hidden_dim_processor: int = 128,
- hidden_dim_node_encoder: int = 128,
- num_layers_node_encoder: int = 2,
- hidden_dim_edge_encoder: int = 128,
- num_layers_edge_encoder: int = 2,
- hidden_dim_node_decoder: int = 128,
- num_layers_node_decoder: int = 2,
- k: int = 3,
- aggregation: str = 'mean',
- Bases: - Module- PbGMR-GMUS architecture. - A mesh-reduced architecture that combines encoding and decoding processors for physics prediction in reduced mesh space. - Parameters:
- input_dim_nodes (int) – Number of node features. 
- input_dim_edges (int) – Number of edge features. 
- output_decode_dim (int) – Number of decoding outputs (per node). 
- output_encode_dim (int, optional) – Number of encoding outputs (per pivotal position), by default 3. 
- processor_size (int, optional) – Number of message passing blocks, by default 15. 
- num_layers_node_processor (int, optional) – Number of MLP layers for processing nodes in each message passing block, by default 2. 
- num_layers_edge_processor (int, optional) – Number of MLP layers for processing edge features in each message passing block, by default 2. 
- hidden_dim_processor (int, optional) – Hidden layer size for the message passing blocks, by default 128. 
- hidden_dim_node_encoder (int, optional) – Hidden layer size for the node feature encoder, by default 128. 
- num_layers_node_encoder (int, optional) – Number of MLP layers for the node feature encoder, by default 2. 
- hidden_dim_edge_encoder (int, optional) – Hidden layer size for the edge feature encoder, by default 128. 
- num_layers_edge_encoder (int, optional) – Number of MLP layers for the edge feature encoder, by default 2. 
- hidden_dim_node_decoder (int, optional) – Hidden layer size for the node feature decoder, by default 128. 
- num_layers_node_decoder (int, optional) – Number of MLP layers for the node feature decoder, by default 2. 
- k (int, optional) – Number of nodes considered for per pivotal position, by default 3. 
- aggregation (str, optional) – Message aggregation type, by default “mean”. 
 
 - Notes - Reference: Han, Xu, et al. “Predicting physics in mesh-reduced space with temporal attention.” arXiv preprint arXiv:2201.09113 (2022). - decode(
- x,
- edge_features,
- graph,
- position_mesh,
- position_pivotal,
- Decode pivotal features back to mesh space. - Parameters:
- x (torch.Tensor) – Input features in pivotal space. 
- edge_features (torch.Tensor) – Edge features. 
- graph (Union[DGLGraph, pyg.data.Data]) – Input graph. 
- position_mesh (torch.Tensor) – Mesh positions. 
- position_pivotal (torch.Tensor) – Pivotal positions. 
 
- Returns:
- Decoded features in mesh space. 
- Return type:
- torch.Tensor 
 
 - encode(
- x,
- edge_features,
- graph,
- position_mesh,
- position_pivotal,
- Encode mesh features to pivotal space. - Parameters:
- x (torch.Tensor) – Input node features. 
- edge_features (torch.Tensor) – Edge features. 
- graph (Union[DGLGraph, pyg.data.Data]) – Input graph. 
- position_mesh (torch.Tensor) – Mesh positions. 
- position_pivotal (torch.Tensor) – Pivotal positions. 
 
- Returns:
- Encoded features in pivotal space. 
- Return type:
- torch.Tensor 
 
 - knn_interpolate(
- x: Tensor,
- pos_x: Tensor,
- pos_y: Tensor,
- batch_x: Tensor = None,
- batch_y: Tensor = None,
- k: int = 3,
- num_workers: int = 1,
- Perform k-nearest neighbor interpolation. - Parameters:
- x (torch.Tensor) – Input features to interpolate. 
- pos_x (torch.Tensor) – Source positions. 
- pos_y (torch.Tensor) – Target positions. 
- batch_x (torch.Tensor, optional) – Batch indices for source positions, by default None. 
- batch_y (torch.Tensor, optional) – Batch indices for target positions, by default None. 
- k (int, optional) – Number of nearest neighbors to consider, by default 3. 
- num_workers (int, optional) – Number of workers for parallel processing, by default 1. 
 
- Returns:
- torch.Tensor – Interpolated features. 
- torch.Tensor – Source indices. 
- torch.Tensor – Target indices. 
- torch.Tensor – Interpolation weights. 
 
 
 
- class physicsnemo.models.meshgraphnet.bsms_mgn.BiStrideMeshGraphNet(*args, **kwargs)[source]#
- Bases: - MeshGraphNet- Bi-stride MeshGraphNet network architecture - Parameters:
- input_dim_nodes (int) – Number of node features 
- input_dim_edges (int) – Number of edge features 
- output_dim (int) – Number of outputs 
- processor_size (int, optional) – Number of message passing blocks, by default 15 
- mlp_activation_fn (Union[str, List[str]], optional) – Activation function to use, by default ‘relu’ 
- num_layers_node_processor (int, optional) – Number of MLP layers for processing nodes in each message passing block, by default 2 
- num_layers_edge_processor (int, optional) – Number of MLP layers for processing edge features in each message passing block, by default 2 
- hidden_dim_processor (int, optional) – Hidden layer size for the message passing blocks, by default 128 
- hidden_dim_node_encoder (int, optional) – Hidden layer size for the node feature encoder, by default 128 
- num_layers_node_encoder (Union[int, None], optional) – Number of MLP layers for the node feature encoder, by default 2. If None is provided, the MLP will collapse to a Identity function, i.e. no node encoder 
- hidden_dim_edge_encoder (int, optional) – Hidden layer size for the edge feature encoder, by default 128 
- num_layers_edge_encoder (Union[int, None], optional) – Number of MLP layers for the edge feature encoder, by default 2. If None is provided, the MLP will collapse to a Identity function, i.e. no edge encoder 
- hidden_dim_node_decoder (int, optional) – Hidden layer size for the node feature decoder, by default 128 
- num_layers_node_decoder (Union[int, None], optional) – Number of MLP layers for the node feature decoder, by default 2. If None is provided, the MLP will collapse to a Identity function, i.e. no decoder 
- aggregation (str, optional) – Message aggregation type, by default “sum” 
- do_conat_trick (: bool, default=False) – Whether to replace concat+MLP with MLP+idx+sum 
- num_processor_checkpoint_segments (int, optional) – Number of processor segments for gradient checkpointing, by default 0 (checkpointing disabled). The number of segments should be a factor of 2 * processor_size, for example, if processor_size is 15, then num_processor_checkpoint_segments can be 10 since it’s a factor of 15 * 2 = 30. It is recommended to start with a smaller number of segments until the model fits into memory since each segment will affect model training speed. 
 
 - forward(
- node_features: Tensor,
- edge_features: Tensor,
- graph: dgl.DGLGraph,
- ms_edges: Iterable[Tensor] = (),
- ms_ids: Iterable[Tensor] = (),
- **kwargs,
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 
- class physicsnemo.models.meshgraphnet.bsms_mgn.MetaData(
- name: str = 'BiStrideMeshGraphNet',
- jit: bool = False,
- cuda_graphs: bool = False,
- amp: bool = False,
- amp_cpu: bool = False,
- amp_gpu: bool = True,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = None,
- onnx_cpu: bool = None,
- onnx_runtime: bool = False,
- trt: bool = False,
- var_dim: int = -1,
- func_torch: bool = True,
- auto_grad: bool = True,
- Bases: - ModelMetaData
Convolutional Networks#
- class physicsnemo.models.pix2pix.pix2pix.MetaData(
- name: str = 'Pix2Pix',
- jit: bool = True,
- cuda_graphs: bool = True,
- amp: bool = False,
- amp_cpu: bool = False,
- amp_gpu: bool = True,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = True,
- onnx_gpu: bool = None,
- onnx_cpu: bool = None,
- onnx_runtime: bool = False,
- trt: bool = False,
- var_dim: int = 1,
- func_torch: bool = True,
- auto_grad: bool = True,
- Bases: - ModelMetaData
- class physicsnemo.models.pix2pix.pix2pix.Pix2Pix(*args, **kwargs)[source]#
- Bases: - Module- Convolutional encoder-decoder based on pix2pix generator models. - Note - The pix2pix architecture supports options for 1D, 2D and 3D fields which can be constroled using the dimension parameter. - Parameters:
- in_channels (int) – Number of input channels 
- out_channels (Union[int, Any], optional) – Number of output channels 
- dimension (int) – Model dimensionality (supports 1, 2, 3). 
- conv_layer_size (int, optional) – Latent channel size after first convolution, by default 64 
- n_downsampling (int, optional) – Number of downsampling blocks, by default 3 
- n_upsampling (int, optional) – Number of upsampling blocks, by default 3 
- n_blocks (int, optional) – Number of residual blocks in middle of model, by default 3 
- activation_fn (Any, optional) – Activation function, by default “relu” 
- batch_norm (bool, optional) – Batch normalization, by default False 
- padding_type (str, optional) – Padding type (‘reflect’, ‘replicate’ or ‘zero’), by default “reflect” 
 
 - Example - >>> #2D convolutional encoder decoder >>> model = physicsnemo.models.pix2pix.Pix2Pix( ... in_channels=1, ... out_channels=2, ... dimension=2, ... conv_layer_size=4) >>> input = torch.randn(4, 1, 32, 32) #(N, C, H, W) >>> output = model(input) >>> output.size() torch.Size([4, 2, 32, 32]) - Note - Reference: Isola, Phillip, et al. “Image-To-Image translation with conditional adversarial networks” Conference on Computer Vision and Pattern Recognition, 2017. https://arxiv.org/abs/1611.07004 - Reference: Wang, Ting-Chun, et al. “High-Resolution image synthesis and semantic manipulation with conditional GANs” Conference on Computer Vision and Pattern Recognition, 2018. https://arxiv.org/abs/1711.11585 - Note - Based on the implementation: NVIDIA/pix2pixHD - forward(input: Tensor) Tensor[source]#
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 
- class physicsnemo.models.pix2pix.pix2pix.ResnetBlock(
- dimension: int,
- channels: int,
- padding_type: str = 'reflect',
- activation: Module = ReLU(),
- use_batch_norm: bool = False,
- use_dropout: bool = False,
- Bases: - Module- A simple ResNet block - Parameters:
- dimension (int) – Model dimensionality (supports 1, 2, 3). 
- channels (int) – Number of feature channels 
- padding_type (str, optional) – Padding type (‘reflect’, ‘replicate’ or ‘zero’), by default “reflect” 
- activation (nn.Module, optional) – Activation function, by default nn.ReLU() 
- use_batch_norm (bool, optional) – Batch normalization, by default False 
 
 - forward(x: Tensor) Tensor[source]#
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 
- class physicsnemo.models.srrn.super_res_net.ConvolutionalBlock3d(
- in_channels: int,
- out_channels: int,
- kernel_size: int,
- stride: int = 1,
- batch_norm: bool = False,
- activation_fn: Module = Identity(),
- Bases: - Module- 3D convolutional block - Parameters:
- in_channels (int) – Input channels 
- out_channels (int) – Output channels 
- kernel_size (int) – Kernel size 
- stride (int, optional) – Convolutional stride, by default 1 
- batch_norm (bool, optional) – Use batchnorm, by default False 
 
 - forward(input: Tensor) Tensor[source]#
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 
- class physicsnemo.models.srrn.super_res_net.MetaData(
- name: str = 'SuperResolution',
- jit: bool = True,
- cuda_graphs: bool = False,
- amp: bool = False,
- amp_cpu: bool = False,
- amp_gpu: bool = False,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = True,
- onnx_gpu: bool = None,
- onnx_cpu: bool = None,
- onnx_runtime: bool = False,
- trt: bool = False,
- var_dim: int = 1,
- func_torch: bool = True,
- auto_grad: bool = True,
- Bases: - ModelMetaData
- class physicsnemo.models.srrn.super_res_net.PixelShuffle3d(scale: int)[source]#
- Bases: - Module- 3D pixel-shuffle operation - Parameters:
- scale (int) – Factor to downscale channel count by 
 - Note - forward(input: Tensor) Tensor[source]#
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 
- class physicsnemo.models.srrn.super_res_net.ResidualConvBlock3d(
- n_layers: int = 1,
- kernel_size: int = 3,
- conv_layer_size: int = 64,
- activation_fn: Module = Identity(),
- Bases: - Module- 3D ResNet block - Parameters:
- n_layers (int, optional) – Number of convolutional layers, by default 1 
- kernel_size (int, optional) – Kernel size, by default 3 
- conv_layer_size (int, optional) – Latent channel size, by default 64 
- activation_fn (nn.Module, optional) – Activation function, by default nn.Identity() 
 
 - forward(input: Tensor) Tensor[source]#
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 
- class physicsnemo.models.srrn.super_res_net.SRResNet(*args, **kwargs)[source]#
- Bases: - Module- 3D convolutional super-resolution network - Parameters:
- in_channels (int) – Number of input channels 
- out_channels (int) – Number of outout channels 
- large_kernel_size (int, optional) – convolutional kernel size for first and last convolution, by default 7 
- small_kernel_size (int, optional) – convolutional kernel size for internal convolutions, by default 3 
- conv_layer_size (int, optional) – Latent channel size, by default 32 
- n_resid_blocks (int, optional) – Number of residual blocks before , by default 8 
- scaling_factor (int, optional) – Scaling factor to increase the output feature size compared to the input (2, 4, or 8), by default 8 
- activation_fn (Any, optional) – Activation function, by default “prelu” 
 
 - Example - >>> #3D convolutional encoder decoder >>> model = physicsnemo.models.srrn.SRResNet( ... in_channels=1, ... out_channels=2, ... conv_layer_size=4, ... scaling_factor=2) >>> input = torch.randn(4, 1, 8, 8, 8) #(N, C, D, H, W) >>> output = model(input) >>> output.size() torch.Size([4, 2, 16, 16, 16]) - Note - Based on the implementation: sgrvinod/a-PyTorch-Tutorial-to-Super-Resolution - forward(in_vars: Tensor) Tensor[source]#
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 
- class physicsnemo.models.srrn.super_res_net.SubPixel_ConvolutionalBlock3d(
- kernel_size: int = 3,
- conv_layer_size: int = 64,
- scaling_factor: int = 2,
- Bases: - Module- Convolutional block with Pixel Shuffle operation - Parameters:
- kernel_size (int, optional) – Kernel size, by default 3 
- conv_layer_size (int, optional) – Latent channel size, by default 64 
- scaling_factor (int, optional) – Pixel shuffle scaling factor, by default 2 
 
 - forward(
- input: Tensor,
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 
Recurrent Neural Networks#
- class physicsnemo.models.rnn.rnn_one2many.MetaData(
- name: str = 'One2ManyRNN',
- jit: bool = False,
- cuda_graphs: bool = False,
- amp: bool = True,
- amp_cpu: bool = None,
- amp_gpu: bool = None,
- torch_fx: bool = True,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = None,
- onnx_cpu: bool = None,
- onnx_runtime: bool = False,
- trt: bool = False,
- var_dim: int = -1,
- func_torch: bool = False,
- auto_grad: bool = False,
- Bases: - ModelMetaData
- class physicsnemo.models.rnn.rnn_one2many.One2ManyRNN(*args, **kwargs)[source]#
- Bases: - Module- A RNN model with encoder/decoder for 2d/3d problems that provides predictions based on single initial condition. - Parameters:
- input_channels (int) – Number of channels in the input 
- dimension (int, optional) – Spatial dimension of the input. Only 2d and 3d are supported, by default 2 
- nr_latent_channels (int, optional) – Channels for encoding/decoding, by default 512 
- nr_residual_blocks (int, optional) – Number of residual blocks, by default 2 
- activation_fn (str, optional) – Activation function to use, by default “relu” 
- nr_downsamples (int, optional) – Number of downsamples, by default 2 
- nr_tsteps (int, optional) – Time steps to predict, by default 32 
 
 - Example - >>> model = physicsnemo.models.rnn.One2ManyRNN( ... input_channels=6, ... dimension=2, ... nr_latent_channels=32, ... activation_fn="relu", ... nr_downsamples=2, ... nr_tsteps=16, ... ) >>> input = invar = torch.randn(4, 6, 1, 16, 16) # [N, C, T, H, W] >>> output = model(input) >>> output.size() torch.Size([4, 6, 16, 16, 16]) - forward(x: Tensor) Tensor[source]#
- Forward pass - Parameters:
- x (Tensor) – Expects a tensor of size [N, C, 1, H, W] for 2D or [N, C, 1, D, H, W] for 3D Where, N is the batch size, C is the number of channels, 1 is the number of input timesteps and D, H, W are spatial dimensions. 
- Returns:
- Size [N, C, T, H, W] for 2D or [N, C, T, D, H, W] for 3D. Where, T is the number of timesteps being predicted. 
- Return type:
- Tensor 
 
 
- class physicsnemo.models.rnn.rnn_seq2seq.MetaData(
- name: str = 'Seq2SeqRNN',
- jit: bool = False,
- cuda_graphs: bool = False,
- amp: bool = True,
- amp_cpu: bool = None,
- amp_gpu: bool = None,
- torch_fx: bool = True,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = None,
- onnx_cpu: bool = None,
- onnx_runtime: bool = False,
- trt: bool = False,
- var_dim: int = -1,
- func_torch: bool = False,
- auto_grad: bool = False,
- Bases: - ModelMetaData
- class physicsnemo.models.rnn.rnn_seq2seq.Seq2SeqRNN(*args, **kwargs)[source]#
- Bases: - Module- A RNN model with encoder/decoder for 2d/3d problems. Given input 0 to t-1, predicts signal t to t + nr_tsteps - Parameters:
- input_channels (int) – Number of channels in the input 
- dimension (int, optional) – Spatial dimension of the input. Only 2d and 3d are supported, by default 2 
- nr_latent_channels (int, optional) – Channels for encoding/decoding, by default 512 
- nr_residual_blocks (int, optional) – Number of residual blocks, by default 2 
- activation_fn (str, optional) – Activation function to use, by default “relu” 
- nr_downsamples (int, optional) – Number of downsamples, by default 2 
- nr_tsteps (int, optional) – Time steps to predict, by default 32 
 
 - Example - >>> model = physicsnemo.models.rnn.Seq2SeqRNN( ... input_channels=6, ... dimension=2, ... nr_latent_channels=32, ... activation_fn="relu", ... nr_downsamples=2, ... nr_tsteps=16, ... ) >>> input = invar = torch.randn(4, 6, 16, 16, 16) # [N, C, T, H, W] >>> output = model(input) >>> output.size() torch.Size([4, 6, 16, 16, 16]) - forward(x: Tensor) Tensor[source]#
- Forward pass - Parameters:
- x (Tensor) – Expects a tensor of size [N, C, T, H, W] for 2D or [N, C, T, D, H, W] for 3D Where, N is the batch size, C is the number of channels, T is the number of input timesteps and D, H, W are spatial dimensions. Currently, this requires input time steps to be same as predicted time steps. 
- Returns:
- Size [N, C, T, H, W] for 2D or [N, C, T, D, H, W] for 3D. Where, T is the number of timesteps being predicted. 
- Return type:
- Tensor 
 
 
Operator Models#
This code contains the DoMINO model architecture. The DoMINO class contains an architecture to model both surface and volume quantities together as well as separately (controlled using the config.yaml file)
- class physicsnemo.models.domino.model.AggregationModel(
- input_features: int,
- output_features: int,
- model_parameters=None,
- new_change: bool = True,
- Bases: - Module- Neural network module to aggregate local geometry encoding with basis functions. - This module combines basis function representations with geometry encodings to predict the final output quantities. It serves as the final prediction layer that integrates all available information sources. - forward(x: Tensor) Tensor[source]#
- Process the combined input features to predict output quantities. - This method applies a series of fully connected layers to the input, which typically contains a combination of basis functions, geometry encodings, and potentially parameter encodings. - Parameters:
- x – Input tensor containing combined features 
- Returns:
- Tensor containing predicted output quantities 
 
 
- class physicsnemo.models.domino.model.BQWarp(
- grid_resolution=None,
- radius: float = 0.25,
- neighbors_in_radius: int = 10,
- Bases: - Module- Warp-based ball-query layer for finding neighboring points within a specified radius. - This layer uses an accelerated ball query implementation to efficiently find points within a specified radius of query points. - forward(
- x: Tensor,
- p_grid: Tensor,
- reverse_mapping: bool = True,
- Performs ball query operation to find neighboring points and their features. - This method uses the Warp-accelerated ball query implementation to find points within a specified radius. It can operate in two modes: - Forward mapping: Find points from x that are near p_grid points (reverse_mapping=False) - Reverse mapping: Find points from p_grid that are near x points (reverse_mapping=True) - Parameters:
- x – Tensor of shape (batch_size, num_points, 3+features) containing point coordinates and their features 
- p_grid – Tensor of shape (batch_size, grid_x, grid_y, grid_z, 3) containing grid point coordinates 
- reverse_mapping – Boolean flag to control the direction of the mapping: - True: Find p_grid points near x points - False: Find x points near p_grid points 
 
- Returns:
- mapping: Tensor containing indices of neighboring points 
- outputs: Tensor containing coordinates of the neighboring points 
 
- Return type:
- tuple containing 
 
 
- class physicsnemo.models.domino.model.DoMINO(
- input_features: int,
- output_features_vol: int | None = None,
- output_features_surf: int | None = None,
- global_features: int = 2,
- model_parameters=None,
- Bases: - Module- DoMINO model architecture for predicting both surface and volume quantities. - The DoMINO (Deep Operational Modal Identification and Nonlinear Optimization) model is designed to model both surface and volume physical quantities in aerodynamic simulations. It can operate in three modes: 1. Surface-only: Predicting only surface quantities 2. Volume-only: Predicting only volume quantities 3. Combined: Predicting both surface and volume quantities - The model uses a combination of: - Geometry representation modules - Neural network basis functions - Parameter encoding - Local and global geometry processing - Aggregation models for final prediction - Parameters:
- input_features (int) – Number of point input features 
- output_features_vol (int, optional) – Number of output features in volume 
- output_features_surf (int, optional) – Number of output features on surface 
- model_parameters – Model parameters controlled by config.yaml 
 
 - Example - >>> from physicsnemo.models.domino.model import DoMINO >>> import torch, os >>> from hydra import compose, initialize >>> from omegaconf import OmegaConf >>> device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") >>> cfg = OmegaConf.register_new_resolver("eval", eval) >>> with initialize(version_base="1.3", config_path="examples/cfd/external_aerodynamics/domino/src/conf"): ... cfg = compose(config_name="config") >>> cfg.model.model_type = "combined" >>> model = DoMINO( ... input_features=3, ... output_features_vol=5, ... output_features_surf=4, ... model_parameters=cfg.model ... ).to(device) - Warp … >>> bsize = 1 >>> nx, ny, nz = cfg.model.interp_res >>> num_neigh = 7 >>> global_features = 2 >>> pos_normals_closest_vol = torch.randn(bsize, 100, 3).to(device) >>> pos_normals_com_vol = torch.randn(bsize, 100, 3).to(device) >>> pos_normals_com_surface = torch.randn(bsize, 100, 3).to(device) >>> geom_centers = torch.randn(bsize, 100, 3).to(device) >>> grid = torch.randn(bsize, nx, ny, nz, 3).to(device) >>> surf_grid = torch.randn(bsize, nx, ny, nz, 3).to(device) >>> sdf_grid = torch.randn(bsize, nx, ny, nz).to(device) >>> sdf_surf_grid = torch.randn(bsize, nx, ny, nz).to(device) >>> sdf_nodes = torch.randn(bsize, 100, 1).to(device) >>> surface_coordinates = torch.randn(bsize, 100, 3).to(device) >>> surface_neighbors = torch.randn(bsize, 100, num_neigh, 3).to(device) >>> surface_normals = torch.randn(bsize, 100, 3).to(device) >>> surface_neighbors_normals = torch.randn(bsize, 100, num_neigh, 3).to(device) >>> surface_sizes = torch.rand(bsize, 100).to(device) + 1e-6 # Note this needs to be > 0.0 >>> surface_neighbors_areas = torch.rand(bsize, 100, num_neigh).to(device) + 1e-6 >>> volume_coordinates = torch.randn(bsize, 100, 3).to(device) >>> vol_grid_max_min = torch.randn(bsize, 2, 3).to(device) >>> surf_grid_max_min = torch.randn(bsize, 2, 3).to(device) >>> global_params_values = torch.randn(bsize, global_features, 1).to(device) >>> global_params_reference = torch.randn(bsize, global_features, 1).to(device) >>> input_dict = { … “pos_volume_closest”: pos_normals_closest_vol, … “pos_volume_center_of_mass”: pos_normals_com_vol, … “pos_surface_center_of_mass”: pos_normals_com_surface, … “geometry_coordinates”: geom_centers, … “grid”: grid, … “surf_grid”: surf_grid, … “sdf_grid”: sdf_grid, … “sdf_surf_grid”: sdf_surf_grid, … “sdf_nodes”: sdf_nodes, … “surface_mesh_centers”: surface_coordinates, … “surface_mesh_neighbors”: surface_neighbors, … “surface_normals”: surface_normals, … “surface_neighbors_normals”: surface_neighbors_normals, … “surface_areas”: surface_sizes, … “surface_neighbors_areas”: surface_neighbors_areas, … “volume_mesh_centers”: volume_coordinates, … “volume_min_max”: vol_grid_max_min, … “surface_min_max”: surf_grid_max_min, … “global_params_reference”: global_params_values, … “global_params_values”: global_params_reference, … } >>> output = model(input_dict) >>> print(f”{output[0].shape}, {output[1].shape}”) torch.Size([1, 100, 5]), torch.Size([1, 100, 4]) - calculate_solution(
- volume_mesh_centers,
- encoding_g,
- encoding_node,
- global_params_values,
- global_params_reference,
- eval_mode,
- num_sample_points=20,
- noise_intensity=50,
- return_volume_neighbors=False,
- Function to approximate solution sampling the neighborhood information 
 - calculate_solution_with_neighbors(
- surface_mesh_centers,
- encoding_g,
- encoding_node,
- surface_mesh_neighbors,
- surface_normals,
- surface_neighbors_normals,
- surface_areas,
- surface_neighbors_areas,
- global_params_values,
- global_params_reference,
- num_sample_points=7,
- Function to approximate solution given the neighborhood information 
 - forward(data_dict, return_volume_neighbors=False)[source]#
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 - geo_encoding_local(
- encoding_g,
- volume_mesh_centers,
- p_grid,
- mode='volume',
- Function to calculate local geometry encoding from global encoding 
 - position_encoder(
- encoding_node: Tensor,
- eval_mode: Literal['surface', 'volume'] = 'volume',
- Compute positional encoding for input points. - Parameters:
- encoding_node – Tensor containing node position information 
- eval_mode – Mode of evaluation, either “volume” or “surface” 
 
- Returns:
- Tensor containing positional encoding features 
 
 - sample_sphere(center, r, num_points)[source]#
- Uniformly sample points in a 3D sphere around the center. - This method generates random points within a sphere of radius r centered at each point in the input tensor. The sampling is uniform in volume, meaning points are more likely to be sampled in the outer regions of the sphere. - Parameters:
- center – Tensor of shape (batch_size, num_points, 3) containing center coordinates 
- r – Radius of the sphere for sampling 
- num_points – Number of points to sample per center 
 
- Returns:
- Tensor of shape (batch_size, num_points, num_samples, 3) containing the sampled points around each center 
 
 - sample_sphere_shell(center, r_inner, r_outer, num_points)[source]#
- Uniformly sample points in a 3D spherical shell around a center. - This method generates random points within a spherical shell (annulus) between inner radius r_inner and outer radius r_outer centered at each point in the input tensor. The sampling is uniform in volume within the shell. - Parameters:
- center – Tensor of shape (batch_size, num_points, 3) containing center coordinates 
- r_inner – Inner radius of the spherical shell 
- r_outer – Outer radius of the spherical shell 
- num_points – Number of points to sample per center 
 
- Returns:
- Tensor of shape (batch_size, num_points, num_samples, 3) containing the sampled points within the spherical shell around each center 
 
 
- class physicsnemo.models.domino.model.GeoConvOut(
- input_features: int,
- model_parameters,
- grid_resolution=None,
- Bases: - Module- Geometry layer to project STL geometry data onto regular grids. - forward(
- x: Tensor,
- grid: Tensor,
- radius: float = 0.025,
- neighbors_in_radius: int = 10,
- Process and project geometric features onto a 3D grid. - Parameters:
- x – Input tensor containing coordinates of the neighboring points (batch_size, nx*ny*nz, 3, n_points) 
- grid – Input tensor represented as a grid of shape (batch_size, nx, ny, nz, 3) 
 
- Returns:
- Processed geometry features of shape (batch_size, base_neurons_in, nx, ny, nz) 
 
 
- class physicsnemo.models.domino.model.GeoProcessor(
- input_filters: int,
- output_filters: int,
- model_parameters,
- Bases: - Module- Geometry processing layer using CNNs - forward(x: Tensor) Tensor[source]#
- Process geometry information through the 3D CNN network. - The network follows an encoder-decoder architecture with skip connections: 1. Downsampling path (encoder) with three levels of max pooling 2. Processing loop in the bottleneck 3. Upsampling path (decoder) with skip connections from the encoder - Parameters:
- x – Input tensor containing grid-represented geometry of shape (batch_size, input_filters, nx, ny, nz) 
- Returns:
- Processed geometry features of shape (batch_size, 1, nx, ny, nz) 
 
 
- class physicsnemo.models.domino.model.GeometryRep(
- input_features: int,
- radii: Sequence[float],
- neighbors_in_radius,
- hops=1,
- model_parameters=None,
- Bases: - Module- Geometry representation module that processes STL geometry data. - This module constructs a multiscale representation of geometry by: 1. Computing multi-scale geometry encoding for local and global context 2. Processing signed distance field (SDF) data for surface information - The combined encoding enables the model to reason about both local and global geometric properties. - forward(
- x: Tensor,
- p_grid: Tensor,
- sdf: Tensor,
- Process geometry data to create a comprehensive representation. - This method combines short-range, long-range, and SDF-based geometry encodings to create a rich representation of the geometry. - Parameters:
- x – Input tensor containing geometric point data 
- p_grid – Grid points for sampling 
- sdf – Signed distance field tensor 
 
- Returns:
- Comprehensive geometry encoding that concatenates short-range, SDF-based, and long-range features 
 
 
- class physicsnemo.models.domino.model.LocalPointConv(
- input_features,
- base_layer,
- output_features,
- model_parameters=None,
- Bases: - Module- Layer for local geometry point kernel - forward(x)[source]#
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 
- class physicsnemo.models.domino.model.NNBasisFunctions(input_features: int, model_parameters=None)[source]#
- Bases: - Module- Basis function layer for point clouds 
- class physicsnemo.models.domino.model.ParameterModel(input_features: int, model_parameters=None)[source]#
- Bases: - Module- Neural network module to encode simulation parameters. - This module encodes physical global parameters into a learned latent representation that can be incorporated into the model’sprediction process. 
- class physicsnemo.models.domino.model.PositionEncoder(input_features: int, model_parameters=None)[source]#
- Bases: - Module- Positional encoding of point clouds 
- physicsnemo.models.domino.model.calculate_pos_encoding(nx, d=8)[source]#
- Function to caluculate positional encoding 
- physicsnemo.models.domino.model.fourier_encode(coords, num_freqs)[source]#
- Function to caluculate fourier features 
- physicsnemo.models.domino.model.fourier_encode_vectorized(coords, freqs)[source]#
- Vectorized Fourier feature encoding 
- physicsnemo.models.domino.model.get_activation(
- activation: Literal['relu', 'gelu'],
- Return a PyTorch activation function corresponding to the given name. 
- physicsnemo.models.domino.model.scale_sdf(sdf: Tensor) Tensor[source]#
- Scale a signed distance function (SDF) to emphasize surface regions. - This function applies a non-linear scaling to the SDF values that compresses the range while preserving the sign, effectively giving more weight to points near surfaces where abs(SDF) is small. - Parameters:
- sdf – Tensor containing signed distance function values 
- Returns:
- Tensor with scaled SDF values in range [-1, 1] 
 
Diffusion Models#
PhysicsNeMo diffusion library provides three categories of models, that serve
different purposes. All models are based on the
Module class.
- Model backbones:
Those are highly configurable architectures that can be used as a building block for more complex models.
- Specialized architectures:
Those are models that usually inherit from the model backbones, with some specific additional functionalities.
- Application-specific interfaces:
These Modules are not truly architectures, but rather wrappers around the model backbones or specialized architectures. Their intent is to provide a more user-friendly interface for specific applications.
In addition of these model architectures, PhysicsNeMo provides diffusion preconditioners, which are essentially wrappers around model architectures, that rescale the inputs and outputs of diffusion models to improve their performance.
Architecture Backbones#
Diffusion model backbones are highly configurable architectures that can be used
as a building block for more complex models. Backbones support
both conditional and unconditional modeling. Currently, there are two provided
backbones: the SongUNet, as implemented in the
SongUNet class and the DhariwalUNet,
as implemented in the DhariwalUNet
class. These models were introduced in the papers Score-based generative modeling through stochastic
differential equations, Song et al. and
Diffusion models beat gans on image synthesis, Dhariwal et al..
The PhysicsNeMo implementation of these models follows closely that used in the paper
Elucidating the Design Space of Diffusion-Based Generative Models, Karras et al.. The original implementation of these
models can be found in the EDM repository.
Model backbones can be used as is, such as in in the StormCast example, but they can also be used as a base class for more complex models.
One of the most common diffusion backbones for image generation is the
SongUNet
class. Its latent state \(\mathbf{x}\) is a tensor of shape \((B, C, H, W)\),
where \(B\) is the batch size, \(C\) is the number of channels,
and \(H\) and \(W\) are the height and width of the feature map. The
model is conditional on the noise level, and can additionally be conditioned on
vector-valued class labels and/or images. The model is organized into levels,
whose number is determined by len(channel_mult), and each level operates at half the resolution of the
previous level (odd resolutions are rounded down). Each level is composed of a sequence of UNet blocks, that optionally contain
self-attention layers, as controlled by the attn_resolutions parameter. The feature map resolution
is halved at the first block of each level and then remains constant within the level.
Here we start by creating a SongUNet model with 3 levels, that applies self-attention
at levels 1 and 2. The model is unconditional, i.e. it is not conditioned on any
class labels or images (but is still conditional on the noise level, as it is
standard practice for diffusion models).
import torch
from physicsnemo.models.diffusion import SongUNet
B, C_x, res = 3, 6, 40   # Batch size, channels, and resolution of the latent state
model = SongUNet(
    img_resolution=res,
    in_channels=C_x,
    out_channels=C_x,  # No conditioning on image: number of output channels is the same as the input channels
    label_dim=0,  # No conditioning on vector-valued class labels
    augment_dim=0,
    model_channels=64,
    channel_mult=[1, 2, 3],  # 3-levels UNet with 64, 128, and 192 channels at each level, respectively
    num_blocks=4,  # 4 UNet blocks at each level
    attn_resolutions=[20, 10],  # Attention is applied at level 1 (resolution 20x20) and level 2 (resolution 10x10)
)
x = torch.randn(B, C_x, res, res)  # Latent state
noise_labels = torch.randn(B)  # Noise level for each sample
# The feature map resolution is 40 at level 0, 20 at level 1, and 10 at level 2
out = model(x, noise_labels, None)
print(out.shape)  # Shape: (B, C_x, res, res), same as the latent state
# The same model can be used on images of different resolution
# Note: the attention is still applied at levels 1 and 2
x_32 = torch.randn(B, C_x, 32, 32)  # Lower resolution latent state
out_32 = model(x_32, noise_labels, None)  # None means no conditioning on class labels
print(out_32.shape)  # Shape: (B, C_x, 32, 32), same as the latent state
The unconditional SongUNet can be extended to be conditional on class labels and/or
images. Conditioning on images is performed by channel-wise concatenation of the image
to the latent state \(\mathbf{x}\) before passing it to the model. The model does not perform
conditioning on images internally, and this operation is left to the user. For
conditioning on class labels (or any vector-valued quantity whose dimension is label_dim),
the model internally generates embeddings for the class labels
and adds them to intermediate activations within the UNet blocks. Here we
extend the previous example to be conditional on a 16-dimensional vector-valued
class label and a 3-channel image.
import torch
from physicsnemo.models.diffusion import SongUNet
B, C_x, res = 3, 10, 40
C_cond = 3
model = SongUNet(
    img_resolution=res,
    in_channels=C_x + C_cond,  # Conditioning on an image with C_cond channels
    out_channels=C_x,  # Output channels: only those of the latent state
    label_dim=16,  # Conditioning on 16-dimensional vector-valued class labels
    augment_dim=0,
    model_channels=64,
    channel_mult=[1, 2, 2],
    num_blocks=4,
    attn_resolutions=[20, 10],
)
x = torch.randn(B, C_x, res, res)  # Latent state
cond = torch.randn(B, C_cond, res, res)  # Conditioning image
x_cond = torch.cat([x, cond], dim=1)  # Channel-wise concatenation of the conditioning image before passing to the model
noise_labels = torch.randn(B)
class_labels = torch.randn(B, 16)  # Conditioning on vector-valued class labels
out = model(x_cond, noise_labels, class_labels)
print(out.shape)  # Shape: (B, C_x, res, res), same as the latent state
Specialized Architectures#
Note that even though backbones can be used as is, some of the examples in
PhysicsNeMo examples use specialized architectures. These specialized architectures
typically inherit from the backbones and implement additional functionalities for specific
applications. For example the CorrDiff example
uses the specialized architectures SongUNetPosEmbd
and SongUNetPosLtEmbd to implement
the diffusion model.
Positional embeddings#
Multi-diffusion (also called patch-based diffusion) is a technique to scale
diffusion models to large domains. The idea is to split the full domain into
patches, and run a diffusion model on each patch in parallel. The generated
patches are then fused back to form the final image. This technique is
particularly useful for domains that are too large to fit into the memory of
a single GPU. The CorrDiff example
uses patch-based diffusion for weather downscaling on large domains. A key
ingredient in the implementation of patch-based diffusion is the use of a
global spatial grid, that is used to inform each patch with their respective
position in the full domain. The SongUNetPosEmbd
class implements this functionality by providing multiple methods to encode
global spatial coordinates of the pixels into a global positional embedding grid.
In addition of multi-diffusion, spatial positional embeddings have also been
observed to improve the quality of the generated images, even for diffusion models
that operate on the full domain.
The following example shows how to use the specialized architecture
SongUNetPosEmbd to implement a
multi-diffusion model. First, we create a SongUNetPosEmbd model similar to
the one in the conditional SongUnet example
with a global positional embedding grid of shape (C_pos_emb, res, res). We
show that the model can be used with the entire latent state (full domain).
import torch
from physicsnemo.models.diffusion import SongUNetPosEmbd
B, C_x, res = 3, 10, 40
C_cond = 3
C_PE = 8  # Number of channels in the positional embedding grid
# Create a SongUNet with a global positional embedding grid of shape (C_PE, res, res)
model = SongUNetPosEmbd(
    img_resolution=res,  # Define the resolution of the global positional embedding grid
    in_channels=C_x + C_cond + C_PE,  # in_channels must include the number of channels in the positional embedding grid
    out_channels=C_x,
    label_dim=16,
    augment_dim=0,
    model_channels=64,
    channel_mult=[1, 2, 2],
    num_blocks=4,
    attn_resolutions=[20, 10],
    gridtype="learnable",  # Use a learnable grid of positional embeddings
    N_grid_channels=C_PE  # Number of channels in the positional embedding grid
)
# Can pass the entire latent state to the model
x_global = torch.randn(B, C_x, res, res)  # Entire latent state
cond = torch.randn(B, C_cond, res, res)  # Conditioning image
x_cond = torch.cat([x_global, cond], dim=1)  # Latent state with conditioning image
noise_labels = torch.randn(B)
class_labels = torch.randn(B, 16)
# The model internally concatenates the global positional embedding grid to the
# input x_cond before the first UNet block.
# Note: global_index=None means use the entire positional embedding grid
out = model(x_cond, noise_labels, class_labels, global_index=None)
print(out.shape)  # Shape: (B, C_x, res, res), same as the latent state
Now we show that the model can be used on local patches of the latent state
(multi-diffusion approach). We manually extract 3 patches from the latent
state. Patches are treated as individual samples, so they are concatenated along
the batch dimension. We also create a global grid of indices grid that
contains the indices of the pixels in the full domain, and we exctract the same
3 patches from the global grid and pass them to the global_index
parameter. The model internally uses global_index to extract the corresponding
patches from the positional embedding grid and concatenate them to the input
x_cond_patches before the first UNet block. Note that conditional
multi-diffusion still requires each patch to be conditioned on the entire
conditioning image cond, which is why we interpolate the conditioning image
to the patch resolution and concatenate it to each individual patch.
In practice it is not necessary to manually extract the patches from the latent
state and the global grid, as PhysicsNeMo provides utilities to help with the
patching operations, in patching. For an example of how
to use these utilities, see the CorrDiff example.
# Can pass local patches to the model
# Create batch of 3 patches from `x_global` with resolution 16x16
pres = 16  # Patch resolution
p1 = x_global[0:1, :, :pres, :pres]  # Patch 1
p2 = x_global[3:4, :, pres:2*pres, pres:2*pres]  # Patch 2
p3 = x_global[1:2, :, -pres:, pres:2*pres]  # Patch 3
patches = torch.cat([p1, p2, p3], dim=0)  # Batch of 3 patches
# Note: the conditioning image needs interpolation (or other operations) to
# match the patch resolution
cond1 = torch.nn.functional.interpolate(cond[0:1], size=(pres, pres), mode="bilinear")
cond2 = torch.nn.functional.interpolate(cond[3:4], size=(pres, pres), mode="bilinear")
cond3 = torch.nn.functional.interpolate(cond[1:2], size=(pres, pres), mode="bilinear")
cond_patches = torch.cat([cond1, cond2, cond3], dim=0)
# Concatenate the patches and the conditioning image
x_cond_patches = torch.cat([patches, cond_patches], dim=1)
# Create corresponding global indices for the patches
Ny, Nx = torch.arange(res).int(), torch.arange(res).int()
grid = torch.stack(torch.meshgrid(Ny, Nx, indexing="ij"), dim=0)
idx_patch1 = grid[:, :pres, :pres]  # Global indices for patch 1
idx_patch2 = grid[:, pres:2*pres, pres:2*pres]  # Global indices for patch 2
idx_patch3 = grid[:, -pres:, pres:2*pres]  # Global indices for patch 3
global_index = torch.stack([idx_patch1, idx_patch2, idx_patch3], dim=0)
# The model internally extracts the corresponding patches from the global
# positional embedding grid and concatenates them to the input x_cond_patches
# before the first UNet block.
out = model(x_cond_patches, noise_labels, class_labels, global_index=global_index)
print(out.shape)  # Shape: (3, C_x, pres, pres), same as the patches extracted from the latent state
Lead-time aware models#
In many diffusion applications, the latent state is time-dependent, and the diffusion process should account for the time-dependence of the latent state. For instance, a forecast model could provide latent states \(\mathbf{x}(T)\) (current time), \(\mathbf{x}(T + \Delta t)\) (one time step forward), …, up to \(\mathbf{x}(T + K \Delta t)\) (K time steps forward). Such prediction horizons are called lead-times (a term adopted from the weather and climate forecasting community) and we want to apply diffusion to each of these latent states while accounting for their associated lead-time information.
PhysicsNeMo provides a specialized architecture
SongUNetPosLtEmbd that implements
lead-time aware models. This is an extension of the
SongUNetPosEmbd class, and
additionally supports lead-time information. In its forward pass, the model
uses the lead_time_label parameter to internally retrieve the associated
lead-time embeddings; it then conditions the diffusion process on those with a
channel-wise concatenation to the latent-state before the first UNet block.
Here we show an example extending the previous ones with lead-time information.
We assume that we have a batch of 3 latent states at times \(T + 2 \Delta t\)
(2 time intervals forward), \(T + 0 \Delta t\) (current time),
and \(T + \Delta t\) (1 time interval forward). The associated lead-time labels are
[2, 0, 1]. In addition, the SongUNetPosLtEmbd model has the ability to
predict probabilities for some channels of the latent state, specified by the
prob_channels parameter. Here we assume that channels 1 and 3 are
probability (i.e. classification) outputs, while other channels are regression
outputs.
import torch
from physicsnemo.models.diffusion import SongUNetPosLtEmbd
B, C_x, res = 3, 10, 40
C_cond = 3
C_PE = 8
lead_time_steps = 3  # Maximum supported lead-time is 2 * dt
C_LT = 6  # 6 channels for each lead-time embeddings
# Create a SongUNet with a lead-time embedding grid of shape
# (lead_time_steps, C_lt_emb, res, res)
model = SongUNetPosLtEmbd(
    img_resolution=res,
    in_channels=C_x + C_cond + C_PE + C_LT,  # in_channels must include the number of channels in lead-time grid
    out_channels=C_x,
    label_dim=16,
    augment_dim=0,
    model_channels=64,
    channel_mult=[1, 2, 2],
    num_blocks=4,
    attn_resolutions=[10, 5],
    gridtype="learnable",
    N_grid_channels=C_PE,
    lead_time_channels=C_LT,
    lead_time_steps=lead_time_steps,  # Maximum supported lead-time horizon
    prob_channels=[1, 3],  # Channels 1 and 3 fromn the latent state are probability outputs
)
x = torch.randn(B, C_x, res, res)  # Latent state at times T+2*dt, T+0*dt, and T + 1*dt
cond = torch.randn(B, C_cond, res, res)
x_cond = torch.cat([x, cond], dim=1)
noise_labels = torch.randn(B)
class_labels = torch.randn(B, 16)
lead_time_label = torch.tensor([2, 0, 1])  # Lead-time labels for each sample
# The model internally extracts the lead-time embeddings corresponding to the
# lead-time labels 2, 0, 1 and concatenates them to the input x_cond before the first
# UNet block. In training mode, the model outputs logits for channels 1 and 3.
out = model(x_cond, noise_labels, class_labels, lead_time_label=lead_time_label)
print(out.shape)  # Shape: (B, C_x, res, res), same as the latent state
# If eval mode the model outputs probabilities for channels 1 and 3
model.eval()
out = model(x_cond, noise_labels, class_labels, lead_time_label=lead_time_label)
Note
The SongUNetPosLtEmbd is not an autoregressive model that performs a rollout
to produce future predictions. From the point of view of the SongUNetPosLtEmbd,
the lead-time information is frozen. The lead-time dependent latent state \(\mathbf{x}\)
might however be produced by such an autoregressive/rollout model.
Note
The SongUNetPosLtEmbd model cannot be scaled to very long lead-time
horizons (controlled by the lead_time_steps parameter). This is because
the lead-time embeddings are represented by a grid of learnable parameters of
shape (lead_time_steps, C_LT, res, res). For very long lead-time, the
size of this grid of embeddings becomes prohibitively large.
Note
In a given input batch x, the associated lead-times might be not necessarily
consecutive or in order. The do not even need to originate from the same forecast
trajectory. For example, the lead-time labels might be [0, 1, 2] instead of [2, 0, 1],
or even [2, 2, 1].
Application-specific Interfaces#
Application-specific interfaces are not true architectures, but rather wrappers
around the model backbones or specialized architectures that provide a more
user-friendly interface for specific applications. Note that not all these
classes are true diffusion models, but can also be used in conjunction with
diffusion models. For instance, the CorrDiff example in
CorrDiff example uses the UNet
class to implement a regression model.
- class physicsnemo.models.diffusion.song_unet.SongUNet(*args, **kwargs)[source]#
- Bases: - Module- This architecture is a diffusion backbone for 2D image generation. It is a reimplementation of the DDPM++ and NCSN++ architectures, which are U-Net variants with optional self-attention, embeddings, and encoder-decoder components. - This model supports conditional and unconditional setups, as well as several options for various internal architectural choices such as encoder and decoder type, embedding type, etc., making it flexible and adaptable to different tasks and configurations. - This architecture supports conditioning on the noise level (called noise labels), as well as on additional vector-valued labels (called class labels) and (optional) vector-valued augmentation labels. The conditioning mechanism relies on addition of the conditioning embeddings in the U-Net blocks of the encoder. To condition on images, the simplest mechanism is to concatenate the image to the input before passing it to the SongUNet. - The model first applies a mapping operation to generate embeddings for all the conditioning inputs (the noise level, the class labels, and the optional augmentation labels). - Then, at each level in the U-Net encoder, a sequence of blocks is applied: - A first block downsamples the feature map resolution by a factor of 2 (odd resolutions are floored). This block does not change the number of channels. 
- A sequence of - num_blocksU-Net blocks are applied, each with a different number of channels. These blocks do not change the feature map resolution, but they multiply the number of channels by a factor specified in- channel_mult. If required, the U-Net blocks also apply self-attention at the specified resolutions.
- At the end of the level, the feature map is cached to be used in a skip connection in the decoder. 
 - The decoder is a mirror of the encoder, with the same number of levels and the same number of blocks per level. It multiplies the feature map resolution by a factor of 2 at each level. - Parameters:
- img_resolution (Union[List[int, int], int]) – - The resolution of the input/output image. Can be a single int \(H\) for square images or a list \([H, W]\) for rectangular images. - Note: This parameter is only used as a convenience to build the network. In practice, the model can still be used with images of different resolutions. The only exception to this rule is when - additive_pos_embedis True, in which case the resolution of the latent state \(\mathbf{x}\) must match- img_resolution.
- in_channels (int) – Number of channels \(C_{in}\) in the input image. May include channels from both the latent state and additional channels when conditioning on images. For an unconditional model, this should be equal to - out_channels.
- out_channels (int) – Number of channels \(C_{out}\) in the output image. Should be equal to the number of channels \(C_{\mathbf{x}}\) in the latent state. 
- label_dim (int, optional, default=0) – Dimension of the vector-valued - class_labelsconditioning; 0 indicates no conditioning on class labels.
- augment_dim (int, optional, default=0) – Dimension of the vector-valued augment_labels conditioning; 0 means no conditioning on augmentation labels. 
- model_channels (int, optional, default=128) – Base multiplier for the number of channels accross the entire network. 
- channel_mult (List[int], optional, default=[1, 2, 2, 2]) – Multipliers for the number of channels at every level in the encoder and decoder. The length of - channel_multdetermines the number of levels in the U-Net. At level- i, the number of channel in the feature map is- channel_mult[i] * model_channels.
- channel_mult_emb (int, optional, default=4) – Multiplier for the number of channels in the embedding vector. The embedding vector has - model_channels * channel_mult_embchannels.
- num_blocks (int, optional, default=4) – Number of U-Net blocks at each level. 
- attn_resolutions (List[int], optional, default=[16]) – Resolutions of the levels at which self-attention layers are applied. Note that the feature map resolution must match exactly the value provided in attn_resolutions for the self-attention layers to be applied. 
- dropout (float, optional, default=0.10) – Dropout probability applied to intermediate activations within the U-Net blocks. 
- label_dropout (float, optional, default=0.0) – Dropout probability applied to the class_labels. Typically used for classifier-free guidance. 
- embedding_type (Literal["fourier", "positional", "zero"], optional, default="positional") – Diffusion timestep embedding type: ‘positional’ for DDPM++, ‘fourier’ for NCSN++, ‘zero’ for none. 
- channel_mult_noise (int, optional, default=1) – Multiplier for the number of channels in the noise level embedding. The noise level embedding vector has - model_channels * channel_mult_noisechannels.
- encoder_type (Literal["standard", "skip", "residual"], optional, default="standard") – Encoder architecture: ‘standard’ for DDPM++, ‘residual’ for NCSN++, ‘skip’ for skip connections. 
- decoder_type (Literal["standard", "skip"], optional, default="standard") – Decoder architecture: ‘standard’ or ‘skip’ for skip connections. 
- resample_filter (List[int], optional, default=[1, 1]) – Resampling filter coefficients applied in the U-Net blocks convolutions: [1,1] for DDPM++, [1,3,3,1] for NCSN++. 
- checkpoint_level (int, optional, default=0) – Number of levels that should use gradient checkpointing. Only levels at which the feature map resolution is large enough will be checkpointed (0 disables checkpointing, higher values means more layers are checkpointed). Higher values trade memory for computation. 
- additive_pos_embed (bool, optional, default=False) – - If - True, adds a learnable positional embedding after the first convolution layer. Used in StormCast model.- Note: Those positional embeddings encode spatial position information of the image pixels, unlike the - embedding_typeparameter which encodes temporal information about the diffusion process. In that sense it is a simpler version of the positional embedding used in- SongUNetPosEmbd.
- use_apex_gn (bool, optional, default=False) – A flag indicating whether we want to use Apex GroupNorm for NHWC layout. Apex needs to be installed for this to work. Need to set this as False on cpu. 
- act (str, optional, default=None) – The activation function to use when fusing activation with GroupNorm. Required when - use_apex_gnis- True.
- profile_mode (bool, optional, default=False) – A flag indicating whether to enable all nvtx annotations during profiling. 
- amp_mode (bool, optional, default=False) – A flag indicating whether mixed-precision (AMP) training is enabled. 
 
 - Forward#- xtorch.Tensor
- The input image of shape \((B, C_{in}, H_{in}, W_{in})\). In general - xis the channel-wise concatenation of the latent state \(\mathbf{x}\) and additional images used for conditioning. For an unconditional model,- xis simply the latent state \(\mathbf{x}\).- Note: \(H_{in}\) and \(W_{in}\) do not need to match \(H\) and \(W\) defined in - img_resolution, except when- additive_pos_embedis- True. In that case, the resolution of- xmust match- img_resolution.
- noise_labelstorch.Tensor
- The noise labels of shape \((B,)\). Used for conditioning on the diffusion noise level. 
- class_labelstorch.Tensor
- The class labels of shape \((B, \text{label_dim})\). Used for conditioning on any vector-valued quantity. Can pass - Nonewhen- label_dimis 0.
- augment_labelstorch.Tensor, optional, default=None
- The augmentation labels of shape \((B, \text{augment_dim})\). Used for conditioning on any additional vector-valued quantity. Can pass - Nonewhen- augment_dimis 0.
 - Outputs#- torch.Tensor
- The denoised latent state of shape \((B, C_{out}, H_{in}, W_{in})\). 
 - Important - The terms noise levels (or noise labels) are used to refer to the diffusion time-step, as these are conceptually equivalent. 
- The terms labels and classes originate from the original paper and EDM repository, where this architecture was used for class-conditional image generation. While these terms suggest class-based conditioning, the architecture can actually be conditioned on any vector-valued conditioning. 
- The term positional embedding used in the embedding_type parameter also comes from the original paper and EDM repository. Here, positional refers to the diffusion time-step, similar to how position is used in transformer architectures. Despite the name, these embeddings encode temporal information about the diffusion process rather than spatial position information. 
- Limitations on input image resolution: for a model that has \(N\) levels, the latent state \(\mathbf{x}\) must have resolution that is a multiple of \(2^N\) in each dimension. This is due to a limitation in the decoder that does not support shape mismatch in the residual connections from the encoder to the decoder. For images that do not match this requirement, it is recommended to interpolate your data on a grid of the required resolution beforehand. 
 - Example - >>> model = SongUNet(img_resolution=16, in_channels=2, out_channels=2) >>> noise_labels = torch.randn([1]) >>> class_labels = torch.randint(0, 1, (1, 1)) >>> input_image = torch.ones([1, 2, 16, 16]) >>> output_image = model(input_image, noise_labels, class_labels) >>> output_image.shape torch.Size([1, 2, 16, 16]) - property amp_mode#
- Should be set to - Trueto enable automatic mixed precision.
 - property profile_mode#
- Should be set to - Trueto enable profiling.
 
- class physicsnemo.models.diffusion.dhariwal_unet.DhariwalUNet(*args, **kwargs)[source]#
- Bases: - Module- This architecture is a diffusion backbone for 2D image generation. It reimplements the ADM architecture, a U-Net variant, with optional self-attention. - It is highly similar to the U-Net backbone defined in - SongUNet, and only differs in a few aspects:- The embedding conditioning mechanism relies on adaptive scaling of the group normalization layers within the U-Net blocks. 
- The parameters initialization follows Kaiming uniform initialization. 
 - Parameters:
- img_resolution (int) – - The resolution \(H = W\) of the input/output image. Assumes square images. - Note: This parameter is only used as a convenience to build the network. In practice, the model can still be used with images of different resolutions. 
- in_channels (int) – Number of channels \(C_{in}\) in the input image. May include channels from both the latent state \(\mathbf{x}\) and additional channels when conditioning on images. For an unconditional model, this should be equal to - out_channels.
- out_channels (int) – Number of channels \(C_{out}\) in the output image. Should be equal to the number of channels \(C_{\mathbf{x}}\) in the latent state. 
- label_dim (int, optional, default=0) – Dimension of the vector-valued - class_labelsconditioning; 0 indicates no conditioning on class labels.
- augment_dim (int, optional, default=0) – Dimension of the vector-valued - augment_labelsconditioning; 0 means no conditioning on augmentation labels.
- model_channels (int, optional, default=128) – Base multiplier for the number of channels accross the entire network. 
- channel_mult (List[int], optional, default=[1,2,2,2]) – Multipliers for the number of channels at every level in the encoder and decoder. The length of - channel_multdetermines the number of levels in the U-Net. At level- i, the number of channel in the feature map is- channel_mult[i] * model_channels.
- channel_mult_emb (int, optional, default=4) – Multiplier for the number of channels in the embedding vector. The embedding vector has - model_channels * channel_mult_embchannels.
- num_blocks (int, optional, default=3) – Number of U-Net blocks at each level. 
- attn_resolutions (List[int], optional, default=[16]) – Resolutions of the levels at which self-attention layers are applied. Note that the feature map resolution must match exactly the value provided in - attn_resolutionsfor the self-attention layers to be applied.
- dropout (float, optional, default=0.10) – Dropout probability applied to intermediate activations within the U-Net blocks. 
- label_dropout (float, optional, default=0.0) – Dropout probability applied to the - class_labels. Typically used for classifier-free guidance.
 
 - Forward#- xtorch.Tensor
- The input tensor of shape \((B, C_{in}, H_{in}, W_{in})\). In general - xis the channel-wise concatenation of the latent state \(\mathbf{x}\) and additional images used for conditioning. For an unconditional model,- xis simply the latent state \(\mathbf{x}\).
- noise_labelstorch.Tensor
- The noise labels of shape \((B,)\). Used for conditioning on the noise level. 
- class_labelstorch.Tensor
- The class labels of shape \((B, \text{label_dim})\). Used for conditioning on any vector-valued quantity. Can pass - Nonewhen- label_dimis 0.
- augment_labelstorch.Tensor, optional, default=None
- The augmentation labels of shape \((B, \text{augment_dim})\). Used for conditioning on any additional vector-valued quantity. Can pass - Nonewhen- augment_dimis 0.
 - Outputs#- torch.Tensor:
- The denoised latent state of shape \((B, C_{out}, H_{in}, W_{in})\). 
 - Examples - >>> model = DhariwalUNet(img_resolution=16, in_channels=2, out_channels=2) >>> noise_labels = torch.randn([1]) >>> class_labels = torch.randint(0, 1, (1, 1)) # noqa: N806 >>> input_image = torch.ones([1, 2, 16, 16]) # noqa: N806 >>> output_image = model(input_image, noise_labels, class_labels) # noqa: N806 - property amp_mode#
- Should be set to - Trueto enable automatic mixed precision.
 - property profile_mode#
- Should be set to - Trueto enable profiling.
 
- class physicsnemo.models.diffusion.song_unet.SongUNetPosEmbd(*args, **kwargs)[source]#
- Bases: - SongUNet- This specialized architecture extends - SongUNetwith positional embeddings that encode global spatial coordinates of the pixels.- This model supports the same type of conditioning as the base SongUNet, and can be in addition conditioned on the positional embeddings. Conditioning on the positional embeddings is performed with a channel-wise concatenation to the input image before the first layer of the U-Net. Multiple types of positional embeddings are supported. Positional embeddings are represented by a 2D grid of shape \((C_{PE}, H, W)\), where \(H\) and \(W\) correspond to the - img_resolutionparameter.- The following types of positional embeddings are supported: - learnable: uses a 2D grid of learnable parameters. 
- linear: uses a 2D rectilinear grid over the domain \([-1, 1] \times [-1, 1]\). 
- sinusoidal: uses sinusoidal functions of the spatial coordinates, with possibly multiple frequency bands. 
- test: uses a 2D grid of integer indices, only used for testing. 
 - When the input image spatial resolution is smaller than the global positional embeddings, it is necessary to select a subset (or patch) of the embedding grid that correspond to the spatial locations of the input image pixels. The model provides two methods for selecting the subset of positional embeddings: - Using a selector function. See - positional_embedding_selector()for details.
- Using global indices. See - positional_embedding_indexing()for details.
 - If none of these are provided, the entire grid of positional embeddings is used and channel-wise concatenated to the input image. - Most parameters are the same as in the parent class - SongUNet. Only the ones that differ are listed below.- Parameters:
- img_resolution (Union[List[int, int], int]) – The resolution of the input/output image. Can be a single int for square images or a list \([H, W]\) for rectangular images. Used to set the resolution of the positional embedding grid. It must correspond to the spatial resolution of the global domain/image. 
- in_channels (int) – - Number of channels \(C_{in} + C_{PE}\), where \(C_{in}\) is the number of channels in the image passed to the U-Net and \(C_{PE}\) is the number of channels in the positional embedding grid. - Important: in comparison to the base - SongUNet, this parameter should also include the number of channels in the positional embedding grid \(C_{PE}\).
- gridtype (Literal["sinusoidal", "learnable", "linear", "test"], optional, default="sinusoidal") – Type of positional embedding to use. Controls how spatial pixels locations are encoded. 
- N_grid_channels (int, optional, default=4) – Number of channels \(C_{PE}\) in the positional embedding grid. For ‘sinusoidal’ must be 4 or multiple of 4. For ‘linear’ and ‘test’ must be 2. For ‘learnable’ can be any value. 
- lead_time_mode (bool, optional, default=False) – Provided for convenience. It is recommended to use the architecture - SongUNetPosLtEmbdfor a lead-time aware model.
- lead_time_channels (int, optional, default=None) – Provided for convenience. Refer to - SongUNetPosLtEmbd.
- lead_time_steps (int, optional, default=9) – Provided for convenience. Refer to - SongUNetPosLtEmbd.
- prob_channels (List[int], optional, default=[]) – Provided for convenience. Refer to - SongUNetPosLtEmbd.
 
 - Forward#- xtorch.Tensor
- The input image of shape \((B, C_{in}, H_{in}, W_{in})\), where \(H_{in}\) and \(W_{in}\) are the spatial dimensions of the input image (does not need to be the full image). In general - xis the channel-wise concatenation of the latent state \(\mathbf{x}\) and additional images used for conditioning. For an unconditional model,- xis simply the latent state \(\mathbf{x}\).- Note: \(H_{in}\) and \(W_{in}\) do not need to match the - img_resolutionparameter, except when- additive_pos_embedis- True. In all other cases, the resolution of- xmust be smaller than- img_resolution.
- noise_labelstorch.Tensor
- The noise labels of shape \((B,)\). Used for conditioning on the diffusion noise level. 
- class_labelstorch.Tensor
- The class labels of shape \((B, \text{label_dim})\). Used for conditioning on any vector-valued quantity. Can pass - Nonewhen- label_dimis 0.
- global_indextorch.Tensor, optional, default=None
- The global indices of the positional embeddings to use. If neither - global_indexnor- embedding_selectorare provided, the entire positional embedding grid of shape \((C_{PE}, H, W)\) is used. In this case- xmust have the same spatial resolution as the positional embedding grid. See- positional_embedding_indexing()for details.
- embedding_selectorCallable, optional, default=None
- A function that selects the positional embeddings to use. See - positional_embedding_selector()for details.
- augment_labelstorch.Tensor, optional, default=None
- The augmentation labels of shape \((B, \text{augment_dim})\). Used for conditioning on any additional vector-valued quantity. Can pass - Nonewhen- augment_dimis 0.
 - Outputs#- torch.Tensor
- The output tensor of shape \((B, C_{out}, H_{in}, W_{in})\). 
 - Important - Unlike positional embeddings defined by - embedding_typein the parent class- SongUNetthat encode the diffusion time-step (or noise level), the positional embeddings in this specialized architecture encode global spatial coordinates of the pixels.- Examples - >>> import torch >>> from physicsnemo.models.diffusion.song_unet import SongUNetPosEmbd >>> from physicsnemo.utils.patching import GridPatching2D >>> >>> # Model initialization - in_channels must include both original input channels (2) >>> # and the positional embedding channels (N_grid_channels=4 by default) >>> model = SongUNetPosEmbd(img_resolution=16, in_channels=2+4, out_channels=2) >>> noise_labels = torch.randn([1]) >>> class_labels = torch.randint(0, 1, (1, 1)) >>> # The input has only the original 2 channels - positional embeddings are >>> # added automatically inside the forward method >>> input_image = torch.ones([1, 2, 16, 16]) >>> output_image = model(input_image, noise_labels, class_labels) >>> output_image.shape torch.Size([1, 2, 16, 16]) >>> >>> # Using a global index to select all positional embeddings >>> patching = GridPatching2D(img_shape=(16, 16), patch_shape=(16, 16)) >>> global_index = patching.global_index(batch_size=1) >>> output_image = model( ... input_image, noise_labels, class_labels, ... global_index=global_index ... ) >>> output_image.shape torch.Size([1, 2, 16, 16]) >>> >>> # Using a custom embedding selector to select all positional embeddings >>> def patch_embedding_selector(emb): ... return patching.apply(emb[None].expand(1, -1, -1, -1)) >>> output_image = model( ... input_image, noise_labels, class_labels, ... embedding_selector=patch_embedding_selector ... ) >>> output_image.shape torch.Size([1, 2, 16, 16]) - property amp_mode#
- Should be set to - Trueto enable automatic mixed precision.
 - positional_embedding_indexing(
- x: Tensor,
- global_index: Tensor | None = None,
- lead_time_label=None,
- Select positional embeddings using global indices. - This method uses global indices to select specific subset of the positional embedding grid (called patches). If no indices are provided, the entire positional embedding grid is returned. - Parameters:
- x (torch.Tensor) – Input tensor of shape \((P \times B, C, H_{in}, W_{in})\). Only used to determine batch size \(B\) and device. 
- global_index (Optional[torch.Tensor], default=None) – Tensor of shape \((P, 2, H_{in}, W_{in})\) that correspond to the patches to extract from the positional embedding grid. \(P\) is the number of distinct patches in the input tensor - x. The channel dimension should contain \(j\), \(i\) indices that should represent the indices of the pixels to extract from the embedding grid.
 
- Returns:
- Selected positional embeddings with shape \((P \times B, C_{PE}, H_{in}, W_{in})\) (same spatial resolution as - global_index) if- global_indexis provided. If- global_indexis None, the entire positional embedding grid is duplicated \(B\) times and returned with shape \((B, C_{PE}, H, W)\).
- Return type:
- torch.Tensor 
 - Example - >>> # Create global indices using patching utility: >>> from physicsnemo.utils.patching import GridPatching2D >>> patching = GridPatching2D(img_shape=(16, 16), patch_shape=(8, 8)) >>> global_index = patching.global_index(batch_size=3) >>> print(global_index.shape) torch.Size([4, 2, 8, 8]) - Notes - This method is typically used in patch-based diffusion (or multi-diffusion), where a large input image is split into multiple patches. The batch dimension of the input tensor contains the patches. Patches are processed independently by the model, and the - global_indexparameter is used to select the grid of positional embeddings corresponding to each patch.
- See this method from - physicsnemo.utils.patching.BasePatching2Dfor generating the- global_indexparameter:- global_index().
 
 - positional_embedding_selector(
- x: Tensor,
- embedding_selector: Callable[[Tensor], Tensor],
- lead_time_label=None,
- Select positional embeddings using a selector function. - Similar to - positional_embedding_indexing(), but instead uses a selector function to select the embeddings.- Parameters:
- x (torch.Tensor) – Input tensor of shape \((P \times B, C, H_{in}, W_{in})\). Only used to determine batch size \(B\), dtype and device. 
- embedding_selector (Callable) – Function that takes as input the entire embedding grid of shape \((C_{PE}, H, W)\) and returns selected embeddings with shape \((P \times B, C_{PE}, H_{in}, W_{in})\). Each selected embedding should correspond to the portion of the embedding grid that corresponds to the batch element in - x. Typically this should be based on- physicsnemo.utils.patching.BasePatching2D.apply()method to maintain consistency with patch extraction.
- lead_time_label (Optional[torch.Tensor], default=None) – Tensor of shape \((P,)\) that corresponds to the lead-time label for each patch. Only used if - lead_time_modeis True.
 
- Returns:
- A tensor of shape \((P \times B, C_{PE} [+ C_{LT}], H_{in}, W_{in})\). \(C_{PE}\) is the number of embedding channels in the positional embedding grid, and \(C_{LT}\) is the number of embedding channels in the lead-time embedding grid. If - lead_time_labelis provided, the lead-time embedding channels are included.
- Return type:
- torch.Tensor 
 - Notes - This method is typically used in patch-based diffusion (or multi-diffusion), where a large input image is split into multiple patches. The batch dimension of the input tensor contains the patches. Patches are processed independently by the model, and the - embedding_selectorfunction is used to select the grid of positional embeddings corresponding to each patch.
- See this method from - physicsnemo.utils.patching.BasePatching2Dfor generating the- embedding_selectorparameter:- apply()
 - Example - >>> # Define a selector function with a patching utility: >>> from physicsnemo.utils.patching import GridPatching2D >>> patching = GridPatching2D(img_shape=(16, 16), patch_shape=(8, 8)) >>> batch_size = 4 >>> def embedding_selector(emb): ... return patching.apply(emb[None].expand(batch_size, -1, -1, -1)) >>> 
 - property profile_mode#
- Should be set to - Trueto enable profiling.
 
- class physicsnemo.models.diffusion.song_unet.SongUNetPosLtEmbd(*args, **kwargs)[source]#
- Bases: - SongUNetPosEmbd- This specialized architecture extends - SongUNetPosEmbdwith two additional capabilities:- The model can be conditioned on lead-time labels. These labels encode physical time information, such as a forecasting horizon. 
- Similarly to the parent - SongUNetPosEmbd, this model predicts regression targets, but it can also produce classification predictions. More precisely, some of the ouput channels are probability outputs, that are passed through a softmax activation function. This is useful for multi-task applications, where the objective is a combination of both regression and classification losses.
 - The mechanism to condition on lead-time labels is implemented by: - First generating a grid of learnable lead-time embeddings of shape \((\text{lead_time_steps}, C_{LT}, H, W)\). The spatial resolution of the lead-time embeddings is the same as the input/output image. 
- Then, given an input - x, select the lead-time embeddings that corresponds to the lead-times associated with the samples in the input- x.
- Finally, concatenate channels-wise the selected lead-time embeddings and positional embeddings to the input - xand pass them to the U-Net network.
 - Most parameters are similar to the parent - SongUNetPosEmbd, at the exception of the ones listed below.- Parameters:
- in_channels (int) – - Number of channels \(C_{in} + C_{PE} + C_{LT}\) in the image passed to the U-Net. - Important: in comparison to the base - SongUNet, this parameter should also include the number of channels in the positional embedding grid \(C_{PE}\) and the number of channels in the lead-time embedding grid \(C_{LT}\).
- lead_time_channels (int, optional, default=None) – Number of channels \(C_{LT}\) in the lead time embedding. These are learned embeddings that encode physical time information. 
- lead_time_steps (int, optional, default=9) – Number of discrete lead time steps to support. Each step gets its own learned embedding vector of shape \((C_{LT}, H, W)\). 
- prob_channels (List[int], optional, default=[]) – Indices of channels that are probability outputs (or classification predictions), In training mode, the model outputs logits for these probability channels, and in eval mode, the model applies a softmax to outputs the probabilities. 
- Forward 
- ------- 
- x (torch.Tensor) – The input image of shape \((B, C_{in}, H_{in}, W_{in})\), where \(H_{in}\) and \(W_{in}\) are the spatial dimensions of the input image (does not need to be the full image). 
- noise_labels (torch.Tensor) – The noise labels of shape \((B,)\). Used for conditioning on the diffusion noise level. 
- class_labels (torch.Tensor) – The class labels of shape \((B, \text{label_dim})\). Used for conditioning on any vector-valued quantity. Can pass - Nonewhen- label_dimis 0.
- global_index (torch.Tensor, optional, default=None) – The global indices of the positional embeddings to use. See - positional_embedding_indexing()for details. If neither- global_indexnor- embedding_selectorare provided, the entire positional embedding grid is used.
- embedding_selector (Callable, optional, default=None) – A function that selects the positional embeddings to use. See - positional_embedding_selector()for details.
- augment_labels (torch.Tensor, optional, default=None) – The augmentation labels of shape \((B, \text{augment_dim})\). Used for conditioning on any additional vector-valued quantity. 
- lead_time_label (torch.Tensor, optional, default=None) – The lead-time labels of shape \((B,)\). Used for selecting lead-time embeddings. It should contain the indices of the lead-time embeddings that correspond to the lead-time of each sample in the batch. 
- Outputs 
- ------- 
- torch.Tensor – The output tensor of shape \((B, C_{out}, H_{in}, W_{in})\). 
 
 - Notes - The lead-time embeddings differ from the diffusion time embeddings used in - SongUNetclass, as they do not encode diffusion time-step but physical forecast time.
 - Example - >>> import torch >>> from physicsnemo.models.diffusion.song_unet import SongUNetPosLtEmbd >>> from physicsnemo.utils.patching import GridPatching2D >>> >>> # Model initialization - in_channels must include original input channels (2), >>> # positional embedding channels (N_grid_channels=4 by default) and >>> # lead time embedding channels (4) >>> model = SongUNetPosLtEmbd( ... img_resolution=16, in_channels=2+4+4, out_channels=2, ... lead_time_channels=4, lead_time_steps=9 ... ) >>> noise_labels = torch.randn([1]) >>> class_labels = torch.randint(0, 1, (1, 1)) >>> # The input has only the original 2 channels - positional embeddings and >>> # lead time embeddings are added automatically inside the forward method >>> input_image = torch.ones([1, 2, 16, 16]) >>> lead_time_label = torch.tensor([3]) >>> output_image = model( ... input_image, noise_labels, class_labels, ... lead_time_label=lead_time_label ... ) >>> output_image.shape torch.Size([1, 2, 16, 16]) >>> >>> # Using global_index to select all the positional and lead time embeddings >>> patching = GridPatching2D(img_shape=(16, 16), patch_shape=(16, 16)) >>> global_index = patching.global_index(batch_size=1) >>> output_image = model( ... input_image, noise_labels, class_labels, ... lead_time_label=lead_time_label, ... global_index=global_index ... ) >>> output_image.shape torch.Size([1, 2, 16, 16]) - property amp_mode#
- Should be set to - Trueto enable automatic mixed precision.
 - positional_embedding_indexing(
- x: Tensor,
- global_index: Tensor | None = None,
- lead_time_label=None,
- Select positional embeddings using global indices. - This method uses global indices to select specific subset of the positional embedding grid (called patches). If no indices are provided, the entire positional embedding grid is returned. - Parameters:
- x (torch.Tensor) – Input tensor of shape \((P \times B, C, H_{in}, W_{in})\). Only used to determine batch size \(B\) and device. 
- global_index (Optional[torch.Tensor], default=None) – Tensor of shape \((P, 2, H_{in}, W_{in})\) that correspond to the patches to extract from the positional embedding grid. \(P\) is the number of distinct patches in the input tensor - x. The channel dimension should contain \(j\), \(i\) indices that should represent the indices of the pixels to extract from the embedding grid.
 
- Returns:
- Selected positional embeddings with shape \((P \times B, C_{PE}, H_{in}, W_{in})\) (same spatial resolution as - global_index) if- global_indexis provided. If- global_indexis None, the entire positional embedding grid is duplicated \(B\) times and returned with shape \((B, C_{PE}, H, W)\).
- Return type:
- torch.Tensor 
 - Example - >>> # Create global indices using patching utility: >>> from physicsnemo.utils.patching import GridPatching2D >>> patching = GridPatching2D(img_shape=(16, 16), patch_shape=(8, 8)) >>> global_index = patching.global_index(batch_size=3) >>> print(global_index.shape) torch.Size([4, 2, 8, 8]) - Notes - This method is typically used in patch-based diffusion (or multi-diffusion), where a large input image is split into multiple patches. The batch dimension of the input tensor contains the patches. Patches are processed independently by the model, and the - global_indexparameter is used to select the grid of positional embeddings corresponding to each patch.
- See this method from - physicsnemo.utils.patching.BasePatching2Dfor generating the- global_indexparameter:- global_index().
 
 - positional_embedding_selector(
- x: Tensor,
- embedding_selector: Callable[[Tensor], Tensor],
- lead_time_label=None,
- Select positional embeddings using a selector function. - Similar to - positional_embedding_indexing(), but instead uses a selector function to select the embeddings.- Parameters:
- x (torch.Tensor) – Input tensor of shape \((P \times B, C, H_{in}, W_{in})\). Only used to determine batch size \(B\), dtype and device. 
- embedding_selector (Callable) – Function that takes as input the entire embedding grid of shape \((C_{PE}, H, W)\) and returns selected embeddings with shape \((P \times B, C_{PE}, H_{in}, W_{in})\). Each selected embedding should correspond to the portion of the embedding grid that corresponds to the batch element in - x. Typically this should be based on- physicsnemo.utils.patching.BasePatching2D.apply()method to maintain consistency with patch extraction.
- lead_time_label (Optional[torch.Tensor], default=None) – Tensor of shape \((P,)\) that corresponds to the lead-time label for each patch. Only used if - lead_time_modeis True.
 
- Returns:
- A tensor of shape \((P \times B, C_{PE} [+ C_{LT}], H_{in}, W_{in})\). \(C_{PE}\) is the number of embedding channels in the positional embedding grid, and \(C_{LT}\) is the number of embedding channels in the lead-time embedding grid. If - lead_time_labelis provided, the lead-time embedding channels are included.
- Return type:
- torch.Tensor 
 - Notes - This method is typically used in patch-based diffusion (or multi-diffusion), where a large input image is split into multiple patches. The batch dimension of the input tensor contains the patches. Patches are processed independently by the model, and the - embedding_selectorfunction is used to select the grid of positional embeddings corresponding to each patch.
- See this method from - physicsnemo.utils.patching.BasePatching2Dfor generating the- embedding_selectorparameter:- apply()
 - Example - >>> # Define a selector function with a patching utility: >>> from physicsnemo.utils.patching import GridPatching2D >>> patching = GridPatching2D(img_shape=(16, 16), patch_shape=(8, 8)) >>> batch_size = 4 >>> def embedding_selector(emb): ... return patching.apply(emb[None].expand(batch_size, -1, -1, -1)) >>> 
 - property profile_mode#
- Should be set to - Trueto enable profiling.
 
- class physicsnemo.models.diffusion.unet.UNet(*args, **kwargs)[source]#
- Bases: - Module- This interface provides a U-Net wrapper for CorrDiff deterministic regression model (and other deterministic downsampling models). It supports the following architectures: - It shares the same architeture as a conditional diffusion model. It does so by concatenating a conditioning image to a zero-filled latent state, and by setting the noise level and the class labels to zero. - Parameters:
- img_resolution (Union[int, Tuple[int, int]]) – The resolution of the input/output image. If a single int is provided, then the image is assumed to be square. 
- img_in_channels (int) – Number of channels in the input image. 
- img_out_channels (int) – Number of channels in the output image. 
- use_fp16 (bool, optional, default=False) – Execute the underlying model at FP16 precision. 
- model_type (Literal['SongUNet', 'SongUNetPosEmbd', 'SongUNetPosLtEmbd',) 
- 'DhariwalUNet'] – Class name of the underlying architecture. Must be one of the following: ‘SongUNet’, ‘SongUNetPosEmbd’, ‘SongUNetPosLtEmbd’, ‘DhariwalUNet’. 
- default='SongUNetPosEmbd' – Class name of the underlying architecture. Must be one of the following: ‘SongUNet’, ‘SongUNetPosEmbd’, ‘SongUNetPosLtEmbd’, ‘DhariwalUNet’. 
- **model_kwargs (dict) – Keyword arguments passed to the underlying architecture __init__ method. 
- call (Please refer to the documentation of these classes for details on how to) 
- directly. (and use these models) 
- Forward 
- ------- 
- x (torch.Tensor) – The input tensor, typically zero-filled, of shape \((B, C_{in}, H_{in}, W_{in})\). 
- img_lr (torch.Tensor) – Conditioning image of shape \((B, C_{lr}, H_{in}, W_{in})\). 
- **model_kwargs – Additional keyword arguments to pass to the underlying architecture forward method. 
- Outputs 
- ------- 
- torch.Tensor – Output tensor of shape \((B, C_{out}, H_{in}, W_{in})\) (same spatial dimensions as the input). 
 
 - property amp_mode#
- Set to - Truewhen using automatic mixed precision.
 - property profile_mode#
- Set to - Trueto enable profiling of the wrapped model.
 - property use_fp16#
- Whether the model uses float16 precision. - Returns:
- True if the model is in float16 mode, False otherwise. 
- Return type:
- bool 
- Type:
- bool 
 
 
Diffusion Preconditioners#
Preconditioning is an essential technique to improve the performance of diffusion models. It consists in scaling the latent state and the noise level that are passed to a network. Some preconditioning also requires to re-scale the output of the network. PhysicsNeMo provides a set of preconditioning classes that are wrappers around backbones or specialized architectures.
Preconditioning schemes used in the paper”Elucidating the Design Space of Diffusion-Based Generative Models”.
- class physicsnemo.models.diffusion.preconditioning.EDMPrecond(*args, **kwargs)[source]#
- Bases: - Module- Improved preconditioning proposed in the paper “Elucidating the Design Space of Diffusion-Based Generative Models” (EDM) - Parameters:
- img_resolution (int) – Image resolution. 
- img_channels (int) – Number of color channels (for both input and output). If your model requires a different number of input or output chanels, override this by passing either of the optional img_in_channels or img_out_channels args 
- label_dim (int) – Number of class labels, 0 = unconditional, by default 0. 
- use_fp16 (bool) – Execute the underlying model at FP16 precision?, by default False. 
- sigma_min (float) – Minimum supported noise level, by default 0.0. 
- sigma_max (float) – Maximum supported noise level, by default inf. 
- sigma_data (float) – Expected standard deviation of the training data, by default 0.5. 
- model_type (str) – Class name of the underlying model, by default “DhariwalUNet”. 
- img_in_channels (int) – Optional setting for when number of input channels =/= number of output channels. If set, will override img_channels for the input This is useful in the case of additional (conditional) channels 
- img_out_channels (int) – Optional setting for when number of input channels =/= number of output channels. If set, will override img_channels for the output 
- **model_kwargs (dict) – Keyword arguments for the underlying model. 
 
 - Note - Reference: Karras, T., Aittala, M., Aila, T. and Laine, S., 2022. Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems, 35, pp.26565-26577. - forward(
- x,
- sigma,
- condition=None,
- class_labels=None,
- force_fp32=False,
- **model_kwargs,
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 
- class physicsnemo.models.diffusion.preconditioning.EDMPrecondMetaData(
- name: str = 'EDMPrecond',
- jit: bool = False,
- cuda_graphs: bool = False,
- amp: bool = False,
- amp_cpu: bool = False,
- amp_gpu: bool = True,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = None,
- onnx_cpu: bool = None,
- onnx_runtime: bool = False,
- trt: bool = False,
- var_dim: int = -1,
- func_torch: bool = False,
- auto_grad: bool = False,
- Bases: - ModelMetaData- EDMPrecond meta data 
- class physicsnemo.models.diffusion.preconditioning.EDMPrecondSR(*args, **kwargs)[source]#
- Bases: - EDMPrecondSuperResolution- NOTE: This is a deprecated version of the EDMPrecondSuperResolution model. This was used to maintain backwards compatibility and allow loading old models. Please use the EDMPrecondSuperResolution model instead. - Improved preconditioning proposed in the paper “Elucidating the Design Space of Diffusion-Based Generative Models” (EDM) for super-resolution tasks - Parameters:
- img_resolution (int) – Image resolution. 
- img_channels (int) – Number of color channels (deprecated, not used). 
- img_in_channels (int) – Number of input color channels. 
- img_out_channels (int) – Number of output color channels. 
- use_fp16 (bool) – Execute the underlying model at FP16 precision?, by default False. 
- sigma_min (float) – Minimum supported noise level, by default 0.0. 
- sigma_max (float) – Maximum supported noise level, by default inf. 
- sigma_data (float) – Expected standard deviation of the training data, by default 0.5. 
- model_type (str) – Class name of the underlying model, by default “SongUNetPosEmbd”. 
- scale_cond_input (bool) – Whether to scale the conditional input (deprecated), by default True. 
- **model_kwargs (dict) – Keyword arguments for the underlying model. 
 
 - Note - References: - Karras, T., Aittala, M., Aila, T. and Laine, S., 2022. Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems, 35, pp.26565-26577. - Mardani, M., Brenowitz, N., Cohen, Y., Pathak, J., Chen, C.Y., Liu, C.C.,Vahdat, A., Kashinath, K., Kautz, J. and Pritchard, M., 2023. Generative Residual Diffusion Modeling for Km-scale Atmospheric Downscaling. arXiv preprint arXiv:2309.15214. - forward(
- x,
- img_lr,
- sigma,
- force_fp32=False,
- **model_kwargs,
- Forward pass of the EDMPrecondSR model wrapper. - Parameters:
- x (torch.Tensor) – Noisy high-resolution image of shape (B, C_hr, H, W). 
- img_lr (torch.Tensor) – Low-resolution conditioning image of shape (B, C_lr, H, W). 
- sigma (torch.Tensor) – Noise level of shape (B) or (B, 1) or (B, 1, 1, 1). 
- force_fp32 (bool, optional) – Whether to force FP32 precision regardless of the use_fp16 attribute, by default False. 
- **model_kwargs (dict) – Additional keyword arguments to pass to the underlying model. 
 
- Returns:
- Denoised high-resolution image of shape (B, C_hr, H, W). 
- Return type:
- torch.Tensor 
 
 
- class physicsnemo.models.diffusion.preconditioning.EDMPrecondSRMetaData(
- name: str = 'EDMPrecondSR',
- jit: bool = False,
- cuda_graphs: bool = False,
- amp: bool = False,
- amp_cpu: bool = False,
- amp_gpu: bool = True,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = None,
- onnx_cpu: bool = None,
- onnx_runtime: bool = False,
- trt: bool = False,
- var_dim: int = -1,
- func_torch: bool = False,
- auto_grad: bool = False,
- Bases: - ModelMetaData- EDMPrecondSR meta data 
- class physicsnemo.models.diffusion.preconditioning.EDMPrecondSuperResolution(*args, **kwargs)[source]#
- Bases: - Module- Improved preconditioning proposed in the paper “Elucidating the Design Space of Diffusion-Based Generative Models” (EDM). - This is a variant of EDMPrecond that is specifically designed for super-resolution tasks. It wraps a neural network that predicts the denoised high-resolution image given a noisy high-resolution image, and additional conditioning that includes a low-resolution image, and a noise level. - Parameters:
- img_resolution (Union[int, Tuple[int, int]]) – Spatial resolution \((H, W)\) of the image. If a single int is provided, the image is assumed to be square. 
- img_in_channels (int) – Number of input channels in the low-resolution input image. 
- img_out_channels (int) – Number of output channels in the high-resolution output image. 
- use_fp16 (bool, optional) – Whether to use half-precision floating point (FP16) for model execution, by default False. 
- model_type (str, optional) – Class name of the underlying model. Must be one of the following: ‘SongUNet’, ‘SongUNetPosEmbd’, ‘SongUNetPosLtEmbd’, ‘DhariwalUNet’. Defaults to ‘SongUNetPosEmbd’. 
- sigma_data (float, optional) – Expected standard deviation of the training data, by default 0.5. 
- sigma_min (float, optional) – Minimum supported noise level, by default 0.0. 
- sigma_max (float, optional) – Maximum supported noise level, by default inf. 
- **model_kwargs (dict) – Keyword arguments passed to the underlying model __init__ method. 
 
 - See also - For- SongUNet
- Basic U-Net for diffusion models 
- SongUNetPosEmbd
- U-Net with positional embeddings 
- SongUNetPosLtEmbd
- U-Net with positional and lead-time embeddings 
 - Please,- and- Note - References: - Karras, T., Aittala, M., Aila, T. and Laine, S., 2022. Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems, 35, pp.26565-26577. - Mardani, M., Brenowitz, N., Cohen, Y., Pathak, J., Chen, C.Y., Liu, C.C.,Vahdat, A., Kashinath, K., Kautz, J. and Pritchard, M., 2023. Generative Residual Diffusion Modeling for Km-scale Atmospheric Downscaling. arXiv preprint arXiv:2309.15214. - property amp_mode#
- Set to - Truewhen using automatic mixed precision.
 - forward(
- x: Tensor,
- img_lr: Tensor,
- sigma: Tensor,
- force_fp32: bool = False,
- **model_kwargs: dict,
- Forward pass of the EDMPrecondSuperResolution model wrapper. - This method applies the EDM preconditioning to compute the denoised image from a noisy high-resolution image and low-resolution conditioning image. - Parameters:
- x (torch.Tensor) – Noisy high-resolution image of shape (B, C_hr, H, W). The number of channels C_hr should be equal to img_out_channels. 
- img_lr (torch.Tensor) – Low-resolution conditioning image of shape (B, C_lr, H, W). The number of channels C_lr should be equal to img_in_channels. 
- sigma (torch.Tensor) – Noise level of shape (B) or (B, 1) or (B, 1, 1, 1). 
- force_fp32 (bool, optional) – Whether to force FP32 precision regardless of the use_fp16 attribute, by default False. 
- **model_kwargs (dict) – Additional keyword arguments to pass to the underlying model self.model forward method. 
 
- Returns:
- Denoised high-resolution image of shape (B, C_hr, H, W). 
- Return type:
- torch.Tensor 
- Raises:
- ValueError – If the model output dtype doesn’t match the expected dtype. 
 
 - property profile_mode#
- Set to - Trueto enable profiling of the wrapped model.
 - static round_sigma(
- sigma: float | List | Tensor,
- Convert a given sigma value(s) to a tensor representation. - Parameters:
- sigma (Union[float, List, torch.Tensor]) – Sigma value(s) to convert. 
- Returns:
- Tensor representation of sigma values. 
- Return type:
- torch.Tensor 
 - See also 
 - property use_fp16#
- Whether the model uses float16 precision. - Returns:
- True if the model is in float16 mode, False otherwise. 
- Return type:
- bool 
- Type:
- bool 
 
 
- class physicsnemo.models.diffusion.preconditioning.EDMPrecondSuperResolutionMetaData(
- name: str = 'EDMPrecondSuperResolution',
- jit: bool = False,
- cuda_graphs: bool = False,
- amp: bool = False,
- amp_cpu: bool = False,
- amp_gpu: bool = True,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = None,
- onnx_cpu: bool = None,
- onnx_runtime: bool = False,
- trt: bool = False,
- var_dim: int = -1,
- func_torch: bool = False,
- auto_grad: bool = False,
- Bases: - ModelMetaData- EDMPrecondSuperResolution meta data 
- class physicsnemo.models.diffusion.preconditioning.VEPrecond(*args, **kwargs)[source]#
- Bases: - Module- Preconditioning corresponding to the variance exploding (VE) formulation. - Parameters:
- img_resolution (int) – Image resolution. 
- img_channels (int) – Number of color channels. 
- label_dim (int) – Number of class labels, 0 = unconditional, by default 0. 
- use_fp16 (bool) – Execute the underlying model at FP16 precision?, by default False. 
- sigma_min (float) – Minimum supported noise level, by default 0.02. 
- sigma_max (float) – Maximum supported noise level, by default 100.0. 
- model_type (str) – Class name of the underlying model, by default “SongUNet”. 
- **model_kwargs (dict) – Keyword arguments for the underlying model. 
 
 - Note - Reference: Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S. and Poole, B., 2020. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456. - forward(
- x,
- sigma,
- class_labels=None,
- force_fp32=False,
- **model_kwargs,
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 
- class physicsnemo.models.diffusion.preconditioning.VEPrecondMetaData(
- name: str = 'VEPrecond',
- jit: bool = False,
- cuda_graphs: bool = False,
- amp: bool = False,
- amp_cpu: bool = False,
- amp_gpu: bool = True,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = None,
- onnx_cpu: bool = None,
- onnx_runtime: bool = False,
- trt: bool = False,
- var_dim: int = -1,
- func_torch: bool = False,
- auto_grad: bool = False,
- Bases: - ModelMetaData- VEPrecond meta data 
- class physicsnemo.models.diffusion.preconditioning.VEPrecond_dfsr(
- img_resolution: int,
- img_channels: int,
- label_dim: int = 0,
- use_fp16: bool = False,
- sigma_min: float = 0.02,
- sigma_max: float = 100.0,
- dataset_mean: float = 5.85e-05,
- dataset_scale: float = 4.79,
- model_type: str = 'SongUNet',
- **model_kwargs: dict,
- Bases: - Module- Preconditioning for dfsr model, modified from class VEPrecond, where the input argument ‘sigma’ in forward propagation function is used to receive the timestep of the backward diffusion process. - Parameters:
- img_resolution (int) – Image resolution. 
- img_channels (int) – Number of color channels. 
- label_dim (int) – Number of class labels, 0 = unconditional, by default 0. 
- use_fp16 (bool) – Execute the underlying model at FP16 precision?, by default False. 
- sigma_min (float) – Minimum supported noise level, by default 0.02. 
- sigma_max (float) – Maximum supported noise level, by default 100.0. 
- model_type (str) – Class name of the underlying model, by default “SongUNet”. 
- **model_kwargs (dict) – Keyword arguments for the underlying model. 
 
 - Note - Reference: Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models. Advances in neural information processing systems. 2020;33:6840-51. - forward(
- x,
- sigma,
- class_labels=None,
- force_fp32=False,
- **model_kwargs,
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 
- class physicsnemo.models.diffusion.preconditioning.VEPrecond_dfsr_cond(
- img_resolution: int,
- img_channels: int,
- label_dim: int = 0,
- use_fp16: bool = False,
- sigma_min: float = 0.02,
- sigma_max: float = 100.0,
- dataset_mean: float = 5.85e-05,
- dataset_scale: float = 4.79,
- model_type: str = 'SongUNet',
- **model_kwargs: dict,
- Bases: - Module- Preconditioning for dfsr model with physics-informed conditioning input, modified from class VEPrecond, where the input argument ‘sigma’ in forward propagation function is used to receive the timestep of the backward diffusion process. The gradient of PDE residual with respect to the vorticity in the governing Navier-Stokes equation is computed as the physics-informed conditioning variable and is combined with the backward diffusion timestep before being sent to the underlying model for noise prediction. - Parameters:
- img_resolution (int) – Image resolution. 
- img_channels (int) – Number of color channels. 
- label_dim (int) – Number of class labels, 0 = unconditional, by default 0. 
- use_fp16 (bool) – Execute the underlying model at FP16 precision?, by default False. 
- sigma_min (float) – Minimum supported noise level, by default 0.02. 
- sigma_max (float) – Maximum supported noise level, by default 100.0. 
- model_type (str) – Class name of the underlying model, by default “SongUNet”. 
- **model_kwargs (dict) – Keyword arguments for the underlying model. 
 
 - Note - Reference: [1] Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S. and Poole, B., 2020. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456. [2] Shu D, Li Z, Farimani AB. A physics-informed diffusion model for high-fidelity flow field reconstruction. Journal of Computational Physics. 2023 Apr 1;478:111972. - forward(
- x,
- sigma,
- class_labels=None,
- force_fp32=False,
- **model_kwargs,
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 - voriticity_residual(w, re=1000.0, dt=0.03125)[source]#
- Compute the gradient of PDE residual with respect to a given vorticity w using the spectrum method. - Parameters:
- w (torch.Tensor) – The fluid flow data sample (vorticity). 
- re (float) – The value of Reynolds number used in the governing Navier-Stokes equation. 
- dt (float) – Time step used to compute the time-derivative of vorticity included in the governing Navier-Stokes equation. 
 
- Returns:
- The computed vorticity gradient. 
- Return type:
- torch.Tensor 
 
 
- class physicsnemo.models.diffusion.preconditioning.VPPrecond(*args, **kwargs)[source]#
- Bases: - Module- Preconditioning corresponding to the variance preserving (VP) formulation. - Parameters:
- img_resolution (int) – Image resolution. 
- img_channels (int) – Number of color channels. 
- label_dim (int) – Number of class labels, 0 = unconditional, by default 0. 
- use_fp16 (bool) – Execute the underlying model at FP16 precision?, by default False. 
- beta_d (float) – Extent of the noise level schedule, by default 19.9. 
- beta_min (float) – Initial slope of the noise level schedule, by default 0.1. 
- M (int) – Original number of timesteps in the DDPM formulation, by default 1000. 
- epsilon_t (float) – Minimum t-value used during training, by default 1e-5. 
- model_type (str) – Class name of the underlying model, by default “SongUNet”. 
- **model_kwargs (dict) – Keyword arguments for the underlying model. 
 
 - Note - Reference: Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S. and Poole, B., 2020. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456. - forward(
- x,
- sigma,
- class_labels=None,
- force_fp32=False,
- **model_kwargs,
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 - round_sigma(sigma: float | List | Tensor)[source]#
- Convert a given sigma value(s) to a tensor representation. - Parameters:
- sigma (Union[float list, torch.Tensor]) – The sigma value(s) to convert. 
- Returns:
- The tensor representation of the provided sigma value(s). 
- Return type:
- torch.Tensor 
 
 - sigma(t: float | Tensor)[source]#
- Compute the sigma(t) value for a given t based on the VP formulation. - The function calculates the noise level schedule for the diffusion process based on the given parameters beta_d and beta_min. - Parameters:
- t (Union[float, torch.Tensor]) – The timestep or set of timesteps for which to compute sigma(t). 
- Returns:
- The computed sigma(t) value(s). 
- Return type:
- torch.Tensor 
 
 - sigma_inv(sigma: float | Tensor)[source]#
- Compute the inverse of the sigma function for a given sigma. - This function effectively calculates t from a given sigma(t) based on the parameters beta_d and beta_min. - Parameters:
- sigma (Union[float, torch.Tensor]) – The sigma(t) value or set of sigma(t) values for which to compute the inverse. 
- Returns:
- The computed t value(s) corresponding to the provided sigma(t). 
- Return type:
- torch.Tensor 
 
 
- class physicsnemo.models.diffusion.preconditioning.VPPrecondMetaData(
- name: str = 'VPPrecond',
- jit: bool = False,
- cuda_graphs: bool = False,
- amp: bool = False,
- amp_cpu: bool = False,
- amp_gpu: bool = True,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = None,
- onnx_cpu: bool = None,
- onnx_runtime: bool = False,
- trt: bool = False,
- var_dim: int = -1,
- func_torch: bool = False,
- auto_grad: bool = False,
- Bases: - ModelMetaData- VPPrecond meta data 
- class physicsnemo.models.diffusion.preconditioning.iDDPMPrecond(*args, **kwargs)[source]#
- Bases: - Module- Preconditioning corresponding to the improved DDPM (iDDPM) formulation. - Parameters:
- img_resolution (int) – Image resolution. 
- img_channels (int) – Number of color channels. 
- label_dim (int) – Number of class labels, 0 = unconditional, by default 0. 
- use_fp16 (bool) – Execute the underlying model at FP16 precision?, by default False. 
- C_1 (float) – Timestep adjustment at low noise levels., by default 0.001. 
- C_2 (float) – Timestep adjustment at high noise levels., by default 0.008. 
- M (int) – Original number of timesteps in the DDPM formulation, by default 1000. 
- model_type (str) – Class name of the underlying model, by default “DhariwalUNet”. 
- **model_kwargs (dict) – Keyword arguments for the underlying model. 
 
 - Note - Reference: Nichol, A.Q. and Dhariwal, P., 2021, July. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning (pp. 8162-8171). PMLR. - alpha_bar(j)[source]#
- Compute the alpha_bar(j) value for a given j based on the iDDPM formulation. - Parameters:
- j (Union[int, torch.Tensor]) – The timestep or set of timesteps for which to compute alpha_bar(j). 
- Returns:
- The computed alpha_bar(j) value(s). 
- Return type:
- torch.Tensor 
 
 - forward(
- x,
- sigma,
- class_labels=None,
- force_fp32=False,
- **model_kwargs,
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 - round_sigma(sigma, return_index=False)[source]#
- Round the provided sigma value(s) to the nearest value(s) in a pre-defined set u. - Parameters:
- sigma (Union[float, list, torch.Tensor]) – The sigma value(s) to round. 
- return_index (bool, optional) – Whether to return the index/indices of the rounded value(s) in u instead of the rounded value(s) themselves, by default False. 
 
- Returns:
- The rounded sigma value(s) or their index/indices in u, depending on the value of return_index. 
- Return type:
- torch.Tensor 
 
 
- class physicsnemo.models.diffusion.preconditioning.iDDPMPrecondMetaData(
- name: str = 'iDDPMPrecond',
- jit: bool = False,
- cuda_graphs: bool = False,
- amp: bool = False,
- amp_cpu: bool = False,
- amp_gpu: bool = True,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = None,
- onnx_cpu: bool = None,
- onnx_runtime: bool = False,
- trt: bool = False,
- var_dim: int = -1,
- func_torch: bool = False,
- auto_grad: bool = False,
- Bases: - ModelMetaData- iDDPMPrecond meta data 
Weather / Climate Models#
- class physicsnemo.models.dlwp.dlwp.DLWP(*args, **kwargs)[source]#
- Bases: - Module- A Convolutional model for Deep Learning Weather Prediction that works on Cubed-sphere grids. - This model expects the input to be of shape [N, C, 6, Res, Res] - Parameters:
- nr_input_channels (int) – Number of channels in the input 
- nr_output_channels (int) – Number of channels in the output 
- nr_initial_channels (int) – Number of channels in the initial convolution. This governs the overall channels in the model. 
- activation_fn (str) – Activation function for the convolutions 
- depth (int) – Depth for the U-Net 
- clamp_activation (Tuple of ints, floats or None) – The min and max value used for torch.clamp() 
 
 - Example - >>> model = physicsnemo.models.dlwp.DLWP( ... nr_input_channels=2, ... nr_output_channels=4, ... ) >>> input = torch.randn(4, 2, 6, 64, 64) # [N, C, F, Res, Res] >>> output = model(input) >>> output.size() torch.Size([4, 4, 6, 64, 64]) - Note - Reference: Weyn, Jonathan A., et al. “Sub‐seasonal forecasting with a large ensemble
- of deep‐learning weather prediction models.” Journal of Advances in Modeling Earth Systems 13.7 (2021): e2021MS002502. 
 - forward(cubed_sphere_input)[source]#
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 
- class physicsnemo.models.dlwp.dlwp.MetaData(
- name: str = 'DLWP',
- jit: bool = False,
- cuda_graphs: bool = True,
- amp: bool = False,
- amp_cpu: bool = True,
- amp_gpu: bool = True,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = None,
- onnx_cpu: bool = None,
- onnx_runtime: bool = False,
- trt: bool = False,
- var_dim: int = 1,
- func_torch: bool = False,
- auto_grad: bool = False,
- Bases: - ModelMetaData
- class physicsnemo.models.dlwp_healpix.HEALPixRecUNet.HEALPixRecUNet(*args, **kwargs)[source]#
- Bases: - Module- Deep Learning Weather Prediction (DLWP) recurrent UNet model on the HEALPix mesh. - forward(
- inputs: Sequence,
- output_only_last=False,
- Forward pass of the HEALPixUnet - Parameters:
- inputs (Sequence) – Inputs to the model, of the form [prognostics|TISR|constants] [B, F, T, C, H, W] is the format for prognostics and TISR [F, C, H, W] is the format for constants 
- output_only_last (bool, optional) – If only the last dimension of the outputs should be returned 
 
- Returns:
- th.Tensor 
- Return type:
- Predicted outputs 
 
 - property integration_steps#
- Number of integration steps 
 
- class physicsnemo.models.dlwp_healpix.HEALPixRecUNet.MetaData(
- name: str = 'DLWP_HEALPixRec',
- jit: bool = False,
- cuda_graphs: bool = True,
- amp: bool = False,
- amp_cpu: bool = True,
- amp_gpu: bool = True,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = None,
- onnx_cpu: bool = None,
- onnx_runtime: bool = False,
- trt: bool = False,
- var_dim: int = 1,
- func_torch: bool = False,
- auto_grad: bool = False,
- Bases: - ModelMetaData- Metadata for the DLWP HEALPix Model 
- class physicsnemo.models.graphcast.graph_cast_net.GraphCastNet(*args, **kwargs)[source]#
- Bases: - Module- GraphCast network architecture - Parameters:
- multimesh_level (int, optional) – Level of the latent mesh, by default 6 
- multimesh (bool, optional) – If the latent mesh is a multimesh, by default True If True, the latent mesh includes the nodes corresponding to the specified mesh_level`and incorporates the edges from all mesh levels ranging from level 0 up to and including `mesh_level. 
- input_res (Tuple[int, int]) – Input resolution of the latitude-longitude grid 
- input_dim_grid_nodes (int, optional) – Input dimensionality of the grid node features, by default 474 
- input_dim_mesh_nodes (int, optional) – Input dimensionality of the mesh node features, by default 3 
- input_dim_edges (int, optional) – Input dimensionality of the edge features, by default 4 
- output_dim_grid_nodes (int, optional) – Final output dimensionality of the edge features, by default 227 
- processor_type (str, optional) – The type of processor used in this model. Available options are ‘MessagePassing’, and ‘GraphTransformer’, which correspond to the processors in GraphCast and GenCast, respectively. By default ‘MessagePassing’. 
- khop_neighbors (int, optional) – Number of khop neighbors used in the GraphTransformer. This option is ignored if ‘MessagePassing’ processor is used. By default 0. 
- processor_layers (int, optional) – Number of processor layers, by default 16 
- hidden_layers (int, optional) – Number of hiddel layers, by default 1 
- hidden_dim (int, optional) – Number of neurons in each hidden layer, by default 512 
- aggregation (str, optional) – Message passing aggregation method (“sum”, “mean”), by default “sum” 
- activation_fn (str, optional) – Type of activation function, by default “silu” 
- norm_type (str, optional) – Normalization type [“TELayerNorm”, “LayerNorm”]. Use “TELayerNorm” for optimal performance. By default “LayerNorm”. 
- use_cugraphops_encoder (bool, default=False) – Flag to select cugraphops kernels in encoder 
- use_cugraphops_processor (bool, default=False) – Flag to select cugraphops kernels in the processor 
- use_cugraphops_decoder (bool, default=False) – Flag to select cugraphops kernels in the decoder 
- do_concat_trick (bool, default=False) – Whether to replace concat+MLP with MLP+idx+sum 
- recompute_activation (bool, optional) – Flag for recomputing activation in backward to save memory, by default False. Currently, only SiLU is supported. 
- partition_size (int, default=1) – Number of process groups across which graphs are distributed. If equal to 1, the model is run in a normal Single-GPU configuration. 
- partition_group_name (str, default=None) – Name of process group across which graphs are distributed. If partition_size is set to 1, the model is run in a normal Single-GPU configuration and the specification of a process group is not necessary. If partitition_size > 1, passing no process group name leads to a parallelism across the default process group. Otherwise, the group size of a process group is expected to match partition_size. 
- use_lat_lon_partitioning (bool, default=False) – flag to specify whether all graphs (grid-to-mesh, mesh, mesh-to-grid) are partitioned based on lat-lon-coordinates of nodes or based on IDs. 
- expect_partitioned_input (bool, default=False) – Flag indicating whether the model expects the input to be already partitioned. This can be helpful e.g. in multi-step rollouts to avoid aggregating the output just to distribute it in the next step again. 
- global_features_on_rank_0 (bool, default=False) – Flag indicating whether the model expects the input to be present in its “global” form only on group_rank 0. During the input preparation phase, the model will take care of scattering the input accordingly onto all ranks of the process group across which the graph is partitioned. Note that only either this flag or expect_partitioned_input can be set at a time. 
- produce_aggregated_output (bool, default=True) – Flag indicating whether the model produces the aggregated output on each rank of the procress group across which the graph is distributed or whether the output is kept distributed. This can be helpful e.g. in multi-step rollouts to avoid aggregating the output just to distribute it in the next step again. 
- produce_aggregated_output_on_all_ranks (bool, default=True) – Flag indicating - if produce_aggregated_output is True - whether the model produces the aggregated output on each rank of the process group across which the group is distributed or only on group_rank 0. This can be helpful for computing the loss using global targets only on a single rank which can avoid either having to distribute the computation of a loss function. 
 
 - Note - Based on these papers: - “GraphCast: Learning skillful medium-range global weather forecasting”
 
- “Forecasting Global Weather with Graph Neural Networks”
 
- “Learning Mesh-Based Simulation with Graph Networks”
 
- “MultiScale MeshGraphNets”
 
- “GenCast: Diffusion-based ensemble forecasting for medium-range weather”
 
 - custom_forward(
- grid_nfeat: Tensor,
- GraphCast forward method with support for gradient checkpointing. - Parameters:
- grid_nfeat (Tensor) – Node features of the latitude-longitude graph. 
- Returns:
- grid_nfeat_finale – Predicted node features of the latitude-longitude graph. 
- Return type:
- Tensor 
 
 - decoder_forward(
- mesh_efeat_processed: Tensor,
- mesh_nfeat_processed: Tensor,
- grid_nfeat_encoded: Tensor,
- Forward method for the last layer of the processor, the decoder, and the final MLP. - Parameters:
- mesh_efeat_processed (Tensor) – Multimesh edge features processed by the processor. 
- mesh_nfeat_processed (Tensor) – Multi-mesh node features processed by the processor. 
- grid_nfeat_encoded (Tensor) – The encoded node features for the latitude-longitude grid. 
 
- Returns:
- grid_nfeat_finale – The final node features for the latitude-longitude grid. 
- Return type:
- Tensor 
 
 - encoder_forward(
- grid_nfeat: Tensor,
- Forward method for the embedder, encoder, and the first of the processor. - Parameters:
- grid_nfeat (Tensor) – Node features for the latitude-longitude grid. 
- Returns:
- mesh_efeat_processed (Tensor) – Processed edge features for the multimesh. 
- mesh_nfeat_processed (Tensor) – Processed node features for the multimesh. 
- grid_nfeat_encoded (Tensor) – Encoded node features for the latitude-longitude grid. 
 
 
 - forward(grid_nfeat: Tensor) Tensor[source]#
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 - prepare_input(
- invar: Tensor,
- expect_partitioned_input: bool,
- global_features_on_rank_0: bool,
- Prepares the input to the model in the required shape. - Parameters:
- invar (Tensor) – Input in the shape [N, C, H, W]. 
- expect_partitioned_input (bool) – flag indicating whether input is partioned according to graph partitioning scheme 
- global_features_on_rank_0 (bool) – Flag indicating whether input is in its “global” form only on group_rank 0 which requires a scatter operation beforehand. Note that only either this flag or expect_partitioned_input can be set at a time. 
 
- Returns:
- Reshaped input. 
- Return type:
- Tensor 
 
 - prepare_output(
- outvar: Tensor,
- produce_aggregated_output: bool,
- produce_aggregated_output_on_all_ranks: bool = True,
- Prepares the output of the model in the shape [N, C, H, W]. - Parameters:
- outvar (Tensor) – Output of the final MLP of the model. 
- produce_aggregated_output (bool) – flag indicating whether output is gathered onto each rank or kept distributed 
- produce_aggregated_output_on_all_ranks (bool) – flag indicating whether output is gatherered on each rank or only gathered at group_rank 0, True by default and only valid if produce_aggregated_output is set. 
 
- Returns:
- The reshaped output of the model. 
- Return type:
- Tensor 
 
 - set_checkpoint_decoder(checkpoint_flag: bool)[source]#
- Sets checkpoint function for the last layer of the processor, the decoder, and the final MLP. - This function returns the appropriate checkpoint function based on the provided checkpoint_flag flag. If checkpoint_flag is True, the function returns the checkpoint function from PyTorch’s torch.utils.checkpoint. Otherwise, it returns an identity function that simply passes the inputs through the given layer. - Parameters:
- checkpoint_flag (bool) – Whether to use checkpointing for gradient computation. Checkpointing can reduce memory usage during backpropagation at the cost of increased computation time. 
- Returns:
- The selected checkpoint function to use for gradient computation. 
- Return type:
- Callable 
 
 - set_checkpoint_encoder(checkpoint_flag: bool)[source]#
- Sets checkpoint function for the embedder, encoder, and the first of the processor. - This function returns the appropriate checkpoint function based on the provided checkpoint_flag flag. If checkpoint_flag is True, the function returns the checkpoint function from PyTorch’s torch.utils.checkpoint. Otherwise, it returns an identity function that simply passes the inputs through the given layer. - Parameters:
- checkpoint_flag (bool) – Whether to use checkpointing for gradient computation. Checkpointing can reduce memory usage during backpropagation at the cost of increased computation time. 
- Returns:
- The selected checkpoint function to use for gradient computation. 
- Return type:
- Callable 
 
 - set_checkpoint_model(checkpoint_flag: bool)[source]#
- Sets checkpoint function for the entire model. - This function returns the appropriate checkpoint function based on the provided checkpoint_flag flag. If checkpoint_flag is True, the function returns the checkpoint function from PyTorch’s torch.utils.checkpoint. In this case, all the other gradient checkpoitings will be disabled. Otherwise, it returns an identity function that simply passes the inputs through the given layer. - Parameters:
- checkpoint_flag (bool) – Whether to use checkpointing for gradient computation. Checkpointing can reduce memory usage during backpropagation at the cost of increased computation time. 
- Returns:
- The selected checkpoint function to use for gradient computation. 
- Return type:
- Callable 
 
 - set_checkpoint_processor(checkpoint_segments: int)[source]#
- Sets checkpoint function for the processor excluding the first and last layers. - This function returns the appropriate checkpoint function based on the provided checkpoint_segments flag. If checkpoint_segments is positive, the function returns the checkpoint function from PyTorch’s torch.utils.checkpoint, with number of checkpointing segments equal to checkpoint_segments. Otherwise, it returns an identity function that simply passes the inputs through the given layer. - Parameters:
- checkpoint_segments (int) – Number of checkpointing segments for gradient computation. Checkpointing can reduce memory usage during backpropagation at the cost of increased computation time. 
- Returns:
- The selected checkpoint function to use for gradient computation. 
- Return type:
- Callable 
 
 - to(
- *args: Any,
- **kwargs: Any,
- Moves the object to the specified device, dtype, or format. This method moves the object and its underlying graph and graph features to the specified device, dtype, or format, and returns the updated object. - Parameters:
- *args (Any) – Positional arguments to be passed to the torch._C._nn._parse_to function. 
- **kwargs (Any) – Keyword arguments to be passed to the torch._C._nn._parse_to function. 
 
- Returns:
- The updated object after moving to the specified device, dtype, or format. 
- Return type:
 
 
- class physicsnemo.models.graphcast.graph_cast_net.MetaData(
- name: str = 'GraphCastNet',
- jit: bool = False,
- cuda_graphs: bool = False,
- amp: bool = False,
- amp_cpu: bool = False,
- amp_gpu: bool = True,
- torch_fx: bool = False,
- bf16: bool = True,
- onnx: bool = False,
- onnx_gpu: bool = None,
- onnx_cpu: bool = None,
- onnx_runtime: bool = False,
- trt: bool = False,
- var_dim: int = -1,
- func_torch: bool = False,
- auto_grad: bool = False,
- Bases: - ModelMetaData
- physicsnemo.models.graphcast.graph_cast_net.get_lat_lon_partition_separators(partition_size: int)[source]#
- Utility Function to get separation intervals for lat-lon grid for partition_sizes of interest. - Parameters:
- partition_size (int) – size of graph partition 
 
- class physicsnemo.models.fengwu.fengwu.Fengwu(*args, **kwargs)[source]#
- Bases: - Module- FengWu PyTorch impl of: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead - https://arxiv.org/pdf/2304.02948.pdf - Parameters:
- img_size – Image size(Lat, Lon). Default: (721,1440) 
- pressure_level – Number of pressure_level. Default: 37 
- embed_dim (int) – Patch embedding dimension. Default: 192 
- patch_size (tuple[int]) – Patch token size. Default: (4,4) 
- num_heads (tuple[int]) – Number of attention heads in different layers. 
- window_size (tuple[int]) – Window size. 
 
 - forward(x)[source]#
- Parameters:
- surface (torch.Tensor) – 2D n_lat=721, n_lon=1440, chans=4. 
- z (torch.Tensor) – 2D n_lat=721, n_lon=1440, chans=37. 
- r (torch.Tensor) – 2D n_lat=721, n_lon=1440, chans=37. 
- u (torch.Tensor) – 2D n_lat=721, n_lon=1440, chans=37. 
- v (torch.Tensor) – 2D n_lat=721, n_lon=1440, chans=37. 
- t (torch.Tensor) – 2D n_lat=721, n_lon=1440, chans=37. 
 
 
 - prepare_input(surface, z, r, u, v, t)[source]#
- Prepares the input to the model in the required shape. :param surface: 2D n_lat=721, n_lon=1440, chans=4. :type surface: torch.Tensor :param z: 2D n_lat=721, n_lon=1440, chans=37. :type z: torch.Tensor :param r: 2D n_lat=721, n_lon=1440, chans=37. :type r: torch.Tensor :param u: 2D n_lat=721, n_lon=1440, chans=37. :type u: torch.Tensor :param v: 2D n_lat=721, n_lon=1440, chans=37. :type v: torch.Tensor :param t: 2D n_lat=721, n_lon=1440, chans=37. :type t: torch.Tensor 
 
- class physicsnemo.models.fengwu.fengwu.MetaData(
- name: str = 'Fengwu',
- jit: bool = False,
- cuda_graphs: bool = True,
- amp: bool = True,
- amp_cpu: bool = None,
- amp_gpu: bool = None,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = True,
- onnx_cpu: bool = False,
- onnx_runtime: bool = True,
- trt: bool = False,
- var_dim: int = 1,
- func_torch: bool = False,
- auto_grad: bool = False,
- Bases: - ModelMetaData
- class physicsnemo.models.pangu.pangu.MetaData(
- name: str = 'Pangu',
- jit: bool = False,
- cuda_graphs: bool = True,
- amp: bool = True,
- amp_cpu: bool = None,
- amp_gpu: bool = None,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = True,
- onnx_cpu: bool = False,
- onnx_runtime: bool = True,
- trt: bool = False,
- var_dim: int = 1,
- func_torch: bool = False,
- auto_grad: bool = False,
- Bases: - ModelMetaData
- class physicsnemo.models.pangu.pangu.Pangu(*args, **kwargs)[source]#
- Bases: - Module- Pangu A PyTorch impl of: Pangu-Weather: A 3D High-Resolution Model for Fast and Accurate Global Weather Forecast - https://arxiv.org/abs/2211.02556 - Parameters:
- img_size (tuple[int]) – Image size [Lat, Lon]. 
- patch_size (tuple[int]) – Patch token size [Lat, Lon]. 
- embed_dim (int) – Patch embedding dimension. Default: 192 
- num_heads (tuple[int]) – Number of attention heads in different layers. 
- window_size (tuple[int]) – Window size. 
 
 - prepare_input(surface, surface_mask, upper_air)[source]#
- Prepares the input to the model in the required shape. :param surface: 2D n_lat=721, n_lon=1440, chans=4. :type surface: torch.Tensor :param surface_mask: 2D n_lat=721, n_lon=1440, chans=3. :type surface_mask: torch.Tensor :param upper_air: 3D n_pl=13, n_lat=721, n_lon=1440, chans=5. :type upper_air: torch.Tensor 
 
- class physicsnemo.models.swinvrnn.swinvrnn.MetaData(
- name: str = 'SwinRNN',
- jit: bool = False,
- cuda_graphs: bool = True,
- amp: bool = True,
- amp_cpu: bool = None,
- amp_gpu: bool = None,
- torch_fx: bool = False,
- bf16: bool = False,
- onnx: bool = False,
- onnx_gpu: bool = True,
- onnx_cpu: bool = False,
- onnx_runtime: bool = True,
- trt: bool = False,
- var_dim: int = 1,
- func_torch: bool = False,
- auto_grad: bool = False,
- Bases: - ModelMetaData
- class physicsnemo.models.swinvrnn.swinvrnn.SwinRNN(*args, **kwargs)[source]#
- Bases: - Module- Implementation of SwinRNN https://arxiv.org/abs/2205.13158 :param img_size: Image size [T, Lat, Lon]. :type img_size: Sequence[int], optional :param patch_size: Patch token size [T, Lat, Lon]. :type patch_size: Sequence[int], optional :param in_chans: number of input channels. :type in_chans: int, optional :param out_chans: number of output channels. :type out_chans: int, optional :param embed_dim: number of embed channels. :type embed_dim: int, optional :param num_groups: number of groups to separate the channels into. :type num_groups: Sequence[int] | int, optional :param num_heads: Number of attention heads. :type num_heads: int, optional :param window_size: Local window size. :type window_size: int | tuple[int], optional - forward(x: Tensor)[source]#
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.