Model Architectures#

PhysicsNeMo, built on PyTorch, offers flexibility for developers to create and train any model architecture. The following classes of fine-tuned and optimized model architecture blocks, such as transformer layers and GNN layers, serve as off the shelf building blocks for developers to configure and develop their proprietary custom models. Additionally, PhysicsNeMo provides a library of application-specific training recipes that utilize these model architectures for various use cases. They serve as reference implementations for end-to-end training pipelines using a specific architecture for a particular use case.

Graph Neural Networks#

NVIDIA PhysicsNeMo has extensive and versatile support for Graph Neural Networks (GNNs) across a wide range of physics simulations, from fluid dynamics and structural mechanics to large-scale weather forecasting. The framework leverages GNNs primarily for their ability to operate on unstructured or irregular meshes, a significant advantage over grid-based models like CNNs. It provides robust support for state-of-the-art GNN architectures like MeshGraphNet and GraphCast. These models are applied to a diverse set of problems, showcasing their strength in handling irregular meshes, particle-based systems, transient dynamics, and large-scale distributed computing.

Graph partitioning and halo regions for handling massive graphs in PhysicsNeMo

Fig. 1 Graph partitioning and halo regions for handling massive graphs in PhysicsNeMo#

Table 2 GNN features and use cases#

GNN Training recipes

Architecture

Key Features

Time-dependent simulation on an irregular 2D mesh

MeshGraphNet

Data structures: Loads time-series data from TFRecord files.

Scaling: Training scales on multiple GPUs/nodes (DDP).
Solves the shallow water equations on a sphere

GraphCast

Data structures: On the fly data generation, on an icosahedral mesh.

Scaling: Distributed message passing.
Predicts surface forces on automotive geometries.

X-MeshGraphNet

Multi-scale extension of MeshGraphNet by partitioning large graphs and incorporating halo regions.

Data structures: Constructs custom graphs directly from tessellated geometry using point clouds and k-nearest neighbors. Additionally, the model builds multi-scale graphs.

Scaling: scaling to meshes of size 100 million cells or more.
Simulates a deforming plate with varying geometries and boundary conditions.

Hybrid MeshGraphNet

Data structures: Utilizes a heterogeneous graph with multiple edge types to handle complex boundary conditions.

Scaling: Training runs on multiple GPUs/nodes (DDP).
Models particle-based systems like fluids or granular materials.

MeshGraphNet

Data structures: Handles point-cloud-like data where the connectivity between nodes (particles) can change over time.

Scaling: Training runs on multiple GPUs/nodes (DDP).
Global weather prediction for up to 10 days.

GraphCast

Transient, auto-regressive approach

Data structures: Multi-scale icosahedral mesh.

Scaling: Multi-GPU (DDP)

Perf: Gradient checkpointing, concatenation trick, fused SiLU, multi-step rollout.

For a detailed tutorial on Mesh Graph Networks in PhysicsNeMo, refer to MeshGraphNet: A Practical User Tutorial.

Transformers#

Nvidia PhysicsNeMo supports transformer-based neural operators, exemplified by a model named transolver, for simulating complex physics problems like external aerodynamics in CFD. This represents a cutting-edge approach that moves beyond traditional CNNs or GNNs and PhysicsNeMo’s support for using sophisticated transformer layers to build powerful surrogate models. Transformers are superior at modeling the long-range interactions that are crucial in many physics problems, particularly in fluid dynamics. By treating the input as an unordered set of points, these models can be more flexible with complex geometries than grid-based models ike CNNs. PhysicsNeMo offers optimized training recipes for Transformer architectures at scale.

Optimizing for Tensor cores using Transformer Engine in PhysicsNeMo

Fig. 2 Optimizing for Tensor cores using Transformer Engine in PhysicsNeMo#

In addition to the following recipes that use the transformer architecture, there are other recipes that use certain transformer layers like self-attention in weather forecasting (Graphcast), weather diagnostic models (AFNO), and many of the diffusion models (See below).

Table 3 Transformers features and use cases#

Transformer Training recipes

Architecture

Key Features

Predict surface forces on automotive geometries

Transolver

Data structures: Irregular Mesh - Applies PhysicsAttention to surface mesh data and on-the-fly signed distance field values

Scaling: Multi-GPU (DDP), Domain Parallel

Perf: With Transformer Engine, gives 25% speedup for large models.
Supports training and inference in fp8 on latest NVIDIA GPUs.

Transolver for Darcy Flow

Transolver

Data structures: 2D image data

Scaling: Multi-GPU (DDP)
Solving 2D Naiver-Stokes equation

DPOT

Data structures: Uses spectral fourier attention to build a PDE Foundation Model.

Scaling: Single GPU, multi-gpu support coming.

Neural Operators#

Neural operators are a class of deep learning models designed to learn mappings between infinite-dimensional function spaces. In engineering and science, this means they learn to solve entire families of Partial Differential Equations (PDEs), making them powerful, mesh-independent tools for accelerating physical simulations. Most prominent neural operator architectures include Fourier Neural Operator (FNO), Deep Operator Network (DeepONet) and variants like Physics informed FNO (PINO). More recent innovations have led to DoMINO (Decomposable Multi-scale Iterative Neural Operator) that have shown remarkable generalizability for various domains. SciML developers can use all of these architectures out-of-the box, ready to scale for enterprise scale AI model development.

Table 4 Neural operator features and use cases#

Neural Operator Training recipes

Architecture

Key Features

Physics Informed FNO for Magnetohydrodynamics

FNO

Physics-informed extension of FNO using Spectral and Finite Difference gradients

Data structures: 2D structured grid

Scaling: Multi-GPU (DDP)
Predict surface and volume fields on automotive geometries

DoMINO

First of its kind, multiscale neural operator

Data structures: point cloud sampled from the CAD model, builds local stencils that alleviate the need for a simulation mesh

Scaling: Multi-GPU (DDP), Domain Parallel

Perf: Tight integration with custom warp kernels enables point-cloud to point-cloud spatial projection, with over 10x better end-to-end performance than brute force methods in PyTorch.

Medium range weather forecasting

SFNO

Data structures: N-D Tensor on structured grid (Lat Long)

Scaling: Spatial model parallelism splits both the model and the data onto multiple GPUs

Perf: various optimization (Automatic mixed precision, activation checkpointing etc.) to fit into GPU memory

Darcy Flow

DeepONet (FNO for branch net and fully connected for trunk net)

Physics-informed using Automatic Differentiation

Data structures: 2D structured tensors. The Fully-Connected component also allows random unstructured sampling

Scaling: Single GPU

Diffusion Models#

NVIDIA PhysicsNeMo provides broad support for state-of-the-art diffusion techniques across multiple scientific areas, including computational fluid dynamics (CFD), weather and climate, geophysics, and generative design. It offers ready-to-use diffusion model backbones that can be used directly or combined with the framework’s accompanying utilities—such as samplers, loss functions, and pre-conditioners—to build complete diffusion workflows without starting from scratch. This combination of core architectures and supporting tools makes diffusion a practical and flexible approach for tackling ill-posed inverse problems, enabling applications such as super-resolution, downscaling, and direct geometry synthesis.

EDM sampler available in PhysicsNeMo

Fig. 3 EDM sampler available in PhysicsNeMo#

Table 5 Diffusion model features and use cases#

Diffusion model Training recipes

Architecture

Key Features

Emulate a convection-allowing model at a few kilometers resolution

Diffusion UNet

Sampler: EDM

Data Structures: N-D Tensor on structured grid (Lat Long)

Scale: Multi-GPU (DDP)

Perf: U-Net architecture optimized for compilation with fused operations, uses Apex Group Norm, bf16 AMP
Generate Km scale weather predictions from coarse simulation

Diffusion UNet

Sampler: EDM

Data Structures: N-D Tensor on structured grid (Lat Long)

Scale: Multi-GPU (DDP), Can be scaled to very large domains (> 2000x2000 pixels) with multi-diffusion

Perf: architecture optimized for compilation with fused operations, uses Apex Group Norm, bf16 AMP. Training supports gradient accumulation and patch-wise gradient accumulation (for muli-diffusion settings)
Reconstructing subsurface velocity model from recorded seismic waveform

Diffusion UNet + Global Filter Network

Sampler: EDM with physics-informed Diffusion Posterior Sampling (DPS)

Data Structures: 2D structured data (subsurface model) + unstructured set of N channels of 1D data (seismic observations)

Scale: Multi-GPU (DDP)

Perf: architectures partially optimized for compilation with fused ops, uses Apex GN and fused Adam optimizer
Showcases diffusion models to generate new, unique topologies that satisfy specific engineering constraints

Diffusion UNet + custom UNet encoder

Sampler: Custom DDPM with Diffusion Posterior Sampling (DPS)

Data Structures: 2D data on structured grid.

Scaling: Single GPU

Voxel based models#

Voxel-based models are a class of deep learning models that predict the results of CAE simulations by treating the 3D physical domain as a regular grid of “3D pixels” or voxels. This approach transforms mesh-based physics problem into a 3D image processing task, which is ideal for leveraging the power of Convolutional Neural Networks (CNNs). This approach allows engineers to apply powerful and highly optimized computer vision architectures, like the U-Net, directly to complex physics problems. U-Net is the most prominent and effective architecture and it uses an encoder-decoder structure with skip connections to efficiently capture geometric features at multiple scales and reconstruct a detailed output field.

PhysicsNeMo provides a set of optimized U-Net backbones that can be used for developing custom proprietary models. Refer to the diffusion models section where U-Net is also used heavily and below are a list of other key U-Net based training recipes and their key features:

Table 6 Voxel based model features and use cases#

Voxel based model Training recipes

Architecture

Key Features

Thermal and airflow surrogate model for Datacenter design

3D UNet

Physics-informed using Finite differences

Data structures: Structured 3D grids representing volume fields

Scale: Multi-GPU (DDP)
Predict surface forces on automotive geometries

FigConvUNet

Data structures: point cloud sampled from the 3D CAD model or the simulation mesh.

Scaling: Multi-GPU (DDP)

Perf: Supports AMP (fp16/bf16)
Predict evolution of 3D Gray Scott system given the initial condition

Diffusion UNet + Global Filter Network

RNN + UNet

Data structures: 4D data (time dimension and 3 dimensions representing volume fields on structured grids)

Scaling: Single GPU