Introduction#

The PhysicsNeMo diffusion framework provides a modular, composable toolkit for building, training, and sampling from diffusion models. It is designed for scientists and engineers who want to apply diffusion-based generative modeling to real-world problems, from weather forecasting and climate downscaling to geophysical inversion and materials design, while remaining flexible enough for research-level experimentation.

Diffusion models learn to generate data by reversing a gradual noising process. During training, the model sees data that has been corrupted by varying amounts of noise, and it learns to predict the clean data (or, equivalently, the noise or the score) from the corrupted version. At inference time, the model starts from pure noise and iteratively removes it, step by step, to produce a new sample.

This basic recipe turns out to be remarkably powerful. Diffusion models achieve state-of-the-art results in image generation, and, therefore, are increasingly used for scientific applications where the goal is to sample from complex, high-dimensional distributions conditioned on physical observations or constraints.

The framework is organized around a small number of clearly defined abstractions. Each abstraction maps to a specific role in the diffusion pipeline and the abstractions compose naturally. The abstractions include:

Framework Components at a Glance#

The table below summarizes the main framework components and their role in training and inference.

Component

Training

Inference

Noise schedulers

Sample times, add noise, compute loss weights

Generate time-steps, initialize latent state, build denoisers

Model backbones

Combined with preconditioner and losses to learn to denoise

Serves as the predictor

Preconditioners

Optionally rescales inputs/outputs for stable training

Acts as the predictor passed to the denoiser factory

Losses

Denoising score matching objective

Samplers and solvers

Numerically integrate the reverse process

Guidance

Steer sampling toward observations or constraints

Multi-diffusion

Patch-based model wrapper and losses

Patch-based denoiser and guidance

The noise scheduler is a particularly central component that is used in both stages. Its role loosely parallels that of the Scheduler in HuggingFace Diffusers, which similarly encapsulates the noise schedule and provides methods for both training and inference. A key difference is that the PhysicsNeMo noise scheduler also owns the get_denoiser() factory, which converts a predictor into a denoiser suitable for the chosen solver.

Design Philosophy: Layered Customization#

Diffusion models are used across a wide spectrum of applications in scientific machine learning and physics-AI, by users with very different needs, for example:

  • diffusion experts who require full control over the forward process, the solver, or the guidance mechanism

  • domain experts in science and engineering who use diffusion as a tool and need reliable, easy-to-use components

The framework is designed to serve both audiences.

A central design principle of the framework is to offer multiple levels of customization, so that users can choose the trade-off between convenience and flexibility that best fits their needs. Not every abstraction exposes all levels, the exact set depends on the component, but the underlying philosophy is consistent throughout:

Protocols (maximum flexibility). Where appropriate, an abstraction defines a minimal protocol (a set of method signatures with no implementation). Any object that satisfies the protocol can be used within the framework. This is the right level when you want to drop in a fully custom component, for instance, a noise scheduler for a non-Gaussian forward process, or a custom solver.

# Any class with the right methods satisfies the NoiseScheduler protocol
from physicsnemo.diffusion.noise_schedulers import NoiseScheduler

class MyCustomScheduler:
    def sample_time(self, N, *, device=None, dtype=None): ...
    def add_noise(self, x0, time): ...
    def timesteps(self, num_steps, *, device=None, dtype=None): ...
    def init_latents(self, spatial_shape, tN, *, device=None, dtype=None): ...
    def get_denoiser(self, **kwargs): ...
    def loss_weight(self, t): ...

assert isinstance(MyCustomScheduler(), NoiseScheduler)  # True

Abstract base classes (structured extensibility). Some components provide abstract base classes that implement shared logic. Subclasses only override a few methods to define their variant. For example, LinearGaussianNoiseScheduler handles noise injection, score conversion, and denoiser construction for any linear-Gaussian schedule. You just define \(\alpha(t)\), \(\sigma(t)\), and the discretization.

from physicsnemo.diffusion.noise_schedulers import LinearGaussianNoiseScheduler

class MyScheduler(LinearGaussianNoiseScheduler):
    def sigma(self, t): return t
    def sigma_inv(self, sigma): return sigma
    def sigma_dot(self, t): return torch.ones_like(t)
    def alpha(self, t): return torch.ones_like(t)
    def alpha_dot(self, t): return torch.zeros_like(t)
    def timesteps(self, num_steps, *, device=None, dtype=None): ...
    def sample_time(self, N, *, device=None, dtype=None): ...
    def loss_weight(self, t): ...

Ready-to-use components (zero boilerplate). The framework ships fully configured components that work out of the box. Most can still be subclassed for light customization.

from physicsnemo.diffusion.noise_schedulers import EDMNoiseScheduler

scheduler = EDMNoiseScheduler(sigma_min=0.002, sigma_max=80.0, rho=7)

Each component’s documentation describes which of these levels it supports. For example, noise schedulers offer all three levels, while solvers provide a protocol and concrete implementations, and preconditioners provide an abstract base class and concrete implementations.

Core Concepts: DiffusionModel, Predictor, and Denoiser#

The framework defines three protocol classes that capture the key signatures involved in a diffusion pipeline. Understanding the distinction between them is essential.

DiffusionModel (DiffusionModel)

The interface for models during training. A DiffusionModel takes the noisy state \(\mathbf{x}_t\), the diffusion time \(t\), and optional conditioning information, and returns a prediction. The prediction target can be anything: clean data \(\hat{\mathbf{x}}_0\), score \(\nabla_{\mathbf{x}} \log p(\mathbf{x})\), noise \(\boldsymbol{\epsilon}\), or velocity \(\mathbf{v}\). Which target the model predicts depends on the training objective and the choice of preconditioner.

Predictor (Predictor)

The interface for trained models during inference. A Predictor is a callable (x, t) -> prediction that does not require conditioning as a separate argument. A predictor is typically obtained from a DiffusionModel by binding the conditioning using functools.partial, but it can be anything: an x0-predictor, a score-predictor, a guidance-augmented predictor, or any combination. The type of prediction a predictor returns depends on how the underlying model was trained. Importantly, not all predictors originate from a trained model. For example, DPS-style guidance predictors are computed on the fly during sampling and always produce a score. Although all predictors share the same (x, t) -> prediction signature, it is your responsibility to know what kind of prediction is returned and to use it accordingly (for example, passing an x0-predictor versus a score-predictor to the noise scheduler’s get_denoiser factory).

Denoiser (Denoiser)

The update function consumed by solvers during sampling. A denoiser is obtained from a predictor using the noise scheduler’s get_denoiser() factory. For more details on how the denoiser fits in the sampling loop, refer to the samplers documentation.

These three types form a pipeline:

Training:   data  ->  NoiseScheduler.add_noise  ->  DiffusionModel  ->  Loss

Inference:  DiffusionModel  ->  partial(...) / closure  ->  Predictor
            Predictor  ->  (optional: + Guidance)       ->  Predictor
            Predictor  ->  (optional: x0 <-> score)     ->  Predictor
            Predictor  ->  NoiseScheduler.get_denoiser  ->  Denoiser
            Denoiser  ->  Solver.step  (sample loop)    ->  samples

In the inference pipeline, partial(...) binds the conditioning into the predictor (a closure or wrapper function achieves the same result). Guidance is optionally composed at the predictor level—for example, DPS guidance combines an x0-predictor with observation constraints to produce a guided score-predictor. When the predictor type does not match what the denoiser factory expects (for example, the model was trained as an x0-predictor but guidance produces a score), the noise scheduler can convert between the two via x0_to_score / score_to_x0.

Prediction Types#

Diffusion models can be trained to predict different targets. The PhysicsNeMo framework currently supports three prediction types, enumerated by the PredictorType alias:

  • x0-predictor ("x0"): The model estimates the clean data \(\hat{\mathbf{x}}_0\) from the noisy state \(\mathbf{x}_t\). This is the most common choice when using a preconditioner.

  • Score-predictor ("score"): The model estimates the score function \(\nabla_{\mathbf{x}} \log p(\mathbf{x}_t)\).

  • Epsilon-predictor ("epsilon"): The model estimates the noise \(\hat{\boldsymbol{\epsilon}}\) such that \(\mathbf{x}_t = \alpha(t)\mathbf{x}_0 + \sigma(t)\boldsymbol{\epsilon}\).

For linear-Gaussian noise schedules these three representations are analytically interchangeable, and the framework handles the conversion internally when building a denoiser for sampling. For other schedule families, the conversion depends on the specific formulation and may need to be handled by the noise scheduler implementation.

Other prediction types (e.g. velocity-predictor) can be supported by implementing a custom NoiseScheduler that handles the appropriate conversions in its get_denoiser() method, and by extending PredictorType accordingly.

API Reference#

DiffusionModel#

class physicsnemo.diffusion.DiffusionModel(*args, **kwargs)[source]#

Protocol defining the common interface for diffusion models.

A diffusion model is any neural network or function that transforms a noisy state x at diffusion time (or noise level) t into a prediction. This protocol defines the standard interface that all diffusion models must satisfy.

Any model or function that implements this interface can be used with preconditioners, losses, samplers, and other diffusion utilities.

The interface is prediction-agnostic: whether your model predicts clean data (\(\mathbf{x}_0\)), noise (\(\epsilon\)), score (\(\nabla \log p\)), or velocity (\(\mathbf{v}\)), the signature remains the same.

The interface supports both conditional and unconditional diffusion models. The condition argument supports different conditioning scenarios:

  • torch.Tensor: Use when there is a single conditioning tensor (e.g., a class embedding or a single image).

  • TensorDict: Use when multiple conditioning tensors are needed, possibly with different shapes. The string keys can be used to provide semantic information about each conditioning tensor.

  • None: Use for unconditional generation or specific scenarios like classifier-free guidance where the model should ignore conditioning.

Examples

>>> import torch
>>> import torch.nn.functional as F
>>> from physicsnemo.diffusion import DiffusionModel
>>>
>>> class Model:
...     def __call__(self, x, t, condition=None, **kwargs):
...         return F.relu(x)
...
>>> isinstance(Model(), DiffusionModel)
True

Predictor#

class physicsnemo.diffusion.Predictor(*args, **kwargs)[source]#

Protocol defining a predictor interface for diffusion models.

A predictor is any callable that takes a noisy state x and diffusion time t, and returns a prediction about the clean data or the noise. Common types of predictors include x0-predictor (predicts the clean data \(\mathbf{x}_0\)), score-predictor, noise-predictor (predicts the noise \(\boldsymbol{\epsilon}\)), velocity-predictor etc.

This protocol is generic and does not assume any specific type of prediction. A predictor can be a trained neural network, a guidance function (e.g., classifier-free guidance, DPS-style guidance), or any combination thereof. The exact meaning of the output depends on the predictor type and how it is used. Any callable that implements this interface can be used as a predictor in sampling utilities.

This protocol is typically used during inference. For training, which often requires additional inputs like conditioning, use the more general DiffusionModel protocol instead. A Predictor can be obtained from a DiffusionModel by partially applying the condition and any other keyword arguments using functools.partial.

Relationship to Denoiser:

A Denoiser is the update function used during sampling (e.g., the right-hand side of an ODE/SDE). It is obtained from a Predictor via the get_denoiser() factory. A typical case is ODE/SDE-based sampling, where one solves:

\[\frac{d\mathbf{x}}{dt} = D(\mathbf{x}, t;\, P(\mathbf{x}, t))\]

where \(P\) is the predictor and \(D\) is the denoiser that wraps it. This equation captures the essence of how these two concepts are related in the framework.

See also

Denoiser

The interface for sampling update functions.

get_denoiser()

Factory to convert a predictor into a denoiser.

Examples

Example 1: Convert a trained conditional model into a predictor using functools.partial:

>>> import torch
>>> from functools import partial
>>> from tensordict import TensorDict
>>> from physicsnemo.diffusion import Predictor
>>>
>>> class MyModel:
...     def __call__(self, x, t, condition=None):
...         # x0-predictor: returns estimate of clean data
...         # (here assumes conditional normal distribution N(x|y))
...         t_bc = t.view(-1, *([1] * (x.ndim - 1)))
...         return x / (1 + t_bc**2) + condition["y"]
...
>>> model = MyModel()
>>> cond = TensorDict({"y": torch.randn(2, 4)}, batch_size=[2])
>>> x0_predictor = partial(model, condition=cond)
>>> isinstance(x0_predictor, Predictor)
True

Example 2: Convert the x0-predictor above into a score-predictor (using a simple EDM-like schedule where \(\sigma(t) = t\) and \(\alpha(t) = 1\)):

>>> def x0_to_score(x0, x_t, t):
...     sigma_sq = t.view(-1, 1) ** 2
...     return (x0 - x_t) / sigma_sq
>>>
>>> def score_predictor(x, t):
...     x0_pred = x0_predictor(x, t)
...     return x0_to_score(x0_pred, x, t)
>>>
>>> isinstance(score_predictor, Predictor)
True

Denoiser#

class physicsnemo.diffusion.Denoiser(*args, **kwargs)[source]#

Protocol defining a denoiser interface for diffusion model sampling.

A denoiser is the update function used during sampling. It takes a noisy state x and diffusion time t, and returns the update term consumed by a Solver. For continuous-time methods this is typically the right-hand side of the ODE/SDE, but the interface is generic and can support other sampling methods as well.

This is the interface used by Solver classes and the sample() function. Any callable that implements this interface can be used as a denoiser.

Important distinction from Predictor:

  • A Predictor is any callable that outputs a raw prediction (e.g., clean data \(\mathbf{x}_0\), score, guidance signal, etc.).

  • A Denoiser is the update function derived from one or more predictors, used directly by the solver during sampling.

Typical workflow:

  1. Start with one or more Predictor instances (e.g. trained model)

  2. Optionally combine predictors (e.g., conditional + guidance scores)

  3. Convert to a Denoiser using get_denoiser()

  4. Pass the denoiser to sample() together with a Solver

See also

Predictor

The interface for raw predictions.

get_denoiser()

Factory to convert a predictor into a denoiser.

sample()

The sampling function that uses this denoiser interface.

Examples

Manually creating a denoiser from an x0-predictor using a simple EDM-like schedule (\(\sigma(t)=t\), \(\alpha(t)=1\)):

>>> import torch
>>> from physicsnemo.diffusion import Denoiser
>>>
>>> # Start from a predictor (x0-predictor)
>>> def x0_predictor(x, t):
...     t_bc = t.view(-1, *([1] * (x.ndim - 1)))
...     return x / (1 + t_bc**2)
>>>
>>> # Build a denoiser (ODE RHS) from scratch:
>>> # score = (x0 - x) / sigma^2,  ODE RHS = -0.5 * g^2 * score
>>> # For EDM: sigma = t, g^2 = 2*t, so RHS = (x0 - x) / t
>>> def my_denoiser(x, t):
...     x0 = x0_predictor(x, t)
...     t_bc = t.view(-1, *([1] * (x.ndim - 1)))
...     return (x0 - x) / t_bc
...
>>> isinstance(my_denoiser, Denoiser)
True

PredictorType#