Introduction#
The PhysicsNeMo diffusion framework provides a modular, composable toolkit for building, training, and sampling from diffusion models. It is designed for scientists and engineers who want to apply diffusion-based generative modeling to real-world problems, from weather forecasting and climate downscaling to geophysical inversion and materials design, while remaining flexible enough for research-level experimentation.
Diffusion models learn to generate data by reversing a gradual noising process. During training, the model sees data that has been corrupted by varying amounts of noise, and it learns to predict the clean data (or, equivalently, the noise or the score) from the corrupted version. At inference time, the model starts from pure noise and iteratively removes it, step by step, to produce a new sample.
This basic recipe turns out to be remarkably powerful. Diffusion models achieve state-of-the-art results in image generation, and, therefore, are increasingly used for scientific applications where the goal is to sample from complex, high-dimensional distributions conditioned on physical observations or constraints.
The framework is organized around a small number of clearly defined abstractions. Each abstraction maps to a specific role in the diffusion pipeline and the abstractions compose naturally. The abstractions include:
a noise scheduler to control the forward and reverse processes
a model backbone to implement the neural network
a preconditioner to rescale model inputs and outputs for stable training
a loss function to define the training objective
a sampler to generate new data at inference time
a guidance to steer the sampling toward desired properties
Framework Components at a Glance#
The table below summarizes the main framework components and their role in training and inference.
Component |
Training |
Inference |
|---|---|---|
Sample times, add noise, compute loss weights |
Generate time-steps, initialize latent state, build denoisers |
|
Combined with preconditioner and losses to learn to denoise |
Serves as the predictor |
|
Optionally rescales inputs/outputs for stable training |
Acts as the predictor passed to the denoiser factory |
|
Denoising score matching objective |
||
Numerically integrate the reverse process |
||
Steer sampling toward observations or constraints |
||
Patch-based model wrapper and losses |
Patch-based denoiser and guidance |
The noise scheduler is a particularly central
component that is used in both stages. Its role loosely parallels that of
the Scheduler
in HuggingFace Diffusers, which similarly encapsulates the noise schedule and
provides methods for both training and inference. A key difference is that the
PhysicsNeMo noise scheduler also owns the
get_denoiser()
factory, which converts a predictor into a denoiser suitable for the chosen
solver.
Design Philosophy: Layered Customization#
Diffusion models are used across a wide spectrum of applications in scientific machine learning and physics-AI, by users with very different needs, for example:
diffusion experts who require full control over the forward process, the solver, or the guidance mechanism
domain experts in science and engineering who use diffusion as a tool and need reliable, easy-to-use components
The framework is designed to serve both audiences.
A central design principle of the framework is to offer multiple levels of customization, so that users can choose the trade-off between convenience and flexibility that best fits their needs. Not every abstraction exposes all levels, the exact set depends on the component, but the underlying philosophy is consistent throughout:
Protocols (maximum flexibility). Where appropriate, an abstraction defines a minimal protocol (a set of method signatures with no implementation). Any object that satisfies the protocol can be used within the framework. This is the right level when you want to drop in a fully custom component, for instance, a noise scheduler for a non-Gaussian forward process, or a custom solver.
# Any class with the right methods satisfies the NoiseScheduler protocol
from physicsnemo.diffusion.noise_schedulers import NoiseScheduler
class MyCustomScheduler:
def sample_time(self, N, *, device=None, dtype=None): ...
def add_noise(self, x0, time): ...
def timesteps(self, num_steps, *, device=None, dtype=None): ...
def init_latents(self, spatial_shape, tN, *, device=None, dtype=None): ...
def get_denoiser(self, **kwargs): ...
def loss_weight(self, t): ...
assert isinstance(MyCustomScheduler(), NoiseScheduler) # True
Abstract base classes (structured extensibility).
Some components provide abstract base classes that implement shared logic.
Subclasses only override a few methods to define their variant. For example,
LinearGaussianNoiseScheduler
handles noise injection, score conversion, and denoiser construction for any
linear-Gaussian schedule. You just define \(\alpha(t)\), \(\sigma(t)\),
and the discretization.
from physicsnemo.diffusion.noise_schedulers import LinearGaussianNoiseScheduler
class MyScheduler(LinearGaussianNoiseScheduler):
def sigma(self, t): return t
def sigma_inv(self, sigma): return sigma
def sigma_dot(self, t): return torch.ones_like(t)
def alpha(self, t): return torch.ones_like(t)
def alpha_dot(self, t): return torch.zeros_like(t)
def timesteps(self, num_steps, *, device=None, dtype=None): ...
def sample_time(self, N, *, device=None, dtype=None): ...
def loss_weight(self, t): ...
Ready-to-use components (zero boilerplate). The framework ships fully configured components that work out of the box. Most can still be subclassed for light customization.
from physicsnemo.diffusion.noise_schedulers import EDMNoiseScheduler
scheduler = EDMNoiseScheduler(sigma_min=0.002, sigma_max=80.0, rho=7)
Each component’s documentation describes which of these levels it supports. For example, noise schedulers offer all three levels, while solvers provide a protocol and concrete implementations, and preconditioners provide an abstract base class and concrete implementations.
Core Concepts: DiffusionModel, Predictor, and Denoiser#
The framework defines three protocol classes that capture the key signatures involved in a diffusion pipeline. Understanding the distinction between them is essential.
- DiffusionModel (
DiffusionModel) The interface for models during training. A
DiffusionModeltakes the noisy state \(\mathbf{x}_t\), the diffusion time \(t\), and optional conditioning information, and returns a prediction. The prediction target can be anything: clean data \(\hat{\mathbf{x}}_0\), score \(\nabla_{\mathbf{x}} \log p(\mathbf{x})\), noise \(\boldsymbol{\epsilon}\), or velocity \(\mathbf{v}\). Which target the model predicts depends on the training objective and the choice of preconditioner.- Predictor (
Predictor) The interface for trained models during inference. A
Predictoris a callable(x, t) -> predictionthat does not require conditioning as a separate argument. A predictor is typically obtained from aDiffusionModelby binding the conditioning usingfunctools.partial, but it can be anything: an x0-predictor, a score-predictor, a guidance-augmented predictor, or any combination. The type of prediction a predictor returns depends on how the underlying model was trained. Importantly, not all predictors originate from a trained model. For example, DPS-style guidance predictors are computed on the fly during sampling and always produce a score. Although all predictors share the same(x, t) -> predictionsignature, it is your responsibility to know what kind of prediction is returned and to use it accordingly (for example, passing an x0-predictor versus a score-predictor to the noise scheduler’sget_denoiserfactory).- Denoiser (
Denoiser) The update function consumed by solvers during sampling. A denoiser is obtained from a predictor using the noise scheduler’s
get_denoiser()factory. For more details on how the denoiser fits in the sampling loop, refer to the samplers documentation.
These three types form a pipeline:
Training: data -> NoiseScheduler.add_noise -> DiffusionModel -> Loss
Inference: DiffusionModel -> partial(...) / closure -> Predictor
Predictor -> (optional: + Guidance) -> Predictor
Predictor -> (optional: x0 <-> score) -> Predictor
Predictor -> NoiseScheduler.get_denoiser -> Denoiser
Denoiser -> Solver.step (sample loop) -> samples
In the inference pipeline, partial(...) binds the conditioning into the
predictor (a closure or wrapper function achieves the same result).
Guidance is optionally composed at the predictor
level—for example, DPS guidance combines an x0-predictor with observation
constraints to produce a guided score-predictor. When the predictor type does
not match what the denoiser factory expects (for example, the model was trained as an
x0-predictor but guidance produces a score), the noise scheduler can convert
between the two via x0_to_score / score_to_x0.
Prediction Types#
Diffusion models can be trained to predict different targets. The PhysicsNeMo
framework currently supports three prediction types, enumerated by the
PredictorType alias:
x0-predictor (
"x0"): The model estimates the clean data \(\hat{\mathbf{x}}_0\) from the noisy state \(\mathbf{x}_t\). This is the most common choice when using a preconditioner.Score-predictor (
"score"): The model estimates the score function \(\nabla_{\mathbf{x}} \log p(\mathbf{x}_t)\).Epsilon-predictor (
"epsilon"): The model estimates the noise \(\hat{\boldsymbol{\epsilon}}\) such that \(\mathbf{x}_t = \alpha(t)\mathbf{x}_0 + \sigma(t)\boldsymbol{\epsilon}\).
For linear-Gaussian noise schedules these three representations are analytically interchangeable, and the framework handles the conversion internally when building a denoiser for sampling. For other schedule families, the conversion depends on the specific formulation and may need to be handled by the noise scheduler implementation.
Other prediction types (e.g. velocity-predictor) can be supported by
implementing a custom
NoiseScheduler that handles
the appropriate conversions in its
get_denoiser()
method, and by extending
PredictorType accordingly.
API Reference#
DiffusionModel#
- class physicsnemo.diffusion.DiffusionModel(*args, **kwargs)[source]#
Protocol defining the common interface for diffusion models.
A diffusion model is any neural network or function that transforms a noisy state
xat diffusion time (or noise level)tinto a prediction. This protocol defines the standard interface that all diffusion models must satisfy.Any model or function that implements this interface can be used with preconditioners, losses, samplers, and other diffusion utilities.
The interface is prediction-agnostic: whether your model predicts clean data (\(\mathbf{x}_0\)), noise (\(\epsilon\)), score (\(\nabla \log p\)), or velocity (\(\mathbf{v}\)), the signature remains the same.
The interface supports both conditional and unconditional diffusion models. The
conditionargument supports different conditioning scenarios:torch.Tensor: Use when there is a single conditioning tensor (e.g., a class embedding or a single image).
TensorDict: Use when multiple conditioning tensors are needed, possibly with different shapes. The string keys can be used to provide semantic information about each conditioning tensor.
None: Use for unconditional generation or specific scenarios like classifier-free guidance where the model should ignore conditioning.
Examples
>>> import torch >>> import torch.nn.functional as F >>> from physicsnemo.diffusion import DiffusionModel >>> >>> class Model: ... def __call__(self, x, t, condition=None, **kwargs): ... return F.relu(x) ... >>> isinstance(Model(), DiffusionModel) True
Predictor#
- class physicsnemo.diffusion.Predictor(*args, **kwargs)[source]#
Protocol defining a predictor interface for diffusion models.
A predictor is any callable that takes a noisy state
xand diffusion timet, and returns a prediction about the clean data or the noise. Common types of predictors include x0-predictor (predicts the clean data \(\mathbf{x}_0\)), score-predictor, noise-predictor (predicts the noise \(\boldsymbol{\epsilon}\)), velocity-predictor etc.This protocol is generic and does not assume any specific type of prediction. A predictor can be a trained neural network, a guidance function (e.g., classifier-free guidance, DPS-style guidance), or any combination thereof. The exact meaning of the output depends on the predictor type and how it is used. Any callable that implements this interface can be used as a predictor in sampling utilities.
This protocol is typically used during inference. For training, which often requires additional inputs like conditioning, use the more general
DiffusionModelprotocol instead. APredictorcan be obtained from aDiffusionModelby partially applying theconditionand any other keyword arguments usingfunctools.partial.Relationship to Denoiser:
A
Denoiseris the update function used during sampling (e.g., the right-hand side of an ODE/SDE). It is obtained from aPredictorvia theget_denoiser()factory. A typical case is ODE/SDE-based sampling, where one solves:\[\frac{d\mathbf{x}}{dt} = D(\mathbf{x}, t;\, P(\mathbf{x}, t))\]where \(P\) is the predictor and \(D\) is the denoiser that wraps it. This equation captures the essence of how these two concepts are related in the framework.
See also
DenoiserThe interface for sampling update functions.
get_denoiser()Factory to convert a predictor into a denoiser.
Examples
Example 1: Convert a trained conditional model into a predictor using
functools.partial:>>> import torch >>> from functools import partial >>> from tensordict import TensorDict >>> from physicsnemo.diffusion import Predictor >>> >>> class MyModel: ... def __call__(self, x, t, condition=None): ... # x0-predictor: returns estimate of clean data ... # (here assumes conditional normal distribution N(x|y)) ... t_bc = t.view(-1, *([1] * (x.ndim - 1))) ... return x / (1 + t_bc**2) + condition["y"] ... >>> model = MyModel() >>> cond = TensorDict({"y": torch.randn(2, 4)}, batch_size=[2]) >>> x0_predictor = partial(model, condition=cond) >>> isinstance(x0_predictor, Predictor) True
Example 2: Convert the x0-predictor above into a score-predictor (using a simple EDM-like schedule where \(\sigma(t) = t\) and \(\alpha(t) = 1\)):
>>> def x0_to_score(x0, x_t, t): ... sigma_sq = t.view(-1, 1) ** 2 ... return (x0 - x_t) / sigma_sq >>> >>> def score_predictor(x, t): ... x0_pred = x0_predictor(x, t) ... return x0_to_score(x0_pred, x, t) >>> >>> isinstance(score_predictor, Predictor) True
Denoiser#
- class physicsnemo.diffusion.Denoiser(*args, **kwargs)[source]#
Protocol defining a denoiser interface for diffusion model sampling.
A denoiser is the update function used during sampling. It takes a noisy state
xand diffusion timet, and returns the update term consumed by aSolver. For continuous-time methods this is typically the right-hand side of the ODE/SDE, but the interface is generic and can support other sampling methods as well.This is the interface used by
Solverclasses and thesample()function. Any callable that implements this interface can be used as a denoiser.Important distinction from Predictor:
A
Predictoris any callable that outputs a raw prediction (e.g., clean data \(\mathbf{x}_0\), score, guidance signal, etc.).A
Denoiseris the update function derived from one or more predictors, used directly by the solver during sampling.
Typical workflow:
Start with one or more
Predictorinstances (e.g. trained model)Optionally combine predictors (e.g., conditional + guidance scores)
Convert to a
Denoiserusingget_denoiser()Pass the denoiser to
sample()together with aSolver
See also
PredictorThe interface for raw predictions.
get_denoiser()Factory to convert a predictor into a denoiser.
sample()The sampling function that uses this denoiser interface.
Examples
Manually creating a denoiser from an x0-predictor using a simple EDM-like schedule (\(\sigma(t)=t\), \(\alpha(t)=1\)):
>>> import torch >>> from physicsnemo.diffusion import Denoiser >>> >>> # Start from a predictor (x0-predictor) >>> def x0_predictor(x, t): ... t_bc = t.view(-1, *([1] * (x.ndim - 1))) ... return x / (1 + t_bc**2) >>> >>> # Build a denoiser (ODE RHS) from scratch: >>> # score = (x0 - x) / sigma^2, ODE RHS = -0.5 * g^2 * score >>> # For EDM: sigma = t, g^2 = 2*t, so RHS = (x0 - x) / t >>> def my_denoiser(x, t): ... x0 = x0_predictor(x, t) ... t_bc = t.view(-1, *([1] * (x.ndim - 1))) ... return (x0 - x) / t_bc ... >>> isinstance(my_denoiser, Denoiser) True