Diffusion Preconditioners#
Preconditioning is an essential technique to improve the performance of diffusion models. It consists in scaling the latent state and the noise level that are passed to a network. Some preconditioning also requires to re-scale the output of the network. PhysicsNeMo provides a set of preconditioning classes that are wrappers around backbones or specialized architectures.
- class physicsnemo.models.diffusion.preconditioning.VPPrecond(*args, **kwargs)[source]#
Bases:
ModulePreconditioning corresponding to the variance preserving (VP) formulation.
- Parameters:
img_resolution (int) – Image resolution.
img_channels (int) – Number of color channels.
label_dim (int) – Number of class labels, 0 = unconditional, by default 0.
use_fp16 (bool) – Execute the underlying model at FP16 precision?, by default False.
beta_d (float) – Extent of the noise level schedule, by default 19.9.
beta_min (float) – Initial slope of the noise level schedule, by default 0.1.
M (int) – Original number of timesteps in the DDPM formulation, by default 1000.
epsilon_t (float) – Minimum t-value used during training, by default 1e-5.
model_type (str) – Class name of the underlying model, by default “SongUNet”.
**model_kwargs (dict) – Keyword arguments for the underlying model.
Note
Reference: Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S. and Poole, B., 2020. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456.
- round_sigma(sigma: float | List | Tensor)[source]#
Convert a given sigma value(s) to a tensor representation.
- Parameters:
sigma (Union[float list, torch.Tensor]) – The sigma value(s) to convert.
- Returns:
The tensor representation of the provided sigma value(s).
- Return type:
torch.Tensor
- sigma(t: float | Tensor)[source]#
Compute the sigma(t) value for a given t based on the VP formulation.
The function calculates the noise level schedule for the diffusion process based on the given parameters beta_d and beta_min.
- Parameters:
t (Union[float, torch.Tensor]) – The timestep or set of timesteps for which to compute sigma(t).
- Returns:
The computed sigma(t) value(s).
- Return type:
torch.Tensor
- sigma_inv(sigma: float | Tensor)[source]#
Compute the inverse of the sigma function for a given sigma.
This function effectively calculates t from a given sigma(t) based on the parameters beta_d and beta_min.
- Parameters:
sigma (Union[float, torch.Tensor]) – The sigma(t) value or set of sigma(t) values for which to compute the inverse.
- Returns:
The computed t value(s) corresponding to the provided sigma(t).
- Return type:
torch.Tensor
- class physicsnemo.models.diffusion.preconditioning.VEPrecond(*args, **kwargs)[source]#
Bases:
ModulePreconditioning corresponding to the variance exploding (VE) formulation.
- Parameters:
img_resolution (int) – Image resolution.
img_channels (int) – Number of color channels.
label_dim (int) – Number of class labels, 0 = unconditional, by default 0.
use_fp16 (bool) – Execute the underlying model at FP16 precision?, by default False.
sigma_min (float) – Minimum supported noise level, by default 0.02.
sigma_max (float) – Maximum supported noise level, by default 100.0.
model_type (str) – Class name of the underlying model, by default “SongUNet”.
**model_kwargs (dict) – Keyword arguments for the underlying model.
Note
Reference: Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S. and Poole, B., 2020. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456.
- class physicsnemo.models.diffusion.preconditioning.iDDPMPrecond(*args, **kwargs)[source]#
Bases:
ModulePreconditioning corresponding to the improved DDPM (iDDPM) formulation.
- Parameters:
img_resolution (int) – Image resolution.
img_channels (int) – Number of color channels.
label_dim (int) – Number of class labels, 0 = unconditional, by default 0.
use_fp16 (bool) – Execute the underlying model at FP16 precision?, by default False.
C_1 (float) – Timestep adjustment at low noise levels., by default 0.001.
C_2 (float) – Timestep adjustment at high noise levels., by default 0.008.
M (int) – Original number of timesteps in the DDPM formulation, by default 1000.
model_type (str) – Class name of the underlying model, by default “DhariwalUNet”.
**model_kwargs (dict) – Keyword arguments for the underlying model.
Note
Reference: Nichol, A.Q. and Dhariwal, P., 2021, July. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning (pp. 8162-8171). PMLR.
- alpha_bar(j)[source]#
Compute the alpha_bar(j) value for a given j based on the iDDPM formulation.
- Parameters:
j (Union[int, torch.Tensor]) – The timestep or set of timesteps for which to compute alpha_bar(j).
- Returns:
The computed alpha_bar(j) value(s).
- Return type:
torch.Tensor
- round_sigma(sigma, return_index=False)[source]#
Round the provided sigma value(s) to the nearest value(s) in a pre-defined set u.
- Parameters:
sigma (Union[float, list, torch.Tensor]) – The sigma value(s) to round.
return_index (bool, optional) – Whether to return the index/indices of the rounded value(s) in u instead of the rounded value(s) themselves, by default False.
- Returns:
The rounded sigma value(s) or their index/indices in u, depending on the value of return_index.
- Return type:
torch.Tensor
- class physicsnemo.models.diffusion.preconditioning.EDMPrecond(*args, **kwargs)[source]#
Bases:
ModuleImproved preconditioning proposed in the paper “Elucidating the Design Space of Diffusion-Based Generative Models” (EDM)
- Parameters:
img_resolution (int) – Image resolution.
img_channels (int) – Number of color channels (for both input and output). If your model requires a different number of input or output chanels, override this by passing either of the optional img_in_channels or img_out_channels args
label_dim (int) – Number of class labels, 0 = unconditional, by default 0.
use_fp16 (bool) – Execute the underlying model at FP16 precision?, by default False.
sigma_min (float) – Minimum supported noise level, by default 0.0.
sigma_max (float) – Maximum supported noise level, by default inf.
sigma_data (float) – Expected standard deviation of the training data, by default 0.5.
model_type (str) – Class name of the underlying model, by default “DhariwalUNet”.
img_in_channels (int) – Optional setting for when number of input channels =/= number of output channels. If set, will override img_channels for the input This is useful in the case of additional (conditional) channels
img_out_channels (int) – Optional setting for when number of input channels =/= number of output channels. If set, will override img_channels for the output
**model_kwargs (dict) – Keyword arguments for the underlying model.
Note
Reference: Karras, T., Aittala, M., Aila, T. and Laine, S., 2022. Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems, 35, pp.26565-26577.
- class physicsnemo.models.diffusion.preconditioning.EDMPrecondSuperResolution(*args, **kwargs)[source]#
Bases:
ModuleImproved preconditioning proposed in the paper “Elucidating the Design Space of Diffusion-Based Generative Models” (EDM).
This is a variant of EDMPrecond that is specifically designed for super-resolution tasks. It wraps a neural network that predicts the denoised high-resolution image given a noisy high-resolution image, and additional conditioning that includes a low-resolution image, and a noise level.
- Parameters:
img_resolution (Union[int, Tuple[int, int]]) – Spatial resolution \((H, W)\) of the image. If a single int is provided, the image is assumed to be square.
img_in_channels (int) – Number of input channels in the low-resolution input image.
img_out_channels (int) – Number of output channels in the high-resolution output image.
use_fp16 (bool, optional) – Whether to use half-precision floating point (FP16) for model execution, by default False.
model_type (str, optional) – Class name of the underlying model. Must be one of the following: ‘SongUNet’, ‘SongUNetPosEmbd’, ‘SongUNetPosLtEmbd’, ‘DhariwalUNet’. Defaults to ‘SongUNetPosEmbd’.
sigma_data (float, optional) – Expected standard deviation of the training data, by default 0.5.
sigma_min (float, optional) – Minimum supported noise level, by default 0.0.
sigma_max (float, optional) – Maximum supported noise level, by default inf.
**model_kwargs (dict) – Keyword arguments passed to the underlying model __init__ method.
See also
ForSongUNetBasic U-Net for diffusion models
SongUNetPosEmbdU-Net with positional embeddings
SongUNetPosLtEmbdU-Net with positional and lead-time embeddings
Please,andNote
References: - Karras, T., Aittala, M., Aila, T. and Laine, S., 2022. Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems, 35, pp.26565-26577. - Mardani, M., Brenowitz, N., Cohen, Y., Pathak, J., Chen, C.Y., Liu, C.C.,Vahdat, A., Kashinath, K., Kautz, J. and Pritchard, M., 2023. Generative Residual Diffusion Modeling for Km-scale Atmospheric Downscaling. arXiv preprint arXiv:2309.15214.
- property amp_mode#
Set to
Truewhen using automatic mixed precision.
- property profile_mode#
Set to
Trueto enable profiling of the wrapped model.
- static round_sigma(
- sigma: float | List | Tensor,
Convert a given sigma value(s) to a tensor representation.
- Parameters:
sigma (Union[float, List, torch.Tensor]) – Sigma value(s) to convert.
- Returns:
Tensor representation of sigma values.
- Return type:
torch.Tensor
See also
- property use_fp16#
Whether the model uses float16 precision.
- Returns:
True if the model is in float16 mode, False otherwise.
- Return type:
bool
- Type:
bool