Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.

Stable Diffusion and SDXL

Stable Diffusion (SD) [[Paper]](https://arxiv.org/pdf/2112.10752v2.pdf) is a powerful generative model that can produce high-quality images based on textual descriptions. By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) have achieved state-of-the-art synthesis results on image data and beyond. However, due to their direct operation in pixel space, optimization of powerful DMs is computationally expensive and can consume hundreds of GPU days. To address this challenge, the SD model is applied in the latent space of powerful pretrained autoencoders. This enables DM training on limited computational resources while retaining their quality and flexibility, greatly boosting visual fidelity.

The SD model also introduces cross-attention layers into the model architecture, allowing it to turn diffusion models into powerful and flexible generators for general conditioning inputs such as text or bounding boxes. As a result, the SD model achieves a new state of the art for image inpainting and highly competitive performance on various tasks, including unconditional image generation, semantic scene synthesis, and super-resolution. Additionally, the SD model significantly reduces computational requirements compared to pixel-based DMs, making it an attractive solution for a wide range of applications.

Note

We provide predefined configs of training and inference for Stable Diffusion, Stable Diffusion v2, and SDXL in NeMo Framework.

Feature

Training

Inference

Data parallelism

Yes

N/A

Tensor parallelism

No

No

Pipeline parallelism

No

No

Sequence parallelism

No

No

Activation checkpointing

No

No

FP32/TF32

Yes

Yes (FP16 enabled by default)

AMP/FP16

Yes

Yes

AMP/BF16

No

No

BF16 O2

No

No

TransformerEngine/FP8

No

No

Multi-GPU

Yes

Yes

Multi-Node

Yes

Yes

Inference deployment

N/A

NVIDIA Triton supported

SW stack support

Slurm DeepOps/Base Command Manager/Base Command Platform

Slurm DeepOps/Base Command Manager/Base Command Platform

NVfuser

No

N/A

Distributed Optimizer

No

N/A

TorchInductor

Yes

N/A

Flash Attention

Yes

N/A

NHWC GroupNorm

Yes

Yes

FSDP

Yes (For sdxl only)

Yes (For sdxl only)