> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# Diffusion Models

## Introduction

Diffusion models are a class of generative models that learn to produce images or videos by iteratively denoising samples from a noise distribution. NeMo AutoModel supports training diffusion models using **flow matching**, a framework that regresses velocity fields along straight interpolation paths between noise and data.

NeMo AutoModel integrates with [Hugging Face Diffusers](https://huggingface.co/docs/diffusers) for model loading and generation, while providing its own distributed training infrastructure via the `TrainDiffusionRecipe`. This recipe handles FSDP2 parallelization, flow matching loss computation, multiresolution bucketed dataloading, and checkpoint management.

## Supported Models

| Owner                | Model                                                          | Task          | Architecture        |
| -------------------- | -------------------------------------------------------------- | ------------- | ------------------- |
| Wan AI               | [Wan 2.1 T2V](/model-coverage/diffusion/wan-2-1-t2v)           | Text-to-Video | DiT (Flow Matching) |
| Black Forest Labs    | [FLUX.1-dev](/model-coverage/diffusion/flux-1-dev)             | Text-to-Image | DiT (Flow Matching) |
| Hunyuan Community    | [HunyuanVideo 1.5](/model-coverage/diffusion/hunyuanvideo-1-5) | Text-to-Video | DiT (Flow Matching) |
| Qwen / Alibaba Cloud | [Qwen-Image](/model-coverage/diffusion/qwen-image)             | Text-to-Image | DiT (Flow Matching) |

## Supported Workflows

* **Pretraining**: Train from randomly initialized weights on large-scale datasets
* **Fine-tuning**: Adapt pretrained model weights to a specific dataset or style
* **Generation**: Run inference with pretrained or fine-tuned checkpoints

## Dataset

Diffusion training requires pre-encoded `.meta` files containing VAE latents and text embeddings. Raw videos or images must be preprocessed before training. See the [Diffusion Dataset Preparation](/datasets/diffusion-dataset) guide.

## Train Diffusion Models

For a complete walkthrough of training configuration, model-specific settings, and launch commands, see the [Diffusion Training and Fine-Tuning Guide](/recipes-e2e-examples/diffusion-fine-tuning).