Diffusion Models#
Introduction#
Diffusion models are a class of generative models that learn to produce images or videos by iteratively denoising samples from a noise distribution. NeMo AutoModel supports training diffusion models using flow matching, a framework that regresses velocity fields along straight interpolation paths between noise and data.
NeMo AutoModel integrates with Hugging Face Diffusers for model loading and generation, while providing its own distributed training infrastructure via the TrainDiffusionRecipe. This recipe handles FSDP2 parallelization, flow matching loss computation, multiresolution bucketed dataloading, and checkpoint management.
Supported Models#
Model |
HF Model ID |
Task |
Parameters |
Parallelization |
Example YAMLs |
|---|---|---|---|---|---|
Wan 2.1 T2V 1.3B |
|
Text-to-Video |
1.3B |
FSDP2 |
|
FLUX.1-dev |
|
Text-to-Image |
12B |
FSDP2 |
|
HunyuanVideo 1.5 |
|
Text-to-Video |
13B |
FSDP2 |
Supported Workflows#
Pretraining: Train from randomly initialized weights on large-scale datasets
Fine-tuning: Adapt pretrained model weights to a specific dataset or style
Generation: Run inference with pretrained or fine-tuned checkpoints
Dataset#
Diffusion training requires pre-encoded .meta files containing VAE latents and text embeddings. Raw videos or images must be preprocessed before training.
For detailed instructions on data preparation, see the Diffusion Dataset Preparation guide.
Train Diffusion Models#
For a complete walkthrough of training configuration, model-specific settings, and launch commands, see the Diffusion Training and Fine-Tuning Guide.