Diffusion Language Models (dLLMs) | NVIDIA NeMo AutoModel

Diffusion language models (dLLMs) generate text by denoising rather than left-to-right autoregression. A fixed-length response “canvas” is corrupted and then iteratively refined, so tokens are produced in parallel and can be revised across steps. NeMo AutoModel supports fine-tuning block-diffusion dLLMs with the same recipe-driven, FSDP2/Expert-Parallel training stack used for LLMs and VLMs.

Supported Models

Owner	Model Family	Architectures
Google	DiffusionGemma	`DiffusionGemmaForBlockDiffusion`

Fine-Tuning

See the DiffusionGemma Fine-Tuning Guide for the block-diffusion training objective (uniform-random token corruption, no [MASK]), self-conditioning, and the supported feature set (SFT, LoRA, Expert Parallelism, activation checkpointing).