dLLM Fine-Tuning
dLLM Fine-Tuning
Introduction
Diffusion language models (dLLMs) generate text by iteratively denoising masked tokens, rather than generating one token at a time left-to-right like autoregressive (AR) models. Starting from a sequence of [MASK] tokens, the model progressively unmasks the most confident positions over multiple denoising steps until the full response is revealed.
This approach enables parallel token generation and bidirectional attention, which gives the model more context for each prediction compared to AR models.
NeMo AutoModel currently supports the following dLLM model family:
- LLaDA (MDLM) ā Bidirectional masked diffusion. The model receives corrupted tokens and predicts the clean token at each masked position.
Workflow Overview
Supported Models
Install NeMo AutoModel
Alternatively, use the pre-built Docker container:
For the full set of installation methods, see the installation guide.
Configure Your Training Recipe
dLLM fine-tuning is driven by:
- A recipe script (
train_ft.py) ā orchestrates the training loop with dLLM-specific corruption, loss, and batch handling. - A YAML configuration file ā specifies the model, data, optimizer, dLLM-specific settings, and distributed training strategy.
The recipe uses a strategy pattern to handle differences between model families. The dllm.mode field in the YAML selects the strategy:
LLaDA Configuration
See llada_sft.yaml for the full working config. The key dLLM-specific sections are:
Key dLLM Config Fields
Fine-Tune the Model
Generation / Inference
The generation script (generate.py) supports chat, raw, and infilling modes for LLaDA checkpoints.