dLLM Fine-Tuning

View as Markdown

Introduction

Diffusion language models (dLLMs) generate text by iteratively denoising masked tokens, rather than generating one token at a time left-to-right like autoregressive (AR) models. Starting from a sequence of [MASK] tokens, the model progressively unmasks the most confident positions over multiple denoising steps until the full response is revealed.

This approach enables parallel token generation and bidirectional attention, which gives the model more context for each prediction compared to AR models.

NeMo AutoModel currently supports the following dLLM model family:

  • LLaDA (MDLM) — Bidirectional masked diffusion. The model receives corrupted tokens and predicts the clean token at each masked position.

Workflow Overview

ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
│ 1. Install │--->│ 2. Configure │--->│ 3. Train │--->│ 4. Generate │
│ │ │ YAML │ │ │ │ │
│ pip install │ │ Recipe + │ │ torchrun │ │ Run dLLM │
│ or Docker │ │ dLLM config │ │ │ │ inference │
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜
StepSectionWhat You Do
1. InstallInstall NeMo AutoModelInstall the package via pip or Docker
2. ConfigureConfigure Your Training RecipeWrite a YAML config specifying model, data, dLLM mode, and training settings
3. TrainFine-Tune the ModelLaunch training with torchrun
4. GenerateGeneration / InferenceGenerate text from a fine-tuned checkpoint

Supported Models

Model FamilydLLM ModeLossInferenceExample Config
LLaDAmdlmMDLM cross-entropyBlock-by-block, full-forward (no KV cache)llada_sft.yaml

Install NeMo AutoModel

$pip3 install nemo-automodel

Alternatively, use the pre-built Docker container:

$docker pull nvcr.io/nvidia/nemo-automodel:26.04.00
$docker run --gpus all -it --rm --shm-size=8g nvcr.io/nvidia/nemo-automodel:26.04.00

For the full set of installation methods, see the installation guide.

Configure Your Training Recipe

dLLM fine-tuning is driven by:

  1. A recipe script (train_ft.py) — orchestrates the training loop with dLLM-specific corruption, loss, and batch handling.
  2. A YAML configuration file — specifies the model, data, optimizer, dLLM-specific settings, and distributed training strategy.

The recipe uses a strategy pattern to handle differences between model families. The dllm.mode field in the YAML selects the strategy:

ModeStrategyDescription
mdlmMDLMStrategyLLaDA-style: model receives corrupted tokens, MDLM cross-entropy loss

LLaDA Configuration

See llada_sft.yaml for the full working config. The key dLLM-specific sections are:

1model:
2 pretrained_model_name_or_path: GSAI-ML/LLaDA-8B-Base
3 torch_dtype: float32
4 trust_remote_code: true
5
6dllm:
7 mode: mdlm
8 mask_token_id: 126336 # LLaDA mask token
9 eps: 0.001 # Minimum corruption ratio
10
11dataset:
12 unshifted: true # Required for dLLM training

Key dLLM Config Fields

FieldDescription
dllm.modeTraining strategy (mdlm)
dllm.mask_token_idToken ID used for masking (126336 for LLaDA)
dllm.epsMinimum corruption ratio to avoid zero-corruption samples
dataset.unshiftedMust be true for dLLM — disables the autoregressive input/target shift

Fine-Tune the Model

$torchrun --nproc-per-node=8 \
> nemo_automodel/recipes/dllm/train_ft.py \
> -c examples/dllm_sft/llada_sft.yaml

Generation / Inference

The generation script (generate.py) supports chat, raw, and infilling modes for LLaDA checkpoints.

LLaDA Generation

$python examples/dllm_generate/generate.py \
> --checkpoint <path> \
> --prompt "Explain what a neural network is."

Generation Parameters

ParameterDescriptionDefault
--stepsNumber of denoising steps128
--max_new_tokensMaximum tokens to generate128
--block_sizeTokens per denoising block32
--temperatureGumbel noise temperature (0 = greedy)0.0
--remaskingConfidence scoring strategy for selecting which positions to unmasklow_confidence