GLM-5 / GLM-5.1 (MoE + DSA)

View as Markdown

GLM-5 and GLM-5.1 are Zhipu AI’s latest open-source large Mixture-of-Experts models featuring a DeepSeek-style MLA (Multi-head Latent Attention) + DSA (Dynamic Sparse Attention) architecture. GLM-5.1 shares the glm_moe_dsa architecture with GLM-5, with updated weights.

TaskText Generation (MoE)
ArchitectureGlmMoeDsaForCausalLM
Parameters256 routed experts, 8 active per token
HF Orgzai-org

Key Features

  • Mixture of Experts (MoE): 256 routed experts with 8 active per token
  • 78 layers, hidden size 6144, with MLA using KV compression (kv_lora_rank=512) and head_dim=64
  • ~200k context window (max_position_embeddings=202,752)
  • 3 dense layers followed by MoE layers (first_k_dense_replace=3)

Available Models

  • GLM-5 (GlmMoeDsaForCausalLM)
  • GLM-5.1 (GlmMoeDsaForCausalLM): updated weights

Example HF Models

ModelHF ID
GLM-5zai-org/GLM-5
GLM-5.1zai-org/GLM-5.1

Example Recipes

RecipeDescription
glm_5_hellaswag_pp.yamlSFT — GLM-5 with EP=64, PP=4 on 32 nodes
glm_5.1_hellaswag_pp.yamlSFT — GLM-5.1 with EP=64, PP=4 on 32 nodes

Parallel Setup

The recipe scales training using Expert Parallelism and Pipeline Parallelism (EP=64, PP=4 across 32 nodes of 8× H100 GPUs).

1distributed:
2 strategy: fsdp2
3 tp_size: 1
4 cp_size: 1
5 pp_size: 4
6 ep_size: 64
7 sequence_parallel: false
8 activation_checkpointing: true
9 pipeline:
10 pp_schedule: interleaved1f1b
11 pp_microbatch_size: 1
12 round_virtual_stages_to_pp_multiple: down
13 scale_grads_in_schedule: false
14 patch_inner_model: false
15 patch_causal_lm_model: false
16 layers_per_stage: 2
17 moe:
18 reshard_after_forward: false
19 wrap_outer_model: false

Try with NeMo AutoModel

1. Install (full instructions):

$pip install nemo-automodel

2. Clone the repo to get the example recipes:

$git clone https://github.com/NVIDIA-NeMo/Automodel.git
$cd Automodel

This recipe was validated on 32 nodes × 8 GPUs (256 H100s). See the Launcher Guide for multi-node setup.

3. Run the recipe from inside the repo:

$automodel --nproc-per-node=8 examples/llm_finetune/glm/glm_5_hellaswag_pp.yaml

1. Pull the container and mount a checkpoint directory:

$docker run --gpus all -it --rm \
> --shm-size=8g \
> -v $(pwd)/checkpoints:/opt/Automodel/checkpoints \
> nvcr.io/nvidia/nemo-automodel:26.06.00

2. Navigate to the AutoModel directory (where the recipes are):

$cd /opt/Automodel

3. Run the recipe:

$automodel --nproc-per-node=8 examples/llm_finetune/glm/glm_5_hellaswag_pp.yaml

See the Installation Guide and LLM Fine-Tuning Guide.

Fine-Tuning

See the LLM Fine-Tuning Guide and the Large MoE Fine-Tuning Guide.

Hugging Face Model Cards