GLM-5 / GLM-5.1 (MoE + DSA)#
GLM-5 and GLM-5.1 are Zhipu AI’s latest open-source large Mixture-of-Experts models featuring a DeepSeek-style MLA (Multi-head Latent Attention) + DSA (Dynamic Sparse Attention) architecture. GLM-5.1 shares the glm_moe_dsa architecture with GLM-5, with updated weights.
Task |
Text Generation (MoE) |
Architecture |
|
Parameters |
256 routed experts, 8 active per token |
HF Org |
Key Features#
Mixture of Experts (MoE): 256 routed experts with 8 active per token
78 layers, hidden size 6144, with MLA using KV compression (kv_lora_rank=512) and head_dim=64
~200k context window (max_position_embeddings=202,752)
3 dense layers followed by MoE layers (first_k_dense_replace=3)
Available Models#
GLM-5 (
GlmMoeDsaForCausalLM)GLM-5.1 (
GlmMoeDsaForCausalLM): updated weights
Example HF Models#
Model |
HF ID |
|---|---|
GLM-5 |
|
GLM-5.1 |
Example Recipes#
Recipe |
Description |
|---|---|
SFT — GLM-5 with EP=64, PP=4 on 32 nodes |
|
SFT — GLM-5.1 with EP=64, PP=4 on 32 nodes |
Parallel Setup#
The recipe scales training using Expert Parallelism and Pipeline Parallelism (EP=64, PP=4 across 32 nodes of 8× H100 GPUs).
distributed:
strategy: fsdp2
tp_size: 1
cp_size: 1
pp_size: 4
ep_size: 64
sequence_parallel: false
activation_checkpointing: true
pipeline:
pp_schedule: interleaved1f1b
pp_microbatch_size: 1
round_virtual_stages_to_pp_multiple: down
scale_grads_in_schedule: false
patch_inner_model: false
patch_causal_lm_model: false
layers_per_stage: 2
moe:
reshard_after_forward: false
wrap_outer_model: false
Try with NeMo AutoModel#
1. Install (full instructions):
pip install nemo-automodel
2. Clone the repo to get the example recipes:
git clone https://github.com/NVIDIA-NeMo/Automodel.git
cd Automodel
Note
This recipe was validated on 32 nodes × 8 GPUs (256 H100s). See the Launcher Guide for multi-node setup.
3. Run the recipe from inside the repo:
automodel --nproc-per-node=8 examples/llm_finetune/glm/glm_5_hellaswag_pp.yaml
Run with Docker
1. Pull the container and mount a checkpoint directory:
docker run --gpus all -it --rm \
--shm-size=8g \
-v $(pwd)/checkpoints:/opt/Automodel/checkpoints \
nvcr.io/nvidia/nemo-automodel:26.02.00
2. Navigate to the AutoModel directory (where the recipes are):
cd /opt/Automodel
3. Run the recipe:
automodel --nproc-per-node=8 examples/llm_finetune/glm/glm_5_hellaswag_pp.yaml
See the Installation Guide and LLM Fine-Tuning Guide.
Fine-Tuning#
See the LLM Fine-Tuning Guide and the Large MoE Fine-Tuning Guide.