DeepSeek V4#
DeepSeek-V4 is the next-generation Mixture-of-Experts language model from DeepSeek-AI. It extends the V3 design with Hyper-Connections (mHC) for multi-stream residual mixing, Compressed Sparse Attention (CSA) with a learned token-importance indexer (DSA), hash-routed MoE layers for the first few decoder blocks, and a refined Multi-Token Prediction (MTP) head with separate e_proj / h_proj projections.
DeepSeek V4 models are supported via the Bridge system with auto-detected configuration and weight mapping.
Model Architecture Features#
Hybrid Attention (DSv4HybridSelfAttention): Per-layer mix of dense MLA and Compressed Sparse Attention selected by
compress_ratiosCompressed Sparse Attention (CSA) with DSA Indexer: Top-k token selection over windowed keys;
index_n_heads,index_head_dim,index_topkcontrol the indexerHyper-Connections (mHC): 4-stream residual mixing per layer (
hc_mult = 4) with sinkhorn-iterated attention; per-MTP-layerhc_head_*learns output contractionHash-Routed MoE: First few decoder layers use a deterministic vocab → expert mapping (
tid2eid) instead of softmax routingMulti-Token Prediction (MTP): One MTP layer with separate
e_projandh_projprojections (post-MCore #4518)YaRN RoPE:
rotary_scaling_factor=16,original_max_position_embeddings=65536;mscale=mscale_all_dim=1.0for V4Sigmoid Gating with Expert Bias:
noaux_tcload balancing,sqrtsoftplusscoring, expert bias enabledo_groupsOutput Projection:o_lora_ranklow-rank output projection split intoo_groupsparallel groups
Examples, Parallelism, and Limitations#
For checkpoint conversion and inference scripts, recommended parallelism settings, and current known limitations, see the DeepSeek V4 examples README.
Hugging Face Model Cards & References#
Hugging Face Model Cards#
DeepSeek-V4-Flash: https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash
DeepSeek-V4-Flash-Base: https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash-Base
DeepSeek-V4-Pro: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro
DeepSeek-V4-Pro-Base: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro-Base
Additional Resources#
GitHub Repository: https://github.com/deepseek-ai/DeepSeek-V4