DeepSeek V4 Flash
DeepSeek V4 Flash is DeepSeek’s latest fine-grained Mixture-of-Experts language model. It uses a 43-layer all-MoE backbone with 256 routed experts plus one shared expert per block, top-6 routing, and a hybrid per-layer attention zoo (SWA / CSA / HCA) selectable through compress_ratios. The first num_hash_layers blocks use a hash-clustering gate, and every block maintains hc_mult=4 Hyper-Connection streams mixed via a learned col-norm-first Sinkhorn router.
Available Models
- DeepSeek-V4-Flash
Architecture
DeepseekV4ForCausalLM
Example HF Models
Example Recipes
Try with NeMo AutoModel
1. Install (full instructions):
2. Clone the repo to get the example recipes:
The full 43-layer schedule requires a multi-node run; see the recipe yaml header for ep_size / pp_size guidance. See the Launcher Guide for multi-node setup.
3. Run the recipe from inside the repo:
Run with Docker
1. Pull the container and mount a checkpoint directory:
2. Navigate to the AutoModel directory (where the recipes are):
3. Run the recipe:
See the Installation Guide and LLM Fine-Tuning Guide.
Fine-Tuning
See the Fine-Tune DeepSeek V4 Flash guide and the Large MoE Fine-Tuning Guide.