Hy3 (HunyuanLarge)
Hy3 (HunyuanLarge)
Hy3-preview is a 295B Mixture-of-Experts language model from Tencent. It features 80 transformer layers (layer 0 dense, layers 1–79 MoE), 192 routed experts plus 1 shared expert with top-8 sigmoid routing, Grouped Query Attention (64 Q / 8 KV heads), per-head QK RMSNorm, RoPE, and an e_score_correction_bias gate buffer for expert-load correction. It supports a 256K context window.
Available Models
- Hy3-preview: 295B total, top-8 routed experts activated per token
Architectures
HYV3ForCausalLM
Example HF Models
Example Recipes
Try with NeMo AutoModel
1. Install (NeMo AutoModel):
2. Clone the repo to get the example recipes:
3. Run the recipe from inside the repo:
See the NeMo AutoModel Installation Guide and LLM Fine-Tuning Guide.
Fine-Tuning
See the LLM Fine-Tuning Guide and the Large MoE Fine-Tuning Guide.