Moonlight
Moonlight is a Mixture-of-Experts language model from Moonshot AI trained using Muon optimizer. It uses the DeepseekV3ForCausalLM architecture with 16B total parameters and 3B activated per token.
Available Models
- Moonlight-16B-A3B: 16B total, 3B activated
Architecture
DeepseekV3ForCausalLM(same architecture as DeepSeek-V3)
Example HF Models
Example Recipes
Try with NeMo AutoModel
1. Install (full instructions):
2. Clone the repo to get the example recipes:
3. Run the recipe from inside the repo:
Run with Docker
1. Pull the container and mount a checkpoint directory:
2. Navigate to the AutoModel directory (where the recipes are):
3. Run the recipe:
See the Installation Guide and LLM Fine-Tuning Guide.
Fine-Tuning
See the LLM Fine-Tuning Guide.