Mixtral
Mixtral is Mistral AI’s Mixture-of-Experts model series. Each token is processed by a subset of experts, enabling a large total parameter count with efficient per-token compute.
Available Models
- Mixtral-8x7B: 8 experts, 2 active per token (~13B active)
- Mixtral-8x7B-Instruct: instruction-tuned variant
- Mixtral-8x22B: 8 experts, 2 active per token (~39B active)
Architecture
MixtralForCausalLM
Example HF Models
Example Recipes
Try with NeMo AutoModel
1. Install (full instructions):
2. Clone the repo to get the example recipes:
3. Run the recipe from inside the repo:
Run with Docker
1. Pull the container and mount a checkpoint directory:
2. Navigate to the AutoModel directory (where the recipes are):
3. Run the recipe:
See the Installation Guide and LLM Fine-Tuning Guide.
Fine-Tuning
See the LLM Fine-Tuning Guide and the Large MoE Fine-Tuning Guide.