Qwen3 MoE
Qwen3 MoE is the Mixture-of-Experts variant of the Qwen3 series from Alibaba Cloud, activating a small fraction of parameters per token for efficient large-scale training.
Available Models
- Qwen3-30B-A3B: 30B total parameters, 3B activated per token
- Qwen3-235B-A22B: 235B total parameters, 22B activated per token
Architecture
Qwen3MoeForCausalLM
Example HF Models
Example Recipes
Try with NeMo AutoModel
1. Install (full instructions):
2. Clone the repo to get the example recipes:
3. Run the recipe from inside the repo:
Run with Docker
1. Pull the container and mount a checkpoint directory:
2. Navigate to the AutoModel directory (where the recipes are):
3. Run the recipe:
See the Installation Guide and LLM Fine-Tuning Guide.
Fine-Tuning
See the LLM Fine-Tuning Guide and the Large MoE Fine-Tuning Guide.