Qwen3-VL / Qwen3-VL-MoE
Qwen3-VL / Qwen3-VL-MoE
Qwen3-VL is Alibaba Cloud’s third-generation vision language model series. The MoE variant activates a fraction of parameters per token for efficient large-scale inference.
Available Models
- Qwen3-VL-8B-Instruct: 8B
- Qwen3-VL-4B-Instruct: 4B
- Qwen3-VL-MoE-30B: 30B total (MoE)
- Qwen3-VL-MoE-235B: 235B total (MoE)
Architecture
Qwen3VLForConditionalGeneration
Example HF Models
Example Recipes
Try with NeMo AutoModel
1. Install (full instructions):
2. Clone the repo to get the example recipes:
3. Run the recipe from inside the repo:
Run with Docker
1. Pull the container and mount a checkpoint directory:
2. Navigate to the AutoModel directory (where the recipes are):
3. Run the recipe:
See the Installation Guide and VLM Fine-Tuning Guide.
Fine-Tuning
See the VLM Fine-Tuning Guide.